Creating SECCOMP profiles for docker containers
June 13, 2024•681 words
Goal
The default seccomp profile for docker is on by default and still allows for more than 300 syscalls, that's about 3/4 of the available syscalls on Linux.
We pay a hefty performance cost for enabling seccomp so we may as well get some serious protection from it!
The same is true for containerd/kubernetes, but GKE for example does not enable it by default outside of Autopilot.
Profiles can be created by hand, but need expertise that few people possess, fortunately a great tool exist, oci-seccomp-bpf-hook, but it's designed and documented to run with podman, not docker and docker compose.
Getting oci-seccomp-bpf-hook to play nicely with docker compose.
Problems
Docker is vaguely OCI compliant, but does not allow to run OCI hooks natively.
docker compose started supporting annotations recently, but was held back by a bug until very recently.
Solution (ubuntu 24.04)
Note that you do not need to do any of this to use a generated seccomp profile, this is just to generate a new profile.
Building the tools
Some dependencies to install:
sudo apt install bpfcc-tools libseccomp-dev golang
remove the docker-compose-v2
package
git clone and build the following repos:
- https://github.com/containers/oci-seccomp-bpf-hook/
- copy the
oci-seccomp-bpf-hook
binary to/usr/libexec/oci/hooks.d/oci-seccomp-bpf-hook
- copy the
- https://github.com/awslabs/oci-add-hooks/
- copy the
oci-add-hooks
binary to/usr/local/bin/
- copy the
- https://github.com/docker/compose/
- copy the
docker-compose
binary to/usr/local/libexec/docker/cli-plugins/
- copy the
Configuration
Configure oci-seccomp-bpf-hook in oci-add-hooks:
/etc/docker/oci-add-hooks.json
{
"hooks": {
"prestart": [
{
"path": "/usr/local/libexec/oci/hooks.d/oci-seccomp-bpf-hook",
"args": ["oci-seccomp-bpf-hook", "-s"]
}
]
}
}
Configure oci-add-hooks in docker:
/etc/docker/daemon.json
{
"runtimes": {
"oci-add-hook": {
"path": "/usr/local/bin/oci-add-hooks",
"runtimeArgs": ["--hook-config-path",
"/etc/docker/oci-add-hooks.json",
"--runtime-path",
"/usr/sbin/runc"]
}
}
}
Now you should be able to restart the dockerd service and everything should work as usual, there is just an extra runtime available, oci-add-hook:
$ sudo docker info | grep Runtime
Runtimes: io.containerd.runc.v2 oci-add-hook runc
Default Runtime: runc
Using our new powers
To run the hook, we need to choose oci-add-hook
as a runtime for a particular service, and we need to annotate the service to tell oci-seccomp-bpf-hook
how to behave.
Example, securing the traefik-forward-auth middleware
traefik-forward-auth is a traefik middleware that allows to intercept calls to an application and redirect the user to an OIDC endpoint, such as keycloak, if they are not authenticated. It turns traefik into an authenticating access proxy.
Surely it doesn't need access to more than 300 syscalls...
To instrument it, it is enough to add this to the service definition:
runtime: oci-add-hook
annotations:
- "io.containers.trace-syscall=of:/tmp/traefik-forward-auth.json;if:/etc/docker/default-seccomp.json"
The if:
part is optional, see the documentation.
After re-creating the container, capture some real-world normal traffic, and stop the container, you can now remove the special runtime
.
If it all went well, you should obtain a nice /tmp/traefik-forward-auth.json
file that looks like this (once beautified with jq
):
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_AARCH64"
],
"syscalls": [
{
"names": [
"accept4",
"bind",
"brk",
"capget",
"capset",
"chdir",
"clone",
"close",
"connect",
"dup3",
"epoll_create1",
"epoll_ctl",
"epoll_pwait",
"execve",
"exit_group",
"faccessat2",
"fchdir",
"fchown",
"fcntl",
"fstat",
"fstatfs",
"futex",
"getcwd",
"getdents64",
"getpeername",
"getpid",
"getppid",
"getrandom",
"getsockname",
"getsockopt",
"gettid",
"listen",
"madvise",
"mmap",
"mount",
"nanosleep",
"newfstatat",
"openat",
"pipe2",
"pivot_root",
"prctl",
"prlimit64",
"read",
"rt_sigaction",
"rt_sigprocmask",
"rt_sigreturn",
"sched_getaffinity",
"sched_yield",
"setgid",
"setgroups",
"sethostname",
"setsockopt",
"setuid",
"sigaltstack",
"socket",
"statfs",
"tgkill",
"umask",
"umount2",
"write"
],
"action": "SCMP_ACT_ALLOW",
"args": [],
"comment": "",
"includes": {},
"excludes": {}
},
{
"action": "SCMP_ACT_ALLOW",
"args": [],
"comment": "",
"includes": {},
"excludes": {}
}
]
}
That is a grand total of 60 syscalls, or a reduction of more than 80% of the attack surface of the kernel, at no extra performance cost!
Of course, traefik-forward-auth is a middleware that should be quite simple, what about bigger applications? I profiled a NextCloud container, how many syscalls does it need? 132. Still a good reduction.
Putting it to the test
Once you have your json file, make it part of your deployment and enforce it by using the security_opt
service element in docker compose, for example:
security_opt:
- seccomp:traefik-forward-auth-seccomp.json
And re-create your container, it should now be limited to the syscalls that were encountered during the profiling phase.