Creating SECCOMP profiles for docker containers
June 13, 2024•681 words
Goal
The default seccomp profile for docker is on by default and still allows for more than 300 syscalls, that's about 3/4 of the available syscalls on Linux.
We pay a hefty performance cost for enabling seccomp so we may as well get some serious protection from it!
The same is true for containerd/kubernetes, but GKE for example does not enable it by default outside of Autopilot.
Profiles can be created by hand, but need expertise that few people possess, fortunately a great tool exist, oci-seccomp-bpf-hook, but it's designed and documented to run with podman, not docker and docker compose.
Getting oci-seccomp-bpf-hook to play nicely with docker compose.
Problems
Docker is vaguely OCI compliant, but does not allow to run OCI hooks natively.
docker compose started supporting annotations recently, but was held back by a bug until very recently.
Solution (ubuntu 24.04)
Note that you do not need to do any of this to use a generated seccomp profile, this is just to generate a new profile.
Building the tools
Some dependencies to install:
sudo apt install bpfcc-tools libseccomp-dev golang
remove the  docker-compose-v2 package
git clone and build the following repos:
- https://github.com/containers/oci-seccomp-bpf-hook/
- copy the 
oci-seccomp-bpf-hookbinary to/usr/libexec/oci/hooks.d/oci-seccomp-bpf-hook 
 - copy the 
 - https://github.com/awslabs/oci-add-hooks/
- copy the 
oci-add-hooksbinary to/usr/local/bin/ 
 - copy the 
 - https://github.com/docker/compose/
- copy the 
docker-composebinary to/usr/local/libexec/docker/cli-plugins/ 
 - copy the 
 
Configuration
Configure oci-seccomp-bpf-hook in oci-add-hooks:
/etc/docker/oci-add-hooks.json
{
  "hooks": {
    "prestart": [
      {
        "path": "/usr/local/libexec/oci/hooks.d/oci-seccomp-bpf-hook",
        "args": ["oci-seccomp-bpf-hook", "-s"]
      }
    ]
  }
}
Configure oci-add-hooks in docker:
/etc/docker/daemon.json
{
  "runtimes": {
    "oci-add-hook": {
      "path": "/usr/local/bin/oci-add-hooks",
      "runtimeArgs": ["--hook-config-path",
        "/etc/docker/oci-add-hooks.json",
        "--runtime-path",
        "/usr/sbin/runc"]
    }
  }
}
Now you should be able to restart the dockerd service and everything should work as usual, there is just an extra runtime available, oci-add-hook:
$ sudo docker info | grep Runtime
 Runtimes: io.containerd.runc.v2 oci-add-hook runc
 Default Runtime: runc
Using our new powers
To run the hook, we need to choose oci-add-hook as a runtime for a particular service, and we need to annotate the service to tell oci-seccomp-bpf-hook how to behave.
Example, securing the traefik-forward-auth middleware
traefik-forward-auth is a traefik middleware that allows to intercept calls to an application and redirect the user to an OIDC endpoint, such as keycloak, if they are not authenticated. It turns traefik into an authenticating access proxy.
Surely it doesn't need access to more than 300 syscalls...
To instrument it, it is enough to add this to the service definition:
    runtime: oci-add-hook
    annotations:
      - "io.containers.trace-syscall=of:/tmp/traefik-forward-auth.json;if:/etc/docker/default-seccomp.json"
The if: part is optional, see the documentation.
After re-creating the container, capture some real-world normal traffic, and stop the container, you can now remove the special runtime.
If it all went well, you should obtain a nice /tmp/traefik-forward-auth.json file that looks like this (once beautified with jq):
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": [
    "SCMP_ARCH_AARCH64"
  ],
  "syscalls": [
    {
      "names": [
        "accept4",
        "bind",
        "brk",
        "capget",
        "capset",
        "chdir",
        "clone",
        "close",
        "connect",
        "dup3",
        "epoll_create1",
        "epoll_ctl",
        "epoll_pwait",
        "execve",
        "exit_group",
        "faccessat2",
        "fchdir",
        "fchown",
        "fcntl",
        "fstat",
        "fstatfs",
        "futex",
        "getcwd",
        "getdents64",
        "getpeername",
        "getpid",
        "getppid",
        "getrandom",
        "getsockname",
        "getsockopt",
        "gettid",
        "listen",
        "madvise",
        "mmap",
        "mount",
        "nanosleep",
        "newfstatat",
        "openat",
        "pipe2",
        "pivot_root",
        "prctl",
        "prlimit64",
        "read",
        "rt_sigaction",
        "rt_sigprocmask",
        "rt_sigreturn",
        "sched_getaffinity",
        "sched_yield",
        "setgid",
        "setgroups",
        "sethostname",
        "setsockopt",
        "setuid",
        "sigaltstack",
        "socket",
        "statfs",
        "tgkill",
        "umask",
        "umount2",
        "write"
      ],
      "action": "SCMP_ACT_ALLOW",
      "args": [],
      "comment": "",
      "includes": {},
      "excludes": {}
    },
    {
      "action": "SCMP_ACT_ALLOW",
      "args": [],
      "comment": "",
      "includes": {},
      "excludes": {}
    }
  ]
}
That is a grand total of 60 syscalls, or a reduction of more than 80% of the attack surface of the kernel, at no extra performance cost!
Of course, traefik-forward-auth is a middleware that should be quite simple, what about bigger applications? I profiled a NextCloud container, how many syscalls does it need? 132. Still a good reduction.
Putting it to the test
Once you have your json file, make it part of your deployment and enforce it by using the security_opt service element in docker compose, for example:
    security_opt:
      - seccomp:traefik-forward-auth-seccomp.json
And re-create your container, it should now be limited to the syscalls that were encountered during the profiling phase.