I was to optimistic about the userspace fix. The fix alone might minimize the attack surface but unfortunately we seem to need the kernel fix. The child we attach in lxc-attach wants to change its LSM label appropriately right before exec() and for that it needs an fd to /proc/self/attr/current. So we seem to always have such an fd around. But what we can do is instead of passing an fd to /proc itself around, is to open up an fd to /proc/self/attr/current in the parent and send it to the child. This might minimize the attack surface but we still need the kernel fix. I post an updated version of the patch I sent before here and I'll keep thinking a little more on how we can avoid having to pass any procfd around. But I doubt it. The more complex solution I outlined above, involving a second lxc_clone() which serves as a simple chrooting process to place is into a isolated set of namespaces is an additional attack surface minimizer. @Stéphane, do you think it be worth adding another process that chroots/minimally namspaces us before attaching to the childs namespaces?
Here's the outline of the current patch:
So far, we opened a file descriptor refering to proc on the host inside the
host namespace and handed that fd to the attached process in
attach_child_main(). This was done to ensure that LSM labels were correctly
setup. However, by exploiting a potential kernel bug, ptrace could be used to
prevent the file descriptor from being closed which in turn could be used by an
unprivileged container to gain access to the host namespace. Aside from this
needing an upstream kernel fix, we should make sure that we don't pass the fd
for proc itself to the attached process. However, we cannot completely prevent
this, as the attached process needs to be able to change its apparmor profile
by writing to /proc/self/attr/exec or /proc/self/attr/current. To minimize the
attack surface, we only send the fd for /proc/self/attr/exec or
/proc/self/attr/current to the attached process. To do this we introduce a
little more IPC between the child and parent:
* IPC mechanism: (X is receiver)
* initial process intermediate attached
* X <--- send pid of
* attached proc,
* then exit
* send 0 ------------------------------------> X
* [do initialization]
* X <------------------------------------ send 1
* [add to cgroup, ...]
* send 2 ------------------------------------> X
* [set LXC_ATTACH_NO_NEW_PRIVS]
* X <------------------------------------ send 3
* [open LSM label fd]
* send 4 ------------------------------------> X
* [set LSM label]
* close socket close socket
* run program
The attached child tells the parent when it is ready to have its LSM labels set
up. The parent then opens an approriate fd for the child PID to
/proc/<pid>/attr/exec or /proc/<pid>/attr/current and sends it via SCM_RIGHTS
to the child. The child can then set its LSM laben. Both sides then close the
socket fds and the child execs the requested process.
I was to optimistic about the userspace fix. The fix alone might minimize the attack surface but unfortunately we seem to need the kernel fix. The child we attach in lxc-attach wants to change its LSM label appropriately right before exec() and for that it needs an fd to /proc/self/ attr/current. So we seem to always have such an fd around. But what we can do is instead of passing an fd to /proc itself around, is to open up an fd to /proc/self/ attr/current in the parent and send it to the child. This might minimize the attack surface but we still need the kernel fix. I post an updated version of the patch I sent before here and I'll keep thinking a little more on how we can avoid having to pass any procfd around. But I doubt it. The more complex solution I outlined above, involving a second lxc_clone() which serves as a simple chrooting process to place is into a isolated set of namespaces is an additional attack surface minimizer. @Stéphane, do you think it be worth adding another process that chroots/minimally namspaces us before attaching to the childs namespaces?
Here's the outline of the current patch: child_main( ). This was done to ensure that LSM labels were correctly attr/exec or /proc/self/ attr/current. To minimize the attr/exec or self/attr/ current to the attached process. To do this we introduce a
So far, we opened a file descriptor refering to proc on the host inside the
host namespace and handed that fd to the attached process in
attach_
setup. However, by exploiting a potential kernel bug, ptrace could be used to
prevent the file descriptor from being closed which in turn could be used by an
unprivileged container to gain access to the host namespace. Aside from this
needing an upstream kernel fix, we should make sure that we don't pass the fd
for proc itself to the attached process. However, we cannot completely prevent
this, as the attached process needs to be able to change its apparmor profile
by writing to /proc/self/
attack surface, we only send the fd for /proc/self/
/proc/
little more IPC between the child and parent:
* IPC mechanism: (X is receiver) ------- ------- ------- ------- -> X ------- ------- ------- ------- -- send 1 ------- ------- ------- ------- -> X NO_NEW_ PRIVS] ------- ------- ------- ------- -- send 3 ------- ------- ------- ------- -> X
* initial process intermediate attached
* X <--- send pid of
* attached proc,
* then exit
* send 0 -------
* [do initialization]
* X <------
* [add to cgroup, ...]
* send 2 -------
* [set LXC_ATTACH_
* X <------
* [open LSM label fd]
* send 4 -------
* [set LSM label]
* close socket close socket
* run program
The attached child tells the parent when it is ready to have its LSM labels set <pid>/attr/ exec or /proc/< pid>/attr/ current and sends it via SCM_RIGHTS
up. The parent then opens an approriate fd for the child PID to
/proc/
to the child. The child can then set its LSM laben. Both sides then close the
socket fds and the child execs the requested process.