Confined processes inside container cannot fully access host pty device passed in by lxc exec
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
apparmor (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned | ||
lxd (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
tcpdump (Ubuntu) |
Confirmed
|
High
|
Unassigned |
Bug Description
Now that AppArmor policy namespaces and profile stacking is in place, I noticed odd stdout buffering behavior when running confined processes via lxc exec. Much more data stdout data is buffered before getting flushed when the program is confined by an AppArmor profile inside of the container.
I see that lxd is calling openpty(3) in the host environment, using the returned fd as stdout, and then executing the command inside of the container. This results in an AppArmor denial because the file descriptor returned by openpty(3) originates outside of the namespace used by the container.
The denial is likely from glibc calling fstat(), from inside the container, on the file descriptor associated with stdout to make a decision on how much buffering to use. The fstat() is denied by AppArmor and glibc ends up handling the buffering differently than it would if the fstat() would have been successful.
Steps to reproduce (using an up-to-date 16.04 amd64 VM):
Create a 16.04 container
$ lxc launch ubuntu-daily:16.04 x
Run tcpdump in one terminal and generate traffic in another terminal (wget google.com)
$ lxc exec x -- tcpdump -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
<Packet dump>
47 packets captured
48 packets received by filter
1 packet dropped by kernel
<ctrl-c>
Note that everything above <Packet dump> was printed immediately because it was printed to stderr. <Packet dump>, which is printed to stdout, was not printed until you pressed ctrl-c and the buffers were flushed thanks to the program terminating. Also, this AppArmor denial shows up in the logs:
audit: type=1400 audit(147890271
Now run tcpdump unconfined and take note that <Packet dump> is printed immediately, before you terminate tcpdump. Also, there are no AppArmor denials.
$ lxc exec x -- aa-exec -p unconfined -- tcpdump -i eth0
...
Now run tcpdump confined but in lxc exec's non-interactive mode and note that <Package dump> is printed immediately and no AppArmor denials are present. (Looking at the lxd code in lxd/container_
$ lxc exec x --mode=
...
Applications that manually call fflush(stdout) are not affected by this as manually flushing stdout works fine. The problem seems to be caused by glibc not being able to fstat() the /dev/pts/12 fd from the host's namespace.
Changed in tcpdump (Ubuntu): | |
importance: | Undecided → High |
Changed in apparmor (Ubuntu): | |
status: | Invalid → Confirmed |
There's currently no way in the AppArmor policy language to allow the getattr operation on the passed in /dev/pts/12 file. The typical workaround of adding the attach_disconnected flag to the profile does not work here because *every* AppArmor profile inside of the container would need that flag.
John Johansen has an AppArmor feature thought-out that would allow the policy language to allow this fd passing between namespaces but it is a sizeable feature and is not on the immediate roadmap.
I haven't had a chance to think it through very much but I'm curious if the LXD developers have any ideas on how this can be solved in LXD. Maybe it is possible to call openpty() from inside the container's namespace? I'm not sure if that would work or if it is safe to do but maybe it is worth investigating.