openssh-server SIGSYS with 'UsePrivilegeSeparation sandbox'
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openssh (Ubuntu) |
Confirmed
|
Low
|
Unassigned |
Bug Description
The 'sshd' process gets 'authentication failure' and refuses to allow any login.
dmesg indicates that the problem is SIGSYS on a call to 'socket' (syscall #41, signal #31).
On a hunch, I decided to test whether the problem is related to 'seccomp' and changed /etc/ssh/
# UsePrivilegeSep
to the former standard value
UsePrivilegeSep
and logins started to work again.
Obviously, I'd like to have the additional protection that sandboxing would give me.
ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: openssh-server 1:7.4p1-10
ProcVersionSign
Uname: Linux 4.10.0-20-generic x86_64
ApportVersion: 2.20.4-0ubuntu4
Architecture: amd64
CurrentDesktop: XFCE
Date: Fri May 12 21:06:20 2017
InstallationDate: Installed on 2017-04-08 (35 days ago)
InstallationMedia:
SourcePackage: openssh
UpgradeStatus: Upgraded to zesty on 2017-04-24 (19 days ago)
KEVIN KENNY (ke9tv) wrote : | #1 |
- Dependencies.txt Edit (3.7 KiB, text/plain; charset="utf-8")
- JournalErrors.txt Edit (13.8 KiB, text/plain; charset="utf-8")
- ProcCpuinfoMinimal.txt Edit (1022 bytes, text/plain; charset="utf-8")
- ProcEnviron.txt Edit (112 bytes, text/plain; charset="utf-8")
- SSHDConfig.txt Edit (3.0 KiB, text/plain; charset="utf-8")
Colin Watson (cjwatson) wrote : | #2 |
Christian Ehrhardt (paelzer) wrote : | #3 |
Hi Kevin,
I quickly checked in the default config in zesty but ssh logins are working fine (pw logins is what I tested).
There must be more to your config that triggers this.
Is what you see in dmesg an apparmor denial or something else?
Could you attach your dmesg covering the issue that you see?
You logs attached so far only hold plenty of:
May 12 19:17:15 hostname sshd[4241]: pam_unix(
[...]
May 12 20:51:55 hostname sshd[8608]: error: maximum authentication attempts exceeded for root from 192.168.1.1 port 9990 ssh2 [preauth]
May 12 20:51:55 hostname sshd[8608]: PAM service(sshd) ignoring max retries; 6 > 3
But those could just as well be from your testing.
I imported your sshd config as auto attached to the report, restarted it and it still worked fine. So we really would need more to help debugging.
Christian Ehrhardt (paelzer) wrote : | #4 |
Thank you for taking the time to report this bug and helping to make Ubuntu better.
There isn't really enough information here for a developer to confirm this issue is a bug, or to begin working on it, so I am marking this bug Incomplete for now.
If you can provide exact steps so that a developer can reproduce the original problem, then please add them to this bug and change the status back to New.
Changed in openssh (Ubuntu): | |
status: | New → Incomplete |
KEVIN KENNY (ke9tv) wrote : | #5 |
- auth.log from the ssh daemon restart through the login failure Edit (6.8 KiB, text/plain)
I reset /etc/ssh/
ssh -v localhost
while logged in as username=kennykb uid=117
The output of 'ssh -v' is attached as 'sshclient.txt'.
The only lines that appeared in syslog after I restarted the daemon were in the attached 'syslog.txt'.
The corresponding time period in auth.log is attached as 'authlog.txt'.
The contents of /etc/ssh/ssh_config at the time of the failure are attached.
(I'll attach the files in the next few messages.)
The first is auth.log. 192.168.1.1 is my gateway machine. I see that I'm being hammered 3-4 times a minute with unauthorized requests to log in as root from some external machine. Isn't the internet a spectacularly hostile place?
If this is not enough, what else do you need? Another developer mentioned rebuilding from source with a particular debugging option turned on - if you need this, can you give me a precise description of the location and version of the package you want rebuilt and the exact change you want?
KEVIN KENNY (ke9tv) wrote : | #6 |
- /var/log/syslog Edit (3.3 KiB, text/plain)
Here's /var/log/syslog from the ssh daemon restart through the authorization failure
KEVIN KENNY (ke9tv) wrote : | #7 |
- Output of 'ssh -v localhost' Edit (2.6 KiB, text/plain)
Here's the output of 'ssh -v localhost' when authorization is failing
KEVIN KENNY (ke9tv) wrote : | #8 |
- /etc/ssh/sshd_config Edit (3.2 KiB, text/plain)
And here's /etc/ssh/
Changed in openssh (Ubuntu): | |
status: | Incomplete → New |
KEVIN KENNY (ke9tv) wrote : | #9 |
- Output of strace - showing the 'socket' call in the child process. Edit (5.1 KiB, text/plain)
Upgraded to 17.10. Uninstalled and reinstalled openssh-client, openssh-server (including loading a fresh /usr/etc/ssh directory).
Still fails.
I attach the output of:
sudo strace -f -e trace=socket /usr/sbin/sshd -d 2>&1 | tee sshd.result.txt
All the other configuration is as before.
What other information can I get for you so as not to just get this issue closed again?
I've downloaded source, but of course it configures to run from /usr/local. What are the correct flags for ./configure to get it to build the way Ubuntu builds it and install to the Ubuntu paths? Alternatively, where can I find out how to run the daemon from /usr/local/sbin instead of /usr/sbin? (That would allow me to configure the extra debugging information requested in the second comment.)
KEVIN KENNY (ke9tv) wrote : | #10 |
Oh, another note: changing the UsePrivilegeSep
Colin Watson (cjwatson) wrote : | #11 |
This issue was never previously closed, only marked Incomplete (which is an open state).
While it's possible to do a normal package build to get things configured exactly the way we do, I don't think that's necessary here. I suggest:
* git clone https:/
* cd openssh
* sudo apt build-dep ./
* make the change I suggested in comment #2
* ./configure --sysconfdir=
* make
Don't install the result. Instead, run "sudo `pwd`/sshd -p 2222 -ddd" (where 2222 is some free port on your system) and try "ssh -oStrictHostKey
Also, could you attach your PAM configuration (/etc/pam.d/sshd plus any files mentioned in @include lines there)?
The strace you attached is unfortunately not very useful. What we need to find out here is what bit of code is making the offending socket call, which is going to require some context around it: that's usually best achieved by not limiting the set of syscalls traced by strace. Unfortunately that also means that your private host keys will show up in the strace, so if you do that then you need to be careful to redact anything like that from the output!
Colin Watson (cjwatson) wrote : | #12 |
And the exact change I want is to uncomment the "#define SANDBOX_
KEVIN KENNY (ke9tv) wrote : | #13 |
- Requested files from test on 2017-10-25 Edit (29.9 KiB, application/x-tar)
OK, I think I've followed instructions here.
I built with the '#define SANDBOX_
uncommented. Recalling at long last that Ubuntu is Debian
(I use Red Hat/CentOS at work and get them confused), I
used 'dpkg-buildpackage -rfakeroot -uc -b' to do the build;
hope that's OK. I also found that I needed to comment away
the four'#include' lines that follow the
SANDBOX_
many errors relating to conflicting structure definitions.
Attached tarball 'ssd-test-
the result of running 'sudo sshd -p 2222 -ddd'. It also contains
an etc/ hierarchy that includes the current /etc/ssh/
and the relevant files from /etc/pam.d. I also threw in the result
of 'strace -f' applied to that command, in case it helps narrow
the point of failure further. I took a quick troll through the
output, and I don't *think* I see it revealing more than a
few bytes of a private key.
Thanks for responding! I hope this stuff helps to move the walls
in on the problem.
Colin Watson (cjwatson) wrote : | #14 |
- sshd-debug.patch Edit (1.4 KiB, text/plain)
Thanks. The strace is indeed useful, although it doesn't get us all the way there. This is very strange! It appears that the network monitor is attempting to create a Unix socket in the middle of sending a USERAUTH_SUCCESS packet, which is peculiar to say the least.
This may take a few iterations. Could you please revert your changes to sandbox-
Seth Arnold (seth-arnold) wrote : | #15 |
Hello Kevin, I don't want to step on Colin's toes here, but I couldn't sort out anything from the logs or files so far. I've had success before with perf to grab call graph information. The usual process is along the lines of:
- enable the ddebs.ubuntu.com repository to get access to debug symbols: https:/
- install packages openssh-
- Record perf results during testing with something like:
sudo perf record -e syscalls:
- test
- Check results with something like:
sudo perf script
With luck this will spit out the function names of the call chain at the time of the socket() syscall that is being killed (probably alongside a lot of unrelated instances -- it's possible to tune the perf line to narrow it down if necessary).
I hope this helps,
Thanks
KEVIN KENNY (ke9tv) wrote : | #16 |
- sshd-strace-20171026.txt Edit (204.0 KiB, text/plain)
The plot thickens.
The configuration of the build appears to be partially implicated.
Could it be that with your ./configure flags, it's failing to find a
failing PAM or something?
When I build with the ./configure that you suggested, it works.
When I build with dpkg-buildpackage, on the same source code, it fails.
Output from the failing case, with your patch to add debug output,
attached as sshd-strace-
KEVIN KENNY (ke9tv) wrote : | #17 |
- perf-20171026.txt Edit (66.8 KiB, text/plain)
Seth: Your requested output is perf-20171026.txt
It's not clear to me what's out of the ordinary in the stack
traces, except of course that once we're in the Python code
of 'apport', things have unquestionably already gone to Hell.
Maybe someone who's familiar with the code will have a better
idea.
Colin Watson (cjwatson) wrote : | #18 |
Seth, this is not as helpful as you might think, because the socket call is being denied by seccomp so the relevant call never shows up in perf. Feel free to try to construct a different perf call that takes account of that ...
Colin Watson (cjwatson) wrote : | #19 |
- sshd-debug.patch Edit (3.9 KiB, text/plain)
OK, so that puts it somewhere inside ssh_packet_
The configuration difference is certainly suggestive. You can see the other configure options in debian/rules, and it would perhaps be worth trying different combinations of those if you have the time: the relevant ones are probably --with-
KEVIN KENNY (ke9tv) wrote : | #20 |
OK, I applied your patch, and was lucid enough to follow farther
into the maze with a few additional debug3's.
I get down into the 'cipher_crypt' function, at line 378:
if (EVP_Cipher(
len) < 0)
return SSH_ERR_
... so now we're off spelunking into libcrypto somewhere.
I tried plunging a short distance into the OpenSSL source, but didn't
have time to get very far down that road. At the time of the crash,
cc->evp is set, EVP_Cipher_
EVP_CIPHER_
these results is significant - I'm looking at this API for the very
first time, and not yet entirely sure of what all the objects
represent.
I likely shan't have any further time today to experiment with
./configure flags. Maybe on the weekend.
KEVIN KENNY (ke9tv) wrote : | #21 |
I lied. I experimented with ./configure flags.
--with-
is definitely the flag that is triggering the crash. Removing this
flag alone cures the crash.
Command that was used:
../configure --build=
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
# removed
# --with-
KEVIN KENNY (ke9tv) wrote : | #22 |
(And, for what it's worth, I don't, to the best of my knowledge, have anything
Kerberos-related set up. There is no /etc/krb5.conf file.
Colin Watson (cjwatson) wrote : | #23 |
- sshd-enable-socket.patch Edit (523 bytes, text/plain)
I'm at a loss as to why Kerberos should affect this particular thing, at least when there's no actual Kerberos authentication involved. Silly question, but you don't have a modified OpenSSL or anything related to it, do you, and what exact package version of libssl1.0.0 do you have installed? Also, if you put --with-
I'm not entirely convinced about your reported value for EVP_Cipher_
The next thing I can think of to try is to allow the network monitor to use this system call and see what else happens around it. Obviously do this very cautiously, and do not run with the attached patch in production (I'm pretty sure the socket syscall is deliberately forbidden in this context), but it should be enough to get a more complete strace and (probably more usefully) to try Seth's perf idea again: with this patch, the socket syscall should actually make it as far as the tracepoint, so we should be able to get a stack trace for it.
KEVIN KENNY (ke9tv) wrote : | #24 |
Disable ssl-engine, re-enable kerberos5, it still crashes.
I'm pretty sure I'm running a stock OpenSSL. The only things
that I can think of that I might have done in that general
vicinity were to install xinetd, althttpd, and stunnel4, and
to obtain a host certificate from letsencrypt.org.
I'm still seeing it falling apart in EVP_Cipher. When I
allow 'socket', it unsurprisingly fails with SIGSYS on a
'connect' that follows immediately. The stack trace is
confusing. I've installed libkeyutils1-
reports [unknown] for the locations.
sshd 9257 [006] 260133.546558: syscalls:
But at least we know it's going through keyutils, which isn't a huge library!
What I get from a separate run using strace is also likely a usefil clue. The abort
on 'connect' shows that it's trying to connect to D-Bus. I have Absolutely No Idea
what's going on there, but it seems quite peculiar. (There are a bunch of writes
from debug3 calls that I strewed throughout cipher_crypt.)
[pid 9903] write(7, "\0\0\0#
[pid 9901] <... write resumed> ) = 53
[pid 9903] <... write resumed> ) = 39
[pid 9901] write(2, "debug3: mm_request_receive enter"..., 37debug3: mm_request_receive entering
<unfinished ...>
[pid 9903] write(7, "\0\0\0\
[pid 9901] <... write resumed> ) = 37
[pid 9903] <... write resumed> ) = 26
[pid 9901] read(5, <unfinished ...>
[pid 9903] write(7, "\0\0\0#
[pid 9903] write(7, "\0\0\0\
[pid 9903]...
Colin Watson (cjwatson) wrote : | #25 |
This looks very weird. If it's calling EVP_Cipher, that should go through OpenSSL; it's conceivable that it might end up in libkeyutils via the Kerberos libraries, sure, but a simple cipher call should already have everything it needs; and libkeyutils doesn't use D-Bus.
Could you attach /lib/x86_
KEVIN KENNY (ke9tv) wrote : | #26 |
- libkeyutils.so.1.5 Edit (13.9 KiB, application/octet-stream)
Yeah, this is almost starting to look like malware trying to exfiltrate keys somehow! Unauthorized socket I/O being done by a library that doesn't match the symbol tables? (I'm glad I don't have any Kerberos keys to leak!)
Kevin Kenny (kennykb) wrote : | #27 |
Not trying to pester, but bumping this in case you didn't see that I posted the requested file.
Luke A. Perkins (public-a) wrote : | #28 |
See also Ticket #1831765 and #1832110 regarding the path of the privilege separation directory (aka: /run/sshd).
This path is hard-coded into sshd and there is no means of changing its location. I have found that if the directory is missing, all ssh communication unceremoniously stops (iff UsePrivilegeSep
My work-around is to create a service that does nothing other than create the "/run/sshd" directory and define the RuntimeDirector
Robie Basak (racb) wrote : | #29 |
Thank you for posting the additional information here. It sounds like this will help others affected.
I see that the bug Importance has never been set, so I'm setting it now, to Low, based on "unusual end-user configurations" from https:/
Changed in openssh (Ubuntu): | |
importance: | Undecided → Low |
Launchpad Janitor (janitor) wrote : | #30 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in openssh (Ubuntu): | |
status: | New → Confirmed |
It seems surprising to me that the pre-authentication network listener would need to create new sockets. I think the best thing to do here is going to be to rebuild sshd locally with SANDBOX_ SECCOMP_ FILTER_ DEBUG defined (see the top of sandbox- seccomp- filter. c) to try to narrow this down.