multipath random crashes on use-after-free
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
multipath-tools (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Trusty |
Fix Released
|
Medium
|
Rafael David Tinoco |
Bug Description
[Impact]
* multipath crashes when device-mapper is modified. DM_NAME was being freed twice.
* expect multipath daemon to crash and not run any checkers on path groups.
* not checking path groups, in an event of failure, the mpath won't change path prios.
* openstack relies on flushing device maps frequently when using iscsi.
[Test Case]
* having a multipathed environment (4 paths, 2 and 2, to a lun):
- while true; do multipath -F ; multipath -r ; multipath -ll; done
* run multipath with valgrind and see:
==31831== Invalid read of size 1
==31831== at 0x4C2E902: strncmp (in /usr/lib/
==31831== by 0x56FC26E: find_mp_by_alias (structs.c:296)
==31831== by 0x404B2F: ev_add_map (main.c:264)
==31831== by 0x404A8B: uev_add_map (main.c:244)
...
==31831== Address 0x728d8d1 is 1 bytes inside a block of size 6 free'd
==31831== at 0x4C2BDEC: free (in /usr/lib/
==31831== by 0x404A9A: uev_add_map (main.c:245)
==31831== by 0x40623C: uev_trigger (main.c:756)
[Regression Potential]
* using strdup for this char *, if there was no double free - like i discovered, would cause a slight memory leak of the size of DM_NAME every time a device mapper disappears and is re-created. it wouldn't be an important regression.
* based on upstream commit and tested by the reported. fixes initial issue.
* What releases are affected ?
The following releases already got the fix
- Xenial/
Note that Debian also has the fix.
Meaning that ONLY Trusty is affected by this bug.
* This SRU contained fixes for 2 LP bugs:
https:/
https:/
[Other Info]
It has brought to my attention that multipath in trusty has been crashing randomly. Some dumps were given to me and I was able to generate some others. I have also generated valgrind output to help me with these random crashes.
Crashes:
#0 malloc_consolidate (av=av@
#1 0x00007f5b62df3cf8 in _int_malloc (av=0x7f5b58000020, bytes=16384) at malloc.c:3423
#2 0x00007f5b62df66d0 in __GI___libc_malloc (bytes=16384) at malloc.c:2891
#3 0x00007f5b638134d7 in dm_task_run () from /lib/x86_
#4 0x00007f5b6314be5c in dm_map_present (str=0x7f5b58000990 "lun02") at devmapper.c:304
#5 0x0000000000404ac7 in ev_add_map (dev=, alias=, vecs=) at main.c:257
#6 0x0000000000000000 in ?? ()
And:
#0 0x00007f13a5933c37 in __GI_raise (sig=sig@entry=6) at ../nptl/
#1 0x00007f13a5937028 in __GI_abort () at abort.c:89
#2 0x00007f13a59702a4 in __libc_message (do_abort=
#3 0x00007f13a597c56e in malloc_printerr (ptr=<optimized out>, str=0x7f13a5a82020 "double free or corruption (out)", action=1) at malloc.c:4996
#4 _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#5 0x00007f13a5cdbe86 in free_multipath (mpp=0x7f138c03
#6 0x00007f13a5cfe117 in _remove_map (mpp=0x7f138c03
#7 0x00007f13a5cfe175 in remove_
#8 0x0000000000406b4d in mpvec_garbage_
...
#14 0x00000000004076b7 in checkerloop (ap=<error reading variable: can't compute CFA for this frame>) at main.c:1163
Please follow my analysis in the subsequent comments.
Changed in multipath-tools (Ubuntu): | |
milestone: | none → ubuntu-14.04.5 |
tags: | added: sts-sponsor |
description: | updated |
description: | updated |
Changed in multipath-tools (Ubuntu Trusty): | |
assignee: | nobody → Rafael David Tinoco (inaddy) |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in multipath-tools (Ubuntu): | |
status: | In Progress → Fix Released |
assignee: | Rafael David Tinoco (inaddy) → nobody |
## Valgrind
==31831== Invalid read of size 1 valgrind/ vgpreload_ memcheck- amd64-linux. so) 64-linux- gnu/libpthread- 2.19.so) valgrind/ vgpreload_ memcheck- amd64-linux. so) 64-linux- gnu/libpthread- 2.19.so)
==31831== at 0x4C2E0F4: strlen (in /usr/lib/
==31831== by 0x56FC243: find_mp_by_alias (structs.c:295)
==31831== by 0x404B2F: ev_add_map (main.c:264)
==31831== by 0x404A8B: uev_add_map (main.c:244)
==31831== by 0x40623C: uev_trigger (main.c:756)
==31831== by 0x5713958: service_uevq (uevent.c:118)
==31831== by 0x5713B67: uevent_dispatch (uevent.c:167)
==31831== by 0x406485: uevqloop (main.c:815)
==31831== by 0x4E3F183: start_thread (in /lib/x86_
==31831== by 0x5A2EBEC: clone (clone.S:111)
==31831== Address 0x728d8d1 is 1 bytes inside a block of size 6 free'd
==31831== at 0x4C2BDEC: free (in /usr/lib/
==31831== by 0x404A9A: uev_add_map (main.c:245)
==31831== by 0x40623C: uev_trigger (main.c:756)
==31831== by 0x5713958: service_uevq (uevent.c:118)
==31831== by 0x5713B67: uevent_dispatch (uevent.c:167)
==31831== by 0x406485: uevqloop (main.c:815)
==31831== by 0x4E3F183: start_thread (in /lib/x86_
==31831== by 0x5A2EBEC: clone (clone.S:111)
All the errors above exist because of a use-after-free caused by the code:
static int
uev_add_map (struct uevent * uev, struct vectors * vecs)
{
char *alias;
int major = -1, minor = -1, rc;
condlog(2, "%s: add map (uevent)", uev->kernel); get_dm_ name(uev) ; map(uev- >kernel, alias, vecs);
alias = uevent_
...
rc = ev_add_
FREE(alias); <--- freed here
return rc
}
Mainly because ev_add_map will call:
/* without_ path(vecs, alias))) { map_state( mpp);
* now we can register the map
*/
if (map_present && (mpp = add_map_
sync_
condlog(2, "%s: devmap %s registered", alias, dev);
return 0;
}
which will set:
extern struct multipath * without_ path (struct vectors * vecs, char * alias)
add_map_
{
struct multipath * mpp = alloc_multipath();
if (!mpp)
return NULL;
mpp->alias = alias;
So when this logic returns, mpp-> alias will be already freed by uev_add_map.
Any other code traversing mpvecs using "mpp" as "struct multipath" will have
pointer "alias" pointing into random memory causing all kinds of memory
corruption.
This logic is triggered every time the "ueventloop" gets a new uevent from
kernel saying that there was a device map change. If device is "dm-" and
its action is "change" (meaning a device mapper was changed), this logic
will try to create a new "mpp" structure in global vecs->mpvec vectors
with this discovered multipath.
This is not used in daemon startup, since all paths are initially created
by the configure() -> {path_discovery, map_discovery, coalesce_paths, coalesce_
maps}. Multipath devices are more susceptible to be created from the beginning,
or by a discovered path, that would trigger another logic to for mpp structures
(thus creating multipath devices): That might explain why this bug has not
been fixed before.
Example of crashes because of this:
#0 malloc_consolidate (av=av@ entry=0x7f5b580 00020) at malloc.c:4149
#1 0x00007f5b62df3cf8 in _int_malloc (av=0x7f5b58000020, bytes=16384) at malloc.c:3423
#2 0x000...