Race conditions when creating multiple LVM2 snapshots

Bug #483890 reported by Jürgen Kreileder
36
This bug affects 4 people
Affects Status Importance Assigned to Milestone
lvm2 (Ubuntu)
New
Undecided
Unassigned
udev (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: lvm2

I'm creating several LVM2 snapshots in quick succession for backup. The attached bash script shows what is done.
If I run the script like it is I *always* get error messages on console and in syslog, presumably all coming from udev. The snapshots seem to work fine but I'm a bit uneasy about this.

If I add "sync; sleep 2;" statements between the snapshot commands it gets a bit better, the error messages only show up sporadically then.

The system is an update-to-date x86-64 Karmic installation. Here are the error messages from two runs of the test script + entries in syslog:

# ./btest.sh
  pre-backup: creating and mounting snapshots
  Logical volume "backup_root" created
  Logical volume "backup_usr" created
  Path /dev/ACTION=_change_ no longer valid for device(253,15)
  Logical volume "backup_home" created
  Path /dev/ACTION=_add_ no longer valid for device(253,21)
  Path /dev/DEVTYPE=_disk_ no longer valid for device(253,21)
  Path /dev/PWD=_/_ no longer valid for device(253,21)
  Path /dev/DM_STATE_ACTIVE=_1_ no longer valid for device(253,21)
  Path /dev/SUBSYSTEM=_block_ no longer valid for device(253,21)
  Path /dev/DM_TABLE_LIVE=_1_ no longer valid for device(253,21)
  Path /dev/IFS=_ no longer valid for device(253,21)
  Path /dev/DM_TARGET_COUNT=_1_ no longer valid for device(253,21)
  Path /dev/UDEV_LOG=_3_ no longer valid for device(253,21)
  Path /dev/_ no longer valid for device(253,21)
  Logical volume "backup_var_log" created
  Path /dev/ACTION=_change_ no longer valid for device(253,20)
  Path /dev/ACTION=_add_ no longer valid for device(253,24)
  Logical volume "backup_var_spool" created
  Logical volume "backup_srv_www" created
  MySQL: flush tables with read lock
  Logical volume "backup_srv_mysql" created
  MySQL: unlock tables
  pre-backup: snapshots mounted
  pre-backup: copying svn repos
  post-backup: unmounting and removing snapshots
  /dev/ACTION=_add_: stat failed: No such file or directory
  Path /dev/ACTION=_add_ no longer valid for device(253,33)
  Path /dev/DEVTYPE=_disk_ no longer valid for device(253,33)
  Path /dev/PWD=_/_ no longer valid for device(253,33)
  Path /dev/DM_STATE_ACTIVE=_1_ no longer valid for device(253,33)
  Logical volume "backup_var_spool" successfully removed
  Logical volume "backup_var_log" successfully removed
  Logical volume "backup_home" successfully removed
  Logical volume "backup_srv_www" successfully removed
  Logical volume "backup_srv_mysql" successfully removed
  Logical volume "backup_usr" successfully removed
  Logical volume "backup_root" successfully removed
  post-backup: snapshots removed

# ./btest.sh
  pre-backup: creating and mounting snapshots
  Logical volume "backup_root" created
  Logical volume "backup_usr" created
  Path /dev/DEVTYPE=_disk_ no longer valid for device(253,18)
  Path /dev/PWD=_/_ no longer valid for device(253,18)
  Path /dev/DM_STATE_ACTIVE=_1_ no longer valid for device(253,18)
  Path /dev/SUBSYSTEM=_block_ no longer valid for device(253,18)
  Path /dev/DM_TABLE_LIVE=_1_ no longer valid for device(253,18)
  Path /dev/IFS=_ no longer valid for device(253,18)
  Path /dev/DM_TARGET_COUNT=_1_ no longer valid for device(253,18)
  Path /dev/UDEV_LOG=_3_ no longer valid for device(253,18)
  Path /dev/PS1=_# no longer valid for device(253,18)
  Path /dev/_ no longer valid for device(253,18)
  Path /dev/PS2=__ no longer valid for device(253,18)
  Path /dev/DM_TARGET_TYPES=_linear_ no longer valid for device(253,18)
  Path /dev/DM_STATE=_ACTIVE_ no longer valid for device(253,18)
  Path /dev/PATH=_/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin_ no longer valid for device(253,18)
  Path /dev/PS4=_+ no longer valid for device(253,18)
  Path /dev/OPTIND=_1_ no longer valid for device(253,18)
  Path /dev/DM_LAST_EVENT_NR=_0_ no longer valid for device(253,18)
  Path /dev/DM_MAJOR=_253_ no longer valid for device(253,18)
  Path /dev/MAJOR=_253_ no longer valid for device(253,18)
  Logical volume "backup_home" created
  Logical volume "backup_var_log" created
  Logical volume "backup_var_spool" created
  Logical volume "backup_srv_www" created
  MySQL: flush tables with read lock
  Logical volume "backup_srv_mysql" created
  MySQL: unlock tables
  pre-backup: snapshots mounted
  pre-backup: copying svn repos
  post-backup: unmounting and removing snapshots
  Logical volume "backup_var_spool" successfully removed
  Logical volume "backup_var_log" successfully removed
  Logical volume "backup_home" successfully removed
  Logical volume "backup_srv_www" successfully removed
  Logical volume "backup_srv_mysql" successfully removed
  Logical volume "backup_usr" successfully removed
  Logical volume "backup_root" successfully removed
  post-backup: snapshots removed

/var/log/syslog:
Nov 17 00:00:39 sphere udevd-work[15754]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:40 sphere udevd-work[15096]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:42 sphere udevd-work[15754]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:43 sphere udevd-work[15096]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:43 sphere udevd-work[15096]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:45 sphere udevd-work[15754]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:46 sphere udevd-work[15096]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:51 sphere udevd-work[15096]: last message repeated 3 times
Nov 17 00:00:51 sphere udevd-work[15754]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:51 sphere udevd-work[15096]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:53 sphere udevd-work[15754]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:54 sphere udevd-work[15096]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:56 sphere udevd-work[15754]: rename(/dev//.udev-tmp, /dev//) failed: Invalid cross-device link
Nov 17 00:00:56 sphere udevd-work[15096]: rmdir(/dev/) failed: Device or resource busy

rmdir(/dev/)? What the heck?

Tags: hardy
Revision history for this message
Jürgen Kreileder (jk) wrote :
Revision history for this message
ppww (pete-launchpad) wrote :

Not convinced it's a duplicate of 444563, but perhaps related. I have a script that creates snapshots of / and /var in succession, and often receive the same strange error message the reporter notes above:

lvcreate --size 4GB --snapshot --name root-snap-20091123b /dev/vg0/root
  Logical volume "root-snap-20091123b" created

lvcreate --size 16GB --snapshot --name var-snap-20091123b /dev/vg1/var
  Path /dev/ACTION=_change_ no longer valid for device(252,10)
  Logical volume "var-snap-20091123b" created

This is accompanied by "Invalid cross-device link" errors in syslog.

Revision history for this message
Jürgen Kreileder (jk) wrote :

I'm not entirely convinced that is a duplicate too.

Additionally to the messages already reported I also see upstart-udev-bridge restarting while creating the snapshots from time to time:
upstart-udev-bridge[25372]: Disconnected from Upstart
init: upstart-udev-bridge main process (25372) terminated with status 1
init: upstart-udev-bridge main process ended, respawning

Revision history for this message
K Richard Pixley (rich-noir) wrote :

I'm seeing similar problems in a similar situation. I'm not doing backups but I am creating and removing a sequence of lv's and snapshots. Mine are on amd64 karmic. I don't believe I've seen these on lucid with the same code.

Revision history for this message
K Richard Pixley (rich-noir) wrote :

I'm not convinced this is the same bug either. Mine is more like yours than like #444563.

Revision history for this message
K Richard Pixley (rich-noir) wrote :

I'm including a copy of the script I'm using to reproduce the problem. It assume you have a volume group named "v4c" of at least 8G available.

Revision history for this message
Jürgen Kreileder (jk) wrote :

Removed the duplicate mark as I'm and others (see https://bugs.launchpad.net/ubuntu/+source/udev/+bug/444563/comments/32 for example) are pretty sure it's a different thing.

Revision history for this message
Wessel Dankers (wsl) wrote :

Pasted from #444563, which actually does seem to be the same bug:

I think I got it. In /lib/udev/rules.d/65-dmsetup.rules there's this line:

ENV{DM_UUID}=="LVM-*", PROGRAM="/bin/sh -c 'set `lvm lvdisplay -C -o vg_name,lv_name --noheadings /dev/$name` ; echo $1/$2'", SYMLINK+="$result"

which is a particularly fragile piece of shell script that doesn't do any error checking. Replacing it with:

ENV{DM_UUID}=="LVM-*", PROGRAM="/bin/sh -ec 'set -- $(lvm lvdisplay -C -o vg_name,lv_name --noheadings /dev/$name); echo ${2+$1/$2}'", SYMLINK+="$result"

will fix the errors and warnings. No more empty or bogus symlinks.

Revision history for this message
Jürgen Kreileder (jk) wrote :

Good catch, will give it a try.

Of course, /lib/udev/rules.d/65-dmsetup.rules isn't a conffile, so changes to it will be lost when the dmsetup package gets upgraded.
(Also, this symlink rule isn't in lucid. On the other hand I didn't upgrade any of the machines showing this problem to lucid yet, maybe it's fixed there anyway.)

Revision history for this message
Wessel Dankers (wsl) wrote :

You can create a file /etc/udev/rules.d/65-dmsetup.rules, which will override the file in /lib/udev/rules.d/.

In lucid this seems to be handled completely differently (and much better imho):

From /lib/udev/rules.d/56-lvm.rules:

# Use DM name and split it up into its VG/LV/layer constituents.
IMPORT{program}="/sbin/dmsetup splitname --nameprefixes --noheadings --rows $env{DM_NAME}"

(this creates the DM_VG_NAME, DM_LV_NAME and DM_LV_LAYER environment variables)

And then, in /lib/udev/rules.d/60-persistent-storage-lvm.rules:

ENV{DM_VG_NAME}=="?*", ENV{DM_LV_NAME}=="?*", ENV{DM_LV_LAYER}=="", SYMLINK+="$env{DM_VG_NAME}/$env{DM_LV_NAME}"

Revision history for this message
Jürgen Kreileder (jk) wrote :

Yep, no more messages about symlinks with lucid.

I still do see some kind of race condition with lucid, though:

May 8 16:24:53 sphere udevd-work[24973]: inotify_add_watch(6, /dev/dm-23, 10) failed: No such file or directory
May 8 16:25:04 sphere udevd-work[24973]: inotify_add_watch(6, /dev/dm-32, 10) failed: No such file or directory

From the timestamps I'd say this happened while the snapshot made for backup were unmounted and lvremoved.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Tagging hardy. Leaving off karmic as those will be sifted through and closed at EOL in 3 months, unlikely there would be an SRU at this point.

tags: added: hardy
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

So, on inspecting this on hardy (I know the original report was against karmic), I have no /lib/udev/rules.d, and my /etc/udev/rules.d/65-dmsetup.rules looks like this:

# This file causes devicemapper devices to be assigned names by udev
# based on the name given to dmsetup.
# See udev(8) for syntax.

SUBSYSTEM!="block", GOTO="dmsetup_end"
KERNEL!="dm-*", GOTO="dmsetup_end"
ACTION!="add|change", GOTO="dmsetup_end"

# Obtain device status
IMPORT{program}="/sbin/dmsetup export -j%M -m%m"
ENV{DM_NAME}!="?*", GOTO="dmsetup_end"

# Make the device take the /dev/mapper name
OPTIONS+="string_escape=none", NAME="mapper/$env{DM_NAME}"
SYMLINK+="disk/by-id/dm-name-$env{DM_NAME}"
ENV{DM_UUID}=="?*", SYMLINK+="disk/by-id/dm-uuid-$env{DM_UUID}"

# Skip vol_id for suspended devices and those with empty or error tables
ENV{DM_STATE}=="SUSPENDED", GOTO="dmsetup_end"
ENV{DM_TARGET_TYPES}=="|*error*", GOTO="dmsetup_end"

# by-uuid and by-label symlinks
IMPORT{program}="vol_id --export $tempnode"
OPTIONS="link_priority=-100"
ENV{DM_TARGET_TYPES}=="*snapshot-origin*", OPTIONS="link_priority=-90"
ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", \
   SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"
ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*", \
   SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}"

LABEL="dmsetup_end"

And I'm not seeing any messages from udev on running the 'btest.sh' script (modified to fit inside my 10G test vm image).

Given that karmic is nearly EOL, I'm not all that interested in trying to reproduce it on karmic. Can one of the other users confirm that bug #241109 is in fact a duplicate of this, or are they separate issues (given that it is against hardy, not karmic).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.