New ext3 partitions should not have max-mount count

Bug #3581 reported by Sitsofe Wheeler
46
Affects Status Importance Assigned to Milestone
partman-ext3 (Baltix)
New
Undecided
Unassigned
partman-ext3 (Ubuntu)
Fix Released
Medium
Unassigned
Declined for Intrepid by Colin Watson

Bug Description

Currently an ext3 partition set up by the ubuntu installer appears to have a max mount count of 30. I don't know whether this goes down on larger partitions but if you have a 200G partition it takes an awful long time to check. While this is appropriate for servers on a desktop I would rather take my chances than sit through that...

Other distros (Fedora, SUSE, Mandriva) default the mount count on new ext3 partitions created during the install to 0.

Related to http://bugzilla.ubuntu.com/show_bug.cgi?id=16254 (which is asking for warning).

Tags: ext3
Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Now similar to Bug #22460

Matt Zimmerman (mdz)
Changed in debian-installer:
assignee: nobody → kamion
status: Unconfirmed → Confirmed
Revision history for this message
Alexandre Otto Strube (surak) wrote :

Kamion, did you see the follow up on #22460? It seems more detailed (with even a poll on the subject there); Perhaps this one would be marked as dupe from that one? (more info there)

Revision history for this message
Colin Watson (cjwatson) wrote :

This bug is not a duplicate of bug 22460. The issue of what the defaults should be is distinct from the issue of how the system should behave given those defaults.

Revision history for this message
Vassilis Pandis (pandisv) wrote :

From tune2fs(8) man page:

You should strongly consider the consequences of disabling mount-count-dependent checking entirely. Bad
disk drives, cables, memory, and kernel bugs could all corrupt a filesystem without marking the filesystem
dirty or in error. If you are using journaling on your filesystem, your filesystem will never be marked
dirty, so it will not normally be checked. A filesystem error detected by the kernel will still force an fsck on the next reboot, but it may already be too late to prevent data loss at that point.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Still here on a system with a disk whose partition table was created from scratch with the Ubuntu Feisty herd 5 alternate install CD.

Colin Watson (cjwatson)
Changed in partman-ext3:
assignee: kamion → nobody
Revision history for this message
Doug Holton (edtechdev) wrote :

If the disk checking behavior cannot be changed (forcing you to wait 10 minutes or more during boot), then the default should be off. No force disk checking every Xth time you boot.

There are times when you need to use your computer now. For example a presentation or a class. In fact that is the normal situation when you turn on a computer. This disk check forces you to wait a long time before you can use your own computer.

Many have posted on the Ubuntu forums about this. I hope it will be disabled in Gutsy (or the behavior changed so that a user can override it when a disk check occurs).

Revision history for this message
Thiago Teixeira (tvst) wrote :

In my opinion, the best solution is to make fsck run on *shutdown* rather than on boot-up. The rationale behind this is that people usually turn on their computers to do something with it. At those times, people want their computers to boot up as fast as possible.

Conversely, people usually do not shut down their computer to do something to it. The exceptions to this are when one is going to replace some hardware, or move a laptop around. Other than that, shutdown speed is not as important.

Thus I believe fsck should run on shutdown by default, and there should be an option to change its behavior if the user would prefer that.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote : Re: [Bug 3581] Re: New ext3 partitions should not have max-mount count

Running fsck on shutdown is an interesting idea, if it can easily be
quit and done the next time, and the user is told of any problems on the
next boot.

In general, you should be able to walk away from the shutdown and know
that the shutdown will actually complete regardless.

If the user gets the option to resolve the issues at the time of
shutdown, then we should not block the shutdown if the user is
unresponsive (i.e. has walked away). The shutdown should complete and
the user gets the option to deal with those issues on boot.

Mark

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

I think any suggestion about warning the user before doing the fsck is probably better off in Bug #22460 . However since people are discussing stuff here...

Mark:
Oof. This sounds a dangerous because if an fsck stops that almost certainly means it needed user intervention to work out what to do next. Additionally checking on shutdown is hard because you have to guarantee that the filesystem has been mounted read only in case you need to change anything.

I can also see this from the remote machine perspective where I do a remote shutdown of a machine via halt only to have it kick off a massive fsck which I can no longer interrupt. Additionally you run into the Windows XP "problem" of people saying "but I just wanted it to power off!" as it uses shutdown as an opportunity to install critical updates. Imagine you are given a machine and just as you reboot the first thing it says is "can I check this disk?". You say yes not knowing how long it will take. Hours later the check is still going and finds a fault which then promptly causes your machine to power off leaving you to spend several hours getting back to the fault when you power the machine back up... which then tells you to launch fsck interactively and asks for the root password. You log in and wait a few more hours while you do an interactive fsck because the shutdown fsck had no where safe to write that it needed to do and interactive fsck on boot (this can probably be worked around but still).

The best short term thing I think you can do is a irritating cron job that kicks in every 4 months and says "checking your disks for failures before they go bad is a good idea. Do a backup now then check disks? Just check disks? ". I suspect the very best thing that can be done is an online fsck hat happens in the background while your system is "running" and is short.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

(I still say that turning off the fsck is the "modern" thing to do though)

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

All good points. Ok, back to the drawing board. Thanks Sitsofe :-)

Revision history for this message
Florin Andrei (florin-andrei) wrote :

This is truly very annoying and completely pointless. Ext3 on modern kernels is very robust. There's no need for extra paranoia every 30 reboots.
In fact, Ubuntu is the only distribution that I'm using that does this. If periodic fsck would truly be needed, I would notice file system corruption on the other systems. That does not occur. The last time I saw that was many years ago with an experimental ReiserFS.

I'm running Fedora and CentOS on a variety of machines, including laptops, sometimes these systems get powercycled, yet there's no corruption.

Heck, my wife's laptop running Vista gets powercycled every once in a while and yet there are no issues with the filesystem.

And yet the Ubuntu machine makes me waste several minutes every 30 reboots for nothing.

fsck every 30 boots is so 1997. Please let's all enter the new millenium already.

Thanks!

Revision history for this message
ma2412ma (ma2412ma) wrote :

Well, Ubuntu is not the only distribution - openSUSE is also doing it (I'm gonna have to file a bug report there as well). Other than that, I fully agree with Florin. What about creating an option in the control center that allows the user to enable and disable the periodic fsck? The default value of this option should be off, and paranoid users can always turn it on with just a few mouse clicks.

Revision history for this message
feld (felderado) wrote :

Data integrity is SO 2007.

Let's keep it that way.

Disabling this is risking instability and loss of data if there are hardware faults.

This minor "inconvenience" could save someone. Better safe than sorry. Speed should NEVER be sacrificed for correctness or data integrity.

If this check is too time consuming for you, SKIP IT. You can kill it with Alt+Sysrq+K you know. Just kill the process and it won't check. End of story.

Revision history for this message
feld (felderado) wrote :

ehh I meant that the other way around.

Correctness and data integrity should NEVER be sacrificed for speed.

You get the point.

Do the right thing devs.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

(Subscribing feld so that reply is seen
To new people (not feld!) who contribute to this bug - if you are participating in this discussion by posting comments that might lead to follow ups please subscribe yourself to this bug so that people have a means in which to follow up)

feld:
Do you disable write caching on your disks? Additionally do you use any extra ext3 journalling options?

Revision history for this message
Florin Andrei (florin-andrei) wrote :

First off, a periodic fsck does not address the problem of data integrity. It's there to make sure that the filesystem is consistent. It's just a repair _attempt_ that occurs long after the damage has been done.

Anyway, both data integrity and filesystem consistency are just fine with Ext3 on modern 2.6 kernels - look at the pretty large community using (just to pick the RH-based distros) Fedora, Red Hat and CentOS, those distributions disable the fsck and if that was a problem, it will surely show up in their bug reports. That, of course, does not happen. What is the logical conclusion?
As for my personal experience - I used Fedora since version 1 on laptops, no problem, despite some fairly stupid things I did with the file systems. I just recently migrated to Ubuntu and now I have to sit through useless fscks that waste my time and drain the battery and do nothing otherwise.

For the exceedingly paranoid, fine, keep it as an option. But don't _force_ it upon everyone else.

The installer should provide this as an option somewhere. The user should be allowed to choose whether the system should perform fsck every X reboots, how big is X, or even allow the user disable fsck altogether.
If that is implemented as a choice, my preference for the default setting would be to disable fsck, but I don't care very much about that as long as I can choose.
If it's deemed too abstract for the regular user, fine, hide it behind an "Advanced Settings" menu or screen.
Like ma2412ma suggested, also adding this as an option to the control panel or something like that would be great.

If you are that worried about data integrity, read up on filesystems and consider the more realistic choices. Periodic fsck does not address your worries. What you actually need in that case are things such as:
1. disable the write cache at the disk/controller level
2. mounting the filesystems with the "sync" option
3. mounting Ext3 with the "data=journal" option
Option #1 is probably the most efficient to keep your data safe and the filesystem sane. But it slows down the disk quite a lot.
Option #2 is the next best thing. Still slow.
Option #3 is probably just as good as #2 in terms of data integrity, probably less good in terms of filesystem consistency, and probably significantly faster than #1 and #2 (but still slower than the default - except in fringe cases such as Postfix spools when some users benchmarked higher performance with data=journal for some reason).

Of course, a simple technique such as periodic backups is more efficient than all the software tricks in the world.

Revision history for this message
Manuel López-Ibáñez (manuellopezibanez) wrote :
Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

I've just done a fresh openSUSE 10.3 install and I can confirm that ma2412ma is right - ext3 partitions seemingly do not have their mount count/check dates disabled by default on SUSE. However, the max mount count on the partition newly made during install was 500 and the check interval was set for 2 months.

Revision history for this message
Victor Osadci (victor-os) wrote :

Regarding the check-on-shut-down:
It is an interesting idea, but care should be taken in special cases such as UPS initiated shut-down or low battery laptops - it would be awful to lose power in the middle of a fsck.

Another idea would be to have a time-based forced fsck. This should be beneficial to both people that reboot frequently and bump into the forced fsck more often than necessary, and people with impressive uptimes, for whom the 30 mounts could be years without a fsck.

Revision history for this message
Jonathan Musther (musther-deactivatedaccount) wrote :

I know AutoFsck was mentioned already, but I think it's worth mentioning with regard to the last comment.

AutoFsck doesn't interfere with the start up check (if it's called for some reason, it'll still run), and prompts the user on shutdown (no user confirmation, no check run), which means it doesn't cause problems with things like UPS initiated shutdowns or low battery shutdowns.

You can find it at:
http://wiki.ubuntu.com/AutoFsck

There's also a nice simple script (with zenity GUI) on that page which allows you to easily change the interval.

Revision history for this message
Jonathan Musther (musther-deactivatedaccount) wrote :

I was wondering if solving the problem was worth while, or whether we'd be better of just scrapping the check altogether, so a while ago I asked a couple of ext3 devs:

Does fsck need to be run on ext3 partitions?

Andrew Morton:

Theoretically: no.
Practically: yes. There are software bugs and hardware bugs and
things can go wrong on-disk. You want to catch them early.
Personally I disable the auto-fsck thing and I'll run fsck manually
once or twice a year, when the time suits me.

Mingming Cao:

Periodically fsck ext3 is still needed, even if ext3 is a journalled fs.
kernel code vm/fs could be buggy, or disks IO errors, which cause
filesystem metadata corrupted silently, this can't be detected by simply
replaying the journal log.
Well how often should ext3 do the sanity check is really depend on the
customer's priority, whether they would like to trade some of the boot
up time with more confident of the fs's healthy. It's probably a good
idea to warning the user that the scheduled fsck is coming and let user
to decide whether they want to do it or delaying it.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

My understanding is that *finding* errors can be done read-only, and
*fixing* errors requires r/w. What I'd like to see is:

 - the ability to run scheduled "error finding" checks, that log the
places to be fixed somewhere
 - the ability to check those logs on boot, and fix errors quickly based
on the pre-discovered problems

That way the boot-time check can be very fast.

Mark

Revision history for this message
Jonathan Musther (musther-deactivatedaccount) wrote :

Using the -n switch, fsck can check a mounted filesystem (-n tells it not to make any changes, so it wont do any repairs). So the quick way to implement this would be to have a scheduled 'fsck -n', if it comes back clean, fine, if it comes back dirty, schedule a full check to run (on shutdown, with a note in the logout/shutdown dialogue?).

That should be quicker and easier to implement than writing the errors to a log for quick fixing.

Mark, are you wanting this for Hardy, or Hardy+?

Revision history for this message
Jonathan Musther (musther-deactivatedaccount) wrote :

Of course I started fiddling with the aim of throwing together something that could do that, but found a problem.

I ran fsck with -n on my root partition, it said there was an inode problem, so I touched /forcefsck and rebooted to run a full fsck. After rebooting I ran fsck -n again to see what it would tell me, unfortunately I now had more problems than I had the first time, lots of:

Free blocks count wrong for group #10 (19891, counted=19838).

and

Free inodes count wrong for group #10 (16186, counted=16187).

type errors. I also did the same with a different disk, and got the same issue. So it seems that ext3 always has inode and block count issues, making my suggestion of running a full fsck when errors are found with a read only one, unworkable.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

So it does look like it would be better to have a set of "known issues"
to point fsck at. I wonder how amenable fsck is to that sort of "just
fix these issues" approach?

Revision history for this message
Jonathan Musther (musther-deactivatedaccount) wrote :

I've done some research, and there's no way to use fsck (well, we're really looking at e2fsck here) in the targeted manner you described - that's not to say it can't be done, just that it can't be done with the current e2fsck, we'll need a rewrite, or a new tool altogether. That's beyond my skills ;-)

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

I've just stumbled across this mailing list thread discussing it https://www.redhat.com/archives/ext3-users/2008-January/msg00027.html (although the conversation quickly turns to the creation of a LVM based cron script). The suggestion is that people use LVM snapshotting and then check the snapshot (which is a rather clever way of doing online checking)...

Revision history for this message
Octavio Alvarez (alvarezp) wrote :

Another idea is to warn the user that he is in the 25..29th mount after he logs in and offer a read-only, valid check *whenever the user wants but before the 30th mount*. And, if the check proves the disk to need r/w to fix something, to provide an option to schedule the check for the next reboot.

This could be done in a system notification bubble, a la "your system needs a restart to be fully upgraded" with a button for the read-only check and a checkbox for the "schedule r/w check for the next reboot."

This way, the video won't suddenly be disrupted because of disk access during an AutoFsck, and also, the user won't have to wait for a full fsck even if he is turning off his laptop in a rush.

Revision history for this message
Florin Andrei (florin-andrei) wrote :

The notification bubble is an interesting idea.

But honestly, with a new Ubuntu version being released, what?, every 6 ... 9 months or so? then does it really matter whether the partition is verified or not?

When upgrading to a new release, if the disk is formatted then it doesn't matter. If it's not formatted ("soft upgrade") then just let the installer do the verification. This should be enough for any desktop / laptop being upgraded regularly to the latest Ubuntu release. I completely disabled all fsck verifications on this busy, heavily used laptop. It will get formatted when I upgrade to Ubuntu 8.04, end of story.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

(Still here in Hardy (development branch) on a freshly made partition:
# tune2fs -l /dev/hdc1
tune2fs 1.40.8 (13-Mar-2008)
Filesystem volume name: <none>
[...]
Mount count: 2
Maximum mount count: 36
Last checked: Sat Mar 29 19:20:52 2008
Check interval: 15552000 (6 months)
Next check after: Thu Sep 25 20:20:52 2008
)

Revision history for this message
ma2412ma (ma2412ma) wrote :

Can we at least make this configurable for the user? I mean configurable for everybody (i.e. a module in the settings window) and not by fiddling around in some config file. I just recently had the pleasure to enjoy a fsck on my 200GB laptop 4200rpm drive. It took over 20 minutes and I really needed my computer at that moment...

IMHO this is really a serious and annoying issue, and although it's probably too late to be fixed in 8.04 it should be resolved ASAP with an update. Either disable it by default or at least let the user decide. No other OS does such a check at bootup, and it was introduced in Linux distros about a year ago - before that we also had ext3 with no check and I didn't hear of anybody having troubles because of that.

Revision history for this message
Victor Osadci (victor-os) wrote :

Is his still a problem ? In Hardy it is possible to either skip or cancel a running fsck (bug #22460).

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

The Hardy interface for this is quite slick - at least it gives one the
impression of being cleanly executed. We should still figure out how to
handle it more gracefully than blocking-on-boot.

Mark

Revision history for this message
ma2412ma (ma2412ma) wrote :

I haven't tried Hardy - how was this solved? Is a message displayed how to skip or cancel a running check? Does it fall back to the text console or is it integrated in the boot splash thing?

Revision history for this message
Colin Watson (cjwatson) wrote :

It's integrated into the boot splash screen, with an option to cancel the check.

Revision history for this message
rusivi2 (rusivi2-deactivatedaccount) wrote :

Thank you for posting this bug.

Is this an issue in Lucid?

Changed in partman-ext3 (Ubuntu):
status: Confirmed → Incomplete
Changed in partman-ext3 (Baltix):
status: New → Incomplete
Revision history for this message
Florin Andrei (florin-andrei) wrote :

The way this issue is handled in Lucid is much better. No need to change anything now, in my view.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

An ext4 partition I have on my EeePC 900 with a 10.04 install has both a max mount count of 38 and a forced check after date of six months. Nasty on a headless machine but the laptop case is vastly improved (if a bit broken when the splash screen is skipped).

Changed in partman-ext3 (Baltix):
status: Incomplete → New
Changed in partman-ext3 (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Phillip Susi (psusi) wrote :

This was fixed a release or two ago.

Changed in partman-ext3 (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.