'apt-mark showauto' and 'apt-cache show' is slow

Bug #1713219 reported by Jarno Suni
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
apt (Ubuntu)
Triaged
Low
Unassigned

Bug Description

$ time apt-mark showauto >/dev/null

real 0m0,620s
user 0m0,557s
sys 0m0,052s

When I run the command first time, it is even much slower.

I could do the job* in fraction of a time using an AWK script
(name it ./apt-mark-showauto.awk):

#!/usr/bin/awk -f
BEGIN{
 file="/var/lib/apt/extended_states"
 while ((getline < file) > 0) {
  if ($0 ~ /^Package:/) {
   pkg=$2
   while ((getline < file) > 0 && $0) {
    if ($1 == "Auto-Installed:") {
     if ($2==1) print pkg
     break
    }
   }
  }
 }
}

$ time ./apt-mark-showauto.awk >/dev/null

real 0m0,004s
user 0m0,004s
sys 0m0,000s

Tested in Ubuntu 20.04 and mawk 1.3.4 as AWK interpreter.

*) The script omits Architecture information, though. And should apt-config be queried for the extended_states file path?

Similarly

apt-cache show <pkg>

is slow. (It also shows whether a package is manually or automatically installed.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: apt 1.0.1ubuntu2.17
ProcVersionSignature: Ubuntu 4.4.0-92.115~14.04.1-generic 4.4.76
Uname: Linux 4.4.0-92-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.25
Architecture: amd64
CurrentDesktop: XFCE
Date: Sat Aug 26 12:59:00 2017
EcryptfsInUse: Yes
InstallationDate: Installed on 2014-09-21 (1070 days ago)
InstallationMedia: Ubuntu-Studio 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.1)
SourcePackage: apt
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.cron.daily.apt: [modified]
modified.conffile..etc.kernel.postinst.d.apt.auto.removal: [modified]
mtime.conffile..etc.cron.daily.apt: 2017-05-03T10:27:27.617839
mtime.conffile..etc.kernel.postinst.d.apt.auto.removal: 2017-06-01T14:39:39.236080

Revision history for this message
Jarno Suni (jarnos) wrote :
Revision history for this message
Julian Andres Klode (juliank) wrote :

Everything in apt uses the cache and that has to be generated. Plus the depcache, I guess.

Changed in apt (Ubuntu):
status: New → Invalid
Revision history for this message
Jarno Suni (jarnos) wrote :

Do you say information about automatically installed packages can not be read from /var/lib/apt/extended_states (or whatever file set in configuration variable Dir::State::extended_states)? If I install something by apt, I see modification time of /var/lib/apt/extended_states about 1 sec later than /var/log/apt/term.log

Revision history for this message
Julian Andres Klode (juliank) wrote :

It first builds or opens the cache, then the depcache, which reads extended_states. Just reading extended_states may be inaccurate.

Jarno Suni (jarnos)
summary: - apt-mark showauto is slow
+ 'apt-mark showauto' and 'apt show' is slow
description: updated
Changed in apt (Ubuntu):
status: Invalid → New
Revision history for this message
Julian Andres Klode (juliank) wrote : Re: 'apt-mark showauto' and 'apt show' is slow

Two problems: (1) depcache opening is slow (2) we don't really need the depcache here. Anyway, triaged, but low importance.

Changed in apt (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Jarno Suni (jarnos) wrote :

Well, I think it is not a problem, if we do not need depcache here :)

Revision history for this message
Julian Andres Klode (juliank) wrote :

The problem is that the code that reads the state reads it into the depcache.

Revision history for this message
Julian Andres Klode (juliank) wrote :

I'm not going to add a second independent reader to apt just so that apt-mark runs a bit faster.

Revision history for this message
Jarno Suni (jarnos) wrote :

I just found out information about packages installed from PPAs are not found in file configured in Dir::State::extended_states, so my script does not work for them.
Are they found in some other file then?

Revision history for this message
Julian Andres Klode (juliank) wrote :

That's simply not true.

Revision history for this message
Jarno Suni (jarnos) wrote :

Maybe so, but the fact is that the file does not contain information about all packages installed to my system. (apt 2.0.4 (amd64) / Ubuntu 20.04)

$ grep -c '^Package:' /var/lib/apt/extended_states
698
$ dpkg -l | grep ^ii | wc -l
1893

My file contains only records for packages that are automatically installed for some reason. (All records have "Auto-Installed: 1")

apt-mark showmanual works anyway.

Revision history for this message
Jarno Suni (jarnos) wrote :

Maybe I should make another bug report about it?

Revision history for this message
Julian Andres Klode (juliank) wrote :

There is no bug there, this is all working correctly.

Revision history for this message
Jarno Suni (jarnos) wrote :

So the file should only list packages that are installed automatically or what?

Revision history for this message
Julian Andres Klode (juliank) wrote :

Yes

Revision history for this message
Jarno Suni (jarnos) wrote :

I wonder why didn't you choose to have only manually installed packages in the file? It would be even smaller file.

Revision history for this message
David Kalnischkies (donkult) wrote :

Mechanical train signals used to signal if the next section is clear vs. blocked by another train used to have the arm raised if it was clear and down if not. That was so that if the mechanic would fail in some way the arm would fall down and rest in the "blocked" state rather than in a "clear" state potentially causing huge problems.

I like to think that it is the same here. If the data gets lost, corrupted or whatever we fall back to a safe state: protected from autoremove as manual installed rather than causing havoc. In reality the original implementer might not have thought that far, but he isn't around anymore to ask him… and it isn't really important, is it?

Your precived "slowness" might be an out of dated cache. Have you recently run an apt command with root rights? If it is not run as root apt will build the cache in memory only. That cache can also easily be many megabytes big, which can take a while to shuffle into disk cache on a slow spinning disk.

Your script might be very fast, but it is also wrong (Auto-Installed is not the only field which can be set there even if for apt its the only one used. Other clients could use other fields and in that case apt would of course also keep the manual entries…) and doesn't even begin to support what apt-mark does. If you want to constructively work on speeding apt in these cases is to look at where the time is lost and optimize those codepaths. Showing off your script-foo is not helping and borderline trolling… after all, I can easily point at a lot of traveling salesperson (well, perhaps not in corona times), but that seems like a very hard problem for computers somehow.

Revision history for this message
Jarno Suni (jarnos) wrote :

Well, the script in the description is a bit too simple. In my application, I was looking for 'Auto-Installed:' field with value 0 to find manually installed packages, and due to the change in the way the file is build nowadays I do not find them by the code anymore. Now the field is pointless. I suppose now I could find the packages by set substraction, if I want to optimize my code.

"time apt-mark showauto >/dev/null"
is slow no matter when I run it. It is about the same if you run it for any single package. Do you call those a borderline cases? It might not be slow by latest and greatest supercomputers though, but I make code for slower hardware, too, and try to strive for reasonable responsiveness in my applications.

Revision history for this message
Jarno Suni (jarnos) wrote :

Oh, actually I was already using set substraction in my code, but I was also checking for 'Auto-Installed: 0'. Maybe that is necessary with some older version of apt? (There was some bug in my code and that is why it did not work correctly for some packages.) Sorry for blaming and confusing. However, I will change my code just in case you decide to remove the pointless Auto-Installed fields from the file.

Revision history for this message
Julian Andres Klode (juliank) wrote :

It's slow because it loads the entire depcache, and checks all dependencies. This can be avoided by refactoring the code, but it seems unnecessary. I'd rather add useful features like telling you which packages become autoremovable garbage after marking something, rather than refactoring the code to not require a depcache.

Revision history for this message
Jarno Suni (jarnos) wrote :

Julian, BTW can you tell which version of apt was the latest where value of the Auto-Installed field can be something else than 1 in the file?

Revision history for this message
David Kalnischkies (donkult) wrote :

No such version exists as it would be a bug. An Auto-Installed field != 1 is still possible if the section includes another field the current apt version doesn't know about and hence can't reason about. apt itself does not currently generate such stanzas, but a future version might. Or other clients might.

Revision history for this message
Jarno Suni (jarnos) wrote :

Oh, currently missing Auto-Installed field seems to mean the package is manually installed even if the Package field exists. I suppose I can rely on that in the future.

Revision history for this message
Julian Andres Klode (juliank) wrote :

It's also imaginable that we might change the value from e.g. 1 to yes. I don't know why it's an integer. In any case, I'd say you can't rely on that file at all. It might get changed in format, renamed or removed entirely without any notice.

Revision history for this message
Jarno Suni (jarnos) wrote :

Well, Boolean algebra may have had some influence on that. It is a common practice. 1 is shorter than yes.

Oh that is sad, because I am not satisfied with the speed of apt-mark. On the other hand it is good. Another format or data structure may be more efficient.

Revision history for this message
Julian Andres Klode (juliank) wrote :

There is no performance issue with the file format, the parsing is not noticeable.

There is a massive design problem in the DepCache having to visit every dependency in the cache and do marking stuff. This worked fine for tiny sets, but scales superlinear to package count.

That's a very long time effort thing to fix for a future ABI break, we need to only check those dependencies we actually care about.

Out of those 0.5s, probably about .4s are spent checking dependencies. This will decrease by factor 10-50 presumably.

Revision history for this message
Jarno Suni (jarnos) wrote :

Good, besides simple text file format is easy to parse and check by command line tools, if needed.
To my surprise, I have more manually installed packages than automatically installed ones, so listing manually installed instead of automatically installed in the file would not give benefit.

'apt-mark showauto' and 'apt-mark showmanual' are simple filters that do not need any information about dependencies.

/var/lib/dpkg/status lists dependencies shown by 'apt-cache show'

summary: - 'apt-mark showauto' and 'apt show' is slow
+ 'apt-mark showauto' and 'apt-cache show' is slow
Jarno Suni (jarnos)
description: updated
Jarno Suni (jarnos)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.