Actually, the reason for DB_HASH is hysterical, not O(1) performance.
db-1.85 did not have a btree implementation a decade ago.
Citing O(1) or O(log(n)), while true, misses real world issues. E.g.
RPM must lookup file paths in two indices because the data is
not rationally indexed, and a string beginning with '/' might be in
either the Providename or the Basenames table. RPM is often forced to do sequential
access (true for rpm -qa e.g.), and does too many redundant
accesses.
The above issues largely obliterate any performance benefit from using
DB_HASH or DB_BTREE.
Its rather easy to do the benchmarks, just add --stats to any RPM command
and compare using DB_BTREE and DB_HASH. I did due diligence when
I switched from DB_HASH to DB_BTREE @rpm5.org and there was no
measurable performance gain from using either DB_BTREE or DB_HASH.
There are other performance gains from improved access on certain
paths. E.g. rpm-5.2.0 @rpm5.org has a measured (with callgrind and --stats)
14.6x speed-up by changing perhaps 50 lines of code on path lookups.
But clearly I got "lucky" and just guessed which lines of code to change.
Re comment #6:
Actually, the reason for DB_HASH is hysterical, not O(1) performance.
db-1.85 did not have a btree implementation a decade ago.
Citing O(1) or O(log(n)), while true, misses real world issues. E.g.
RPM must lookup file paths in two indices because the data is
not rationally indexed, and a string beginning with '/' might be in
either the Providename or the Basenames table. RPM is often forced to do sequential
access (true for rpm -qa e.g.), and does too many redundant
accesses.
The above issues largely obliterate any performance benefit from using
DB_HASH or DB_BTREE.
Its rather easy to do the benchmarks, just add --stats to any RPM command
and compare using DB_BTREE and DB_HASH. I did due diligence when
I switched from DB_HASH to DB_BTREE @rpm5.org and there was no
measurable performance gain from using either DB_BTREE or DB_HASH.
There are other performance gains from improved access on certain
paths. E.g. rpm-5.2.0 @rpm5.org has a measured (with callgrind and --stats)
14.6x speed-up by changing perhaps 50 lines of code on path lookups.
But clearly I got "lucky" and just guessed which lines of code to change.