Ubuntu
findimagedupes package

findimagedupes should be parallelizable

Bug #502224 reported by gwern on 2010-01-02

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	findimagedupes (Ubuntu)	New	Undecided	Unassigned

Bug Description

Binary package hint: findimagedupes

An excellent feature for findimagedupes would be hashing/analyzing multiple images at once, in parallel. Each image can be analyzed independently, and the file IO makes up a minuscule amount of the runtime - the problem is embarrassingly parallel. Practically linear speedups should be perfectly possible.

And the benefits are real: on large collections, the runtime can be many minutes or hours. I have 4 cores which are generally not doing much; why can't they all be used to cut the runtime by half or more?

I looked into running 4 findimagedupes concurrently and then using --merge to bring together their results, but this is deeply hacky and I worry about race-conditions and data consistency in the ultimate fingerprint database; parallelism is something the application should be handling internally.

Revision history for this message

gwern (gwern0) wrote on 2010-06-15:

It's possible that this has been fixed as of 2.18-3: I seem to regularly see findimagedupes using 200-300% in top, or 2 or 3 of my 4 cores.

Revision history for this message

Jonathan H N Chin (jhnc) wrote on 2019-01-24:

Sorry, I just saw this as I don't monitor bug trackers.
I'm the author.

This is a good idea but the code would need to be refactored.
I'll have a think.

-jonathan

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntufindimagedupes package

findimagedupes should be parallelizable

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
findimagedupes package