SubDownloader

Include the HASH cache to speed future searches

Bug #242699 reported by Ivan Garcia on 2008-06-24

Affects		Status	Importance	Assigned to	Milestone
	SubDownloader	Confirmed	Wishlist	Ivan Garcia

Bug Description

Reported by capiscuas, Jun 04, 2008

the hash values of the avi files will get saved in a cache so next time it
will speed up the future searches of the same AVI's.

This was implemented in the old 1.2.9 by some fan.
http://forum.opensubtitles.org/viewtopic.php?t=145&postdays=0&postorder=asc&start=150

Revision history for this message

Rolf Leggewie (r0lf) wrote on 2008-07-04:

Is this really worth the trouble of more code? The way I understand it, several GB worth of video take only a few seconds to hash (not the complete file is being hashed)

Changed in subdownloader:
status:	New → Incomplete

Revision history for this message

Ivan Garcia (capiscuas) wrote on 2008-07-08:

I agree, currently Subdownloader does a very fast hashes for many GB of videos, coding this will take more effort and complexity to the software.

Changed in subdownloader:
importance:	Undecided → Wishlist
status:	Incomplete → Won't Fix

Revision history for this message

James (owyjxnimlbcm) wrote on 2008-07-17:

I add this feature to the previous version. I dont get term "trouble of more code". Coding change what I've done was sent to author, so code already exists.

Hashing ~100GB movies are long (1.7GHz, 1GB Ram, 7200 ATA disk), and search in stored .txt files is a lot quickier. Anyway, user can choose between Enable and Disable this option..

I vote 100% yes for this feature.

Revision history for this message

Ivan Garcia (capiscuas) wrote on 2008-07-17:

Hi James, SD2.0 won't use relative text files to store information, we are using QT preferences for that now, so we'll have to modify about your code to make it fit with the new policy.

I agree that this feature may be useful for SD fans.

Can you help us to integrate it?

Thanks.

Revision history for this message

James (owyjxnimlbcm) wrote on 2008-07-17:

I would like, but i have reinstalled computer, and dont have python /qt installed, unfortunatelly I lost source codes too, and http://www.edisk.cz/stahni/44900/Modified_files.zip_13.13KB.html doesnt work anymore..

But changes was very simple, and you maybe have source codes which I once send to you.. I had problems with the whole python/qt thing only (file handling,compiling, final file size.. etc).

I woud like help, but I dont know, if (and how many) time I will have to write this.. I must finnish my Diploma work, and I have free time for it only on weekends..

Can try this (must download some py/qt and sources i guess?), but experienced py/qt programmer can implement this so much quicker than me.

It is only about one additional variable, if true, then before hashing check filename.hash, if exists then skip hash and load filestring, if doesnt exists then hash and after that store hash to filename.hash.

Revision history for this message

James (owyjxnimlbcm) wrote on 2008-07-17:

Have anyone my modified getHash function?

Revision history for this message

James (owyjxnimlbcm) wrote on 2008-07-17:

modified files.zip Edit (8.2 KiB, application/zip)

Ok, I install py/qt, Eric4 IDE and type some stuff.
Modified files are:
gui\preferences.py
gui\preferences_ui.py
videofile.py

All my changes are commented with word James. For me saved hashes works..

http://img232.imageshack.us/img232/9342/snap1dk5.jpg

Ivan Garcia (capiscuas) on 2008-07-18

Changed in subdownloader:
assignee:	nobody → capiscuas
importance:	Wishlist → Low
status:	Won't Fix → In Progress

Revision history for this message

James (owyjxnimlbcm) wrote on 2008-07-20:

But I cant compile sources :( windows_installer.py doesnt work for me:

line 34 error:
ImportError: No module named subdownloader..

Any help? :)

Revision history for this message

opensubtitles (j-admin-opensubtitles-org) wrote on 2008-07-21:

I like idea of storing hashes. Some users use SD for network drivers, where they have 500+ GB movies, so if it is offline, it is a lot faster. Also I am not sure how hashes thing is implemented - it is written in each movie dir, or it is in one file stored where is SubDownloader ?

Revision history for this message

eduo (eduo) wrote on 2008-07-21:

#10

James: Latest code revision should fix all the troubles with "module named subdownloader" (path issues in the way modules were being called).

I have all my movies in network drives. Scanning 500GB of movies, in dozens of directories, for a thousand or so files (TV episodes) takes around 1 minute. That's BLAZINGLY fast, all things considered. And that's, also, a fringe use that is not common (nor should it be).

There are some problems with caching hashes, which are not easily solvable from a User Interface point of view:

-How is the cache built? How can you know what's in the cache or not? Can you add to the cache? Delete from the cache?

-How does the program deal with changed videofiles? Checking the cache against the existing file defeats the purpose of the cache but if you download a better version of a movie the cache is not valid any more.

-How does the program handle moved/renamed movie files? How does it handle if the network drive is mounted slightly different?

-What happens if subtitles are found against the cached hashes, but the network drive is not mounted and the preference is there for downloading to the same path as the movie?

I like the idea of caches, I've always liked caches. But I think it's not simple or straightforward. Especially from a User Interface point of view (which is what I'm referring to in all cases, I already know how to implement it technically, that's not the issue).

Revision history for this message

Ivan Garcia (capiscuas) wrote on 2008-07-21:

#11

Hi james, I think we need to improve a bit the way of handling the hashes. as eduo mention, the full filepath of the video cannot be used as index for our hashes because it's common of users to move their files.

Also having 1 .hash file per file it's not very clean i think, I'm thinking this way.

I propose use Qt Settings for a faster storage of the hashes,
the indexes can be the filesize+filename, the information will be the hash.

filename is weird that it's gonna be changed, so this double key can assure us that we are talking about the same videofile.

A second possibility is just having 1 hash file and the line will have our filesize+filename index followed by the hash value. I think this will be an slower procedure because it requires to read 1 by 1 all the lines.

Revision history for this message

eduo (eduo) wrote on 2008-07-21:

#12

I don't believe using the settings is a good idea, settings should be just that and not change once set.

If I wante this functionality (I don't) I'd vote for a separate database (sqlite) for this.

Also, it would be a good idea to consider an expiration date for cached hashes. If there's been more than, say, a month since the hash was capture into the cache then try to refresh it. If it can't be refreshed then it is deleted.

I think a note needs to be made. We're talking about two different things here:

1.-Cached hashes: This is storing the cache (the time-consuming part of the process) so it doesn't need to be generated every time for existing videofiles. This caching assumes the videofile is available but speeds up the process.

2.-Offline access: This is not caching AT ALL. This means keeping a separate database with the necessary information to check for subtitles without having the video files available at the moment. An example would be storing the videofiles in a "multimedia portable drive". You may have it connected to the TV most of the time, but you want to check if subtitles have become available for the files in it.

The processes for both may be similar but they are two very distinct situations that need to be dealt separately. The first focuses on overall speed, the second on availability.

Ivan Garcia (capiscuas) on 2008-10-20

Changed in subdownloader:
importance:	Low → Wishlist
status:	In Progress → Confirmed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

modified files.zip Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.