Fulltext Searching

Bug #383873 reported by x-rayman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Referencer
New
Undecided
Unassigned

Bug Description

Moved from answers:

Would it be possible/practical to integrate fulltext searching of the files stored within referencer not just the meta-data?

I'm not certain of the best way to do this whether to parse all the files and where possible pass the information into a database such as MySQL and then use its inbuilt searching capabilities or sqlite or both (bibus like). Or to use some internal system or using something like swish-e?

I was thinking of trying to write a plugin based on a MySQL database approach. I see that there is already a plugin for doing searching put it looks like for adding entries rather than finding entries as its main goal? I haven't looked but it would also appear that search-test as a database source is located in the main code somewhere?

PS

I've written a very crude plugin which converts ACS and RSC default pdf filenames into DOI strings. I'd be happy to put it somewhere. It is very basic and not very sophisticated it works on the premise that ACS pubs save by default to the end string of their DOI. Whilst RSC pubs do a similar thing now they also seemed to use a different convention previously. The python plugin just works like genkey but moving the title/filename to the DOI field after adding the correct journal prefix.

Tags: wishlist
Revision history for this message
x-rayman (ya93hjdqalf9) wrote :

I've written a primitive interface between a hard coded mysql database in a plugin. For each pdf a md5checkum is created and pdftotext is run these are loaded into the database. The result of pdftotext being stored as a fulltext field.

Searching the fulltext field brings back results with appropriate scoring which could then be fed back into referencer.

For now I'm highjacking test-search to bring the "results" back.

Being rather lazy how does referenecer identify each file - just by filename? or do you have some internal method/uniqueID?

Testing on my system imported 811 entries in 1 minute. Which I was pleasantly surprised about!

Revision history for this message
John S (jcspray) wrote :

Sorry for ignoring you for so long. Did you end up doing anything more with this?

Revision history for this message
x-rayman (ya93hjdqalf9) wrote :

I only made it so that your search system could use the full text search and bring back the results. I really needed a better way to get them into the gui.

Revision history for this message
Lucas Roesler (theaxer) wrote :

Would it be possible to use a program like tracker or beagle to achieve this?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.