Release Search API enhancements
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Open Library |
Confirmed
|
High
|
Edward Betts |
Bug Description
Stephanie @ California Digital Library has asked for the following:
1) Search API should return full record for each book, instead of just ID
2) Query multiple books with a single request. This is a barrier to
surfacing OpenLibrary links on the search results page because we would have
to make requests for every book.
3) Enable an author/title search within a single query. Right now we need to
search for the author, then use the author ID to search for the book.
4) (2008-06-26) Jonathan Rochkind has asked for API access to fulltext search (see comment #20).
(2008-07-23) There's now a separate tracker item for fulltext search access, bug #251276.
solrize (solrize) wrote : | #1 |
Changed in openlibrary: | |
assignee: | nobody → solrize |
importance: | Undecided → High |
status: | New → Confirmed |
importance: | High → Wishlist |
solrize (solrize) wrote : | #2 |
How does this look as an expanded search api:
q is a dictionary (json formatted) contains the following fields:
offset :: Integer -> result number to start at, default 0
rows :: Integer -> number of rows to return, default 20
EXACTLY ONE OF:
query :: String -> basic query string ("basic query" means cross-field)
advanced_query :: dictionary
The advanced query dictionary would be something like the above. We could make a syntax for Boolean combinations.
Another (simpler) approach would be to just use a string in Lucene syntax, but we may not want to always stay with that.
Stephanie Collett (stephanie-collett) wrote : | #3 |
RE: 1+2) Yes, I think the less transactions the better for creating dynamic links on the client side.
RE 2) Multiple books in a single request is a tricky requirement because there needs to be a way to determine which book came from what submitted criteria. Google's API nests the results, but they only allow an identifier search at this time.
http://
RE 3) I like this approach. Would it then be possible to limit search results by scanned books?
RE 4) Sounds interesting, I'd have to think about it. But I don't have a use case for this right now.
RE 5) Offset and Row options would be a good way to handle large sets.
Stephanie Collett (stephanie-collett) wrote : | #4 |
I am most interested advance query for my catalog integration, but I think a basic cross-field query could useful in many other situations.
solrize (solrize) wrote : | #5 |
To select scanned books, just add the has_fulltext field with value 1.
For testing purposes I will add an advanced query using Lucene syntax, but we may later want to switch to something more dictionary based like in comment #1.
solrize (solrize) wrote : | #6 |
As a preliminary pass at this, please try:
http://
Changed in openlibrary: | |
status: | Confirmed → In Progress |
solrize (solrize) wrote : | #7 |
Here is one limited to full text. Sorry that launchpad is breaking up these long url's. You'll have to paste them back together.
http://
Stephanie Collett (stephanie-collett) wrote : | #8 |
This is great! Being able to search by author, title, and fulltext is key for my application. And multiple format options enable both quick linking by the ID and more heavy weight applications working directly with the data.
Could there be a way for multiple queries to be passed within one request to the API? My particular use case can have up to 30 books to look up, and it would be nice to only make a single (or few) requests to query all those books.
solrize (solrize) wrote : | #9 |
Really 30 different search queries? Each one is typically an author-title combination where they are all basically unrelated? OK, I made it so you can supply a list as the query value. I'm just a little surprised that the application wants that.
Let me know how this looks and if there's too much crud in the results etc. Also I probably want to clean up the syntax some before releasing it, so this is just a test of concept.
http://
Stephanie Collett (stephanie-collett) wrote : | #10 |
Yeah, I know that is a little crazy, but it is an upper bound. Searches on Melvyl have an average of 8-9 items. And many of books will have identifiers (e.g. LCCN, ISBN, OCLC). However, some books will require at least an author and title search. So...theoretically there could be 30 author/title searches, but it is unlikely. We are excited to try linking to the OCA content directly in the search results.
I like the prototype, and think it will integrate well into Melvyl. My next step will be to build it into our Melvyl test instance along side the Google Book Search code. I'll let you know how it goes. Thanks!
solrize (solrize) wrote : | #11 |
per Stephanie's request I've added a callback parameter, e.g.
http://
This wraps the json result string in a function call with the name given in the callback parameter, e.g.
olresults(
I think the idea is to insert the result directly into the dhtml sent to the user's browser so that client side javascript can process it.
Anand Chitipothu (anandology) wrote : Re: [Bug 236947] Re: search API improvements | #12 |
> This wraps the json result string in a function call with the name given
> in the callback parameter, e.g.
>
> olresults(
What are the available callbacks?
> I think the idea is to insert the result directly into the dhtml sent to
> the user's browser so that client side javascript can process it.
client side javascript can also process JSON. I don't think we should
be doing this.
Anand Chitipothu (anandology) wrote : | #13 |
On Sat, Jun 7, 2008 at 6:37 AM, Anand Chitipothu <email address hidden> wrote:
>> This wraps the json result string in a function call with the name given
>> in the callback parameter, e.g.
>>
>> olresults(
>
Sorry, I misunderstood this. I thought you are executing a python function.
This is useful in general. We can add this support for all API functions.
solrize (solrize) wrote : Re: search API improvements | #14 |
Aaron explained it to me, the idea is that the client sends the OL request and the callback gets around restrictions on cross-site scripting. The callback function is just whatever is in the request url. Out of general paranoia I check that it's an identifier-like token.
One issue is about the case where there's a bunch of different queries in one request: maybe that leads to an overlong url and you want to POST instead of GET the request. I'm not sure if that works so well with this model. Also, it occurs to me that putting the search terms into the url is a slight privacy hazard because of company firewalls that log outgoing url's but don't log POST contents. We should at minimum put up an HTTPS server to help with such issues.
rejon (rejon) wrote : | #15 |
Cool, this is what David needs for Open Library + Wikipedia plugin. David, anything more needed?
Stephanie Collett (stephanie-collett) wrote : | #16 |
To enable JS libraries with namespaces, would it be possible to allow period characters in the callback method?
This would look like:
q={"query"
and return:
JS.Library.
solrize (solrize) wrote : | #17 |
Does anyone see any exploits? I'm way out of the game with this stuff and not thinking very clearly. I'll put it in tomorrow unless someone sees a problem.
Aaron Swartz (aaronsw) wrote : Re: [Bug 236947] Re: search API improvements | #18 |
Del.icio.us doesn't seem to do any sort of filtering, so I suspect it's OK:
http://
solrize (solrize) wrote : Re: search API improvements | #19 |
I added periods to the set of allowed characters, except the first char still has to be alphabetic.
solrize (solrize) wrote : | #20 |
Jonathan Rochkind has asked for a search API to access fulltext search of scanned books. I will add something like that, so the API takes search terms and returns a list of OL book records and leaf numbers where the search terms occur. Jonathan also asks about launching flipbook with the search terms bookmarked and highlighted. See bug# 126611 for more about that.
Stephanie Collett (stephanie-collett) wrote : | #21 |
Would it be possible to wrap json error messages in callbacks? When an error occurs there is no way to recover in Javascript since it requires a callback.
olresults(
description: | updated |
solrize (solrize) wrote : | #22 |
I'll see if I can add the calback around the error message this evening.
solrize (solrize) wrote : | #23 |
Callback for error return is added.
jrochkind (rochkind-jhu) wrote : | #24 |
Thanks Paul. What I'm asking for is actually a bit different/simpler than a full search API via Open Library. That would be useful, but would be actually too much overhead on my end for what I want to do here, that I am already doing with Amazon and GBS for analogy. To explain:
I can already search using the IA XML search, and discover from those XML results if there is a flipbook available. If there is a flipbook available, I can determine it's URL, such as:
http://
And I can send the user there, and once there, they can enter a search query and see search results. But I'd like to let them enter a search query over in my interface, and send them to that page with their search already submitted and the results immediately shown. Both Amazon and Google Books let me do that with URLs analagous to:
http://
( Ie, http://
or: http://
Much simpler than having to use a full scale API and render the search results myself--which I don't need that full power right now. I'm happy to just send the user into your rendered search results, like I can with Amazon and GBS.
So, whether it's through OpenLibrary or not, I really want something much simpler than a full API (which would maybe return hits in XML)---I just want a way to "link" into search results in the IA's website. Make sense? If it's through OpenLibrary, the problem is that many books that are accessible and have flipbooks via the IA website in general (and the XML search) are not neccesarily in the OpenLibrary yet, and may not be for some time. So my specific suggestion is really just enhancing that flipbook interface to pay attention to a URL query parameter "&query=" or what have you, and automatically present search results for that query if present.
Make sense?
solrize (solrize) wrote : Re: [Bug 236947] Re: search API improvements | #25 |
> http://
Yes, I understand, we have a rather old open request about this (see
above), and there's an implementation that does about what you're asking,
but it isn't reliable enough to release at the moment. I should really
give it more attention but nobody has been asking about it recently. It
requires hacking inside of Flipbook itself, which is a complex Javascript
application that we've been wanting to replace. I'm about to start doing
some more fulltext-related stuff for other reasons too, so I'll see if I
can fix the remaining problems.
solrize (solrize) wrote : Re: search API improvements | #26 |
The development search engine (used by apollonius in the links above) will be down for part of today for a ram upgrade. I hope this doesn't interfere with anyone.
Aaron Swartz (aaronsw) wrote : | #27 |
It still seems to be down -- everything is returning:
IOError: [Errno 32] Broken pipe
solrize (solrize) wrote : | #28 |
Restarted apollonius:9071 server. I've seen it wedge like that a couple other times. I don't know why.
Jason Ronallo (jronallo) wrote : | #29 |
> 1) Search API should return full record for each book, instead of just ID
Has this been implemented yet? I'm still just getting back just IDs. As the original requester commented, it would be nice to be able to make a single request of OL and be returned all metadata on an edition.
> http://
This example link was given for how to pass multiple queries in a batch. The concern was that there would be too much crud in the response. I tried to take a look but got an error as Aaron did.
I can easily see having a batch of 30 queries to make at a time even though I'm looking for a single work. If I have an identifier like an ISBN, I may run it through a service that returns all related identifiers (OCLC, ISBN, LCCN). Then query OL with all of them. This greatly increases the chances that I will find fulltext. Now if OL is FRBRized (all related editions groups) and could return those related editions-
When you start returning results for a batch of queries, it would be nice to have deduping already done. Some thought will need to be given to how to do this. Some folks will likely want the results for each query in a batch to be identified by the query, so this complicates what you might be able to do with deduping.
One API that you might like to look at for ideas is for MBooks/SDR. It does nice things like scoring results as well as deduping. It is still in development, so it doesn't do batches yet, but here's an example of what it can do with multiple identifiers for the same edition:
http://
solrize (solrize) wrote : | #30 |
Jason, please go ahead and try that query again, the test server was down for some reason and I just restarted it.
MBooks/SDR looks interesting, though its scoring function (like ours) doesn't look that useful. I can put solr's scores into the API search results if they aren't already there and you want them, but search ranking for bibliographic records is quite problematic. Most libraries don't attempt it.
When you say deduping, do you mean merging the search results when more than one query hits a specific book? I can think of various ways to do that but maybe it's more sensible to let the client app do it.
Solr implements Lucene-syntax queries with arbitrary boolean combinations, so you could do your ISBN/OCLC/LCCN query as a big OR instead of a bunch of separate queries, if that helps. I'm hesitant to "officially" support that syntax in the API but if it's an important type of query then I could come up with some json-ified version.
aaron r (arubinstein) wrote : | #31 |
A couple very minor points...
1. Is it possible to return the cover images with the expanded results?
2. Filters like the "has_fulltext" is great. Is there one planned for language?
Thanks
solrize (solrize) wrote : | #32 |
1. You mean you want the url for the cover image jpeg? Hmm. Let me get back to you on that. I don't think it's in the index now but it might be possible to add it in the next rebuild.
2. Unfortunately most of our catalog records don't specify the book's language. Trying to filter on language will usually get an empty result set, even if you're just looking for books in english.
aaron r (arubinstein) wrote : | #33 |
1. I'm sorry for not being totally clear. It looks like user-added cover images are returned with a "/get" query using an OpenLibrary #.
Here's an example query that returns the coverimage:
http://
When a title with a user-added image comes up in a query using the expanded format, the coverimage information is not included. After I posted this question, I noticed that the results in expanded format are also missing the "isbn_10" that is returned with a /get query for a specific title. Since those seem to be the identifiers used for the jpegs that were gathered from google books (I assume), I've been using them to generate urls for the cover art. I guess this is all moot if you're able to add the image urls directly into the index, which would be great.
2. Too bad... I'm working for an organization that's part of the OCA and will be contributing a large amount of books in one particular language. When the books are submitted, we'll make sure that the language code is included in the metadata. It would be great if there was a way to filter on language for that reason. I hate to ask for special favors but it seems like something that in an ideal world could be very useful for others as well.
solrize (solrize) wrote : | #34 |
1. I'm not sure why the query
http://
doesn't return the cover attribute. Anand, if you're reading this, any idea?
2. You can send in arbitrary lucene queries so you can say (for example) languages:eng (use the 3-letter MARC code for the language). The problem is that it returns few books because the language data is not there for the search to find. But if you add a bunch of records with language fields, the search engine can see those fields.
aaron r (arubinstein) wrote : | #35 |
I can get it to work for English books but this query returned no results:
http://
while this query returns items that clearly have "yid" in the language field:
http://
Please feel free to email me (arubinstein at bikher dot org) if this is no longer relevant to the discussion.
Thanks!
solrize (solrize) wrote : | #36 |
Thanks, I think I see the problem and will fix it in the next index build, hopefully within a week or so.
solrize (solrize) wrote : | #37 |
I have opened a new tracker item, bug #251276, for the (nonexistent right now) fulltext search API. Main reason for having it in a separate item is that this one is getting rather long. The two can be seen as related. Please put any discussion or requests about the fulltext API in the other item rather than here. I expect to be doing some work on it pretty soon.
description: | updated |
solrize (solrize) wrote : | #38 |
I just checked the stuff described above to the main hg repo. Any reason not to pull it to staging?
Jonathan Narwold (jonathan-narwold) wrote : | #39 |
Any progress on #1 at the top (showing full book information instead of just IDs)? I tried some of the links in this post, and it looks like the test server is down at the moment. Any chance this could be moved to the main OpenLibrary search API?
solrize (solrize) wrote : | #40 |
Unless I'm mistaken we did push out that code to production a while back, but right now I see that trying the example strings above causes crashes the json encoder, so it looks like there has been some kind of regression. I'll look into it. Thanks for bringing this up, it has been on the back burner for a while.
Nate Irwin (nate-nateirwin) wrote : | #41 |
I'm interested in any progress on this, as well.
Trying something like this http://
Am I just doing something wrong?
solrize (solrize) wrote : | #42 |
That query crashes with a stack dump, which is automatically a server side bug. If something is wrong with the user query, the server should at worst send a reasonable error message.
Looking at the stack trace, the json serializer is crashing on an input that looks valid at first glance. I'm in the middle of something else right now but will try to fix this soon.
Nate Irwin (nate-nateirwin) wrote : | #43 |
Have you been able to take a look at this? I want to integrate Open Library with my application, but I have to be able to get more information back from the search query. Otherwise the search is just too inefficient to integrate. Thanks!
i30817 (i30817) wrote : | #44 |
If you don't mind, besides the list of results (that doesn't appear to work yet) i would like something like the amazon not funcionalilty to filter audiobooks and comics for example. If your database is built correctly this may help queries (Amazon has a book index for everything and thus must disambiguate in search).
What is going to be the return format for the List? I need a way to know what search corresponds to what result.
i30817 (i30817) wrote : | #45 |
Also, i would be good if the first result on a search was the one that had a book cover. I know this is not always possible even in the amazon search, but that's what i'm using the search for - finding the OLID's for a book filename.
solrize (solrize) wrote : | #46 |
Nate Irwin 3/12, sorry, I didn't notice your question when you posted it. If I don't get back to you within a few days, please remind me, either here or by email.
i30817, you can use "not" in Lucene queries--does that take care of what you're asking for? The one concern that I have is wanting to keep open the possibility of moving away from Lucene, so I don't want to promise long-term support for every weird feature of Lucene query syntax. But for now and the immediately foreseeable future, it should work, and any replacement should certainly support "not" in some form.
i30817 (i30817) wrote : Re: [Bug 236947] Re: search API improvements | #47 |
I see - it works. There is still one small problem:
besides multiple requests, i still have to check all olid's images returned
to see if there is a viable image there. If the returned list could be
sorted acording to having images or not or if it returned a parameter saying
if there is a image or not this could be avoided - but if you don't want to
change this, its not a great problem.
Anand Chitipothu (anandology) wrote : | #48 |
2009/4/27 i30817 <email address hidden>:
> I see - it works. There is still one small problem:
> besides multiple requests, i still have to check all olid's images returned
> to see if there is a viable image there. If the returned list could be
> sorted acording to having images or not or if it returned a parameter saying
> if there is a image or not this could be avoided - but if you don't want to
> change this, its not a great problem.
search engine doesn't know about the availability of covers. You have
to use a separate query to find that.
i30817 (i30817) wrote : Re: search API improvements | #49 |
How about a author query parameter? I'm seeing results of Authors who write about other Authors and books.
James Joyce gets a bucketful of leaches for example.
solrize (solrize) wrote : | #50 |
Anand, Is there some reason the presence of covers isn't mentioned in the json records? If the info is there then I can index it.
i30817, you can use "authors:(james joyce)". Is that what you want?
Anand Chitipothu (anandology) wrote : Re: [Bug 236947] Re: search API improvements | #51 |
2009/4/27 solrize <email address hidden>:
> Anand, Is there some reason the presence of covers isn't mentioned in
> the json records? If the info is there then I can index it.
we don't have cover info in the json records. there are stored separately.
Here is a related issue: https:/
i30817 (i30817) wrote : Re: search API improvements | #52 |
does the "authors:(james joyce)" work with multiple authors? Do i need to put a , or something between authors?
solrize (solrize) wrote : | #53 |
You can do arbitrary boolean queries using AND, OR, and NOT:
authors:(james joyce) OR authors:(anton chekhov)
i30817 (i30817) wrote : | #54 |
Thanks. Can you tell me why does this search:
"title:(Holidays are Hell) authors:(Kim Harrison ) OR authors:( Lynsay Sands ) OR authors:( Vicki Pettersson ) OR authors:( Marjorie Liu)"
Gives me another unrelated matches by other authors?
solrize (solrize) wrote : | #55 |
It looks like "are" is a stopword, so ignored in "holidays are hell". That gets all books with "holidays" and "hell" in the title, including a bunch of editions of "Holidays in Hell" by P. J. O'Rourke, plus " Hell for the Holidays" by Chris Grabenstein etc.
I think I see what you are trying to do:
title:(Holidays are Hell) AND authors:(Kim Harrison)
or the more complete version:
title:(Holidays are Hell) AND (authors:(Kim Harrison ) OR authors:( Lynsay Sands ) OR authors:( Vicki Pettersson ) OR authors:( Marjorie Liu))
In general, if you want to use fancy Lucene syntax, you should look at the Lucene docs for how to do it.
i30817 (i30817) wrote : Re: [Bug 236947] Re: search API improvements | #56 |
Ah, ok. I think i'm golden now.
i30817 (i30817) wrote : Re: search API improvements | #57 |
Say why does the search:
title:(Roadside Picnic) AND (authors:(Arkady) OR authors:(Boris Strugatsky))
give nothing while
title:(Roadside Picnic) AND authors:(Arkady) OR authors:(Boris Strugatsky)
gives the book, if the first search is more correct (i'm not doubting - i've seen it's more correct in other cases).
solrize (solrize) wrote : | #58 |
It is a little bit odd that the stored author fields for some of the books relevant to that query are missing. I'm in the process of reindexing the search engine so will try again when the new index is done.
Let's try to keep this tracker item on the topic of API improvements. For other issues, please feel free to open a separate item. Thanks.
solrize (solrize) wrote : | #59 |
To not leave the question hanging, that query seems to work with the new index, which is now on the staging server. It should go on production in the next day or so.
summary: |
- search API improvements + FR: search API improvements |
Jonathan Narwold (jonathan-narwold) wrote : Re: FR: search API improvements | #60 |
This full-text search function STILL does not appear to work on the old JSON api, and I don't see an equivalent search function in the new Restful api. Could someone give me an update on what's available?
Jonathan Narwold (jonathan-narwold) wrote : | #61 |
Just for clarification - I didn't mean full-text in the sense that you're searching the text of a book. I'm referring to full book details (as opposed to just keys).
solrize (solrize) wrote : | #62 |
Jonathan, thanks for pinging this. format=expanded is definitely still broken. I'll see if I can fix it.
Are you willing to help write a spec for the restful api? One reason it's slid is that I'm not sure exactly what it should do.
Anand Chitipothu (anandology) wrote : Re: [Bug 236947] Re: FR: search API improvements | #63 |
2009/8/2 solrize <email address hidden>
> Jonathan, thanks for pinging this. format=expanded is definitely still
> broken. I'll see if I can fix it.
>
> Are you willing to help write a spec for the restful api? One reason
> it's slid is that I'm not sure exactly what it should do.
The search API must be exactly same as the query API.
http://
Please let me know if you need any more info.
solrize (solrize) wrote : Re: FR: search API improvements | #64 |
It looks like format=expanded broke because the format of author objects changed. I've made a patch to fix this and will try to deploy it.
http://
George (george-archive) wrote : | #65 |
Anand - are you calling this resolved?
Changed in openlibrary: | |
assignee: | solrize (solrize) → Anand Chitipothu (anandology) |
sancho (sancho) wrote : | #66 |
Hi,
Does the restful api support text searching yet?
What i'd like to be able to do is pass a string to the search and return all matching books e.g.
http://
OR
http://
This would return all details for matching books.
I'd even be happy having to do 2 queries e.g.
http://
http://
As you can see from the links below both should return books
http://
http://
http://
Thanks
Simon
Anand Chitipothu (anandology) wrote : | #67 |
Sorry Simon, there isn't a way yet to access search though API.
Changed in openlibrary: | |
assignee: | Anand Chitipothu (anandology) → Edward Betts (edwardbetts) |
assignee: | Edward Betts (edwardbetts) → Anand Chitipothu (anandology) |
sancho (sancho) wrote : | #68 |
Thanks George,
Any idea when this functionality might be available?
At the moment the multiple queries required to get date via the json api is quite expensive.
Is there a possibility that paging could be added to the json search, as it appears it is only possible at the moment to get the first 20 results e.g.
http://
Cheers
Simon
Changed in openlibrary: | |
status: | In Progress → Confirmed |
Anand Chitipothu (anandology) wrote : | #69 |
You can try using our experimental search API.
http://
Please keep in mind that this is experimental and can change any time soon.
sancho (sancho) wrote : | #70 |
Hi Anand,
Thanks, this is exactly the sort of thing I was after :-)
If your open to comment the only thing that could be tweaked is the text section (example below) could do with keys for the values.
Thanks again.
Simon
"text": [
"OL8893758W",
"Terry Pratchett",
"Guilty Of Literature",
"OL8684550M",
"OL24298975M",
"Pocket Essentials",
"Old Earth Books",
"188296831X",
"9781882968
"9781848396
"OL3062820A",
"Andrew M. Butler",
"OverDrive",
"Language Arts",
"Reference",
"Nonfiction"
],
zombiepig (nyall-zombiepigs) wrote : | #71 |
Is there a similar interface to http://
George (george-archive) wrote : | #72 |
Another feature request:
Comment: Hi Open Library,
>
> Thanks for providing such a great service to the world. We at the
> National
> Library of Australia are periodically downloading a full dump of your
> edition data, in JSON format. From this full dump, we would like to
> extract just the records where the full text is freely available online,
> for inclusion in our search engine Trove.
> http://
>
> We currently do this by checking for an ocaid, however we have noticed
> that this includes records where the only online content is a locked
> daisy
> format, which is not available to users in Australia.
>
> DAISY format example: http://
>
> We would like some way to filter out these records, however currently
> they don't seem to include enough information to allow us to do this. Is it
> possible for you to include more information in your records so that we
> can identify these records? For example, perhaps the formats the item is
> available in could be included.
>
> Looking forward to your reply,
> Joanna Meakins and Kent Fitch
> Trove Team
> National Library of Australia
Changed in openlibrary: | |
assignee: | Anand Chitipothu (anandology) → George (george-archive) |
assignee: | George (george-archive) → Edward Betts (edwardbetts) |
milestone: | none → search-september-release |
importance: | Wishlist → High |
summary: |
- FR: search API improvements + Release Search API enhancements |
George (george-archive) wrote : | #73 |
Edward - this bug is massive now - feel free to close and start a fresh one.
Changed in openlibrary: | |
milestone: | none → general-bucket |
Some questions for Stephanie:
Should 1) and 2) actually be combined, that does #2 also mean that you want to get the full record for each book?
3) How about if I add a search_params field to the JSON request, i.e.
search_fields : {"author": "mark twain", "title": "connecticut yankee" }
that would allow you to specify any fields you wanted. Default would be a basic (cross-field) search as we have now.
4) In general, do you want to see the OL search facets?
5) What do you want to have happen when there's a large result set? For example, "france" (one of our standard test searches) returns 1000's of results.