Open Library is not pulling jackets

Bug #1748537 reported by Steve Callender
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Evergreen
New
Undecided
Unassigned

Bug Description

Jacket images have stopped working with Open Library. I've tested this in 2.11 and 3.0 Evergreen.

Seeing 404 errors in the logs and it looks like it's related to the following in AddecContent.pm.

     my $agent = LWP::UserAgent->new(timeout => $net_timeout);

Tags: addedcontent
Revision history for this message
Dan Wells (dbw2) wrote :

We are experiencing this as well. I am actually seeing 403 errors, and it appears the OpenLibrary API is blocking LWP at the user-agent level. Here are some quick tests which make me believe this:

perl -e 'use Data::Dumper; use LWP::UserAgent; my $agent = LWP::UserAgent->new(); my $res = $agent->get("http://openlibrary.org/api/volumes/brief/json/isbn:130608198X"); print Dumper($res);'

(returns 403 for me)

perl -e 'use Data::Dumper; use LWP::UserAgent; my $agent = LWP::UserAgent->new(); $agent->agent("Evergreen_ILS"); my $res = $agent->get("http://openlibrary.org/api/volumes/brief/json/isbn:130608198X"); print Dumper($res);'

(works fine for me)

In a quick search, I could not find any information with this as a new policy, but I didn't look very hard. In any case, this would be a simple fix, but:

a) Do we want to supply a custom/spoofed UA string?
b) If we do, what should it be?

(Also note, AddedContent has a built in error retry timeout, 10 minutes by default. So, if you apply this to AddedContent.pm as a fix, you may not see images for another 10 minutes. (You will see "added content lookup disabled" instead.))

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

The open library covers page does mention rate limiting resulting in 403 Forbidden responses.
https://openlibrary.org/dev/docs/api/covers

They say they allow 100 requests/IP per 5 minutes. I can see a site hitting that easily with a few people searching/browsing at once.

They also state some guidelines that evergreen doesn't follow.

If you want to display covers on public-facing pages, please use a src URL that points to covers.openlibrary.org. For example, if you'd like to call a cover using an ISBN, you can do it like this:
<img src="http://covers.openlibrary.org/b/isbn/9780385533225-S.jpg" />

A courtesy link back to Open Library is appreciated, whether it be on each individual book's page (where you can link back to the book's page on Open Library, for example, using the same ISBN http://openlibrary.org/isbn/9780385533225), or on your About page or in your footer.

Revision history for this message
Dan Wells (dbw2) wrote :

Thanks, Josh. I don't think we are hitting the rate limit, since it claims to be IP based, and requests from the same IP with a different UA string work fine, while our regular access has been down for days. We have also experienced recently that LoC blocks requests with an "LWP" UA string, so this is apparently becoming more common as a basic means to block scripted harvesters/attacks.

As for the more direct cover link, I'd be curious to see whether it follows the same "FRBR"-ish logic allowed in the volumes API. It seems like we may have gone down that road before, but switched to the volumes API to increase the odds of finding some cover, even if for a different edition, but maybe I am remembering something else (Google books API, etc.).

Elaine Hardy (ehardy)
tags: added: addedcontent
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.