Open Library

parse amazon data

Bug #152793 reported by Aaron Swartz on 2007-10-15

4

Affects		Status	Importance	Assigned to	Milestone
	Open Library	Confirmed	Medium	Edward Betts	Open Library 1.0

Bug Description

There are around 6M Amazon books now up at:

http://www.archive.org/details/amazon_crawl.catalog/

They should be parsed and eventually integrated. (Also, there are another million or so since the last time you grabbed the ISBNs from here.)

Aaron Swartz (aaronsw) on 2007-10-15

Changed in openlibrary:
assignee:	nobody → edward-debian
importance:	Undecided → High
milestone:	none → launch
status:	New → Confirmed

Revision history for this message

Edward Betts (edwardbetts) wrote on 2007-10-19:

#1

The catalog.txt file contains duplicates, for example:

0002165163 1 Amazon.com: Spinner's yarn: Books: Ian Alexander Ross Peebles
0002165163 o-0 Amazon.com: Spinner's yarn: Books: Ian Alexander Ross Peebles
0002165171 1 Amazon.com: Memoirs: Books: Jean Monnet
0002165171 o-0 Amazon.com: Memoirs: Books: Jean Monnet
000216518X 1 Amazon.com: Media Mob: Books: George Melly
000216518X o-0 Amazon.com: Media Mob: Books: George Melly
000216521X 1 Amazon.com: Old Glory an American Voyage: Books: Johnathan Raban
000216521X o-0 Amazon.com: Old Glory an American Voyage: Books: Johnathan Raban
0002165252 1 404 - Document Not Found
0002165252 o-0 404 - Document Not Found

Revision history for this message

Aaron Swartz (aaronsw) wrote on 2007-10-19: Re: [Bug 152793] Re: parse amazon data

#2

Hmm. The catalogs for 1 and o-0 are different, so some things must
have been downloaded twice by accident.

Revision history for this message

Edward Betts (edwardbetts) wrote on 2008-01-30:

#3

Amazon parser is working. Got some more fields to add:

has_cover_img: boolean, done
amazon_availability: string, like "In Stock.", done
list_price, amazon_price, used_price: value in $
editorial_reviews: list
more_editorial_reviews: boolean
customer_review_count: int
average_customer_review: string
other_editions: list
statistically_improbable_phrases: list, done
capitalized_phrases: list, done
tags: list - done

lists of isbns and page numbers for books cited and books citing

Revision history for this message

Aaron Swartz (aaronsw) wrote on 2008-01-30:

#4

Isn't average customer review a float?

Revision history for this message

Edward Betts (edwardbetts) wrote on 2008-01-30:

#5

Average customer review is a fixed point number, with one decimal place. It could be represented as a float.

Edward Betts (edwardbetts) on 2010-01-12

Changed in openlibrary:
importance:	High → Medium

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.