TPAC: Search engines and browsers redirect endlessly on some URLs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Triaged
|
Undecided
|
Unassigned |
Bug Description
* Evergreen 2.3.0 beta2-ish
I checked on Google's Webmaster Tools for our catalogue for the first time in a LONG time and it reported a significant increase in the number of errors when we cut over to TPAC.
One of the problems appears to be endless redirects that we run into when adding items to the temporary list, using the following URL:
http://
Chrome and Firefox both refuse to load the page, as it appears to result in endless redirects.
Here's what Google's Webmaster Tools "Fetch as Google" shows:
-------
Fetch as Google
This is how Googlebot fetched the page.
URL: http://
Date: Thursday, August 30, 2012 5:01:20 PM PDT
Googlebot Type: Web
Download Time (in milliseconds): 292
HTTP/1.1 302 Found
Date: Fri, 31 Aug 2012 00:01:20 GMT
Server: Apache/2.2.16 (Debian)
Set-Cookie: anoncache=
Location: #record_591425
Cache-Control: max-age=5
Expires: Fri, 31 Aug 2012 00:01:25 GMT
Content-Length: 284
Keep-Alive: timeout=1, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="#
<hr>
<address>
</body></html>
-------
To avoid indexing links that a search engine wouldn't care about, and to cut down on inadvertent search engine loops in this case, we can publish a robots.txt with something like:
User-agent: *
Disallow: /eg/opac/mylist/
Disallow: /eg/opac/myopac/
Disallow: /eg/opac/
Disallow: /eg/opac/
... and I believe we should probably add something like this to the default install (with appropriate notes in the docs).
However, we should also prevent creating these eternally redirecting links in the first place.