launchpad sometimes serves download files as content-type text/html

Bug #703807 reported by Zooko Wilcox-O'Hearn
54
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Steve Kowalik
Twisted
Fix Released
Unknown
pyOpenSSL
Invalid
Undecided
Unassigned
twisted-web (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

My "easy_install" is provided by distribute-0.6.14. I'm on Mac OS X.

HACK Zooko-Ofsimplegeos-MacBook-Pro:~/playground/tahoe-lafs/trunk$ sudo easy_install pyOpenSSL
install_dir /Library/Python/2.6/site-packages/
Searching for pyOpenSSL
Reading http://pypi.python.org/simple/pyOpenSSL/
Reading http://pyopenssl.sourceforge.net/
Reading http://launchpad.net/pyopenssl
Best match: pyOpenSSL 0.11
Downloading http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz
error: Unexpected HTML page found at http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz

In a potentially related story, the "home page" field on http://pypi.python.org/pypi/pyOpenSSL points to http://pyopenssl.sourceforge.net/ and should probably be changed to point to launchpad.

Related branches

Revision history for this message
Zooko Wilcox-O'Hearn (zooko) wrote :

wget http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz && easy_install pyOpenSSL-0.11.tar.gz

works as desired.

Revision history for this message
Jean-Paul Calderone (exarkun) wrote :
Revision history for this message
Robert Collins (lifeless) wrote :

wget -S http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz
--2011-01-18 04:09:50-- http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz
Resolving launchpadlibrarian.net... 91.189.89.229, 91.189.89.228
Connecting to launchpadlibrarian.net|91.189.89.229|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 200 OK
  Date: Fri, 14 Jan 2011 22:13:19 GMT
  Server: TwistedWeb/10.1.0
  Content-Length: 242152
  Accept-Ranges: bytes
  Last-Modified: Mon, 01 Nov 2010 22:24:33 GMT
  Cache-Control: max-age=31536000, public
  Content-Type: application/x-tar

- looks fine to me.

Changed in launchpad:
status: New → Incomplete
Revision history for this message
Robert Collins (lifeless) wrote :

Why do you think its getting text/html ?

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 703807] Re: "easy_install pyOpenSSL" says "error: Unexpected HTML page found at http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"

I wonder if it was a transient error containing html, and it ignored or
didn't get an error status?

- Martin
On 17/01/2011 9:21 AM, "Robert Collins" <email address hidden> wrote:
> Why do you think its getting text/html ?
>
> --
> You received this bug notification because you are subscribed to
> Launchpad itself.
> https://bugs.launchpad.net/bugs/703807
>
> Title:
> "easy_install pyOpenSSL" says "error: Unexpected HTML page found at
> http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"
>

Revision history for this message
Jean-Paul Calderone (exarkun) wrote : Re: "easy_install pyOpenSSL" says "error: Unexpected HTML page found at http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"

I investigated a little further, here's the results of two HEAD requests to launchpadlibrarian.net:

HEAD /58498441/pyOpenSSL-0.11.tar.gz HTTP/1.1
Host: launchpadlibrarian.net

HTTP/1.1 200 OK
Date: Sat, 15 Jan 2011 12:56:20 GMT
Server: TwistedWeb/10.1.0
Content-Length: 242152
Accept-Ranges: bytes
Last-Modified: Mon, 01 Nov 2010 22:24:33 GMT
Cache-Control: max-age=31536000, public
Content-Type: text/html
Age: 182423
X-Cache: HIT from nutmeg.canonical.com
X-Cache-Lookup: HIT from nutmeg.canonical.com:3128
Via: 1.0 nutmeg.canonical.com:3128 (squid/2.7.STABLE7)
Via: 1.1 launchpadlibrarian.net
Vary: Accept-Encoding

HEAD /58498441/pyOpenSSL-0.11.tar.gz HTTP/1.1
Host: launchpadlibrarian.net

HTTP/1.1 200 OK
Date: Fri, 14 Jan 2011 22:13:19 GMT
Server: TwistedWeb/10.1.0
Content-Length: 242152
Accept-Ranges: bytes
Last-Modified: Mon, 01 Nov 2010 22:24:33 GMT
Cache-Control: max-age=31536000, public
Content-Type: application/x-tar
Age: 235405
X-Cache: HIT from banana.canonical.com
X-Cache-Lookup: HIT from banana.canonical.com:3128
Via: 1.0 banana.canonical.com:3128 (squid/2.7.STABLE7)
Via: 1.1 launchpadlibrarian.net

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 703807] Re: "easy_install pyOpenSSL" says "error: Unexpected HTML page found at http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"

What accept-encoding headers did those two HEAD's have?

Revision history for this message
Jean-Paul Calderone (exarkun) wrote : Re: "easy_install pyOpenSSL" says "error: Unexpected HTML page found at http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"

None, the requests were made exactly as pasted (with telnet).

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 703807] Re: "easy_install pyOpenSSL" says "error: Unexpected HTML page found at http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"

When I try that from here, I get application/x-tar:

mbp@joy% telnet launchpadlibrarian.net 80
Trying 91.189.89.228...
Connected to launchpadlibrarian.net.
Escape character is '^]'.
HEAD /58498441/pyOpenSSL-0.11.tar.gz HTTP/1.1
Host: launchpadlibrarian.net

HTTP/1.0 200 OK
Date: Mon, 17 Jan 2011 17:00:21 GMT
Server: TwistedWeb/10.1.0
Content-Length: 242152
Accept-Ranges: bytes
Last-Modified: Mon, 01 Nov 2010 22:24:33 GMT
Cache-Control: max-age=31536000, public
Content-Type: application/x-tar
Age: 28285
X-Cache: HIT from nutmeg.canonical.com
X-Cache-Lookup: HIT from nutmeg.canonical.com:3128
X-Cache: MISS from conf-gw.conference
X-Cache-Lookup: MISS from conf-gw.conference:3128
Via: 1.0 nutmeg.canonical.com:3128 (squid/2.7.STABLE7), 1.0
launchpadlibrarian.net, 1.1 conf-gw.conference:3128
(squid/2.7.STABLE9)
Connection: close

Could it be that you have an intercepting proxy or something similar?

--
Martin

Revision history for this message
Robert Collins (lifeless) wrote :

Well, we definitely have one here and it already has the right content
type; you need a cache-buster header to see what upstream is sending.
Or issue the request in the d/c.

Revision history for this message
Jean-Paul Calderone (exarkun) wrote : Re: "easy_install pyOpenSSL" says "error: Unexpected HTML page found at http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"

Repeating this experiment now, the nutmeg response produces application/x-tar content-type. I'm relatively confident that there's no proxy between my machine and launchpadlibrarian.net, but I can't 100% rule out my ISP doing something sneaky (as far as I know, they're not known for doing this sort of thing). Note though that the ticket was first filed by zooko, so presumably the two of us both saw this behavior, and we're several thousand miles apart and on different ISPs.

Revision history for this message
Zooko Wilcox-O'Hearn (zooko) wrote :

Repeating the experiment just now did not reproduce the problem. This time it worked:

HACK Zooko-Ofsimplegeos-MacBook-Pro:~/playground/python-simplegeo-shared$ sudo easy_install http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz
install_dir /Library/Python/2.6/site-packages/
Downloading http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz
Processing pyOpenSSL-0.11.tar.gz
Running pyOpenSSL-0.11/setup.py -q bdist_egg --dist-dir /tmp/easy_install-G5mDCT/pyOpenSSL-0.11/egg-dist-tmp-sdjFoE
warning: no previously-included files matching '*.pyc' found anywhere in distribution
Adding pyOpenSSL 0.11 to easy-install.pth file

Installed /Library/Python/2.6/site-packages/pyOpenSSL-0.11-py2.6-macosx-10.6-universal.egg
Processing dependencies for pyOpenSSL==0.11
Finished processing dependencies for pyOpenSSL==0.11

This time I had tcpdump running, and here are the relevant flows:

GET /pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz HTTP/1.1
Accept-Encoding: identity
Host: launchpad.net
Connection: close
User-Agent: Python-urllib/2.6 distribute/0.6.14

HTTP/1.1 303 See Other
Date: Tue, 18 Jan 2011 03:24:56 GMT
Server: zope.server.http (HTTP)
X-Powered-By: Zope (www.zope.org), Python (www.python.org)
X-Content-Type-Warning: guessed from content
Content-Length: 0
Location: http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz
Vary: Cookie,Authorization,Accept-Encoding
Content-Type: text/plain;charset=utf-8
Via: 1.1 launchpad.net
Connection: close

GET /58498441/pyOpenSSL-0.11.tar.gz HTTP/1.1
Accept-Encoding: identity
Host: launchpadlibrarian.net
Connection: close
User-Agent: Python-urllib/2.6 distribute/0.6.14

HTTP/1.1 200 OK
Date: Fri, 14 Jan 2011 22:13:19 GMT
Server: TwistedWeb/10.1.0
Content-Length: 242152
Accept-Ranges: bytes
Last-Modified: Mon, 01 Nov 2010 22:24:33 GMT
Cache-Control: max-age=31536000, public
Content-Type: application/x-tar
Age: 277897
X-Cache: HIT from banana.canonical.com
X-Cache-Lookup: HIT from banana.canonical.com:3128
Via: 1.0 banana.canonical.com:3128 (squid/2.7.STABLE7)
Via: 1.1 launchpadlibrarian.net
Connection: close

(followed by the body)

Revision history for this message
Martin Pool (mbp) wrote :

We just got dupe bug 704450, which suggests that this is not caused by the client environment or network, and is caused by a recent regression in Launchpad. I don't think this is a bug in pyOpenSSL as such, but they may choose to keep the bug open as an issue that affects them.

Changed in launchpad:
status: Incomplete → Confirmed
Revision history for this message
Leonard Richardson (leonardr) wrote :

As reported in bug 704450, on January 18th I uploaded a tarball and verified that it was being served as application/x-tgz. As of today (the 25th), it's now being served as text/html.

The tarball is this one: http://launchpad.net/beautifulsoup/trunk/3.2/+download/BeautifulSoup-3.2.0.tar.gz

I'll try to leave it up so you can diagnose the problem, but I may have to take it down. This bug breaks easy_install. As such, I can't register Launchpad as the PyPI download location for my package until it's fixed.

Changed in launchpad:
importance: Undecided → High
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 703807] Re: "easy_install pyOpenSSL" says "error: Unexpected HTML page found at http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"

Connecting to launchpadlibrarian.net|91.189.89.228|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Wed, 26 Jan 2011 02:20:06 GMT
  Server: TwistedWeb/10.1.0
  Accept-Ranges: bytes
  Last-Modified: Tue, 18 Jan 2011 14:53:35 GMT
  Cache-Control: max-age=31536000, public
  Content-Type: application/x-tgz
  X-Cache: MISS from banana.canonical.com
  X-Cache-Lookup: MISS from banana.canonical.com:3128
  Via: 1.0 banana.canonical.com:3128 (squid/2.7.STABLE7), 1.1
launchpadlibrarian.net, 1.1 AKmdrL2CacheBC6.telecom.co.nz
  Content-Length: 31060
  Connection: Keep-Alive
  Age: 16

All looks fine; went through to the backend.

Have we got a tcpdump of the problem?

On Wed, Jan 26, 2011 at 1:57 AM, Leonard Richardson
<email address hidden> wrote:
> As reported in bug 704450, on January 18th I uploaded a tarball and
> verified that it was being served as application/x-tgz. As of today (the
> 25th), it's now being served as text/html.
>
> The tarball is this one:
> http://launchpad.net/beautifulsoup/trunk/3.2/+download/BeautifulSoup-3.2.0.tar.gz
>
> I'll try to leave it up so you can diagnose the problem, but I may have
> to take it down. This bug breaks easy_install. As such, I can't register
> Launchpad as the PyPI download location for my package until it's fixed.
>
> ** Changed in: launchpad
>   Importance: Undecided => High
>
> --
> You received this bug notification because you are subscribed to
> Launchpad Suite.
> https://bugs.launchpad.net/bugs/703807
>
> Title:
>  "easy_install pyOpenSSL" says "error: Unexpected HTML page found at
>  http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"
>

Changed in launchpad:
status: Confirmed → Opinion
status: Opinion → Incomplete
Revision history for this message
Martin Pool (mbp) wrote : Re: "easy_install pyOpenSSL" says "error: Unexpected HTML page found at http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"

This is intermittent (and therefore likely annoying) but I don't think it's incomplete. Multiple people have reproduced it, using simple clients like curl, across multiple networks.

Changed in launchpad:
importance: High → Critical
status: Incomplete → Triaged
Revision history for this message
Zooko Wilcox-O'Hearn (zooko) wrote :
Revision history for this message
Martin Pool (mbp) wrote :

I can reproduce this from home too. It's interesting that I did actually get the bytes of a tarball, but marked as text/html.

mbp@joy% curl -L -v -O http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz
* About to connect() to launchpad.net port 80 (#0)
* Trying 91.189.89.222... % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0connected
* Connected to launchpad.net (91.189.89.222) port 80 (#0)
> GET /pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz HTTP/1.1
> User-Agent: curl/7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
> Host: launchpad.net
> Accept: */*
>
< HTTP/1.1 303 See Other
< Date: Mon, 07 Feb 2011 06:55:25 GMT
< Server: zope.server.http (HTTP)
< X-Powered-By: Zope (www.zope.org), Python (www.python.org)
< X-Content-Type-Warning: guessed from content
< Content-Length: 0
< Location: http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz
< Vary: Cookie,Authorization,Accept-Encoding
< Content-Type: text/plain;charset=utf-8
< Via: 1.1 launchpad.net
<
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connection #0 to host launchpad.net left intact
* Issue another request to this URL: 'http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz'
* About to connect() to launchpadlibrarian.net port 80 (#1)
  0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0connected
* Connected to launchpadlibrarian.net (91.189.89.229) port 80 (#1)
> GET /58498441/pyOpenSSL-0.11.tar.gz HTTP/1.1
> User-Agent: curl/7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
> Host: launchpadlibrarian.net
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 03 Feb 2011 17:00:18 GMT
< Server: TwistedWeb/10.1.0
< Content-Length: 242152
< Accept-Ranges: bytes
< Last-Modified: Mon, 01 Nov 2010 22:24:33 GMT
< Cache-Control: max-age=31536000, public
< Content-Type: text/html
< Age: 309308
< X-Cache: HIT from banana.canonical.com
< X-Cache-Lookup: HIT from banana.canonical.com:3128
< Via: 1.0 banana.canonical.com:3128 (squid/2.7.STABLE7)
< Via: 1.1 launchpadlibrarian.net
< Vary: Accept-Encoding
<
{ [data not shown]
100 236k 100 236k 0 0 68221 0 0:00:03 0:00:03 --:--:-- 102k* Connection #1 to host launchpadlibrarian.net left intact

* Closing connection #0
* Closing connection #1

Revision history for this message
Martin Pool (mbp) wrote :

OK, consistently across 8 trials: nutmeg gives the correct content type, and banana gives the wrong content type. So I wonder if something is wrong at the proxy level.

Revision history for this message
Martin Pool (mbp) wrote :

... though nutmeg is not exactly getting it right in describing this as 'application/x-tar' with no other encoding, since it is in fact tgz. But it is less wrong that banana.

Revision history for this message
Martin Pool (mbp) wrote :

Sending 'Cache-control: no-cache' gets the content-type correct on that request, and it stays correct on later requests that go through the same proxy.

So there's a workaround for anyone affected by this: use curl or similar to blow out the caches.

I wonder why squid is getting the wrong header stuck in its cache? Is it that launchpad librarian occasionally serves the right body with the wrong header? Or is it something within our squid setup?

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 703807] Re: "easy_install pyOpenSSL" says "error: Unexpected HTML page found at http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"

FWIW our two squids are configured identically modulo their IP addresses.

Revision history for this message
Robert Collins (lifeless) wrote : Re: "easy_install pyOpenSSL" says "error: Unexpected HTML page found at http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"

tagging regression: we didn't used to have this problem.

tags: added: regression
Martin Pool (mbp)
summary: - "easy_install pyOpenSSL" says "error: Unexpected HTML page found at
- http://launchpad.net/pyopenssl/main/0.11/+download/pyOpenSSL-0.11.tar.gz"
+ launchpad sometimes serves download files as content-type text/html
Revision history for this message
Graham Binns (gmb) wrote :

FTR, I just got bitten by this when trying to download http://launchpadlibrarian.net/73387253/testtools-0.9.11.tar.gz. Setting the Cache-control header as Martin suggested fixed the problem.

Martin Pool (mbp)
summary: launchpad sometimes serves download files as content-type text/html
+ (only banana, not nutmeg)
William Grant (wgrant)
summary: launchpad sometimes serves download files as content-type text/html
- (only banana, not nutmeg)
Revision history for this message
William Grant (wgrant) wrote :

The librarian serves any 304 Not Modified as text/html, which pollutes the squid cache. This may be a Twisted thing.

Revision history for this message
William Grant (wgrant) wrote :

From twisted.web.static.File.render_GET:

        if request.setLastModified(self.getmtime()) is http.CACHED:
            return ''

        producer = self.makeProducer(request, fileForReading)

makeProducer calls _setContentHeaders, which sets Content-Type. This is conveniently skipped when returning a 304.

But RFC2616 says:

    If a cache uses a received 304 response to update a cache entry, the
    cache MUST update the entry to reflect any new field values given in
    the response.

So twisted.web should really always be returning the correctly Content-Type.

Paul Sladen (sladen)
Changed in twisted-web (Ubuntu):
status: New → Incomplete
Revision history for this message
Martin Pool (mbp) wrote :

Twisted defaults to c-t=html if it returns a body, which is reasonable. But for 304, it should probably undo that, and not send a content type at all.

Changed in twisted-web (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Martin Pool (mbp) wrote :

> So twisted.web should really always be returning the correctly Content-Type.

If it's sending back not-modified, I think there's really no need for it to report the content type, and it's hard to see how it could ever help. The cache can hold onto whatever c-t it has.

Revision history for this message
Martin Pool (mbp) wrote :
Changed in twisted:
status: Unknown → New
Revision history for this message
Martin Pool (mbp) wrote :

It looks like this will soon be fixed in Twisted through http://twistedmatrix.com/trac/ticket/4156 and it can probably then be fixed in Launchpad by deploying a new copy of Twisted. I don't think any code changes will be needed in lp.

Revision history for this message
Jean-Paul Calderone (exarkun) wrote :

Marking this pyOpenSSL bug as invalid, as it seems clear that it's an issue with the hosting infrastructure (with progress resolving the issue being made in the appropriate forums).

Changed in pyopenssl:
status: New → Invalid
Changed in twisted:
status: New → Fix Released
Graham Binns (gmb)
Changed in launchpad:
assignee: nobody → Graham Binns (gmb)
Graham Binns (gmb)
Changed in launchpad:
assignee: Graham Binns (gmb) → nobody
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Steve Kowalik (stevenk)
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
Revision history for this message
William Grant (wgrant) wrote :
tags: added: qa-ok
removed: qa-needstesting
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.