So, firstly, it makes sense that this has changed recently, since we've recently rolled out a new stack of frontend machines for various reasons. One feature of the new stack that was previously only present on non-production environments is a caching layer for those requests that are cacheable (mostly anonymous requests), and that's where the problem seems to be here.
I finally managed to run into this locally, and went looking for the request matching the `X-Request-Id` header in appserver logs:
The origin IP address (redacted here for privacy) wasn't mine, so I was clearly getting a cached response back, and I notice that the status here is 304 Not Modified. So somebody sent a `GET` query probably with an `If-None-Match` header (as `lazr.restfulclient` would do), the appserver correctly returned 304 with an empty response, and then Squid incorrectly cached the empty response body and returned 200. Somehow.
This at least gives me something to go on, and I'll experiment further on a non-production instance to see if I can work out how to trigger this on demand.
So, firstly, it makes sense that this has changed recently, since we've recently rolled out a new stack of frontend machines for various reasons. One feature of the new stack that was previously only present on non-production environments is a caching layer for those requests that are cacheable (mostly anonymous requests), and that's where the problem seems to be here.
I finally managed to run into this locally, and went looking for the request matching the `X-Request-Id` header in appserver logs:
2023-10-16 13:35:48.410Z INFO talisker.wsgi "GET /api/1. 0/~maxmind/ +archive/ ubuntu/ ppa" method=GET path=/api/ 1.0/~maxmind/ +archive/ ubuntu/ ppa status=304 duration_ms=35.475 ip=10.131.215.147 proto=HTTP/1.1 forwarded= "[redacted] , 10.131. 215.233, 10.131. 215.105" ua=Python- urllib/ 3.6 logging_count=5 logging_ time_ms= 0.251 nonpython_actions=8 sql_statements=7 sql_ms=10 publication_ thread_ duration_ ms=3.857 publication_ duration_ ms=6.47 userid=Anonymous traversal_ thread_ duration_ ms=10.329 traversal_ duration_ ms=19.253 pageid= Archive: EntryResource host=launchpad.net scheme=https request_ id=5256682c- e5ce-46e8- 9e0d-adebe54581 60
The origin IP address (redacted here for privacy) wasn't mine, so I was clearly getting a cached response back, and I notice that the status here is 304 Not Modified. So somebody sent a `GET` query probably with an `If-None-Match` header (as `lazr.restfulcl ient` would do), the appserver correctly returned 304 with an empty response, and then Squid incorrectly cached the empty response body and returned 200. Somehow.
This at least gives me something to go on, and I'll experiment further on a non-production instance to see if I can work out how to trigger this on demand.