There have been a lot of emails from the autopkgtest infra over the long (US) holiday weekend, reporting a significant number of dead runners. The emails don't contain enough information to tell why they were killed; e.g:
==== worker <email address hidden> failed ====
[...]
Started autopkgtest worker lcy01-8.
[...]
2018-05-28 03:37:35,394 [26050] WARNING: Testbed failure, retrying in 5 minutes
2018-05-28 03:37:35,394 [26050] ERROR: Three tmpfails in a row, aborting worker. Log follows:
2018-05-28 03:37:35,395 [26050] ERROR: autopkgtest [02:46:11]: git checkout: 5243905 ssh-setup/nova: Add support for keystone v3 auth
autopkgtest [02:46:11]: host juju-prod-ues-proposed-migration-machine-11; command line: /home/ubuntu/autopkgtest/runner/autopkgtest --output-dir /tmp/autopkgtest-work.158kldh_/out --timeout-copy=6000 --setup-commands /home/ubuntu/autopkgtest-cloud/worker-config-production/setup-canonical.sh --setup-commands /home/ubuntu/autopkgtest/setup-commands/setup-testbed --apt-pocket=proposed=src:graphicsmagick --apt-upgrade diaspora-installer --env=ADT_TEST_TRIGGERS=graphicsmagick/1.3.29+hg15665-1 -- ssh -s /home/ubuntu/autopkgtest/ssh-setup/nova -- --flavor autopkgtest --security-groups <email address hidden> --name adt-cosmic-amd64-diaspora-installer-20180528-005242 --image adt/ubuntu-cosmic-amd64-server --keyname testbed-juju-prod-ues-proposed-migration-machine-11 --net-id=net_ues_proposed_migration -e ''"'"'http_proxy=http://squid.internal:3128'"'"'' -e ''"'"'https_proxy=http://squid.internal:3128'"'"'' -e ''"'"'no_proxy=127.0.0.1,127.0.1.1,localhost,localdomain,novalocal,internal,archive.ubuntu.com,security.ubuntu.com,ddebs.ubuntu.com,changelogs.ubuntu.com,ppa.launchpad.net'"'"'' --mirror=http://ftpmaster.internal/ubuntu
autopkgtest [02:46:49]: @@@@@@@@@@@@@@@@@@@@ test bed setup
[...]
autopkgtest [02:47:25]: test command1: preparing testbed
Reading package lists...
Building dependency tree...
Reading state information...
Correcting dependencies...Starting pkgProblemResolver with broken count: 0
Starting 2 pkgProblemResolver with broken count: 0
Done
Done
Starting pkgProblemResolver with broken count: 0
Starting 2 pkgProblemResolver with broken count: 0
Done
The following additional packages will be installed:
build-essential cpp cpp-7 dbconfig-common dbconfig-pgsql diaspora-common
diaspora-installer exim4 exim4-base exim4-config exim4-daemon-light
fontconfig fontconfig-config fonts-dejavu-core g++ g++-7 gcc gcc-7
gcc-7-base ghostscript gir1.2-freedesktop gir1.2-gdkpixbuf-2.0
gir1.2-harfbuzz-0.0 gir1.2-rsvg-2.0 hicolor-icon-theme icu-devtools
imagemagick imagemagick-6-common imagemagick-6.q16 libasan4 libatomic1
libavahi-client3 libavahi-common-data libavahi-common3 libbz2-dev libc-ares2
libc-dev-bin libc6-dev libcairo-gobject2 libcairo-script-interpreter2
libcairo2 libcairo2-dev libcc1-0 libcilkrts5 libcroco3 libcups2
libcupsimage2 libcurl4-openssl-dev libdatrie1 libdjvulibre-dev
libdjvulibre-text libdjvulibre21 libexif-dev libexif12 libexpat1-dev
libffi-dev libfftw3-double3 libfontconfig1 libfontconfig1-dev
libfreetype6-dev libgcc-7-dev libgd3 libgdk-pixbuf2.0-0
libgdk-pixbuf2.0-common libgdk-pixbuf2.0-dev libglib2.0-bin libglib2.0-dev
libglib2.0-dev-bin libgmp-dev libgmpxx4ldbl libgomp1 libgraphite2-3
libgraphite2-dev libgs9 libgs9-common libharfbuzz-dev libharfbuzz-gobject0
libharfbuzz-icu0 libharfbuzz0b libhttp-parser2.8 libice-dev libice6
libicu-dev libicu-le-hb-dev libicu-le-hb0 libiculx60 libijs-0.35
libilmbase-dev libilmbase12 libisl19 libitm1 libjbig-dev libjbig0
libjbig2dec0 libjemalloc1 libjpeg-dev libjpeg-turbo8 libjpeg-turbo8-dev
libjpeg8 libjpeg8-dev liblcms2-2 liblcms2-dev liblqr-1-0 liblqr-1-0-dev
liblsan0 libltdl-dev libltdl7 liblzma-dev libmagickcore-6-arch-config
libmagickcore-6-headers libmagickcore-6.q16-3 libmagickcore-6.q16-3-extra
libmagickcore-6.q16-dev libmagickwand-6-headers libmagickwand-6.q16-3
libmagickwand-6.q16-dev libmagickwand-dev libmpc3 libmpx2
libnginx-mod-http-geoip libnginx-mod-http-image-filter
libnginx-mod-http-xslt-filter libnginx-mod-mail libnginx-mod-stream
libopenexr-dev libopenexr22 libpango-1.0-0 libpangocairo-1.0-0
libpangoft2-1.0-0 libpaper1 libpcre16-3 libpcre3-dev libpcre32-3
libpcrecpp0v5 libpixman-1-0 libpixman-1-dev libpng-dev libpq-dev libpq5
libpthread-stubs0-dev libquadmath0 librsvg2-2 librsvg2-common librsvg2-dev
libruby2.5 libsm-dev libsm6 libssl-dev libstdc++-7-dev libthai-data libthai0
libtiff-dev libtiff5 libtiff5-dev libtiffxx5 libtsan0 libubsan0 libuv1
libwebp6 libwmf-dev libwmf0.2-7 libx11-dev libxau-dev libxcb-render0
libxcb-render0-dev libxcb-shm0 libxcb-shm0-dev libxcb1-dev libxdmcp-dev
libxext-dev libxml2-dev libxpm4 libxrender-dev libxrender1 libxslt1-dev
libxt-dev libxt6 linux-libc-dev nginx nginx-common nginx-core nodejs
pkg-config poppler-data postgresql postgresql-10 postgresql-client
postgresql-client-10 postgresql-client-common postgresql-common
python3-distutils python3-lib2to3 rake redis-server redis-tools ruby
ruby-dev ruby-did-you-mean ruby-diff-lcs ruby-minitest ruby-net-telnet
ruby-power-assert ruby-rspec ruby-rspec-core ruby-rspec-expectations
ruby-rspec-mocks ruby-rspec-support ruby-test-unit ruby-thread-order ruby2.5
Deleting existing group {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'name': '<email address hidden>', 'description': 'copy <email address hidden> of default (Default security group)', 'security_group_rules': [{'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv4', 'id': '15657ed8-c781-4080-9677-8ede743e30cd', 'port_range_min': None, 'created_at': '2018-05-27T09:20:17Z', 'remote_group_id': None, 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': None, 'protocol': 'icmp', 'remote_ip_prefix': '91.189.90.53/32', 'updated_at': '2018-05-27T09:20:17Z', 'direction': 'ingress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv4', 'id': '3c9ef2c0-8618-4e2c-b97c-b43ce7068a9e', 'port_range_min': None, 'created_at': '2018-05-27T09:20:18Z', 'remote_group_id': '843e3a73-f1ee-4ecc-bc8c-b2d297df96b9', 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': None, 'protocol': None, 'remote_ip_prefix': None, 'updated_at': '2018-05-27T09:20:18Z', 'direction': 'ingress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv4', 'id': '6f392f68-54ca-4c0b-a5ba-daa7a3999d6a', 'port_range_min': 22, 'created_at': '2018-05-27T09:20:17Z', 'remote_group_id': None, 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': 22, 'protocol': 'tcp', 'remote_ip_prefix': '162.213.33.179/32', 'updated_at': '2018-05-27T09:20:17Z', 'direction': 'ingress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv4', 'id': '7ac1770e-e6ce-4843-b114-e4f24281f784', 'port_range_min': 22, 'created_at': '2018-05-27T09:20:18Z', 'remote_group_id': None, 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number'
: 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': 22, 'protocol': 'tcp', 'remote_ip_prefix': '91.189.90.53/32', 'updated_at': '2018-05-27T09:20:18Z', 'direction': 'ingress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv6', 'id': '7bf03f0d-432b-4c46-b2b1-c12af62793f7', 'port_range_min': None, 'created_at': '2018-05-27T09:20:17Z', 'remote_group_id': '843e3a73-f1ee-4ecc-bc8c-b2d297df96b9', 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': None, 'protocol': None, 'remote_ip_prefix': None, 'updated_at': '2018-05-27T09:20:17Z', 'direction': 'ingress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv6', 'id': 'ec28f1fe-75cf-49f8-9e2c-e83afc72da24', 'port_range_min': None, 'created_at': '2018-05-27T09:20:18Z', 'remote_group_id': None, 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': None, 'protocol': None, 'remote_ip_prefix': None, 'updated_at': '2018-05-27T09:20:18Z', 'direction': 'egress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv4', 'id': 'f6b2100e-f068-4111-9f68-6e86e6250b29', 'port_range_min': None, 'created_at': '2018-05-27T09:20:18Z', 'remote_group_id': None, 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': None, 'protocol': None, 'remote_ip_prefix': None, 'updated_at': '2018-05-27T09:20:18Z', 'direction': 'egress'}], 'id': '5c87742c-aaa3-4806-932e-9317521587dc', 'updated_at': '2018-05-27T09:20:18Z', 'created_at': '2018-05-27T09:20:17Z', 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'revision_number': 10}
<fin>
The only verbose openstack output here is about deletion of a security group. Nothing in this log directly links back to a particular openstack instance, or gives any hint of why the instance has disappeared. The comments in the source suggest this is due to hitting quotas when trying to run too many huge jobs in parallel; but that should not be a problem with proper quotas, and doesn't explain why there appears to be a recent increase, and none of the output corroborates that this is what is happening.
If we have frequently failing autopkgtest instances, we ought to know why. For that, we need better logging.
If the failures are not interesting, then we should not generate emails for all of them; we should detect that they're not interesting, and avoid sending mail.
One thing I want to see is the test request as well as the worker's name, so you can see straight away if a particular test is causing problems.
Maybe if we had this thing: https:/ /trello. com/c/jaReiQ53/ 6-ops-debuggabi lity-helper- commands- to-map- systemd- unit-cloud- instance- running- stuck-test - we could put its output in the emails.