Testflinger disconnections
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Testflinger |
Invalid
|
Undecided
|
Paul Larson |
Bug Description
Lately I've been seeing a lot of disconnection error messages during testflinger runs. Some recover, others seem to fail completely. Below are two examples. THe first one had several disconnections and eventual reconnection and that one seems OK otherwise, as testing did complete.
The second one however, seems to have completely disconnected. Iterestingly I was able to reconnect usgin checkbox-cli manually, and found that once I did, testing resumed. So that's an additional issue, the checkbox test in progress seemed to have paused when testflinger lost the connection, and sat there paused until I manually connected more than a day later.
This run disconnected and then later reconnected multiple times:
stress-ng: info: [3609835] dispatching hogs: 8 qsort
stress-ng: info: [3609835] successful run completed in 300.00s (5 mins, 0.00 secs)
stress-ng: info: [3609850] dispatching hogs: 8 stack
Reconnecting...
Rejoined session.
In progress: com.canonical.
stress-ng: info: [3609850] successful run completed in 300.82s (5 mins, 0.82 secs)
stress-ng: info: [3609892] dispatching hogs: 8 str
stress-ng: info: [3609892] successful run completed in 300.00s (5 mins, 0.00 secs)
stress-ng: info: [3609906] dispatching hogs: 8 stream
stress-ng: info: [3609908] stress-ng-stream: stressor loosely based on a variant of the STREAM benchmark code
stress-ng: info: [3609908] stress-ng-stream: do NOT submit any of these results to the STREAM benchmark results
stress-ng: info: [3609908] stress-ng-stream: Using CPU cache size of 8192K
stress-ng: info: [3609911] stress-ng-stream: memory rate: 1293.94 MB/sec, 517.58 Mflop/sec (instance 3)
stress-ng: info: [3609914] stress-ng-stream: memory rate: 999.71 MB/sec, 399.88 Mflop/sec (instance 6)
stress-ng: info: [3609908] stress-ng-stream: memory rate: 1593.38 MB/sec, 637.35 Mflop/sec (instance 0)
stress-ng: info: [3609915] stress-ng-stream: memory rate: 910.95 MB/sec, 364.38 Mflop/sec (instance 7)
stress-ng: info: [3609912] stress-ng-stream: memory rate: 1199.38 MB/sec, 479.75 Mflop/sec (instance 4)
stress-ng: info: [3609913] stress-ng-stream: memory rate: 1100.02 MB/sec, 440.01 Mflop/sec (instance 5)
stress-ng: info: [3609910] stress-ng-stream: memory rate: 1399.39 MB/sec, 559.76 Mflop/sec (instance 2)
stress-ng: info: [3609909] stress-ng-stream: memory rate: 1498.41 MB/sec, 599.37 Mflop/sec (instance 1)
stress-ng: info: [3609906] successful run completed in 300.02s (5 mins, 0.02 secs)
stress-ng: info: [3609917] dispatching hogs: 8 tsearch
stress-ng: info: [3609917] successful run completed in 300.07s (5 mins, 0.07 secs)
stress-ng: info: [3609934] dispatching hogs: 8 vm-rw
stress-ng: info: [3609934] successful run completed in 300.01s (5 mins, 0.01 secs)
stress-ng: info: [3609953] dispatching hogs: 8 wcs
stress-ng: info: [3609953] successful run completed in 300.00s (5 mins, 0.00 secs)
stress-ng: info: [3609968] dispatching hogs: 8 zero
stress-ng: info: [3609968] successful run completed in 300.00s (5 mins, 0.00 secs)
stress-ng: info: [3609979] dispatching hogs: 8 mlock
stress-ng: info: [3609979] successful run completed in 300.26s (5 mins, 0.26 secs)
stress-ng: info: [3610002] dispatching hogs: 8 mmapfork
stress-ng: info: [3610002] successful run completed in 300.48s (5 mins, 0.48 secs)
stress-ng: info: [3692061] dispatching hogs: 8 mmapmany
stress-ng: info: [3692061] successful run completed in 300.03s (5 mins, 0.03 secs)
stress-ng: info: [3692083] dispatching hogs: 8 mremap
stress-ng: info: [3692083] successful run completed in 300.70s (5 mins, 0.70 secs)
stress-ng: info: [3692103] dispatching hogs: 8 shm-sysv
stress-ng: info: [3692103] successful run completed in 301.00s (5 mins, 1.00 secs)
stress-ng: info: [3692127] dispatching hogs: 8 vm-splice
stress-ng: info: [3692127] successful run completed in 300.00s (5 mins, 0.00 secs)
stress-ng: info: [3692137] dispatching hogs: 8 malloc
stress-ng: info: [3692137] successful run completed in 377.02s (6 mins, 17.02 secs)
stress-ng: info: [3692161] dispatching hogs: 8 mincore
stress-ng: info: [3692161] successful run completed in 377.00s (6 mins, 17.00 secs)
stress-ng: info: [3692177] dispatching hogs: 8 vm
stress-ng: info: [3692177] successful run completed in 377.01s (6 mins, 17.01 secs)
stress-ng: info: [3692197] dispatching hogs: 8 bigheap
stress-ng: info: [3692197] successful run completed in 377.22s (6 mins, 17.22 secs)
stress-ng: info: [3692221] dispatching hogs: 8 brk
Reconnecting...
Reconnecting...
Reconnecting...
Rejoined session.
In progress: com.canonical.
stress-ng: info: [3692221] successful run completed in 379.32s (6 mins, 19.32 secs)
This run disconnected and never recovered:
stress-ng: info: [3349656] successful run completed in 922.02s (15 mins, 22.02 secs)
stress-ng: info: [3349749] dispatching hogs: 40 bigheap
stress-ng: info: [3349749] successful run completed in 922.55s (15 mins, 22.55 secs)
stress-ng: info: [3349845] dispatching hogs: 40 brk
Reconnecting...
Reconnecting...
Reconnecting...
Reconnecting...
Reconnecting...
Reconnecting...
Reconnecting...
Connection lost!
Service explicitly disconnected you. Possible reason: new remote connected to the service
+ EXITCODE=0
+ mkdir -p artifacts
+ cp launcher artifacts
+ find /home/ubuntu/ -name 'submission_
+ find /home/ubuntu/ -name 'submission_*.html' -exec mv '{}' artifacts/
+ find /home/ubuntu/ -name 'submission_*.xlsx' -exec mv '{}' artifacts/
+ find /home/ubuntu/ -name 'submission_
+ tar -xf artifacts/
tar: artifacts/
tar: Error is not recoverable: exiting now
+ mv submission.json artifacts
mv: cannot stat 'submission.json': No such file or directory
++ _run grep /boot/efi /proc/mounts
++ ssh -o StrictHostKeyCh
++ grep -o '.*[^0-9]'
++ cut -d ' ' -f 1
+ ROOT_DISK=/dev/sda
+ echo 'Zeroing Disk /dev/sda'
Zeroing Disk /dev/sda
+ _run sudo sgdisk -Z /dev/sda
+ ssh -o StrictHostKeyCh
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
2020-11-20 11:15:21,439 drapion INFO: DEVICE AGENT: END testrun
*******
* Starting testflinger cleanup phase on drapion *
*******
Cleaning up container if it exists...
drapion
complete
Note, in the drapion example, I did NOT at any time connect to the checkbox remote session from another machine. I kicked it off via testflinger, started polling the output and walked away waiting for the job to complete.