Another instance, no upgrades have happened recently (in fact, I'm trying to prep the site for one).
Site is running 1.25.13 on Trusty. Juju is HA on machines 0 1 and 2, and there are several other machines. Cloud is Maas 1.9.
The units residing on machine 2 (not in LXC's but on the machine itself) are in state 'failed', I have tried restarting the machine and unit agents, the machines on 0 and 1 as well, all the juju-db's, and all the rsyslog daemons.
I ran mgopurge (1.6) with all the state servers stopped.
In the logs for the unit (with the log set to TRACE) I see the following when I try to run the following:
However the command never returns, the agents don't move away from failed status, and hooks don't run. I don't see anything in the machine log that looks related at all (can attach but there's potentially sensitive info would need scrubbing).
Also, I note there's a number of rsyslog connection attempts and frequent disconnects which could be a red herring or could be significant - e.g.
2018-03-06 03:15:08 INFO juju.worker.dependency engine.go:352 "rsyslog-config-updater" manifold worker stopped: dial tcp 10.28.16.13:6514: getsockopt: connection refused
2018-03-06 03:15:08 DEBUG juju.worker.dependency engine.go:444 restarting dependents of "rsyslog-config-updater" manifold
2018-03-06 03:15:08 INFO juju.worker.dependency engine.go:294 starting "rsyslog-config-updater" manifold worker in 3s...
2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:302 starting "rsyslog-config-updater" manifold worker
2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:268 "rsyslog-config-updater" manifold requested "agent" resource
2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:268 "rsyslog-config-updater" manifold requested "api-caller" resource
2018-03-06 03:15:11 DEBUG juju.worker.rsyslog worker.go:108 starting rsyslog worker mode 1 for "unit-os-cs-1" ""
2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:309 running "rsyslog-config-updater" manifold worker
2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:315 registered "rsyslog-config-updater" manifold worker
2018-03-06 03:15:11 INFO juju.worker.dependency engine.go:339 "rsyslog-config-updater" manifold worker started
2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:444 restarting dependents of "rsyslog-config-updater" manifold
2018-03-06 03:15:11 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.16.13:6514
2018-03-06 03:15:11 INFO juju.worker.dependency engine.go:352 "rsyslog-config-updater" manifold worker stopped: dial tcp 10.28.16.13:6514: getsockopt: connection refused
2018-03-06 03:15:11 DEBUG juju.worker.dependency engine.go:444 restarting dependents of "rsyslog-config-updater" manifold
2018-03-06 03:15:11 INFO juju.worker.dependency engine.go:294 starting "rsyslog-config-updater" manifold worker in 3s...
2018-03-06 03:15:12 DEBUG juju.worker.leadership tracker.go:138 os-cs/1 renewing lease for os-cs leadership
2018-03-06 03:15:12 DEBUG juju.worker.leadership tracker.go:165 checking os-cs/1 for os-cs leadership
2018-03-06 03:15:13 DEBUG juju.worker.leadership tracker.go:180 os-cs/1 confirmed for os-cs leadership until 2018-03-06 03:16:12.552651545 +0000 UTC
2018-03-06 03:15:13 INFO juju.worker.leadership tracker.go:182 os-cs/1 will renew os-cs leadership at 2018-03-06 03:15:42.552651545 +0000 UTC
2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:302 starting "rsyslog-config-updater" manifold worker
2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:268 "rsyslog-config-updater" manifold requested "agent" resource
2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:268 "rsyslog-config-updater" manifold requested "api-caller" resource
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:108 starting rsyslog worker mode 1 for "unit-os-cs-1" ""
2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:309 running "rsyslog-config-updater" manifold worker
2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:315 registered "rsyslog-config-updater" manifold worker
2018-03-06 03:15:14 INFO juju.worker.dependency engine.go:339 "rsyslog-config-updater" manifold worker started
2018-03-06 03:15:14 DEBUG juju.worker.dependency engine.go:444 restarting dependents of "rsyslog-config-updater" manifold
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.16.13:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.2.22:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.24.13:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.6.13:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.8.13:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.2.20:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.16.12:6514
At a similar time in syslog:
Mar 6 03:15:08 hostname rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="1778857" x-info="http://www.rsyslog.com"] exiting on signal 15.
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 14 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ]
Mar 6 03:15:12 hostname rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="1788531" x-info="http://www.rsyslog.com"] start
Mar 6 03:15:12 hostname rsyslogd-2307: warning: ~ action is deprecated, consider using the 'stop' statement instead [try http://www.rsyslog.com/e/2307 ]
Mar 6 03:15:12 hostname rsyslogd-2221: module 'imuxsock' already in this config, cannot be added
[try http://www.rsyslog.com/e/2221 ]
Mar 6 03:15:12 hostname rsyslogd: rsyslogd's groupid changed to 104
Mar 6 03:15:12 hostname rsyslogd: rsyslogd's userid changed to 101
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 4 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ]
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 5 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ]
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 6 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ]
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 7 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ]
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 8 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ]
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 9 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ]
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 11 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ]
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 10 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ]
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 12 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ]
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 13 queue[DA]', emergency switch to direct mode [try http://www.rsyslog.com/e/2040 ]
Mar 6 03:15:15 hostname rsyslogd-2083: gnutls returned error on handshake: A TLS warning alert has been received.
[try http://www.rsyslog.com/e/2083 ]
Mar 6 03:15:22 hostname rsyslogd-2027: imfile: could not persist state file machine-2 - data may be repeated on next startup. Is WorkDirectory set? [try http://www.rsyslog.com/e/2027 ]
I tried clearing out the rsyslog config from /etc/rsyslog.d/25-juju.conf, emptying out /var/spool/rsyslog to clean out any broken files (with rsyslog stopped), and restarting the machine agent, but the .qi etc files all came back immediately as did these errors.
Another instance, no upgrades have happened recently (in fact, I'm trying to prep the site for one).
Site is running 1.25.13 on Trusty. Juju is HA on machines 0 1 and 2, and there are several other machines. Cloud is Maas 1.9.
The units residing on machine 2 (not in LXC's but on the machine itself) are in state 'failed', I have tried restarting the machine and unit agents, the machines on 0 and 1 as well, all the juju-db's, and all the rsyslog daemons.
I ran mgopurge (1.6) with all the state servers stopped.
In the logs for the unit (with the log set to TRACE) I see the following when I try to run the following:
juju run --unit ceph/1 'uptime'
2018-03-06 03:10:12 DEBUG juju.worker.uniter runlistener.go:61 RunCommands: {Commands:uptime RelationId:-1 RemoteUnitName: ForceRemoteUnit :false}
2018-03-06 03:10:12 TRACE juju.worker.uniter uniter.go:336 run commands: uptime
However the command never returns, the agents don't move away from failed status, and hooks don't run. I don't see anything in the machine log that looks related at all (can attach but there's potentially sensitive info would need scrubbing).
Also, I note there's a number of rsyslog connection attempts and frequent disconnects which could be a red herring or could be significant - e.g. dependency engine.go:352 "rsyslog- config- updater" manifold worker stopped: dial tcp 10.28.16.13:6514: getsockopt: connection refused dependency engine.go:444 restarting dependents of "rsyslog- config- updater" manifold dependency engine.go:294 starting "rsyslog- config- updater" manifold worker in 3s... dependency engine.go:302 starting "rsyslog- config- updater" manifold worker dependency engine.go:268 "rsyslog- config- updater" manifold requested "agent" resource dependency engine.go:268 "rsyslog- config- updater" manifold requested "api-caller" resource dependency engine.go:309 running "rsyslog- config- updater" manifold worker dependency engine.go:315 registered "rsyslog- config- updater" manifold worker dependency engine.go:339 "rsyslog- config- updater" manifold worker started dependency engine.go:444 restarting dependents of "rsyslog- config- updater" manifold dependency engine.go:352 "rsyslog- config- updater" manifold worker stopped: dial tcp 10.28.16.13:6514: getsockopt: connection refused dependency engine.go:444 restarting dependents of "rsyslog- config- updater" manifold dependency engine.go:294 starting "rsyslog- config- updater" manifold worker in 3s... leadership tracker.go:138 os-cs/1 renewing lease for os-cs leadership leadership tracker.go:165 checking os-cs/1 for os-cs leadership leadership tracker.go:180 os-cs/1 confirmed for os-cs leadership until 2018-03-06 03:16:12.552651545 +0000 UTC leadership tracker.go:182 os-cs/1 will renew os-cs leadership at 2018-03-06 03:15:42.552651545 +0000 UTC dependency engine.go:302 starting "rsyslog- config- updater" manifold worker dependency engine.go:268 "rsyslog- config- updater" manifold requested "agent" resource dependency engine.go:268 "rsyslog- config- updater" manifold requested "api-caller" resource dependency engine.go:309 running "rsyslog- config- updater" manifold worker dependency engine.go:315 registered "rsyslog- config- updater" manifold worker dependency engine.go:339 "rsyslog- config- updater" manifold worker started dependency engine.go:444 restarting dependents of "rsyslog- config- updater" manifold
2018-03-06 03:15:08 INFO juju.worker.
2018-03-06 03:15:08 DEBUG juju.worker.
2018-03-06 03:15:08 INFO juju.worker.
2018-03-06 03:15:11 DEBUG juju.worker.
2018-03-06 03:15:11 DEBUG juju.worker.
2018-03-06 03:15:11 DEBUG juju.worker.
2018-03-06 03:15:11 DEBUG juju.worker.rsyslog worker.go:108 starting rsyslog worker mode 1 for "unit-os-cs-1" ""
2018-03-06 03:15:11 DEBUG juju.worker.
2018-03-06 03:15:11 DEBUG juju.worker.
2018-03-06 03:15:11 INFO juju.worker.
2018-03-06 03:15:11 DEBUG juju.worker.
2018-03-06 03:15:11 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.16.13:6514
2018-03-06 03:15:11 INFO juju.worker.
2018-03-06 03:15:11 DEBUG juju.worker.
2018-03-06 03:15:11 INFO juju.worker.
2018-03-06 03:15:12 DEBUG juju.worker.
2018-03-06 03:15:12 DEBUG juju.worker.
2018-03-06 03:15:13 DEBUG juju.worker.
2018-03-06 03:15:13 INFO juju.worker.
2018-03-06 03:15:14 DEBUG juju.worker.
2018-03-06 03:15:14 DEBUG juju.worker.
2018-03-06 03:15:14 DEBUG juju.worker.
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:108 starting rsyslog worker mode 1 for "unit-os-cs-1" ""
2018-03-06 03:15:14 DEBUG juju.worker.
2018-03-06 03:15:14 DEBUG juju.worker.
2018-03-06 03:15:14 INFO juju.worker.
2018-03-06 03:15:14 DEBUG juju.worker.
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.16.13:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.2.22:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.24.13:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.6.13:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.8.13:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.2.20:6514
2018-03-06 03:15:14 DEBUG juju.worker.rsyslog worker.go:225 making syslog connection for "juju-unit-os-cs-1" to 10.28.16.12:6514
At a similar time in syslog: www.rsyslog. com"] exiting on signal 15. www.rsyslog. com/e/2040 ] www.rsyslog. com"] start www.rsyslog. com/e/2307 ] www.rsyslog. com/e/2221 ] www.rsyslog. com/e/2040 ] www.rsyslog. com/e/2040 ] www.rsyslog. com/e/2040 ] www.rsyslog. com/e/2040 ] www.rsyslog. com/e/2040 ] www.rsyslog. com/e/2040 ] www.rsyslog. com/e/2040 ] www.rsyslog. com/e/2040 ] www.rsyslog. com/e/2040 ] www.rsyslog. com/e/2040 ] www.rsyslog. com/e/2083 ] www.rsyslog. com/e/2027 ]
Mar 6 03:15:08 hostname rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="1778857" x-info="http://
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 14 queue[DA]', emergency switch to direct mode [try http://
Mar 6 03:15:12 hostname rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="1788531" x-info="http://
Mar 6 03:15:12 hostname rsyslogd-2307: warning: ~ action is deprecated, consider using the 'stop' statement instead [try http://
Mar 6 03:15:12 hostname rsyslogd-2221: module 'imuxsock' already in this config, cannot be added
[try http://
Mar 6 03:15:12 hostname rsyslogd: rsyslogd's groupid changed to 104
Mar 6 03:15:12 hostname rsyslogd: rsyslogd's userid changed to 101
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 4 queue[DA]', emergency switch to direct mode [try http://
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 5 queue[DA]', emergency switch to direct mode [try http://
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 6 queue[DA]', emergency switch to direct mode [try http://
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 7 queue[DA]', emergency switch to direct mode [try http://
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 8 queue[DA]', emergency switch to direct mode [try http://
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 9 queue[DA]', emergency switch to direct mode [try http://
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 11 queue[DA]', emergency switch to direct mode [try http://
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 10 queue[DA]', emergency switch to direct mode [try http://
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 12 queue[DA]', emergency switch to direct mode [try http://
Mar 6 03:15:12 hostname rsyslogd-2040: fatal error on disk queue 'action 13 queue[DA]', emergency switch to direct mode [try http://
Mar 6 03:15:15 hostname rsyslogd-2083: gnutls returned error on handshake: A TLS warning alert has been received.
[try http://
Mar 6 03:15:22 hostname rsyslogd-2027: imfile: could not persist state file machine-2 - data may be repeated on next startup. Is WorkDirectory set? [try http://
I tried clearing out the rsyslog config from /etc/rsyslog. d/25-juju. conf, emptying out /var/spool/rsyslog to clean out any broken files (with rsyslog stopped), and restarting the machine agent, but the .qi etc files all came back immediately as did these errors.