Some units not reporting swift usage
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Landscape Client |
Fix Committed
|
Undecided
|
Simon Poirier | ||
landscape-client (Ubuntu) |
Fix Released
|
Undecided
|
Andreas Hasenack |
Bug Description
landscape-client 16.04~bzr841-
I had a ceph/swift cloud deploy where for some reason just 1/3 of the swift units were reporting swift data. Two of them were saying this:
2016-06-02 02:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
The openstack dashboard in landscape was reporting just 1/3 of the swift storage (see screenshot).
swift recon was showing all the storage (also see attached):
Disk usage: space used: 1996472320 of 199605878784
Disk usage: space free: 197609406464 of 199605878784
Disk usage: lowest: 0.11%, highest: 3.48%, avg: 1.00020717434%
This was all seen several hours after the deployment finished, almost half a day.
I then decided to restart landscape-client in the foreground, to see if there were any backtraces (that's the usual trick, because backtraces in the swift plugin are lost, see bug #1563565). To my surprise, the swift plugin started reporting data.
monitor log covering the time when it was broken, and after my restart where at first I ran it in the foreground, and then in the background with a shorter reporting interval:
# grep Swift monitor.log
2016-06-01 22:08:46,449 INFO [MainThread] Registering plugin landscape.
2016-06-01 23:08:46,451 WARNING [MainThread] 1 of 720 expected Swift device usage snapshot events (0.14%) occurred in the last 3600.00s.
2016-06-02 00:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 01:08:46,451 WARNING [MainThread] 0 of 720 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 02:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 03:08:46,451 WARNING [MainThread] 0 of 720 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 04:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 05:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 06:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 07:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 08:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 09:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 10:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 11:08:46,450 WARNING [MainThread] 0 of 720 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 12:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 12:54:05,236 WARNING [MainThread] 0 of 543 expected Swift device usage snapshot events (0.00%) occurred in the last 2718.79s.
2016-06-02 12:57:02,272 INFO [MainThread] Registering plugin landscape.
2016-06-02 12:57:29,322 INFO [MainThread] 5 of 5 expected Swift device usage snapshot events (100.00%) occurred in the last 27.05s.
2016-06-02 12:58:10,375 INFO [MainThread] Registering plugin landscape.
2016-06-02 13:02:04,883 INFO [MainThread] 46 of 46 expected Swift device usage snapshot events (100.00%) occurred in the last 234.51s.
2016-06-02 13:02:07,217 INFO [MainThread] Registering plugin landscape.
2016-06-02 13:04:07,218 INFO [MainThread] 23 of 24 expected Swift device usage snapshot events (95.83%) occurred in the last 120.00s.
2016-06-02 13:06:07,218 INFO [MainThread] 24 of 23 expected Swift device usage snapshot events (104.35%) occurred in the last 120.00s.
And indeed, after I restarted the clients on the two broken units, it all worked as it should. You can see the jump in the graph in the attached screenshot.
It's not clear how to debug this should it happen in a live system again.
Related branches
- Free Ekanayaka (community): Approve
- 🤖 Landscape Builder: Approve (test results)
- Geoff Teale (community): Approve
-
Diff: 161 lines (+71/-13)6 files modifiedlandscape/broker/client.py (+15/-3)
landscape/broker/tests/test_client.py (+23/-0)
landscape/lib/monitor.py (+6/-10)
landscape/lib/tests/test_monitor.py (+12/-0)
landscape/monitor/swiftusage.py (+3/-0)
landscape/monitor/tests/test_swiftusage.py (+12/-0)
tags: | removed: kanban |
Changed in landscape-client: | |
assignee: | nobody → Simon Poirier (simpoir) |
Changed in landscape-client: | |
status: | New → In Progress |
Changed in landscape-client: | |
status: | In Progress → Fix Committed |
Changed in landscape-client (Ubuntu): | |
assignee: | nobody → Andreas Hasenack (ahasenack) |
status: | New → In Progress |
output of swift-recon --all