data corruption with CEPH & gnocchi-metricd leades to delete whole CEPH pool and loose all data
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Gnocchi |
Fix Released
|
High
|
Mehdi Abaakouk |
Bug Description
When gnocchi-metricd ( on master branch and stable/1.2 ) writes on CEPH, if you kill gnocchi-metricd with :
# pkill -f gnocchi-metricd
Metricd apparently lefts corrupted files on CEPH while killing it, so restarting it, never resumes and stales so you just see metrics accumulating from gnocchi-api.
-----------------
2015-09-28 18:39:26.376 19169 INFO gnocchi.cli [-] Metricd reporting: 58 measurements bundles across 54 metrics wait to be processed.
2015-09-28 18:49:26.376 19169 INFO gnocchi.cli [-] Metricd reporting: 68 measurements bundles across 85 metrics wait to be processed.
2015-09-28 18:53:26.376 19169 INFO gnocchi.cli [-] Metricd reporting: 88 measurements bundles across 99 metrics wait to be processed.
-----------------
Debugin a little, we see that when metricd stales, it does geting in and out of this function (reading CEPH xattr)
https:/
Never gets out of this caller function :
https:/
And never return to carbonara caller (that does return when this are working fine while processing measures ) at :
https:/
This is critical since there's no way to fix the issue because there's no message on the logs ( on metricd nor gnocchi-api ) to find what's or what're the file/s that are corrupted, you need to destroy the whole pool / delete all rados objects.
gnocchi.conf example:
[DEFAULT]
debug = True
verbose = True
log_file = /var/log/
[api]
port = 8041
host = 0.0.0.0
workers = 2
[archive_policy]
[database]
[indexer]
url = mysql:/
[keystone_
signing_dir = /var/cache/gnocchi
auth_uri = http://
auth_url = http://
project_domain_id = default
project_name = service
project_name = admin
password = MYSUPERPASSWD
username = cloudadmin
auth_plugin = password
memcached_servers = memcache2:
memcache_
memcache_secret_key = LE9_s0kyh7Z_
[metricd]
[oslo_policy]
[statsd]
[storage]
driver = ceph
metric_
ceph_pool = gnocchi
ceph_username = gnocchi
ceph_keyring = /etc/ceph/
ceph_conffile = /etc/ceph/ceph.conf
file_basepath = /var/lib/gnocchi
file_basepath_tmp = ${file_
summary: |
- data corruption with CEPH & gnocchi-metricd leaves to delete whole CEPH - pool + data corruption with CEPH & gnocchi-metricd leades to delete whole CEPH + pool and loose all data |
Changed in gnocchi: | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in gnocchi: | |
assignee: | Mehdi Abaakouk (sileht) → Chris Dent (cdent) |
Changed in gnocchi: | |
assignee: | Chris Dent (cdent) → Mehdi Abaakouk (sileht) |
Changed in gnocchi: | |
milestone: | none → 1.3.0 |
status: | Fix Committed → Fix Released |
Fix proposed to branch: master /review. openstack. org/232061
Review: https:/