All alarm sanity cases failed on build 341 in RHOSP13 setup.
while debugging one particular test case test_vrouter_process_status_alarms, we see alarm getting generated with wrong host name.
when vrouter agent is stopped on overcloud-novacompute-2, alarm was seen for overcloud-novacompute-1.
{u'vrouter': [{u'name': u'overcloud-novacompute-1', u'value': {u'UVEAlarms': {u'alarms': [{u'severity': 2, u'alarm_rules': {u'or_list': [{u'and_list': [{u'condition': {u'operation': u'>=', u'operand1': u'VrouterStatsAgent.out_bps_ewm.*.sigma', u'variables': [u'VrouterStatsAgent.out_bps_ewm.__key'], u'operand2': {u'json_value': u'2'}}, u'match': [{u'json_operand1_value': u'3.0', u'json_variables': {u'VrouterStatsAgent.out_bps_ewm.__key': u'"enp4s0f1"'}}]}]}, {u'and_list': [{u'condition': {u'operation': u'>=', u'operand1': u'VrouterStatsAgent.in_bps_ewm.*.sigma', u'variables': [u'VrouterStatsAgent.in_bps_ewm.__key'], u'operand2': {u'json_value': u'2'}}, u'match': [{u'json_operand1_value': u'3.0', u'json_variables': {u'VrouterStatsAgent.in_bps_ewm.__key': u'"enp4s0f1"'}}]}]}]}, u'timestamp': 1541756354310490, u'ack': False, u'token': u'eyJ0aW1lc3RhbXAiOiAxNTQxNzU2MzU0MzEwNDkwLCAiaHR0cF9wb3J0IjogNTk5NSwgImhvc3RfaXAiOiAiMTAuMS4wLjI2In0=', u'type': u'default-global-system-config:system-defined-phyif-bandwidth', u'description': u'Physical Bandwidth usage anomaly.'}], u'__T': 1541756354311573}}}]}
tried stopping it manually on other computes too but alarm is always seen with name overcloud-novacompute-1.
still debugging the issue, will update the bug if i have more info further.
setup info:
(undercloud) [stack@queensa ~]$ openstack server list
| ID | Name | Status | Networks | Image | Flavor |
| 0127b6d9-0942-4047-9739-29d2cc9bf26f | overcloud-contrailcontroller-0 | ACTIVE | ctlplane= | overcloud-full | contrail-controller |
| 86879faf-293b-4be7-ae24-7bd268451be1 | overcloud-contrailcontroller-1 | ACTIVE | ctlplane= | overcloud-full | contrail-controller |
| 0e3e0f2e-8ef5-4ab8-8236-f58a0cae929b | overcloud-controller-1 | ACTIVE | ctlplane= | overcloud-full | control |
| 99509e79-4264-4e41-94bd-18252cb6428b | overcloud-controller-0 | ACTIVE | ctlplane= | overcloud-full | control |
| e847506d-2615-49b8-833e-432de751a5c0 | overcloud-novacompute-2 | ACTIVE | ctlplane= | overcloud-full | compute |
| 774b6e04-9704-4f33-b8c5-6c2a9ec0352e | overcloud-novacompute-0 | ACTIVE | ctlplane= | overcloud-full | compute |
| c0bda172-7039-47b8-92d0-03ad30be9103 | overcloud-controller-2 | ACTIVE | ctlplane= | overcloud-full | control |
| 59fc3a15-66d6-4e48-81ef-00e724ad2aed | overcloud-novacompute-1 | ACTIVE | ctlplane= | overcloud-full | compute |
| 79d3c0c0-305a-4e50-9502-ecde64276fd4 | overcloud-contrailcontroller-2 | ACTIVE | ctlplane= | overcloud-full | contrail-controller |
looks like alarm for diff. host name was for some other reason, now we don't see alarm getting generated when stopped the agent manually but the process status uve is updated with PROCESS_ STATE_EXITED which is expected. but the alarm could not get generated.
(Pdb) self.ops_ inspect[ collector_ ip].dict_ get('analytics/ uves/vrouter/ overcloud- novacompute- 2?flat' ) builder- juniper- contrail- ci-0000139243" , "build-id" : "5.0-341.el7", "build-number" : "@contrail"}]}', u'installed_ package_ version' : u'5.0-341.el7', u'deleted': False, u'disk_usage_info': {u'/dev/sda2': {u'partition_ space_available _1k': 963361544, u'partition_ space_used_ 1k': 13388732, u'percentage_ partition_ space_used' : 1, u'partition_type': u'xfs'}}, u'__T': 1541761913816681, u'running_ package_ version' : u'5.0-341.el7', u'process_ mem_cpu_ usage': {u'contrail- vrouter- nodemgr' : {u'mem_res': 32808, u'cpu_share': 0.41, u'mem_virt': 60456}}, u'system_ cpu_usage' : {u'fifteen_ min_avg' : 0.29, u'node_type': u'vrouter', u'cpu_share': 0.02, u'five_min_avg': 0.24, u'one_min_avg': 0.18}, u'system_ mem_usage' : {u'used': 2806072, u'cached': 4112024, u'free': 190961068, u'node_type': u'vrouter', u'total': 197881792, u'buffers': 2628}, u'process_status': [{u'instance_id': u'0', u'module_id': u'contrail- vrouter- nodemgr' , u'state': u'Functional', u'description': None, u'connection_ infos': [{u'server_addrs': [u'10.1. 0.16:8086' ], u'status': u'Up', u'type': u'Collector', u'name': None, u'description': u'ClientInit to Established on EvSandeshCtrlMe ssageRecv' }]}], u'system_cpu_info': {u'num_cpu': 32, u'num_core_ per_socket' : 8, u'num_thread_ per_core' : 2, u'num_socket': 2}, u'process_info': [{u'process_name': u'contrail- vrouter- agent', u'start_count': 9, u'process_state': u'PROCESS_ STATE_EXITED' , u'last_stop_time': None, u'core_file_list': [], u'last_start_time': u'1541761626829 708', u'stop_count': 0, u'last_exit_time': u'1541761652287 532', u'exit_count': 9}, {u'process_name': u'contrail- vrouter- nodemgr' , u'start_count': 1, u'process_state': u'PROCESS_ STATE_RUNNING' , u'last_stop_time': None, u'core_file_list': [], u'last_start_time': u'1541684145000 000', u'stop_count': 0, u'last_exit_time': None, u'exit_count': 0}]}, u'ContrailConfig': {u'deleted': False, u'__T': 1541743958781663, u'elements': {u'fq_name': u'["default- global- system- config" , "overcloud- novacompute- 2"]', u'parent_uuid': u'"47fe663e- 7f70-404b- adfb-00f579062a fe"', u'virtual_ router_ dpdk_enabled' : u'false', u'parent_type': u'"global- system- config" ', u'uuid': u'"e5c7c7e9- 4b21-4496- aa1c-e83f84b3aa 97"', u'perms2': u'{"owner": "cloud-admin", "owner_access": 7, "global_access": 0, "share": []}', u'id_perms': u'{"enable": true, "description": null, "created": "2018-11- 08T13:36: 36.070703" , "creator": null, "uuid": {"uuid_mslong": 165574223598526 96726, "uuid_lslong": 122579276453025 98295}, "user_visible": true, "last_modified": "2018-11- 08T17:32: 13.278946" , "permissions": {"owner": "admin", "owner_access": 7, "other_access": 7, "group": "admin", "group_access": 7}}', u'display_name': u'"overcl...
{u'NodeStatus': {u'build_info': u'{"build-info" : [{"build-version" : "5.0.2", "build-time" : "2018-11-07 07:55:22.713293", "build-user" : "zuul", "build-hostname" : "rhel-7-