Activity log for bug #1569159

Date Who What changed Old value New value Message
2016-04-12 05:20:09 kalagesan bug added bug
2016-04-12 05:20:09 kalagesan attachment added failed_tor-agent_after_deletedpng.png https://bugs.launchpad.net/bugs/1569159/+attachment/4633576/+files/failed_tor-agent_after_deletedpng.png
2016-04-12 05:21:25 kalagesan description Hello, I'm Koji Yoshida from CTC. Please investigate the problem below. a Koji ################################ 1. Problem Occurrence Date & Time ################################# ################# 2. Service Impact ################# none ################# 3. Current Status ################# Problem happened at [ x ] LAB [ ] Production Network Current Status [ ] Still Happening [ x ] Recovered [ ] Auto Recovery [ ] Hardware Replacement [ x ] CLI Operation [ ] Other ###################### 4. Problem Description ###################### Customer deleted a tor-agent with running delete_tor_agent_by_id on build server. Even after the completion of the task, the deleted tor-agent remain on Web GUI: Monitor > Infrastructure > Virtual Routers pane with the status of "System Information unavailable,Configuration unavailable". Customer restarted supervisor-vrouter service, and it was recovered. In order to delete this, they had to restart one of the remaining services except contrail-vrouter-nodemgr on the TSN. This operation can impact the BUM traffic through TSN. When they restarted contrail-vrouter-nodemgr, then the status of every vrouter-agent or tor-agent become "Process States unavailable" and the situation got worse. I am able to reproduce this issue in my lab 2.21 setup ################################### Troubleshooting & Recovery Steps ################################### Currently, delete_tor_agent_by_index called by delete_tor_agent_by_id firstly stops and delete the service of tor-agent, then delete the definition of physical router and virtual router through Config API. I think this is the cause of the issue. If we firstly remove the definition of physical rotuer and virtual router and then stop and delete the service, the deleted object doesn't remain on Web GUI. Here is a temporal fix for delete_tor_agent_by_index: ----------------------------------------------------------------------------- @task def delete_tor_agent_by_index(index, node_info, restart=True): '''Disable tor agent functionality in particular node. USAGE: fab delete_tor_agent_by_index:0,root@1.1.1.2 ''' i = int(index) host_string = node_info with settings(host_string=host_string): toragent_dict = getattr(env,'tor_agent', None) if not host_string in toragent_dict: print 'tor-agent entry for %s does not exist in testbed file' \ %(host_string) return if not i < len(toragent_dict[host_string]): print 'tor-agent entry for host %s and index %d does not exist in '\ 'testbed file' %(host_string, i) return # Populate the argument to pass for setup-vnc-tor-agent tor_id = int(get_tor_agent_id(toragent_dict[host_string][i])) if tor_id == -1: return tor_name= toragent_dict[host_string][i]['tor_name'] tor_vendor_name= toragent_dict[host_string][i]['tor_vendor_name'] tgt_hostname = sudo("hostname") # Default agent name agent_name = tgt_hostname + '-' + str(tor_id) # If tor_agent_name is not specified or if its value is not # specified use default agent name tor_agent_name = '' if 'tor_agent_name' in toragent_dict[host_string][i]: tor_agent_name = toragent_dict[host_string][i]['tor_agent_name'] if tor_agent_name != None: tor_agent_name = tor_agent_name.strip() if tor_agent_name == None or not tor_agent_name: tor_agent_name = agent_name cfgm_host = get_control_host_string(env.roledefs['cfgm'][0]) cfgm_host_password = get_env_passwords(env.roledefs['cfgm'][0]) cfgm_ip = get_contrail_internal_vip() or hstr_to_ip(cfgm_host) cfgm_user = env.roledefs['cfgm'][0].split('@')[0] cfgm_passwd = get_env_passwords(env.roledefs['cfgm'][0]) compute_host = get_control_host_string(host_string) (tgt_ip, tgt_gw) = get_data_ip(host_string) compute_mgmt_ip= host_string.split('@')[1] compute_control_ip= hstr_to_ip(compute_host) admin_tenant_name = get_admin_tenant_name() admin_user, admin_password = get_authserver_credentials() authserver_ip = get_authserver_ip() prov_args = "--host_name %s --host_ip %s --api_server_ip %s --oper del " \ "--admin_user %s --admin_password %s --admin_tenant_name %s\ --openstack_ip %s" \ %(tor_agent_name, compute_control_ip, cfgm_ip, admin_user, admin_password, admin_tenant_name, authserver_ip) pr_args = "--device_name %s --vendor_name %s --api_server_ip %s\ --oper del --admin_user %s --admin_password %s\ --admin_tenant_name %s --openstack_ip %s"\ %(tor_name, tor_vendor_name, cfgm_ip, admin_user, admin_password, admin_tenant_name, authserver_ip) with settings(host_string=env.roledefs['cfgm'][0], password=cfgm_passwd): sudo("python /opt/contrail/utils/provision_physical_device.py %s" %(pr_args)) sudo("python /opt/contrail/utils/provision_vrouter.py %s" %(prov_args)) # Stop tor-agent process tor_process_name = 'contrail-tor-agent-' + str(tor_id) cmd = 'service ' + tor_process_name + ' stop' sudo(cmd) # Remove tor-agent config file tor_file_name = '/etc/contrail/' + tor_process_name + '.conf' if exists(tor_file_name, use_sudo=True): remove_file(tor_file_name) # Remove tor-agent INI file used by supervisord tor_ini_file_name = '/etc/contrail/supervisord_vrouter_files/' + tor_process_name + '.ini' if exists(tor_ini_file_name, use_sudo=True): remove_file(tor_ini_file_name) # Remove tor-agent init file tor_init_file = '/etc/init.d/' + tor_process_name if exists(tor_init_file, use_sudo=True): remove_file(tor_init_file) # If SSL files generated for tor-agent exists, remove them cert_file = "/etc/contrail/ssl/certs/tor." + str(tor_id) + ".cert.pem" privkey_file = "/etc/contrail/ssl/private/tor." + str(tor_id) + ".privkey.pem" if exists(cert_file, use_sudo=True): remove_file(cert_file) if exists(privkey_file, use_sudo=True): remove_file(privkey_file) if exists('/etc/contrail/ssl/certs/cacert.pem', use_sudo=True): remove_file('/etc/contrail/ssl/certs/cacert.pem') if restart: sudo("supervisorctl -c /etc/contrail/supervisord_vrouter.conf update") Steps to Reproduce 1. Delete tor-agent with delete_tor_agent_by_index. 2. Check the status of tor-agent in Monitor pane of Virtual Routers on Web GUI. Customer deleted a tor-agent with running delete_tor_agent_by_id on build server. Even after the completion of the task, the deleted tor-agent remain on Web GUI: Monitor > Infrastructure > Virtual Routers pane with the status of "System Information unavailable,Configuration unavailable". Customer restarted supervisor-vrouter service, and it was recovered. In order to delete this, they had to restart one of the remaining services except contrail-vrouter-nodemgr on the TSN. This operation can impact the BUM traffic through TSN. When they restarted contrail-vrouter-nodemgr, then the status of every vrouter-agent or tor-agent become "Process States unavailable" and the situation got worse. I am able to reproduce this issue in my lab 2.21 setup ################################### Troubleshooting & Recovery Steps ################################### Currently, delete_tor_agent_by_index called by delete_tor_agent_by_id firstly stops and delete the service of tor-agent, then delete the definition of physical router and virtual router through Config API. I think this is the cause of the issue. If we firstly remove the definition of physical rotuer and virtual router and then stop and delete the service, the deleted object doesn't remain on Web GUI. Here is a temporal fix for delete_tor_agent_by_index: ----------------------------------------------------------------------------- @task def delete_tor_agent_by_index(index, node_info, restart=True): '''Disable tor agent functionality in particular node. USAGE: fab delete_tor_agent_by_index:0,root@1.1.1.2 ''' i = int(index) host_string = node_info with settings(host_string=host_string): toragent_dict = getattr(env,'tor_agent', None) if not host_string in toragent_dict: print 'tor-agent entry for %s does not exist in testbed file' \ %(host_string) return if not i < len(toragent_dict[host_string]): print 'tor-agent entry for host %s and index %d does not exist in '\ 'testbed file' %(host_string, i) return # Populate the argument to pass for setup-vnc-tor-agent tor_id = int(get_tor_agent_id(toragent_dict[host_string][i])) if tor_id == -1: return tor_name= toragent_dict[host_string][i]['tor_name'] tor_vendor_name= toragent_dict[host_string][i]['tor_vendor_name'] tgt_hostname = sudo("hostname") # Default agent name agent_name = tgt_hostname + '-' + str(tor_id) # If tor_agent_name is not specified or if its value is not # specified use default agent name tor_agent_name = '' if 'tor_agent_name' in toragent_dict[host_string][i]: tor_agent_name = toragent_dict[host_string][i]['tor_agent_name'] if tor_agent_name != None: tor_agent_name = tor_agent_name.strip() if tor_agent_name == None or not tor_agent_name: tor_agent_name = agent_name cfgm_host = get_control_host_string(env.roledefs['cfgm'][0]) cfgm_host_password = get_env_passwords(env.roledefs['cfgm'][0]) cfgm_ip = get_contrail_internal_vip() or hstr_to_ip(cfgm_host) cfgm_user = env.roledefs['cfgm'][0].split('@')[0] cfgm_passwd = get_env_passwords(env.roledefs['cfgm'][0]) compute_host = get_control_host_string(host_string) (tgt_ip, tgt_gw) = get_data_ip(host_string) compute_mgmt_ip= host_string.split('@')[1] compute_control_ip= hstr_to_ip(compute_host) admin_tenant_name = get_admin_tenant_name() admin_user, admin_password = get_authserver_credentials() authserver_ip = get_authserver_ip() prov_args = "--host_name %s --host_ip %s --api_server_ip %s --oper del " \ "--admin_user %s --admin_password %s --admin_tenant_name %s\ --openstack_ip %s" \ %(tor_agent_name, compute_control_ip, cfgm_ip, admin_user, admin_password, admin_tenant_name, authserver_ip) pr_args = "--device_name %s --vendor_name %s --api_server_ip %s\ --oper del --admin_user %s --admin_password %s\ --admin_tenant_name %s --openstack_ip %s"\ %(tor_name, tor_vendor_name, cfgm_ip, admin_user, admin_password, admin_tenant_name, authserver_ip) with settings(host_string=env.roledefs['cfgm'][0], password=cfgm_passwd): sudo("python /opt/contrail/utils/provision_physical_device.py %s" %(pr_args)) sudo("python /opt/contrail/utils/provision_vrouter.py %s" %(prov_args)) # Stop tor-agent process tor_process_name = 'contrail-tor-agent-' + str(tor_id) cmd = 'service ' + tor_process_name + ' stop' sudo(cmd) # Remove tor-agent config file tor_file_name = '/etc/contrail/' + tor_process_name + '.conf' if exists(tor_file_name, use_sudo=True): remove_file(tor_file_name) # Remove tor-agent INI file used by supervisord tor_ini_file_name = '/etc/contrail/supervisord_vrouter_files/' + tor_process_name + '.ini' if exists(tor_ini_file_name, use_sudo=True): remove_file(tor_ini_file_name) # Remove tor-agent init file tor_init_file = '/etc/init.d/' + tor_process_name if exists(tor_init_file, use_sudo=True): remove_file(tor_init_file) # If SSL files generated for tor-agent exists, remove them cert_file = "/etc/contrail/ssl/certs/tor." + str(tor_id) + ".cert.pem" privkey_file = "/etc/contrail/ssl/private/tor." + str(tor_id) + ".privkey.pem" if exists(cert_file, use_sudo=True): remove_file(cert_file) if exists(privkey_file, use_sudo=True): remove_file(privkey_file) if exists('/etc/contrail/ssl/certs/cacert.pem', use_sudo=True): remove_file('/etc/contrail/ssl/certs/cacert.pem') if restart: sudo("supervisorctl -c /etc/contrail/supervisord_vrouter.conf update") => Steps to Reproduce 1. Delete tor-agent with delete_tor_agent_by_index. 2. Check the status of tor-agent in Monitor pane of Virtual Routers on Web GUI.
2016-04-12 05:24:54 kalagesan bug added subscriber Koji Yoshida
2016-04-12 13:05:53 Koji Yoshida information type Proprietary Public
2016-04-19 09:40:26 Rahul tags analytics
2016-04-19 09:40:45 Rahul juniperopenstack: assignee Raj Reddy (rajreddy)
2016-04-27 21:27:26 Raj Reddy juniperopenstack: assignee Raj Reddy (rajreddy) Hari Prasad Killi (haripk)
2016-04-27 21:27:41 Raj Reddy tags analytics analytics vrouter
2016-05-12 10:45:33 Hari Prasad Killi juniperopenstack: assignee Hari Prasad Killi (haripk) RAVI KIRAN (ravibk)
2016-05-16 17:54:20 OpenContrail Admin nominated for series juniperopenstack/r2.21.x
2016-05-16 17:54:20 OpenContrail Admin bug task added juniperopenstack/r2.21.x
2016-05-16 17:54:20 OpenContrail Admin bug task added juniperopenstack/r2.21.x
2016-05-24 00:21:16 Raj Reddy nominated for series juniperopenstack/r2.20
2016-05-24 00:21:16 Raj Reddy bug task added juniperopenstack/r2.20
2016-05-24 00:21:16 Raj Reddy nominated for series juniperopenstack/r3.0
2016-05-24 00:21:16 Raj Reddy bug task added juniperopenstack/r3.0
2016-05-24 00:21:16 Raj Reddy nominated for series juniperopenstack/r2.22.x
2016-05-24 00:21:16 Raj Reddy bug task added juniperopenstack/r2.22.x
2016-05-24 00:21:42 Raj Reddy juniperopenstack/r2.20: assignee RAVI KIRAN (ravibk)
2016-05-24 00:21:49 Raj Reddy juniperopenstack/r2.22.x: assignee RAVI KIRAN (ravibk)
2016-05-24 00:21:56 Raj Reddy juniperopenstack/r3.0: assignee RAVI KIRAN (ravibk)
2016-05-24 00:48:45 Raj Reddy tags analytics vrouter vrouter
2016-05-25 07:04:07 OpenContrail Admin juniperopenstack/r2.21.x: status In Progress Fix Committed
2016-05-25 07:04:08 OpenContrail Admin juniperopenstack/r2.21.x: milestone r2.21.2
2016-05-25 08:09:27 OpenContrail Admin nominated for series juniperopenstack/trunk
2016-05-25 08:09:27 OpenContrail Admin bug task added juniperopenstack/trunk
2016-05-25 08:09:27 OpenContrail Admin bug task added juniperopenstack/trunk
2016-05-25 17:33:17 OpenContrail Admin juniperopenstack/r3.0: status New In Progress
2016-05-26 04:15:37 OpenContrail Admin juniperopenstack/r2.22.x: status New In Progress
2016-05-28 10:11:58 OpenContrail Admin juniperopenstack/r3.0: status In Progress Fix Committed
2016-05-28 10:11:59 OpenContrail Admin juniperopenstack/r3.0: milestone r3.0.2.0
2016-05-29 18:30:44 OpenContrail Admin juniperopenstack/r2.20: status New In Progress
2016-05-30 04:41:16 OpenContrail Admin juniperopenstack/r2.20: status In Progress Fix Committed
2016-05-30 04:41:17 OpenContrail Admin juniperopenstack/r2.20: milestone r2.23
2016-05-30 13:43:37 OpenContrail Admin juniperopenstack/trunk: status In Progress Fix Committed
2016-05-30 13:43:38 OpenContrail Admin juniperopenstack/trunk: milestone r3.1.0.0-fcs
2016-05-31 04:29:21 OpenContrail Admin juniperopenstack/r2.22.x: status In Progress Fix Committed
2016-05-31 04:29:22 OpenContrail Admin juniperopenstack/r2.22.x: milestone r2.22.3