2016-04-12 05:21:25 |
kalagesan |
description |
Hello,
I'm Koji Yoshida from CTC.
Please investigate the problem below.
a
Koji
################################
1. Problem Occurrence Date & Time
#################################
#################
2. Service Impact
#################
none
#################
3. Current Status
#################
Problem happened at
[ x ] LAB
[ ] Production Network
Current Status
[ ] Still Happening
[ x ] Recovered
[ ] Auto Recovery
[ ] Hardware Replacement
[ x ] CLI Operation
[ ] Other
######################
4. Problem Description
######################
Customer deleted a tor-agent with running delete_tor_agent_by_id on build server.
Even after the completion of the task, the deleted tor-agent remain on Web GUI:
Monitor > Infrastructure > Virtual Routers pane with the status of "System
Information unavailable,Configuration unavailable".
Customer restarted supervisor-vrouter service, and it was recovered.
In order to delete this, they had to restart one of the remaining services except
contrail-vrouter-nodemgr on the TSN. This operation can impact the BUM traffic
through TSN.
When they restarted contrail-vrouter-nodemgr, then the status of every
vrouter-agent or tor-agent become "Process States unavailable" and the situation
got worse.
I am able to reproduce this issue in my lab 2.21 setup
###################################
Troubleshooting & Recovery Steps
###################################
Currently, delete_tor_agent_by_index called by delete_tor_agent_by_id
firstly stops and delete the service of tor-agent, then delete
the definition of physical router and virtual router through Config API.
I think this is the cause of the issue.
If we firstly remove the definition of physical rotuer and virtual router
and then stop and delete the service, the deleted object doesn't remain
on Web GUI.
Here is a temporal fix for delete_tor_agent_by_index:
-----------------------------------------------------------------------------
@task
def delete_tor_agent_by_index(index, node_info, restart=True):
'''Disable tor agent functionality in particular node.
USAGE: fab delete_tor_agent_by_index:0,root@1.1.1.2
'''
i = int(index)
host_string = node_info
with settings(host_string=host_string):
toragent_dict = getattr(env,'tor_agent', None)
if not host_string in toragent_dict:
print 'tor-agent entry for %s does not exist in testbed file' \
%(host_string)
return
if not i < len(toragent_dict[host_string]):
print 'tor-agent entry for host %s and index %d does not exist in '\
'testbed file' %(host_string, i)
return
# Populate the argument to pass for setup-vnc-tor-agent
tor_id = int(get_tor_agent_id(toragent_dict[host_string][i]))
if tor_id == -1:
return
tor_name= toragent_dict[host_string][i]['tor_name']
tor_vendor_name= toragent_dict[host_string][i]['tor_vendor_name']
tgt_hostname = sudo("hostname")
# Default agent name
agent_name = tgt_hostname + '-' + str(tor_id)
# If tor_agent_name is not specified or if its value is not
# specified use default agent name
tor_agent_name = ''
if 'tor_agent_name' in toragent_dict[host_string][i]:
tor_agent_name = toragent_dict[host_string][i]['tor_agent_name']
if tor_agent_name != None:
tor_agent_name = tor_agent_name.strip()
if tor_agent_name == None or not tor_agent_name:
tor_agent_name = agent_name
cfgm_host = get_control_host_string(env.roledefs['cfgm'][0])
cfgm_host_password = get_env_passwords(env.roledefs['cfgm'][0])
cfgm_ip = get_contrail_internal_vip() or hstr_to_ip(cfgm_host)
cfgm_user = env.roledefs['cfgm'][0].split('@')[0]
cfgm_passwd = get_env_passwords(env.roledefs['cfgm'][0])
compute_host = get_control_host_string(host_string)
(tgt_ip, tgt_gw) = get_data_ip(host_string)
compute_mgmt_ip= host_string.split('@')[1]
compute_control_ip= hstr_to_ip(compute_host)
admin_tenant_name = get_admin_tenant_name()
admin_user, admin_password = get_authserver_credentials()
authserver_ip = get_authserver_ip()
prov_args = "--host_name %s --host_ip %s --api_server_ip %s --oper del " \
"--admin_user %s --admin_password %s --admin_tenant_name %s\
--openstack_ip %s" \
%(tor_agent_name, compute_control_ip, cfgm_ip, admin_user,
admin_password, admin_tenant_name, authserver_ip)
pr_args = "--device_name %s --vendor_name %s --api_server_ip %s\
--oper del --admin_user %s --admin_password %s\
--admin_tenant_name %s --openstack_ip %s"\
%(tor_name, tor_vendor_name, cfgm_ip, admin_user, admin_password,
admin_tenant_name, authserver_ip)
with settings(host_string=env.roledefs['cfgm'][0], password=cfgm_passwd):
sudo("python /opt/contrail/utils/provision_physical_device.py %s" %(pr_args))
sudo("python /opt/contrail/utils/provision_vrouter.py %s" %(prov_args))
# Stop tor-agent process
tor_process_name = 'contrail-tor-agent-' + str(tor_id)
cmd = 'service ' + tor_process_name + ' stop'
sudo(cmd)
# Remove tor-agent config file
tor_file_name = '/etc/contrail/' + tor_process_name + '.conf'
if exists(tor_file_name, use_sudo=True):
remove_file(tor_file_name)
# Remove tor-agent INI file used by supervisord
tor_ini_file_name = '/etc/contrail/supervisord_vrouter_files/' + tor_process_name + '.ini'
if exists(tor_ini_file_name, use_sudo=True):
remove_file(tor_ini_file_name)
# Remove tor-agent init file
tor_init_file = '/etc/init.d/' + tor_process_name
if exists(tor_init_file, use_sudo=True):
remove_file(tor_init_file)
# If SSL files generated for tor-agent exists, remove them
cert_file = "/etc/contrail/ssl/certs/tor." + str(tor_id) + ".cert.pem"
privkey_file = "/etc/contrail/ssl/private/tor." + str(tor_id) + ".privkey.pem"
if exists(cert_file, use_sudo=True):
remove_file(cert_file)
if exists(privkey_file, use_sudo=True):
remove_file(privkey_file)
if exists('/etc/contrail/ssl/certs/cacert.pem', use_sudo=True):
remove_file('/etc/contrail/ssl/certs/cacert.pem')
if restart:
sudo("supervisorctl -c /etc/contrail/supervisord_vrouter.conf update")
Steps to Reproduce
1. Delete tor-agent with delete_tor_agent_by_index.
2. Check the status of tor-agent in Monitor pane of Virtual Routers on Web GUI. |
Customer deleted a tor-agent with running delete_tor_agent_by_id on build server.
Even after the completion of the task, the deleted tor-agent remain on Web GUI:
Monitor > Infrastructure > Virtual Routers pane with the status of "System
Information unavailable,Configuration unavailable".
Customer restarted supervisor-vrouter service, and it was recovered.
In order to delete this, they had to restart one of the remaining services except
contrail-vrouter-nodemgr on the TSN. This operation can impact the BUM traffic
through TSN.
When they restarted contrail-vrouter-nodemgr, then the status of every
vrouter-agent or tor-agent become "Process States unavailable" and the situation
got worse.
I am able to reproduce this issue in my lab 2.21 setup
###################################
Troubleshooting & Recovery Steps
###################################
Currently, delete_tor_agent_by_index called by delete_tor_agent_by_id
firstly stops and delete the service of tor-agent, then delete
the definition of physical router and virtual router through Config API.
I think this is the cause of the issue.
If we firstly remove the definition of physical rotuer and virtual router
and then stop and delete the service, the deleted object doesn't remain
on Web GUI.
Here is a temporal fix for delete_tor_agent_by_index:
-----------------------------------------------------------------------------
@task
def delete_tor_agent_by_index(index, node_info, restart=True):
'''Disable tor agent functionality in particular node.
USAGE: fab delete_tor_agent_by_index:0,root@1.1.1.2
'''
i = int(index)
host_string = node_info
with settings(host_string=host_string):
toragent_dict = getattr(env,'tor_agent', None)
if not host_string in toragent_dict:
print 'tor-agent entry for %s does not exist in testbed file' \
%(host_string)
return
if not i < len(toragent_dict[host_string]):
print 'tor-agent entry for host %s and index %d does not exist in '\
'testbed file' %(host_string, i)
return
# Populate the argument to pass for setup-vnc-tor-agent
tor_id = int(get_tor_agent_id(toragent_dict[host_string][i]))
if tor_id == -1:
return
tor_name= toragent_dict[host_string][i]['tor_name']
tor_vendor_name= toragent_dict[host_string][i]['tor_vendor_name']
tgt_hostname = sudo("hostname")
# Default agent name
agent_name = tgt_hostname + '-' + str(tor_id)
# If tor_agent_name is not specified or if its value is not
# specified use default agent name
tor_agent_name = ''
if 'tor_agent_name' in toragent_dict[host_string][i]:
tor_agent_name = toragent_dict[host_string][i]['tor_agent_name']
if tor_agent_name != None:
tor_agent_name = tor_agent_name.strip()
if tor_agent_name == None or not tor_agent_name:
tor_agent_name = agent_name
cfgm_host = get_control_host_string(env.roledefs['cfgm'][0])
cfgm_host_password = get_env_passwords(env.roledefs['cfgm'][0])
cfgm_ip = get_contrail_internal_vip() or hstr_to_ip(cfgm_host)
cfgm_user = env.roledefs['cfgm'][0].split('@')[0]
cfgm_passwd = get_env_passwords(env.roledefs['cfgm'][0])
compute_host = get_control_host_string(host_string)
(tgt_ip, tgt_gw) = get_data_ip(host_string)
compute_mgmt_ip= host_string.split('@')[1]
compute_control_ip= hstr_to_ip(compute_host)
admin_tenant_name = get_admin_tenant_name()
admin_user, admin_password = get_authserver_credentials()
authserver_ip = get_authserver_ip()
prov_args = "--host_name %s --host_ip %s --api_server_ip %s --oper del " \
"--admin_user %s --admin_password %s --admin_tenant_name %s\
--openstack_ip %s" \
%(tor_agent_name, compute_control_ip, cfgm_ip, admin_user,
admin_password, admin_tenant_name, authserver_ip)
pr_args = "--device_name %s --vendor_name %s --api_server_ip %s\
--oper del --admin_user %s --admin_password %s\
--admin_tenant_name %s --openstack_ip %s"\
%(tor_name, tor_vendor_name, cfgm_ip, admin_user, admin_password,
admin_tenant_name, authserver_ip)
with settings(host_string=env.roledefs['cfgm'][0], password=cfgm_passwd):
sudo("python /opt/contrail/utils/provision_physical_device.py %s" %(pr_args))
sudo("python /opt/contrail/utils/provision_vrouter.py %s" %(prov_args))
# Stop tor-agent process
tor_process_name = 'contrail-tor-agent-' + str(tor_id)
cmd = 'service ' + tor_process_name + ' stop'
sudo(cmd)
# Remove tor-agent config file
tor_file_name = '/etc/contrail/' + tor_process_name + '.conf'
if exists(tor_file_name, use_sudo=True):
remove_file(tor_file_name)
# Remove tor-agent INI file used by supervisord
tor_ini_file_name = '/etc/contrail/supervisord_vrouter_files/' + tor_process_name + '.ini'
if exists(tor_ini_file_name, use_sudo=True):
remove_file(tor_ini_file_name)
# Remove tor-agent init file
tor_init_file = '/etc/init.d/' + tor_process_name
if exists(tor_init_file, use_sudo=True):
remove_file(tor_init_file)
# If SSL files generated for tor-agent exists, remove them
cert_file = "/etc/contrail/ssl/certs/tor." + str(tor_id) + ".cert.pem"
privkey_file = "/etc/contrail/ssl/private/tor." + str(tor_id) + ".privkey.pem"
if exists(cert_file, use_sudo=True):
remove_file(cert_file)
if exists(privkey_file, use_sudo=True):
remove_file(privkey_file)
if exists('/etc/contrail/ssl/certs/cacert.pem', use_sudo=True):
remove_file('/etc/contrail/ssl/certs/cacert.pem')
if restart:
sudo("supervisorctl -c /etc/contrail/supervisord_vrouter.conf update")
=> Steps to Reproduce
1. Delete tor-agent with delete_tor_agent_by_index.
2. Check the status of tor-agent in Monitor pane of Virtual Routers on Web GUI. |
|