Our contrail-api node experiences a problem when trying to marshall a sandesh message into serialized form.
We run daily builds, so this build we currently run is from 2016-02-02 01:47 CET (some 15 hours old by now).
See this log / debug excerpt:
Feb 02 16:22:49 contrail-dev-config1.srv.lab.cloud.ipnett.se contrail-api[14463]: 02/02/2016 04:22:49 PM [TXMLProtocol]: TXML Protocol: Invalid u32 value 0.02490234375
Feb 02 16:22:49 contrail-dev-config1.srv.lab.cloud.ipnett.se contrail-api[14463]: ERROR:TXMLProtocol:TXML Protocol: Invalid u32 value 0.02490234375
Feb 02 16:22:49 contrail-dev-config1.srv.lab.cloud.ipnett.se contrail-api[14463]: ERROR:contrail-api:Failed to send sandesh
Feb 02 16:22:49 contrail-dev-config1.srv.lab.cloud.ipnett.se contrail-api[14463]: ERROR:contrail-api:PFORMAT: ModuleCpuStateTrace(_context='', _scope='', _category='', _send_queue_enabled=True, _seqnum=1, _versionsig=4054510910, _s
ource='contrail-dev-config1.srv.lab.cloud.ipnett.se', _instance_id='0', _client=None, _type=6, _hints=1, _http_server=None, _logger=None, _more=False, _node_type='Config', data=ModuleCpuState(module_cpu_info=[ModuleCpuInfo(instance
_id='0', module_id='contrail-api', cpu_info=CpuLoadInfo(sys_mem_info=SysMemInfo(total=1884124L, buffers=24956L, free=0.02490234375, used=1404096L), meminfo=MemInfo(virt=677196, peakvirt=677196, res=71532), cpu_share=10.0, num_cpu=1
, cpuload=CpuLoadAvg(fifteen_min_avg=0.08, five_min_avg=0.12, one_min_avg=0.14)))], name='contrail-dev-config1.srv.lab.cloud.ipnett.se', deleted=None, config_node_ip=['10.19.30.116'], _table='ObjectConfigNode', build_info='{"build-
info" : [{"build-version" : "3.0", "build-time" : "2016-02-02 02:22:41.454097", "build-user" : "jenkins-remote", "build-hostname" : "jenkinsx86-64-test-rpmcentos7.srv.lab.cloud.ipnett.se", "build-git-ver" : "6caa96c", "build-id" :
"3.0-1602020147.el7", "build-number" : "@ipnett"}]}'), _module='contrail-api', _level=2147483647, _timestamp=1454426569455028, _client_context='', _connect_to_collector=True, _role=0)
Feb 02 16:22:49 contrail-dev-config1.srv.lab.cloud.ipnett.se contrail-api[14463]: ERROR:contrail-api:sandesh type: : ModuleCpuStateTrace
We traced the error itself to https://github.com/Juniper/contrail-sandesh/blob/master/library/python/pysandesh/protocol/TXMLProtocol.py#L255
It happens at https://github.com/Juniper/contrail-sandesh/blob/master/library/python/pysandesh/sandesh_session.py#L177
So, as far as we underestand, this ModuleCpueStateTrace , held in /usr/lib/python2.7/site-packages/cfgm_common/uve/cfgm_cpuinfo/ttypes.py , distributed by the package "python-contrail", is actually generated:
anticimex@galileo ~/code/cloud-services/opencontrail $ grep -r ttypes.py .
./contrail-sandesh/compiler/generate/t_py_generator.cc: string f_types_name = package_dir_+"/"+"ttypes.py";
The error can be said to be that "someone put a float where there must be an int", but shouldn't the marshaller code actually be able to detect whether it's float or int, since python allows you to do that detection, and use the appropriate format when marshalling?
I.e.
>>> type(5).__name__
'int'
>>> type(0.2).__name__
'float'
>>>
Anyway, looking forward to finding a solution. The bug is severe, it appears this prevents nodes from reporting in that they are alive in some sense for contrail; though discovery pub/sub is working unaffected.
More information, and patch submitted.
The problem is that numeric array references to python-psutil's phymem_usage() are used at /github. com/Juniper/ contrail- controller/ blob/8cc4abe1d8 422ce515673a79b c614b27c4fb088e /src/config/ common/ vnc_cpu_ info.py# L132
https:/
We have currently python- psutil- 2.2.1-el7 installed, and it looks like this: phymem_ usage() 1929342976L, available= 1342205952L, percent=30.4, used=1854337024L, free=75005952L, active=851570688, inactive=844365824, buffers=85241856L, cached=1181958144) usage[2] /1024
>>> import psutil
>>> phymem_usage = psutil.
>>> phymem_usage
svmem(total=
>>> phymem_
0.0296875
Now, this is slightly tricky since psutil is an external library.
First of all, one can note that since commit https:/ /github. com/giampaolo/ psutil/ commit/ 13689e21aae3cef 779b54a57bc985c ab36fdaa9e which was between psutil 0.5.1 and 0.6.0, phymem_usage() has been deprecated in favour of the virtual_memory() call, which provides the data as named tuples, i.e.: virtual_ memory( )
>>> vmem = psutil.
>>> dir(vmem)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '_asdict', '_fields', '_make', '_replace', 'active', 'available', 'buffers', 'cached', 'count', 'free', 'inactive', 'index', 'percent', 'total', 'used']
>>> vmem.available
1341956096L
>>> vmem.free
109649920L
Three files use this psutil SysMemInfo method in contrail- controller: cloud-services/ external/ opencontrail/ contrail- controller [bug] $ grep -lr SysMemInfo --include='*.py' . agent/uve/ cpuinfo. py cpuinfo. py common/ vnc_cpu_ info.py
anticimex@galileo ~/code/
./src/vnsw/
./src/opserver/
./src/config/
The proposed fix is two-fold:
1) impose minimum version requirement of python-psutil to 0.6.0 around the board,
2) use the new and less imprecise python-psutil method virtual_memory(), which has been stable since version 0.6.0, released 2012-07-24.
A proposed fix for contrail-packages comes in a separate bugreport.