get too much dataframes will break socket

Bug #1607592 reported by Jeremy Liu
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloudkitty
Confirmed
Undecided
Unassigned

Bug Description

When I check to "Reporting" page, it keeps hanging. After some time, an error occurs with "504 Gateway Time-out".
While cloudkitty-api catches logs as below:

172.16.6.121 - - [28/Jul/2016 22:46:59] "GET /v1/storage/dataframes?tenant_id=8211ef43d7e74f32b43542e1e701afc5&begin=2016-07-01T00%3A00%3A00&end=2016-07-31T23%3A59%3A59 HTTP/1.1" 200 58201348
Traceback (most recent call last):
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 86, in run
    self.finish_response()
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 128, in finish_response
    self.write(data)
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 212, in write
    self.send_headers()
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 270, in send_headers
    self.send_preamble()
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 191, in send_preamble
    self._write('HTTP/%s %s\r\n' % (self.http_version,self.status))
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 391, in _write
    self.stdout.write(data)
  File "/usr/lib64/python2.7/socket.py", line 324, in write
    self.flush()
  File "/usr/lib64/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 104] Connection reset by peer
172.16.6.121 - - [28/Jul/2016 22:46:59] "GET /v1/storage/dataframes?tenant_id=8211ef43d7e74f32b43542e1e701afc5&begin=2016-07-01T00%3A00%3A00&end=2016-07-31T23%3A59%3A59 HTTP/1.1" 500 59
----------------------------------------
Exception happened during processing of request from ('172.16.6.121', 49895)
Traceback (most recent call last):
  File "/usr/lib64/python2.7/SocketServer.py", line 295, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib64/python2.7/SocketServer.py", line 321, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib64/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib64/python2.7/SocketServer.py", line 651, in __init__
    self.finish()
  File "/usr/lib64/python2.7/SocketServer.py", line 710, in finish
    self.wfile.close()
  File "/usr/lib64/python2.7/socket.py", line 279, in close
    self.flush()
  File "/usr/lib64/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe

If I switch to another tenant and click the "Reporting" page, everything goes well.

version info: stable/mitaka

Revision history for this message
Stéphane Albert (sheeprine) wrote :

Hi,

I guess the first block is from the API and the second from Horizon.
Can you try to do the same request using the CLI client?
It might be apache serving Horizon that is explicitly killing the request, as it thinks it timed out.

Thanks.

Changed in cloudkitty:
status: New → Incomplete
Revision history for this message
Jeremy Liu (liujiong) wrote :

Well, I tried with the CLI client and got the same error.
cloudkitty storage-dataframe-list --begin "2016-07-01T00:00" --end "2016-07-31T23:59:59" --tenant 8211ef43d7e74f32b43542e1e701afc5
It returned with "Gateway Timeout (HTTP 504)".
And cloudkitty-api logged as below:

172.16.6.121 - - [29/Jul/2016 05:19:36] "GET /v1/storage/dataframes?tenant_id=8211ef43d7e74f32b43542e1e701afc5&begin=2016-07-01T00%3A00&end=2016-07-31T23%3A59%3A59 HTTP/1.1" 200 59413588
Traceback (most recent call last):
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 86, in run
    self.finish_response()
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 128, in finish_response
    self.write(data)
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 212, in write
    self.send_headers()
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 270, in send_headers
    self.send_preamble()
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 191, in send_preamble
    self._write('HTTP/%s %s\r\n' % (self.http_version,self.status))
  File "/usr/lib64/python2.7/wsgiref/handlers.py", line 391, in _write
    self.stdout.write(data)
  File "/usr/lib64/python2.7/socket.py", line 324, in write
    self.flush()
  File "/usr/lib64/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 104] Connection reset by peer
172.16.6.121 - - [29/Jul/2016 05:19:36] "GET /v1/storage/dataframes?tenant_id=8211ef43d7e74f32b43542e1e701afc5&begin=2016-07-01T00%3A00&end=2016-07-31T23%3A59%3A59 HTTP/1.1" 500 59
----------------------------------------
Exception happened during processing of request from ('172.16.6.121', 50422)
Traceback (most recent call last):
  File "/usr/lib64/python2.7/SocketServer.py", line 295, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib64/python2.7/SocketServer.py", line 321, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib64/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib64/python2.7/SocketServer.py", line 651, in __init__
    self.finish()
  File "/usr/lib64/python2.7/SocketServer.py", line 710, in finish
    self.wfile.close()
  File "/usr/lib64/python2.7/socket.py", line 279, in close
    self.flush()
  File "/usr/lib64/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe

Revision history for this message
Jeremy Liu (liujiong) wrote :

I can get the dataframes properly of another tenant using CLI client.

Revision history for this message
Stéphane Albert (sheeprine) wrote :

This seems to be related to wsgiref that triggers a timeout. You can either transition from wsgiref to using a proper wsgi server (apache + mod_wsgi).
It might be fixed in wsgiref but it's not supposed to be used in production. And that's what the TC decided.

Revision history for this message
Jeremy Liu (liujiong) wrote :

OK, thanks.

Revision history for this message
Stéphane Albert (sheeprine) wrote :

serve_forever() seems to ignore the timeout attribute of the server object. Plus it's actually set to None which means no timeout. It seems the issue is a a really low level, like syscalls or network stack. I'll try to have a look later. But the proper fix is to use a proper WSGI server.

Jeremy Liu (liujiong)
Changed in cloudkitty:
status: Incomplete → Confirmed
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Hello,

We have the same problem, as described below. Cloudkitty version - 6.0.0-0ubuntu0ol3, installed and configured via documentation: https://docs.openstack.org/cloudkitty/pike/configuration/index.html

However, when we request /v1/storage/dataframes - we receive Gateway Timeout (HTTP 504). We configured API as an apache wsgi, as was proposed previously.

So, the code below cause hanging:

>>> from cloudkittyclient import client as ck_client
>>> begin = "2018-01-01T00:00:00"
>>> end = "2018-01-15T00:00:00"
>>> client = ck_client.Client('1', 'http://controller:8889', token=token, insecure=True)
>>> data = client.storage.dataframes.list(begin=begin, end=end, tenant_id=tenant)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/cloudkittyclient/apiclient/base.py", line 358, in list
    self.collection_key)
  File "/usr/local/lib/python2.7/dist-packages/cloudkittyclient/apiclient/base.py", line 131, in _list
    body = self.client.get(url).json()
  File "/usr/local/lib/python2.7/dist-packages/cloudkittyclient/apiclient/client.py", line 359, in get
    return self.client_request("GET", url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/cloudkittyclient/apiclient/client.py", line 349, in client_request
    self, method, url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/cloudkittyclient/apiclient/client.py", line 265, in client_request
    method, self.concat_url(endpoint, url), **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/cloudkittyclient/apiclient/client.py", line 205, in request
    raise exceptions.from_response(resp, method, url)
cloudkittyclient.apiclient.exceptions.GatewayTimeout: Gateway Timeout (HTTP 504)
>>>

We've increased apache time limits several time and data collection between 2018-01-01 and 2018-01-15 it takes 3m14s, which is too slow.
And this happens, when we have only 4 VMs within the tenant. So we're actually able only to collect data via api (CURL) only for the last several days.

Probably you have some advices how to speed up this process or what may be the cause of it? As this results in non-working reporting horizon plugin, while rating works.

Revision history for this message
zhangguoqing (474751729-o) wrote :

Hi Dmitriy R. You can try change the backend of storage to gnocchi, if you have installed it for metering.

[storage]
backend = sqlalchemy --> gnocchi or gnocchihybrid

[storage_gnocchi]
auth_section = ks_auth

FYI. https://docs.openstack.org/cloudkitty/latest/configuration/configuration.html#for-keystone-identity-api-v3

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Hi,

Yep, exactly this has been set in my cloudkitty configuration:

[storage]

#
# From cloudkitty.common.config
#

# Name of the storage backend driver. (string value)
backend = gnocchihybrid

[storage_gnocchi]

#
# From cloudkitty.common.config
#

# Gnocchi storage archive policy definition. (string value)
auth_section = ks_auth

Revision history for this message
bhujay kumar (bhatta) wrote :

Any solution for the problem , I have the same issue as Dimitriy R . When I change the storage as gnocchi , the error changes to :

ERROR wsme.api [req-220e7a71-a521-42a7-87bf-f75356751bfe 4a3c972a86564a4784bf4f0ba77b4b49 373b270dc 9a649d0b36671bdeec1e7e9 default - -] Server-side error: "list index out of range". Detail:
Traceback (most recent call last):

  File "/openstack/venvs/cloudkitty-16.0.4/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction
    result = f(self, *args, **kwargs)

  File "/openstack/venvs/cloudkitty-16.0.4/lib/python2.7/site-packages/cloudkitty/api/v1/controllers/storage.py", line 66, in ge t_all
    res_type=resource_type)

  File "/openstack/venvs/cloudkitty-16.0.4/lib/python2.7/site-packages/cloudkitty/storage/gnocchi/__init__.py", line 417, in get _time_frame
    measure=measure))

  File "/openstack/venvs/cloudkitty-16.0.4/lib/python2.7/site-packages/cloudkitty/storage/gnocchi/__init__.py", line 353, in _to _cloudkitty
    resource_data = resource_data[0]

IndexError: list index out of range
: IndexError: list index out of range

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Agree, gnocchi backend simply doesn't work

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.