Controller looses connection to elasticserch/kibana

Bug #1488717 reported by Ryan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel Plugins
Fix Released
High
LMA-Toolchain Fuel Plugins
StackLight
Fix Released
High
Simon Pasquier
0.8
Fix Released
Undecided
Unassigned

Bug Description

When deploying cluster with elasticsearch/kibana node - notice after about 24-36 hours the primary controller just falls off the Kibana Dashboard and no messages from the node are revieved at all, then it only shows the last 2 controllers.

elasticsearch/kibana node has 32GB of memory, using 13GB for JVM or roughly 40% of total system memory for JVM

Primary Controller
#/var/log/lma_collector.log

2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full

Primary Controller has 40GB free available memory,

Thanks, Ryan

Tags: lma
Changed in fuel-plugins:
assignee: nobody → LMA-Toolchain Fuel Plugins (mos-lma-toolchain)
milestone: none → 6.1
Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

Hi Ryan, thanks for the report. Which version of LMA are you testing with? 0.7 or master?

Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

In particular, I'd like to know the Heka version (hekad -version) since it might be a bug in Heka 0.10.

Revision history for this message
Ryan (ryanohagan75) wrote : RE: [Bug 1488717] Re: Controller looses connection to elasticserch/kibana
Download full text (3.3 KiB)

Hey Simon, this is from master branch as of yesterday, yes the newest heka 0.1.0

> Date: Wed, 26 Aug 2015 13:54:16 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 1488717] Re: Controller looses connection to elasticserch/kibana
>
> In particular, I'd like to know the Heka version (hekad -version) since
> it might be a bug in Heka 0.10.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1488717
>
> Title:
> Controller looses connection to elasticserch/kibana
>
> Status in Fuel Plugins:
> New
>
> Bug description:
> When deploying cluster with elasticsearch/kibana node - notice after
> about 24-36 hours the primary controller just falls off the Kibana
> Dashboard and no messages from the node are revieved at all, then it
> only shows the last 2 controllers.
>
>
> elasticsearch/kibana node has 32GB of memory, using 13GB for JVM or roughly 40% of total system memory for JVM
>
> Primary Controller
> #/var/log/lma_collector.log
>
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
...

Read more...

Revision history for this message
Ryan (ryanohagan75) wrote :
Download full text (3.3 KiB)

Master as of 8/25/15

> Date: Wed, 26 Aug 2015 09:13:10 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 1488717] Re: Controller looses connection to elasticserch/kibana
>
> Hi Ryan, thanks for the report. Which version of LMA are you testing
> with? 0.7 or master?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1488717
>
> Title:
> Controller looses connection to elasticserch/kibana
>
> Status in Fuel Plugins:
> New
>
> Bug description:
> When deploying cluster with elasticsearch/kibana node - notice after
> about 24-36 hours the primary controller just falls off the Kibana
> Dashboard and no messages from the node are revieved at all, then it
> only shows the last 2 controllers.
>
>
> elasticsearch/kibana node has 32GB of memory, using 13GB for JVM or roughly 40% of total system memory for JVM
>
> Primary Controller
> #/var/log/lma_collector.log
>
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
>
> Primary Controller has 40GB free available memory,
>
> Thanks, R...

Read more...

Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

Thanks! Looks like we're facing a bug in the latest version of Heka [1]. Someone else reported a similar problem to us yesterday, I'll investigate...

[1] https://github.com/mozilla-services/heka/issues/1630

Changed in fuel-plugins:
milestone: 6.1 → 7.0
status: New → Triaged
importance: Undecided → High
Revision history for this message
Ryan (ryanohagan75) wrote :
Download full text (3.6 KiB)

Thanks Simon!

> Date: Wed, 26 Aug 2015 15:13:29 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 1488717] Re: Controller looses connection to elasticserch/kibana
>
> Thanks! Looks like we're facing a bug in the latest version of Heka [1].
> Someone else reported a similar problem to us yesterday, I'll
> investigate...
>
> [1] https://github.com/mozilla-services/heka/issues/1630
>
> ** Changed in: fuel-plugins
> Milestone: 6.1 => 7.0
>
> ** Changed in: fuel-plugins
> Status: New => Triaged
>
> ** Changed in: fuel-plugins
> Importance: Undecided => High
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1488717
>
> Title:
> Controller looses connection to elasticserch/kibana
>
> Status in Fuel Plugins:
> Triaged
>
> Bug description:
> When deploying cluster with elasticsearch/kibana node - notice after
> about 24-36 hours the primary controller just falls off the Kibana
> Dashboard and no messages from the node are revieved at all, then it
> only shows the last 2 controllers.
>
>
> elasticsearch/kibana node has 32GB of memory, using 13GB for JVM or roughly 40% of total system memory for JVM
>
> Primary Controller
> #/var/log/lma_collector.log
>
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't delive...

Read more...

Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

This is due to a bug [1] in Heka affecting the management of the output queue. The issue is fixed in the 0.10 branch but we'll have to wait for the next beta release of 0.10 (or the official 0.10 release). IIUC it should be a matter of days.

[1] https://github.com/mozilla-services/heka/issues/1627

Revision history for this message
Ryan (ryanohagan75) wrote :
Download full text (3.5 KiB)

Thanks Simon! Great work on this!

> Date: Fri, 28 Aug 2015 08:56:26 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 1488717] Re: Controller looses connection to elasticserch/kibana
>
> This is due to a bug [1] in Heka affecting the management of the output
> queue. The issue is fixed in the 0.10 branch but we'll have to wait for
> the next beta release of 0.10 (or the official 0.10 release). IIUC it
> should be a matter of days.
>
> [1] https://github.com/mozilla-services/heka/issues/1627
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1488717
>
> Title:
> Controller looses connection to elasticserch/kibana
>
> Status in Fuel Plugins:
> Triaged
>
> Bug description:
> When deploying cluster with elasticsearch/kibana node - notice after
> about 24-36 hours the primary controller just falls off the Kibana
> Dashboard and no messages from the node are revieved at all, then it
> only shows the last 2 controllers.
>
>
> elasticsearch/kibana node has 32GB of memory, using 13GB for JVM or roughly 40% of total system memory for JVM
>
> Primary Controller
> #/var/log/lma_collector.log
>
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:47 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_output' error: can't deliver matched message: Queue is full
> 2015/08/25 23:27:48 Plugin 'elasticsearch_out...

Read more...

Revision history for this message
Ryan (ryanohagan75) wrote :

Hey Simon, don't know if this helps,

Re-Deployed Cluster/Master as of September 17th, after the first controller stopped reporting to to the ES_KIB Node. Went into the;

#nano /etc/lma_collector/output-elasticsearch.toml

and changed the;

max_buffer_size = -1000000000

to

max_buffer_size = -1

and then restarted the lma_collector,

crm resource restart lma_collector

Then all the sudden the controller started reporting to ES/Kibana, thanks ryan

Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

@Ryan, sorry for the lag, we're busy polishing the next release of LMA... I thought that the Heka team would release another 0.10 version but it's not there yet. So I've submitted a fix to disable disk buffering for now.
This isn't ideal since it means that data may be lost in case of network outage but at least it should fix the 'queue is full' error after 1G of data has been processed.

Changed in fuel-plugins:
status: Triaged → In Progress
Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

A partial fix has been implemented that disables disk buffering for now. See https://github.com/stackforge/fuel-plugin-lma-collector/commit/474b255d329b604fc3636b7a07cda339a58fe7b2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-plugin-lma-collector (master)

Fix proposed to branch: master
Review: https://review.openstack.org/250410

Changed in lma-toolchain:
assignee: nobody → Simon Pasquier (simon-pasquier)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/250411

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-plugin-lma-collector (master)

Reviewed: https://review.openstack.org/250410
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=c5f97a203b5e5c517987be2c6aecc567f2f95656
Submitter: Jenkins
Branch: master

commit c5f97a203b5e5c517987be2c6aecc567f2f95656
Author: Simon Pasquier <email address hidden>
Date: Thu Nov 26 15:03:03 2015 +0100

    Update Heka to 0.10.0b2

    This version will allow us to enable the buffering for the output
    plugins and deal properly with RabbitMQ connection drops.

    Change-Id: I087236ecc7756d005a98cd11d3e5efe8cbdc00cb
    Closes-Bug: #1503251
    Partial-Bug: #1488717

Changed in lma-toolchain:
importance: Undecided → High
Changed in lma-toolchain:
milestone: none → 0.9.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/250411
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=8766ea76ecc39011199adf21683d39657e796028
Submitter: Jenkins
Branch: master

commit 8766ea76ecc39011199adf21683d39657e796028
Author: Simon Pasquier <email address hidden>
Date: Thu Nov 26 15:27:22 2015 +0100

    Enable buffering for Elasticsearch and TCP outputs

    Change-Id: If3ca7b35e808d802b84e834f70d7fb37e36a84be
    Closes-Bug: #1488717

Changed in lma-toolchain:
status: In Progress → Fix Committed
Changed in fuel-plugins:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-plugin-lma-collector (stable/0.8)

Fix proposed to branch: stable/0.8
Review: https://review.openstack.org/258465

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/0.8
Review: https://review.openstack.org/258467

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-plugin-lma-collector (stable/0.8)

Reviewed: https://review.openstack.org/258465
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=7206aa0e4b67afb13c22748268fd51a0ca58e139
Submitter: Jenkins
Branch: stable/0.8

commit 7206aa0e4b67afb13c22748268fd51a0ca58e139
Author: Simon Pasquier <email address hidden>
Date: Thu Nov 26 15:03:03 2015 +0100

    Update Heka to 0.10.0b2

    This version will allow us to enable the buffering for the output
    plugins and deal properly with RabbitMQ connection drops.

    Change-Id: I087236ecc7756d005a98cd11d3e5efe8cbdc00cb
    Closes-Bug: #1503251
    Partial-Bug: #1488717
    (cherry picked from commit c5f97a203b5e5c517987be2c6aecc567f2f95656)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/258467
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=9222061063f4f2c51ea0ddbba10fcff514eaddaa
Submitter: Jenkins
Branch: stable/0.8

commit 9222061063f4f2c51ea0ddbba10fcff514eaddaa
Author: Simon Pasquier <email address hidden>
Date: Thu Nov 26 15:27:22 2015 +0100

    Enable buffering for Elasticsearch and TCP outputs

    Change-Id: If3ca7b35e808d802b84e834f70d7fb37e36a84be
    Closes-Bug: #1488717
    (cherry picked from commit 8766ea76ecc39011199adf21683d39657e796028)

Changed in fuel-plugins:
status: Fix Committed → Fix Released
Changed in lma-toolchain:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.