Logs and notifications are dropped during a "long" Elasticsearch outage
Bug #1566748 reported by
Simon Pasquier
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StackLight |
Fix Released
|
High
|
Swann Croiset |
Bug Description
The current buffering policy for the Heka output plugins is 'drop'. So when the Elasticsearch server is down for a relatively long time, the Elasticsearch output plugin can fill the local queue (the limit is 1G) and it will start to drop the collected logs and notifications.
Changed in lma-toolchain: | |
milestone: | 1.0.0 → 0.10.0 |
Changed in lma-toolchain: | |
assignee: | LMA-Toolchain Fuel Plugins (mos-lma-toolchain) → Swann Croiset (swann-w) |
status: | Confirmed → In Progress |
Changed in lma-toolchain: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
Reviewed: https:/ /review. openstack. org/300447 /git.openstack. org/cgit/ openstack/ fuel-plugin- lma-collector/ commit/ ?id=ebac150f8a0 f3bb6e13c6759ad 7c4ddaf2fad226
Committed: https:/
Submitter: Jenkins
Branch: master
commit ebac150f8a0f3bb 6e13c6759ad7c4d daf2fad226
Author: Swann Croiset <email address hidden>
Date: Sun Mar 27 22:46:52 2016 +0200
Separate the (L)og of the LMA collector
This change separates the processing of the logs/notifications and
metric/alerting into 2 dedicated hekad processes, these services are
named 'log_collector' and 'metric_collector'.
Both services are managed by Pacemaker on controller nodes and by Upstart on
other nodes.
All metrics computed by log_collector (HTTP response times and creation time
for instances and volumes) are sent directly to the metric_collector via TCP.
Elasticsearch output (log_collector) uses full_action='block' and the
TCP output uses full_action='drop'.
All outputs of metric_collector (InfluxDB, HTTP and TCP) use action= 'drop'.
full_
The buffer size configurations are: output buffer size is decreased to 256Mb (vs 1Gb).
* metric_collector:
- influxdb-output buffer size is increased to 1Gb.
- aggregator-output (tcp) buffer size is decreased to 256Mb (vs 1Gb).
- nagios outputs (x3) buffer size are decreased to 1Mb.
* log_collector:
- elasticsearch-
- tcp-output buffer size is set to 256Mb.
Implements: blueprint separate- lma-collector- pipelines
Fixes-bug: #1566748
Change-Id: Ieadb93b89f81e9 44e21cf8e5a65f4 d683fd0ffb8