StackLight

Logs and notifications are dropped during a "long" Elasticsearch outage

Bug #1566748 reported by Simon Pasquier on 2016-04-06

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StackLight	Fix Released	High	Swann Croiset	StackLight 0.10.0

Bug Description

The current buffering policy for the Heka output plugins is 'drop'. So when the Elasticsearch server is down for a relatively long time, the Elasticsearch output plugin can fill the local queue (the limit is 1G) and it will start to drop the collected logs and notifications.

Tags:

Simon Pasquier (simon-pasquier) on 2016-04-25

Changed in lma-toolchain:
milestone:	1.0.0 → 0.10.0

OpenStack Infra (hudson-openstack) on 2016-05-04

Changed in lma-toolchain:
assignee:	LMA-Toolchain Fuel Plugins (mos-lma-toolchain) → Swann Croiset (swann-w)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-04: Fix merged to fuel-plugin-lma-collector (master)

Reviewed: https://review.openstack.org/300447
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-collector/commit/?id=ebac150f8a0f3bb6e13c6759ad7c4ddaf2fad226
Submitter: Jenkins
Branch: master

commit ebac150f8a0f3bb6e13c6759ad7c4ddaf2fad226
Author: Swann Croiset <email address hidden>
Date: Sun Mar 27 22:46:52 2016 +0200

Separate the (L)og of the LMA collector

    This change separates the processing of the logs/notifications and
    metric/alerting into 2 dedicated hekad processes, these services are
    named 'log_collector' and 'metric_collector'.

Both services are managed by Pacemaker on controller nodes and by Upstart on
other nodes.

    All metrics computed by log_collector (HTTP response times and creation time
    for instances and volumes) are sent directly to the metric_collector via TCP.
    Elasticsearch output (log_collector) uses full_action='block' and the
    TCP output uses full_action='drop'.

All outputs of metric_collector (InfluxDB, HTTP and TCP) use
full_action='drop'.

    The buffer size configurations are:
    * metric_collector:
      - influxdb-output buffer size is increased to 1Gb.
      - aggregator-output (tcp) buffer size is decreased to 256Mb (vs 1Gb).
      - nagios outputs (x3) buffer size are decreased to 1Mb.
    * log_collector:
      - elasticsearch-output buffer size is decreased to 256Mb (vs 1Gb).
      - tcp-output buffer size is set to 256Mb.

Implements: blueprint separate-lma-collector-pipelines
Fixes-bug: #1566748

Change-Id: Ieadb93b89f81e944e21cf8e5a65f4d683fd0ffb8

Changed in lma-toolchain:
status:	In Progress → Fix Committed

Simon Pasquier (simon-pasquier) on 2016-07-26

Changed in lma-toolchain:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.