python-oops-tools

monitoring oops rates is hard

Bug #1018574 reported by Robert Collins on 2012-06-27

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	python-oops-tools	Triaged	High	Unassigned

Bug Description

We'd like to be able to alert on an oops spike (ideally per source (U1/ISD/LP etc) happening, as that would let us find out before everything comes crashing down around our ears: e.g. recent DOS attacks, and poor APIs would have been picked up.

One way to do this would be to have a stream of oops metadata (e.g. project, time, samples since last minute) that can be consumed by e.g. esper or custom code. This might be amqp based, or stdout. Polling is possible but scales poorly. We probably don't want the spike analysis code analysing the full size of each oops, so a separate network of consumers is likely sensible.

E.g.: implementation sketch.

amqp2disk -> stream of oops metadata messages -> aggregator which tracks smoothed rate over time and alerts on spikes.

Tags:

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.