Documentation update: calculation for a disk space and rotation period for influxDB/LMA installation.

Bug #1544797 reported by Stanislav Kolenkin
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
StackLight
Confirmed
Medium
Patrick Petit

Bug Description

Our LMA documentation (http://plugins.mirantis.com/docs/i/n/influxdb_grafana/influxdb_grafana-0.8-0.8.0-1.pdf) does not contain enough information about the Requirements for Influxdb.

Please include the following information to the respective documentation:
1. Amount of metrics we could have for each node (controller, compute, storage node, …)
2. Create a table for general configurations with a calculation requirement for disk space and a recommendation for configuring shards and a retention policy for each configuration

The Size on the disk depends primarily on the size of your field values. "Measurement name, tag keys, tag values, and field keys" are stored one time, not for every point. Field values, by necessity, are stored independently for each point. We don't have a wide range of tests with the new BZ1 compression, but in our single field 1 billion point test, the billion points take up about 28GB on disk (roughly 30 bytes per point).

For your 30 day retention of 15MM points per minute, that works out to about 650 billion points, which gives us a very rough 18TB for your full raw data every 30 days. That's quite a bit, and we don't have a good feel for terabyte scale datasets yet. Can you do some downsampling and early expiry of the raw points? InfluxDB makes it easy to configure retention policies and downsampling continuous queries so that your data automatically downsamples and expires when appropriate.

The write volume is roughly 250,000 points per second, which is feasible on a well-tuned box with 0.9.3. Clustering should help with throughput as well, assuming the data isn't fully replicated to every node in the cluster. Clustering is still beta but should mature by the end of September.

The size per column will be roughly (24 bytes + size_of_column) where size_of_column is 8 bytes for doubles, 8 bytes for int64, 1 byte for bool values and the average size of a string for string columns. Hope you find this useful.

The file system should have 20-30% more space in order to prevent performance degradation.

Also please configure puppet manifests for configuration shards and retetion policy in accordance of disk space on influxdb node.

tags: added: customer-found support
Swann Croiset (swann-w)
Changed in lma-toolchain:
assignee: nobody → LMA-Toolchain Fuel Plugins (mos-lma-toolchain)
tags: added: doc perf
Swann Croiset (swann-w)
Changed in lma-toolchain:
importance: Undecided → Medium
status: New → Confirmed
Changed in lma-toolchain:
milestone: none → 0.10.0
assignee: LMA-Toolchain Fuel Plugins (mos-lma-toolchain) → Patrick Petit (patrick-michel-petit)
Changed in lma-toolchain:
milestone: 0.10.0 → 0.11.0
Changed in lma-toolchain:
milestone: 0.11.0 → 1.0.0
Changed in lma-toolchain:
milestone: 1.0.0 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.