MAAS slow performance + growing database

Bug #1830365 reported by Peter Sabaini
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Fix Committed

Bug Description

On a long running installation I'm seeing performance problems when commissioning nodes.

The installation has 130+ nodes and is running MAAS 2.3.5 (6511-gf466fdb-0ubuntu1) under xenial.

When commissioning 4 nodes the web interface will become unresponsive (taking a node overview page to render 15s+), and a "maas <profile> node read xxx" taking up to 9s instead of 2s for the idle case.

One thing I'm noticing is a lot of "maas@maasdb ERROR: could not serialize access due to concurrent update" lines in postgres log.

Looking at the database seems that some tables in postgres have become quite big, esp. the maasserver_event table plus index:

maasdb=# SELECT nspname || '.' || relname AS "relation",
maasdb-# pg_size_pretty(pg_relation_size(C.oid)) AS "size"
maasdb-# FROM pg_class C
maasdb-# LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
maasdb-# WHERE nspname NOT IN ('pg_catalog', 'information_schema')
maasdb-# ORDER BY pg_relation_size(C.oid) DESC
maasdb-# LIMIT 20;
                       relation | size
 public.maasserver_event | 2472 MB
 public.maasserver_event_node_id_3f03c875fc2d72eb_idx | 726 MB
 public.maasserver_event_94757cae | 487 MB
 public.maasserver_event_c693ebc8 | 484 MB
 public.maasserver_event__created | 386 MB
 public.maasserver_event_pkey | 385 MB

maasdb=# SELECT *, pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS INDEX
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS TABLE
  FROM (
  SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
      SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_indexes_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r' AND relname like 'maasserver_event%'
  ) a
) a;
  oid | table_schema | table_name | row_estimate | total_bytes | index_bytes | toast_bytes | table_bytes | total | index | toast | table
 19125 | public | maasserver_event | 1.64169e+07 | 5180358656 | 2587885568 | 8192 | 2592464896 | 4940 MB | 2468 MB | 8192 bytes | 2472 MB
 19136 | public | maasserver_eventtype | 42 | 114688 | 65536 | 8192 | 40960 | 112 kB | 64 kB | 8192 bytes | 40 kB

From inspecting that table it seems that this table keeps all events since initial installation. Also, that table defines quite a few btree indexes, which could also impact performance on updates/inserts.

Do you think this could explain the sluggishness when commissioning? Should those tables be trimmed as part of a regular maintenance?

Related branches

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

I'm uploading logs here (sorry Canonical only):

Also the above queries with slightly less horrible formatting:

Lee Trager (ltrager)
Changed in maas:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Lee Trager (ltrager)
milestone: none → 2.3.6
Revision history for this message
Lee Trager (ltrager) wrote :

I've backported a number of performance improvements from master to 2.3 which should significantly reduce memory usage over the websocket. You can test[1] which has all patches applied.


Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
assignee: Lee Trager (ltrager) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.