On a long running installation I'm seeing performance problems when commissioning nodes.

The installation has 130+ nodes and is running MAAS 2.3.5 (6511-gf466fdb-0ubuntu1) under xenial.

When commissioning 4 nodes the web interface will become unresponsive (taking a node overview page to render 15s+), and a "maas <profile> node read xxx" taking up to 9s instead of 2s for the idle case.

One thing I'm noticing is a lot of "maas@maasdb ERROR: could not serialize access due to concurrent update" lines in postgres log.

Looking at the database seems that some tables in postgres have become quite big, esp. the maasserver_event table plus index:

maasdb=# SELECT nspname || '.' || relname AS "relation",
maasdb-# pg_size_pretty(pg_relation_size(C.oid)) AS "size"
maasdb-# FROM pg_class C
maasdb-# LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
maasdb-# WHERE nspname NOT IN ('pg_catalog', 'information_schema')
maasdb-# ORDER BY pg_relation_size(C.oid) DESC
maasdb-# LIMIT 20;
                       relation | size
 public.maasserver_event | 2472 MB
 public.maasserver_event_node_id_3f03c875fc2d72eb_idx | 726 MB
 public.maasserver_event_94757cae | 487 MB
 public.maasserver_event_c693ebc8 | 484 MB
 public.maasserver_event__created | 386 MB
 public.maasserver_event_pkey | 385 MB

maasdb=# SELECT *, pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS INDEX
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS TABLE
  FROM (
  SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
      SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_indexes_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r' AND relname like 'maasserver_event%'
  ) a
) a;
  oid | table_schema | table_name | row_estimate | total_bytes | index_bytes | toast_bytes | table_bytes | total | index | toast | table
 19125 | public | maasserver_event | 1.64169e+07 | 5180358656 | 2587885568 | 8192 | 2592464896 | 4940 MB | 2468 MB | 8192 bytes | 2472 MB
 19136 | public | maasserver_eventtype | 42 | 114688 | 65536 | 8192 | 40960 | 112 kB | 64 kB | 8192 bytes | 40 kB

From inspecting that table it seems that this table keeps all events since initial installation. Also, that table defines quite a few btree indexes, which could also impact performance on updates/inserts.

Do you think this could explain the sluggishness when commissioning? Should those tables be trimmed as part of a regular maintenance?

Peter Sabaini (peter-sabaini) wrote :

I'm uploading logs here (sorry Canonical only):

Also the above queries with slightly less horrible formatting:

Lee Trager (ltrager) wrote :

I've backported a number of performance improvements from master to 2.3 which should significantly reduce memory usage over the websocket. You can test[1] which has all patches applied.


