Websocket machines list: performance degradation

Bug #2020760 reported by Anton Troyanov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Status tracked in 3.5
3.4
Won't Fix
High
Unassigned
3.5
Fix Committed
High
Unassigned

Bug Description

One of the performance metrics `Execution Time - test_perf_list_machines_Websocket_endpoint` showed significant performance degradation.

It could be that it happened after applying fix [1]

[1] https://code.launchpad.net/~troyanov/maas/+git/maas/+merge/442021

Changed in maas:
milestone: none → 3.4.0
status: New → Triaged
importance: Undecided → High
Revision history for this message
Anton Troyanov (troyanov) wrote (last edit ):

I don't think that added subquery introduced such a dramatic penalty (x3 as seen in performance tests) on its own.

Here is the [query](https://pastebin.ubuntu.com/p/Ny4VNtrfW4/) which is a simplified version of what Django [generates](https://pastebin.ubuntu.com/p/xdRYHwMyTR/) with removed WHERE filtering.

Original (with WHERE and retrieving all the columns):
`Execution Time: 1873.566 ms`

Simplified (without WHERE and most of the columns):
`Execution Time: 1567.970 ms`

Interesting part, if I completely remove the qubquery:
`Execution Time: 1458.419 ms`

When I started to analyze execution plan, there was one line that I think explains why this is slow `rows=1062792`:
```
 GroupAggregate (cost=904.26..31523.40 rows=505 width=16) (actual time=4.274..289.491 rows=505 loops=1)
   Group Key: maasserver_node.id
   -> Merge Left Join (cost=904.26..26425.12 rows=1018645 width=16) (actual time=4.024..213.045 rows=1062792 loops=1)
```

This is the cost for being able to filter on everything using LEFT JOIN, which gives an enormous data merge.
And then we do a GROUP BY over those entries, to group it back to number of machines (thats what makes this query expensive).

Also the fix was backported to 3.3 which is already released, but performance tests do not show any changes there.

Changed in maas:
milestone: 3.4.0 → 3.5.0
no longer affects: maas/3.5
no longer affects: maas/3.4
Changed in maas:
milestone: none → 3.5.0
Revision history for this message
Jacopo Rota (r00ta) wrote :

I think we can close this one as in 3.5 the machine.list action performance improved by 50% using sqlalchemy.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.