Use sqlalchemy baked query

Bug #1786226 reported by venkata anil
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Wishlist
venkata anil

Bug Description

I am running rally scenario test create_and_list_ports on a 3 controller setup(each controller have 8 CPUs i.e 4 cores*2 HTs) with (function call trace enabled on neutron server processes) a concurrency of 8 for 400 iterations.

Average time taken for create port is 7.207 seconds(when 400 ports are created) and the function call trace for this run is at http://paste.openstack.org/show/727718/ and rally results are
+---------------------------------------------------------------------------------------------------------------------------+
| Response Times (sec) |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| Action | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| neutron.create_network | 2.085 | 2.491 | 3.01 | 3.29 | 7.558 | 2.611 | 100.0% | 400 |
| neutron.create_port | 5.69 | 6.878 | 7.755 | 9.394 | 17.0 | 7.207 | 100.0% | 400 |
| neutron.list_ports | 0.72 | 5.552 | 9.123 | 9.599 | 11.165 | 5.559 | 100.0% | 400 |
| total | 10.085 | 15.263 | 18.789 | 19.734 | 28.712 | 15.377 | 100.0% | 400 |
| -> duration | 10.085 | 15.263 | 18.789 | 19.734 | 28.712 | 15.377 | 100.0% | 400 |
| -> idle_duration | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 100.0% | 400 |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+

Michael Bayer (zzzeek) has analysed this callgraph and had some suggestions. One suggestion is to use baked query i.e https://review.openstack.org/#/c/430973/2

This is his analysis - "But looking at the profile I see here, it is clear that the vast majority of time is spent doing lots and lots of small queries, and all of the mechanics involved with turning them into SQL strings and invoking them. SQLAlchemy has a very effective optimization for this but it must be coded into Neutron.

Here is the total time spent for Query to convert its state into SQL:

148029/356073 15.232 0.000 4583.820 0.013 /usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py:3372(Query._compile_context)

that's 4583 seconds spent in Query compilation, which if Neutron were modified to use baked queries, would be vastly reduced. I demonstrated the beginning of this work in 2017 here: https://review.openstack.org/#/c/430973/1 , which illustrates how to first start to create a base query method in neutron that other functions can begin to make use of. As more queries start using the baked form, this 4500 seconds number will begin to drop."

I have restored his patch https://review.openstack.org/#/c/430973/2 , with this the average time taken to create port is 5.196 seconds (when 400 ports are created), and the function call trace for this run is at http://paste.openstack.org/show/727719/ also total time spent on Query compilation (Query._compile_context) is only 1675 seconds.

83696/169062 7.308 0.000 1675.140 0.010 /usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py:3372(Query._compile_context)

Rally results for this run are

+---------------------------------------------------------------------------------------------------------------------------+
| Response Times (sec) |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| Action | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| neutron.create_network | 2.03 | 2.435 | 2.872 | 3.42 | 8.301 | 2.631 | 100.0% | 400 |
| neutron.create_port | 4.196 | 4.996 | 5.664 | 6.5 | 12.303 | 5.196 | 100.0% | 400 |
| neutron.list_ports | 0.592 | 5.432 | 8.97 | 9.475 | 10.1 | 5.373 | 100.0% | 400 |
| total | 7.165 | 13.265 | 16.544 | 17.296 | 25.041 | 13.201 | 100.0% | 400 |
| -> duration | 7.165 | 13.265 | 16.544 | 17.296 | 25.041 | 13.201 | 100.0% | 400 |
| -> idle_duration | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 100.0% | 400 |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+

In https://review.openstack.org/#/c/430973/2 we are using baked query for only get_by_id i.e
https://review.openstack.org/#/c/430973/2/neutron/db/_model_query.py@225

While creating port, plugin will call get_network() which internally calls get_by_id.

The cumulative time taken for 4800 ncalls of get_network is 3249 seconds

4800 0.325 0.000 3249.537 0.677 /usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py:928(Ml2Plugin.get_network)

but when get_network is using baked query, this has come down to 1075 seconds.

4800 0.321 0.000 1075.695 0.224 /usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py:928(Ml2Plugin.get_network)

If we enhance any other neutron DB methods to use baked query, then we can improve the neutron performance further.

Revision history for this message
venkata anil (anil-venkata) wrote :
Revision history for this message
venkata anil (anil-venkata) wrote :
Changed in neutron:
assignee: nobody → venkata anil (anil-venkata)
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I compared rally results from Your patch and some other neutron patch (https://review.openstack.org/#/c/589885/). Looks for me that improvement is really impressive :)

Changed in neutron:
status: New → Confirmed
importance: Undecided → Wishlist
Revision history for this message
Daniel Alvarez (dalvarezs) wrote :

Awesome!!!!!!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.