DHCP agent logging is causing performance issues

Bug #1557640 reported by Stephen Eilert
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Stephen Eilert
Kilo
New
Undecided
Unassigned

Bug Description

When a new port gets added or removed, the DHCP agent is currently dumping the entirety of the hosts file in the logs. This is a problem in some deployments we have encountered, some of which with upwards of 10000 ports.

Change-Id: I3ad7864eeb2f959549ed356a1e34fa18804395cc removed logging calls from inside the loop. It also added two new log calls: one at the beginning of the loop that builds the hosts file in memory and one at the end, that also includes the full content. Those could be useful pieces of debugging information. However, adding the host file contents can get expensive when there are large numbers of ports.

For large numbers of ports, it can take more than one hour for the agent to sync up. The log output contributes a significant amount of that time.

This was introduced in Liberty. One reviewer at the time had already expressed concerns about the impact of the log call.

Will propose a patch to keep the log message, but remove the file contents.

summary: - DHCP agent logging is too verbose
+ DHCP agent logging is causing performance issues
description: updated
Changed in neutron:
assignee: nobody → Stephen Eilert (outworlder)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/293144

Changed in neutron:
status: New → In Progress
tags: added: l3-ipam-dhcp
tags: added: loadimpact logging
Changed in neutron:
importance: Undecided → Low
Gary Kotton (garyk)
Changed in neutron:
importance: Low → High
tags: added: mitaka-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/293144
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ed7411fa1c4d6a26cea3a8737b3c29ce6ff64363
Submitter: Jenkins
Branch: master

commit ed7411fa1c4d6a26cea3a8737b3c29ce6ff64363
Author: Stephen Eilert <email address hidden>
Date: Tue Mar 15 13:54:23 2016 -0700

    Removes host file contents from DHCP agent logs

    The change I3ad7864eeb2f959549ed356a1e34fa18804395cc addressed a
    performance impact by removing logging calls from inside the loop that
    creates the hosts file in memory. However, it also introduced the
    dumping of said hosts file to the logs.

    On deployments with a very high number of ports(>5K), this increases the
    load of the node substantially. The problem is worse when a high number
    of port updates is received in a short amount of time.

    There is also the administrative burden caused by the sheer amount of
    data being logged.

    If the content of the host file is required for debugging purposes, it
    can be found at /opt/stack/data/neutron/dhcp/<net-id>/host

    This patch is trivial and only removes the logging of the file contents.
    The 'building'/'done building' logging calls may still provide useful
    information.

    Change-Id: Ie176ef123c194c22dca0967a6acfb9a48c959e6c
    Closes-Bug: 1557640

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/295932

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/296352

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/296506

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/295932
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d774e70d6bd511ab52e387493f1fb18ae4975565
Submitter: Jenkins
Branch: stable/liberty

commit d774e70d6bd511ab52e387493f1fb18ae4975565
Author: Stephen Eilert <email address hidden>
Date: Tue Mar 15 13:54:23 2016 -0700

    Removes host file contents from DHCP agent logs

    The change I3ad7864eeb2f959549ed356a1e34fa18804395cc addressed a
    performance impact by removing logging calls from inside the loop that
    creates the hosts file in memory. However, it also introduced the
    dumping of said hosts file to the logs.

    On deployments with a very high number of ports(>5K), this increases the
    load of the node substantially. The problem is worse when a high number
    of port updates is received in a short amount of time.

    There is also the administrative burden caused by the sheer amount of
    data being logged.

    If the content of the host file is required for debugging purposes, it
    can be found at /opt/stack/data/neutron/dhcp/<net-id>/host

    This patch is trivial and only removes the logging of the file contents.
    The 'building'/'done building' logging calls may still provide useful
    information.

    Conflicts:
     neutron/agent/linux/dhcp.py

    Change-Id: Ie176ef123c194c22dca0967a6acfb9a48c959e6c
    Depends-On: I4041478ca09bd124827782774b8520908ef07be0
    Closes-Bug: 1557640
    (cherry picked from commit ed7411fa1c4d6a26cea3a8737b3c29ce6ff64363)

tags: added: in-stable-liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/mitaka)

Reviewed: https://review.openstack.org/296352
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bd3e9c3b1e759d4296cdb9a816a63ef773ea5f63
Submitter: Jenkins
Branch: stable/mitaka

commit bd3e9c3b1e759d4296cdb9a816a63ef773ea5f63
Author: Stephen Eilert <email address hidden>
Date: Tue Mar 15 13:54:23 2016 -0700

    Removes host file contents from DHCP agent logs

    The change I3ad7864eeb2f959549ed356a1e34fa18804395cc addressed a
    performance impact by removing logging calls from inside the loop that
    creates the hosts file in memory. However, it also introduced the
    dumping of said hosts file to the logs.

    On deployments with a very high number of ports(>5K), this increases the
    load of the node substantially. The problem is worse when a high number
    of port updates is received in a short amount of time.

    There is also the administrative burden caused by the sheer amount of
    data being logged.

    If the content of the host file is required for debugging purposes, it
    can be found at /opt/stack/data/neutron/dhcp/<net-id>/host

    This patch is trivial and only removes the logging of the file contents.
    The 'building'/'done building' logging calls may still provide useful
    information.

    Change-Id: Ie176ef123c194c22dca0967a6acfb9a48c959e6c
    Closes-Bug: 1557640
    (cherry picked from commit ed7411fa1c4d6a26cea3a8737b3c29ce6ff64363)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/296506
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=910737690546e8dcc1b9bd0330ece4099e933541
Submitter: Jenkins
Branch: stable/kilo

commit 910737690546e8dcc1b9bd0330ece4099e933541
Author: Stephen Eilert <email address hidden>
Date: Tue Mar 15 13:54:23 2016 -0700

    Removes host file contents from DHCP agent logs

    The change I3ad7864eeb2f959549ed356a1e34fa18804395cc addressed a
    performance impact by removing logging calls from inside the loop that
    creates the hosts file in memory. However, it also introduced the
    dumping of said hosts file to the logs.

    On deployments with a very high number of ports(>5K), this increases the
    load of the node substantially. The problem is worse when a high number
    of port updates is received in a short amount of time.

    There is also the administrative burden caused by the sheer amount of
    data being logged.

    If the content of the host file is required for debugging purposes, it
    can be found at /opt/stack/data/neutron/dhcp/<net-id>/host

    This patch is trivial and only removes the logging of the file contents.
    The 'building'/'done building' logging calls may still provide useful
    information.

    Conflicts:
     neutron/agent/linux/dhcp.py

    Change-Id: Ie176ef123c194c22dca0967a6acfb9a48c959e6c
    Depends-On: I4041478ca09bd124827782774b8520908ef07be0
    Closes-Bug: 1557640
    (cherry picked from commit ed7411fa1c4d6a26cea3a8737b3c29ce6ff64363)

tags: added: in-stable-kilo
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 8.1.0

This issue was fixed in the openstack/neutron 8.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/314250

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 2015.1.4

This issue was fixed in the openstack/neutron 2015.1.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)
Download full text (36.9 KiB)

Reviewed: https://review.openstack.org/314250
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3bf73801df169de40d365e6240e045266392ca63
Submitter: Jenkins
Branch: master

commit a323769143001d67fd1b3b4ba294e59accd09e0e
Author: Ryan Moats <email address hidden>
Date: Tue Oct 20 15:51:37 2015 +0000

    Revert "Improve performance of ensure_namespace"

    This reverts commit 81823e86328e62850a89aef9f0b609bfc0a6dacd.

    Unneeded optimization: this commit only improves execution
    time on the order of milliseconds, which is less than 1% of
    the total router update execution time at the network node.

    This also

    Closes-bug: #1574881

    Change-Id: Icbcdf4725ba7d2e743bb6761c9799ae436bd953b

commit 7fcf0253246832300f13b0aa4cea397215700572
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Apr 21 07:05:16 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I9e930750dde85a9beb0b6f85eeea8a0962d3e020

commit 643b4431606421b09d05eb0ccde130adbf88df64
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Apr 19 06:52:48 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I52d7460b3265b5460b9089e1cc58624640dc7230

commit 1ffea42ccdc14b7a6162c1895bd8f2aae48d5dae
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Apr 18 15:03:30 2016 +0000

    Updated from global requirements

    Change-Id: Icb27945b3f222af1d9ab2b62bf2169d82b6ae26c

commit b970ed5bdac60c0fa227f2fddaa9b842ba4f51a7
Author: Kevin Benton <email address hidden>
Date: Fri Apr 8 17:52:14 2016 -0700

    Clear DVR MAC on last agent deletion from host

    Once all agents are deleted from a host, the DVR MAC generated
    for that host should be deleted as well to prevent a buildup of
    pointless flows generated in the OVS agent for hosts that don't
    exist.

    Closes-Bug: #1568206
    Change-Id: I51e736aa0431980a595ecf810f148ca62d990d20
    (cherry picked from commit 92527c2de2afaf4862fddc101143e4d02858924d)

commit eee9e58ed258a48c69effef121f55fdaa5b68bd6
Author: Mike Bayer <email address hidden>
Date: Tue Feb 9 13:10:57 2016 -0500

    Add an option for WSGI pool size

    Neutron currently hardcodes the number of
    greenlets used to process requests in a process to 1000.
    As detailed in
    http://lists.openstack.org/pipermail/openstack-dev/2015-December/082717.html

    this can cause requests to wait within one process
    for available database connection while other processes
    remain available.

    By adding a wsgi_default_pool_size option functionally
    identical to that of Nova, we can lower the number of
    greenlets per process to be more in line with a typical
    max database connection pool size.

    DocImpact: a previously unused configuration value
               wsgi_default_pool_size is now used to a...

tags: added: neutron-proactive-backport-potential
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 7.1.0

This issue was fixed in the openstack/neutron 7.1.0 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 9.0.0.0b1

This issue was fixed in the openstack/neutron 9.0.0.0b1 development milestone.

tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 2015.1.4

This issue was fixed in the openstack/neutron 2015.1.4 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.