Sending amphora heartbeats fails when the LB reaches a certain size

Bug #2025262 reported by Gabriel Hartmann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
octavia
New
Undecided
Unassigned

Bug Description

The amphora-agent seems not to be able to send heartbeat messages larger than the max size of a UDP datagram.

We noticed this on a very large LB with 106 listeners, 106 pools and 26 members per pool.
Thats 2862 resources in total.

After adding some debugging output in octavia.amphorae.backends.health_daemon.health_sender, a "[Errno 90] Message too long" socket error could be observed.
In that case the payload size of a UDP heartbeat was 66250 bytes.

It seems that we are limited to a length of 65507 bytes with UDP and IPv4.

As a result of that the octavia health manager is constantly triggering failovers on the amphoras of the LB.
On a new amphora, the first few heartbeats succeed until the haproxy config is being deployed.
After that no heartbeats go through.

This was observed on amphoras with Ubuntu 20.04.4 LTS and amphora-agent 9.0.2.dev34.

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

Thanks for reporting this issue.

We had a RFE on storyboard that mitigates this packet size issue:
[RFE] Optionally send amphora heart beat messages via TCP https://storyboard.openstack.org/#!/story/2010216

A commit was proposed:
852269: Health sender: support to send via TCP | https://review.opendev.org/c/openstack/octavia/+/852269

it got one CR+2 and needs more reviews

Revision history for this message
Gabriel Hartmann (gagaha) wrote :

Thank you for pointing me to the RFE.

We deployed the fix and I can confirm that it works and fixes the issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.