Creating an octavia loadbalancer gets into ERROR state unless failedover manually

Bug #2037300 reported by Diko Parvanov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Octavia Charm
Triaged
High
Unassigned
octavia
In Progress
Medium
Unassigned

Bug Description

Running 1.26/stable k8s + openstack-integrator on jammy/yoga openstack cloud - all load balancers created by the k8s API silently fail in Octavia in ERROR mode without any amphorae getting built - after manually running openstack loadbalancer failover UUID the amphorae are built and the load balancer gets provisioned.

Revision history for this message
Michael Johnson (johnsom) wrote :

Can you provide the log snippet from the Octavia controller worker process that set the ERROR state?

Revision history for this message
Diko Parvanov (dparv) wrote (last edit ):

Seems this is the error traceback:

2023-09-26 05:04:35.315 2682774 ERROR oslo_messaging.rpc.server novaclient.exceptions.Forbidden: Quota exceeded, too many server groups. (HTTP 403) (Request-ID: req-5fa48cb3-0591-4afc-a8cc-bd8e12092ed9)

and

2023-09-26 05:04:35.315 2682774 ERROR oslo_messaging.rpc.server octavia.common.exceptions.ServerGroupObjectCreateException: Failed to create server group object.

Full: https://pastebin.ubuntu.com/p/QgRJYJ3xH3/

The security groups limit quota for the project is set to 200, the number of security groups for that project are 26.

openstack quota show -c secgroups XXXX
+-----------+-------+
| Field | Value |
+-----------+-------+
| secgroups | 200 |
+-----------+-------+

openstack security group list --project XXXX -c ID -f value | wc -l
26

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

Hi, Please check the quota for server-groups (and maybe also server-group-members) in the project that controls the Octavia resources (most of the deployment tools use the "admin" tenant).

When using Active-Standby topology, you need one server-group for each load balancer.

Revision history for this message
Diko Parvanov (dparv) wrote :

Indeed - the issue was with server-groups, fixed with:

openstack quota set <octavia_services_project> --server-groups -1

no longer affects: charm-kubernetes-master
no longer affects: charm-kubernetes-worker
Changed in octavia:
status: New → Invalid
Revision history for this message
Diko Parvanov (dparv) wrote :

I still don't understand why openstack loadbalancer failover overrides this thou, it seems to be inconsistent behavior.

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

hmm, right, this is weird, I'll check that in the code.

what was the previous value in the quota?

BTW I think that defining a default quota for Octavia should be handled by the deployment tool, maybe they will have to fix that.

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote (last edit ):

It might be a bug in Octavia,

The 2 amphora VMs are created with the same server-group to enforce the nova anti-affinity policy.

But if the creation of the server-group fails, the amphorae are not created, the LB goes into ERROR and the load_balancer.server_group_id in the DB is NULL.

During a failover, Octavia recreates the amphora VMs based on this load_balancer.server_group_id value, which is empty, so anti-affinity policy is not applied correctly (but VMs are created successfully)

Changed in octavia:
importance: Undecided → Medium
status: Invalid → Confirmed
Revision history for this message
Diko Parvanov (dparv) wrote :

The charm should handle the quotas, so adding that project as well here. Deployment is using channel yoga/stable revision 115. It had the default quota of 10.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

In theory, the octavia charm already sets the quotas to unlimited, in set_service_quotas_unlimited():

def set_service_quotas_unlimited(identity_service):
    """Set services project quotas to unlimited.

    :param identity_service: reactive Endpoint of type ``identity-service``
    :type identity_service: RelationBase class
    :returns: None
    :rtype: None
    :raises: api_crud.APIUnavailable
    """
    try:
        _ul = -1
        session = session_from_identity_service(identity_service)
        nova = get_nova_client(session)
        nova.quotas.update(
            identity_service.service_tenant_id(),
            cores=_ul, ram=_ul, instances=_ul)
        nc = init_neutron_client(session)
        nc.update_quota(
            identity_service.service_tenant_id(),
            body={
                "quota": {
                    "port": _ul, "security_group": _ul,
                    "security_group_rule": _ul, "network": _ul, "subnet": _ul,
                    "floatingip": _ul, "router": _ul, "rbac_policy": _ul}})
    ...

This is run from the 'configure-resources' action on the charm. Please could you verify that the 'configure-resources' action was run (in theory it must have been as otherwise octavia wouldn't be set up to work), but it's always worth checking!

It may be that there's a bug in what is having quotas set on.

Thanks.

Changed in charm-octavia:
status: New → Incomplete
Revision history for this message
Diko Parvanov (dparv) wrote :

The fce log shows it was successfully run - maybe there is an issue here with octavia API and yoga? We have several other jammy/yoga clouds that I just checked and all of them have server-groups in the services project set to 10.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Thanks @dparv; this needs to be investigated further; as a work around the action from comment #4 should be used, I guess, for the moment.

Changed in charm-octavia:
status: Incomplete → Triaged
importance: Undecided → High
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/octavia/+/898212

Changed in octavia:
status: Confirmed → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.