Wrong haproxy configuration in HA setup

Bug #1916498 reported by Hybrid512
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Octavia Charm
Triaged
Critical
Unassigned

Bug Description

Hi,

I'm still facing many issues with Ocatavia and trying to pinpoint the root causes, I found something weird in the haproxy configuration maintained by hacluster.

Here is my /etc/haproxy/haproxy.conf from one of my Octavia nodes :

----------------------------------------------------------------------
global
    log 127.0.0.1 local0
    log 127.0.0.1 local1 notice
    maxconn 20000
    user haproxy
    group haproxy
    spread-checks 0

defaults
    log global
    mode tcp
    option tcplog
    option dontlognull
    retries 3
    timeout queue 9000
    timeout connect 9000
    timeout client 90000
    timeout server 90000

listen stats
    bind 127.0.0.1:8888
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /
    stats auth admin:6pmtHODVsjK8VqLC5GTgRwDFFtjVNzfu

frontend tcp-in_octavia-api_admin
    bind *:9876
    bind :::9876
    acl net_192.168.203.162 dst 192.168.203.162/255.255.255.0
    use_backend octavia-api_admin_192.168.203.162 if net_192.168.203.162
    acl net_192.168.211.24 dst 192.168.211.24/255.255.255.0
    use_backend octavia-api_admin_192.168.211.24 if net_192.168.211.24
    default_backend octavia-api_admin_192.168.203.162

backend octavia-api_admin_192.168.203.162
    balance leastconn
    server octavia-0 192.168.203.162:9866 check
    server octavia-1 192.168.211.25:9866 check
    server octavia-2 192.168.211.26:9866 check

backend octavia-api_admin_192.168.211.24
    balance leastconn
    server octavia-0 192.168.211.24:9866 check
    server octavia-1 192.168.211.25:9866 check
    server octavia-2 192.168.211.26:9866 check
----------------------------------------------------------------------

As you can see, there are 2 backends configured and if I compare with other HA services such as nova-cloud-controller, neutron-api, cinder, ... there should be only one.

In this configuration, 192.168.203.0/24 is my "PXE, non routed network" and 182.168.211.0/24 is my "internal" network.
I must say that this deployment is made with Juju/MaaS.

I did some curl on the different <IP>:9866 endpoints and they don't even respond the same way.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Please could you indicate ubuntu version, openstack version and charms versions that this is happening with. Thanks.

Changed in charm-octavia:
status: New → Incomplete
Revision history for this message
Hybrid512 (walid-moghrabi) wrote :

Sure, sorry to have forgotten that ...

* Ubuntu version : 20.04 (Focal)
* Openstack version : Ussuri ("distro" based on Focal)
* Charms version : 20.01 (and more specifically, octavia-32, hacluster-74)
* MaaS version : 2.9.2/snap^
* Juju version : 2.8.8

Regards.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

So, that backend for 192.168.203.162 looks very strange. i.e. mixed addresses from two backends (or it should probably just be one). This is either a bug (that part of the code has been tweaked recently) or a configuration problem. Let's rule out the latter.

Do you have the bundle (including spaces) for the configuration of the octavia charm / with its hacluster units. The charm seems to be picking up the 192.168.203.162 network as something that it is listening on in addition to 192.168.211.24.

Thanks.

Revision history for this message
Hybrid512 (walid-moghrabi) wrote :
Download full text (3.6 KiB)

I couldn't answer by now because I destroyed my cluster, I'm redeploying a new one, I'll get your information then.

I would definitely go for a bug TBH because this is the only unit I have doing this in the hacluster configuration.
All the others are configured the same way with hacluster (3 nodes, only 1 space and those 2 subnets) and they all are showing only 1 backend with the 3 nodes on the right space (the "internal" subnet).

Here is a snippet of my bundle, "ost-int" is my "internal" space (subnet 192.168.211.0/24), 192.168.203.0/24 corresponds to my MaaS "PXE" space (so the subnet dedicated to provisioning machines with DHCP/PXE).
Just to note : Octavia is deployed on 3 bare metal and not in LXD containers in order to workaround this issue : https://bugs.launchpad.net/charm-layer-ovn/+bug/1896630

-------------------------------------------------------------------------------
applications:
  barbican:
    charm: cs:barbican
    num_units: 3
    to:
    - lxd:7
    - lxd:8
    - lxd:9
    options:
      openstack-origin: distro
      vip: 192.168.211.230
      worker-multiplier: 0.25
    bindings:
      "": ost-int

  barbican-hacluster:
    charm: cs:hacluster
    options:
      cluster_count: 3
    bindings:
      "": ost-int

  barbican-mysql-router:
    charm: cs:mysql-router
    bindings:
      "": ost-int

  barbican-vault:
    charm: cs:barbican-vault
    bindings:
      "": ost-int

  octavia:
    charm: cs:octavia
    num_units: 3
    to:
    - 7
    - 8
    - 9
    options:
      openstack-origin: distro
      vip: 192.168.211.231
      worker-multiplier: 0.25
    bindings:
      "": ost-int

  octavia-dashboard:
    charm: cs:octavia-dashboard
    bindings:
      "": ost-int

  octavia-diskimage-retrofit:
    charm: cs:octavia-diskimage-retrofit
    options:
      amp-image-tag: octavia-amphora
    bindings:
      "": ost-int

  octavia-hacluster:
    charm: cs:hacluster
    options:
      cluster_count: 3
    bindings:
      "": ost-int

  octavia-mysql-router:
    charm: cs:mysql-router
    bindings:
      "": ost-int

  octavia-ovn-chassis:
    charm: cs:ovn-chassis
    bindings:
      "": ost-int

relations:
## HA
- - barbican:ha
  - barbican-hacluster:ha
- - octavia:ha
  - octavia-hacluster:ha

## Barbican
- - barbican-mysql-router:db-router
  - mysql-innodb-cluster:db-router
- - barbican-mysql-router:shared-db
  - barbican:shared-db
- - barbican:certificates
  - vault:certificates
- - keystone:identity-service
  - barbican:identity-service
- - rabbitmq-server:amqp
  - barbican:amqp
- - barbican-vault:secrets
  - barbican:secrets
- - vault:secrets
  - barbican-vault:secrets-storage

## Octavia
- - octavia-mysql-router:db-router
  - mysql-innodb-cluster:db-router
- - octavia-mysql-router:shared-db
  - octavia:shared-db
- - octavia:certificates
  - vault:certificates
- - keystone:identity-service
  - octavia:identity-service
- - rabbitmq-server:amqp
  - octavia:amqp
- - neutron-api:neutron-load-balancer
  - octavia:neutron-api
- - octavia-ovn-chassis:ovsdb-subordinate
  - octavia:ovsdb-subordinate
- - octavia-ovn-chassis:certificates
  - vault:certificates
- - octavia-ovn-chassis:ovsdb
  - ovn-central:ovsdb
- - octavia:ov...

Read more...

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Thanks for reporting Hybrid512, can you give us an example of what is not working for you with Octavia, that you think is caused by this presumably wrong haproxy configuration?

Changed in charm-octavia:
status: Incomplete → New
Changed in charm-octavia:
status: New → Incomplete
Revision history for this message
Hybrid512 (walid-moghrabi) wrote :
Download full text (5.0 KiB)

Hi Aurelien,

First, the API is very slow to answer, in Horizon, I constantly get error messages that pops up saying it can't find a loadbalancer even though I have some created.
It takes a long time for every actions such as retrieving the nova AZ or subnets (but most of the time, it finishes to get them anyway).
As for assigning a FIP to a LB, it says it faisl but in fact, it does it but sometuimes I have to do it more than once ...
Well, it misbehave a lot and it is really slow.
Before you ask, I don't have any high load or low memory on any on my nodes, might be Octavia or not.

Curling HAProxy backend endpoints on the wrong subnet doesn't give me the right answer (which is normal since this subnet is not listened on by the apache configuration it targets) but it still gives an HTTP 200 code since it is the default apache vhost that answers and thus, is fools haproxy since this backend is okayish with a 200 HTTP code but it doesn't provide what is expected by the API.

But basically, as you see, there is something wrong with the haproxy configuration.
I deployed a fresh cluster yesterday, here is what I have on some other services and Octavia.
They are all deployed in HA mode with 3 nodes and a VIP with the hacluster subordinate :

octavia/0:
=========
--------------------------------------------------------------------------------
frontend tcp-in_octavia-api_admin
    bind *:9876
    bind :::9876
    acl net_192.168.203.119 dst 192.168.203.119/255.255.255.0
    use_backend octavia-api_admin_192.168.203.119 if net_192.168.203.119
    acl net_192.168.211.26 dst 192.168.211.26/255.255.255.0
    use_backend octavia-api_admin_192.168.211.26 if net_192.168.211.26
    default_backend octavia-api_admin_192.168.203.119

backend octavia-api_admin_192.168.203.119
    balance leastconn
    server octavia-0 192.168.203.119:9866 check
    server octavia-1 192.168.211.27:9866 check
    server octavia-2 192.168.211.28:9866 check

backend octavia-api_admin_192.168.211.26
    balance leastconn
    server octavia-0 192.168.211.26:9866 check
    server octavia-1 192.168.211.27:9866 check
    server octavia-2 192.168.211.28:9866 check
--------------------------------------------------------------------------------

octavia/1:
=========
--------------------------------------------------------------------------------
frontend tcp-in_octavia-api_admin
    bind *:9876
    bind :::9876
    acl net_192.168.203.120 dst 192.168.203.120/255.255.255.0
    use_backend octavia-api_admin_192.168.203.120 if net_192.168.203.120
    acl net_192.168.211.27 dst 192.168.211.27/255.255.255.0
    use_backend octavia-api_admin_192.168.211.27 if net_192.168.211.27
    default_backend octavia-api_admin_192.168.203.120

backend octavia-api_admin_192.168.203.120
    balance leastconn
    server octavia-1 192.168.203.120:9866 check
    server octavia-0 192.168.211.26:9866 check
    server octavia-2 192.168.211.28:9866 check

backend octavia-api_admin_192.168.211.27
    balance leastconn
    server octavia-1 192.168.211.27:9866 check
    server octavia-0 192.168.211.26:9866 check
    server octavia-2 192.168.211.28:9866 check
-------------------------------------------------...

Read more...

Changed in charm-octavia:
status: Incomplete → New
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Setting to critical as I think the haproxy config is wrong for the charm. We need to see if it is reproduced with the current -next charms, and if not, backport to the stable charms. It's probably due to the changes in charm-helpers and the picking of local-address changes that were introduced around the changes at https://github.com/juju/charm-helpers/pull/561 -- i.e. this may have already been solved.

Changed in charm-octavia:
importance: Undecided → Critical
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.