Designate

Bind backend rndc commands aren't limited

Bug #1896783 reported by Michael Chapman on 2020-09-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Designate	Fix Committed	High	Michael Chapman

Bug Description

In a local test environment I misconfigured the pools.yaml to point at bind servers that don't exist, then ran tempest. The result was about 1500 rndc processes that don't appear to ever disappear, causing significant load on the system.
The environment is deployed using tripleo, so the processes are all in the designate-worker container:

[root@controller-1 ~]# podman stats designate_worker

ID NAME CPU % MEM USAGE / LIMIT MEM % NET IO BLOCK IO PIDS
18103f368b4e designate_worker 1.73% 19.13GB / 33.56GB 57.02% -- / -- 4.882GB / 8.536MB 2076

Steps to reproduce:

1. Deploy devstack with the bind backend.
2. Edit /etc/designate/pools.yaml so rndc_host doesn't point at a bind server
3. Update the pools: designate-manage pool update --file /etc/designate/pools.yaml
4. Run tempest: tox -e all-plugin -- designate

This is despite designate worker only having 2 configured workers:
/etc/designate.conf:
[service:worker]
workers = 2

It might make sense to maintain a task queue for the rndc commands so that only a limited number can be active at any given time.
rndc doesn't have a timeout option that I can see in the man page, so it might makes sense to add one via oslo processutils. I think the built in timeout is about 30 seconds.

See original description

Michael Chapman (michaeltchapman) on 2020-09-23

Changed in designate:
assignee:	nobody → Michael Chapman (michaeltchapman)

Michael Chapman (michaeltchapman) on 2020-09-24

summary:	- Bind backend rndc commands have no timeout + Bind backend rndc commands aren't limited
description:	updated

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-11-04: Related fix proposed to designate (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/761274

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-11-17: Related fix merged to designate (master)

Reviewed: https://review.opendev.org/761274
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=10f19870c4bc503f414d5beef92c3939d91764d9
Submitter: Zuul
Branch: master

commit 10f19870c4bc503f414d5beef92c3939d91764d9
Author: Michael Chapman <email address hidden>
Date: Wed Nov 4 15:24:43 2020 +1100

Add timeout to rndc commands

    In the event of a backend BIND server being unreachable for any reason,
    rndc commands will persist for a very long time and can consume
    significant resources. This can be seen when running devstack with
    a pool configured to point at a bind server that doesn't exist - the
    rndc process count can climb into the thousands.

An optional timeout has been added to rndc to alleviate this.

Change-Id: Idd61e79715b21fdd3249136cf68a7b9d3069c3f9
Related-Bug: 1896783

Michael Johnson (johnsom) on 2021-03-19

Changed in designate:
status:	New → Fix Committed
importance:	Undecided → High

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.