named.conf.options.inside.maas reverts to default
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
High
|
Björn Tillenius | ||
2.8 |
Fix Released
|
High
|
Björn Tillenius |
Bug Description
Default /var/snap/
-----
dnssec-validation auto;
allow-query { any; };
allow-recursion { trusted; };
allow-query-cache { trusted; };
-----
After changing the upstream_dns config setting to our upstream DNS, and turning off dnssec_validation (our upstream DNS is a bit broken in that regard), it changes to this:
-----
forwarders {
x.x.x.x;
};
dnssec-validation no;
allow-query { any; };
allow-recursion { trusted; };
allow-query-cache { trusted; };
-----
At this point, DNS works fine (in this case, we have a Juju-deployed OpenStack, and all machines/containers can resolve).
You wait... time passes. Services start timing out, and you discover DNS no longer works in your machines/
Looking at /var/snap/
A quick fix is possible by changing the config (we set dnssec_validation to "yes" and then back to "no") which regenerates the named.conf.
We have observed this on:
2.8.1-8567-
2.7.1-8261-
We haven't seen this behaviour on our pre-snap system (2.4.2-
In case it's relevant, on our 2.8.1/2.7.1 systems we're running dual region/rack controllers for redundancy, so we're also using an external postgres. Our 2.4.2 is a single region/rack controller.
I'm hoping this is reproducable elsewhere. Downloading logs from the affected systems is difficult, and I don't currently have access to them. If my logs are necessary, I will add them when I can.
Related branches
- MAAS Lander: Approve
- Björn Tillenius: Approve
-
Diff: 13 lines (+2/-1)1 file modifiedsnap/local/tree/bin/run-named (+2/-1)
- Alberto Donato: Approve
-
Diff: 13 lines (+2/-1)1 file modifiedsnap/local/tree/bin/run-named (+2/-1)
Changed in maas: | |
milestone: | none → next |
status: | In Progress → Fix Committed |
Changed in maas: | |
milestone: | next → 2.9.0b4 |
Changed in maas: | |
status: | Fix Committed → Fix Released |
I've reproduced this on a test system to which I have full access.
At the time DNS starts failing, /var/snap/ maas/common/ log/named. log shows:
25-Jul-2020 09:44:34.333 ../../. ./lib/dns/ rbtdb.c: 1499: fatal error: CHECK(rbtdb- >next_serial != 0) failed 1ubuntu1. 12-Ubuntu (Extended Support Version) <id:a375815>
25-Jul-2020 09:44:34.333 RUNTIME_
25-Jul-2020 09:44:34.333 exiting (due to fatal error in library)
25-Jul-2020 09:44:38.443 starting BIND 9.11.3-
So BIND crashes and is restarted - with a bad configuration.
I think the crash is actually due to changing the dnssec-validation option and reloading, and restarting instead of reloading prevents this crash. This means I should use a different 'quick fix'!
However, I can't find anything in the logs as to why the configuration file was changed, which is the real issue here.