[master] Restarting systemd-networkd breaks keepalived, heartbeat, corosync, pacemaker (interface aliases are restarted)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
netplan |
Triaged
|
Low
|
Unassigned | ||
heartbeat (Ubuntu) |
Won't Fix
|
Low
|
Unassigned | ||
keepalived (Ubuntu) |
In Progress
|
Medium
|
Athos Ribeiro | ||
Bionic |
Confirmed
|
Medium
|
Athos Ribeiro | ||
Focal |
Confirmed
|
Undecided
|
Athos Ribeiro | ||
systemd (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Won't Fix
|
Medium
|
Unassigned | ||
Bionic |
Fix Released
|
Medium
|
Eric Desrochers | ||
Disco |
Won't Fix
|
Medium
|
Unassigned | ||
Eoan |
Fix Released
|
Medium
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[impact]
- ALL related HA software has a small problem if interfaces are being managed by systemd-networkd: nic restarts/reconfigs are always going to wipe all interfaces aliases when HA software is not expecting it to (no coordination between them.
- keepalived, smb ctdb, pacemaker, all suffer from this. Pacemaker is smarter in this case because it has a service monitor that will restart the virtual IP resource, in affected node & nic, before considering a real failure, but other HA service might consider a real failure when it is not.
[test case]
- comment #14 is a full test case: to have 3 node pacemaker, in that example, and cause a networkd service restart: it will trigger a failure for the virtual IP resource monitor.
- other example is given in the original description for keepalived. both suffer from the same issue (and other HA softwares as well).
[regression potential]
- this backports KeepConfiguration parameter, which adds some significant complexity to networkd's configuration and behavior, which could lead to regressions in correctly configuring the network at networkd start, or incorrectly maintaining configuration at networkd restart, or losing network state at networkd stop.
- Any regressions are most likely to occur during networkd start, restart, or stop, and most likely to involve missing or incorrect ip address(es).
- the change is based in upstream patches adding the exact feature we needed to fix this issue & it will be integrated with a netplan change to add the needed stanza to systemd nic configuration file (KeepConfigurat
[other info]
original description:
---
Configure netplan for interfaces, for example (a working config with IP addresses obfuscated)
network:
ethernets:
eth0:
dhcp4: false
eth2:
- 12.13.14.18/29
- 12.13.14.19/29
dhcp4: false
eth3:
dhcp4: false
eth4:
dhcp4: false
eth7:
dhcp4: false
version: 2
Configure keepalived (again, a working config with IP addresses obfuscated)
global_defs # Block id
{
notification_email {
<email address hidden>
}
smtp_server 10.22.11.7 # IP
router_id system3 # string identifying the machine,
}
vrrp_sync_group collection {
group {
wan
lan
}
vrrp_instance wan {
state MASTER
interface eth2
priority 150
advert_int 1
smtp_alert
}
12.13.14.20
}
}
vrrp_instance lan {
state MASTER
interface eth3
priority 150
advert_int 1
smtp_alert
}
}
}
vrrp_instance phone {
state MASTER
interface eth4
priority 150
advert_int 1
smtp_alert
}
}
}
At boot the affected interfaces have:
5: eth4: <BROADCAST,
link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff
inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4
valid_lft forever preferred_lft forever
inet 10.22.14.3/24 scope global secondary eth4
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:
valid_lft forever preferred_lft forever
7: eth3: <BROADCAST,
link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff
inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3
valid_lft forever preferred_lft forever
inet 10.22.11.13/24 scope global secondary eth3
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:
valid_lft forever preferred_lft forever
9: eth2: <BROADCAST,
link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff
inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2
valid_lft forever preferred_lft forever
inet 12.13.14.20/32 scope global eth2
valid_lft forever preferred_lft forever
inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:
valid_lft forever preferred_lft forever
Run 'netplan try' (didn't even make any changes to the configuration) and the keepalived addresses disappear never to return, the affected interfaces have:
5: eth4: <BROADCAST,
link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff
inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:
valid_lft forever preferred_lft forever
7: eth3: <BROADCAST,
link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff
inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:
valid_lft forever preferred_lft forever
9: eth2: <BROADCAST,
link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff
inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2
valid_lft forever preferred_lft forever
inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:
valid_lft forever preferred_lft forever
Related branches
- Dan Streetman (community): Approve
- Christian Ehrhardt (community): Approve
- Balint Reczey: Pending requested
- Dimitri John Ledkov: Pending requested
- Canonical Server: Pending requested
-
Diff: 674 lines (+628/-0)7 files modifieddebian/changelog (+11/-0)
debian/patches/lp1815101-01-networkd-add-support-to-keep-configuration.patch (+209/-0)
debian/patches/lp1815101-02-networkd-stop-clients-when-networkd-shuts-down.patch (+94/-0)
debian/patches/lp1815101-03-network-add-KeepConfiguration-dhcp-on-stop.patch (+155/-0)
debian/patches/lp1815101-04-network-make-KeepConfiguration-static-drop-DHCP-addr.patch (+93/-0)
debian/patches/lp1815101-05-man-add-documentation-about-KeepConfiguration.patch (+61/-0)
debian/patches/series (+5/-0)
summary: |
- netplan removes keepalived configuration + Restarting systemd-networkd breaks keepalived clusters |
summary: |
- Restarting systemd-networkd breaks keepalived clusters + [master] Restarting systemd-networkd breaks keepalived clusters |
Changed in netplan: | |
status: | Invalid → Confirmed |
Changed in keepalived (Ubuntu): | |
status: | Triaged → Confirmed |
Changed in systemd (Ubuntu): | |
status: | Triaged → Confirmed |
Changed in keepalived (Ubuntu Bionic): | |
status: | New → Confirmed |
Changed in keepalived (Ubuntu Disco): | |
status: | New → Confirmed |
Changed in heartbeat (Ubuntu Bionic): | |
importance: | Undecided → Medium |
status: | New → Triaged |
Changed in heartbeat (Ubuntu Disco): | |
importance: | Undecided → Medium |
status: | New → Triaged |
Changed in heartbeat (Ubuntu Eoan): | |
importance: | Undecided → Low |
status: | New → Triaged |
Changed in heartbeat (Ubuntu Bionic): | |
assignee: | nobody → Rafael David Tinoco (rafaeldtinoco) |
Changed in heartbeat (Ubuntu Disco): | |
assignee: | nobody → Rafael David Tinoco (rafaeldtinoco) |
Changed in heartbeat (Ubuntu Eoan): | |
assignee: | nobody → Rafael David Tinoco (rafaeldtinoco) |
summary: |
- [master] Restarting systemd-networkd breaks keepalived clusters + [master] Restarting systemd-networkd breaks keepalived, heartbeat, + corosync, pacemaker (interface aliases are restarted) |
tags: | added: sts |
description: | updated |
description: | updated |
Changed in keepalived (Ubuntu Xenial): | |
assignee: | nobody → Rafael David Tinoco (rafaeldtinoco) |
importance: | Undecided → Medium |
status: | New → Confirmed |
no longer affects: | heartbeat (Ubuntu Xenial) |
Changed in systemd (Ubuntu Xenial): | |
assignee: | nobody → Rafael David Tinoco (rafaeldtinoco) |
importance: | Undecided → Medium |
status: | New → Confirmed |
tags: | added: ddstreet |
Changed in systemd (Ubuntu Bionic): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
status: | Confirmed → In Progress |
Changed in systemd (Ubuntu Bionic): | |
assignee: | Jorge Niedbalski (niedbalski) → Eric Desrochers (slashd) |
Changed in systemd (Ubuntu Bionic): | |
assignee: | Eric Desrochers (slashd) → nobody |
Changed in netplan: | |
status: | Incomplete → Triaged |
Changed in keepalived (Ubuntu): | |
assignee: | nobody → Athos Ribeiro (athos-ribeiro) |
Changed in keepalived (Ubuntu Xenial): | |
assignee: | nobody → Athos Ribeiro (athos-ribeiro) |
Changed in keepalived (Ubuntu Bionic): | |
assignee: | nobody → Athos Ribeiro (athos-ribeiro) |
no longer affects: | keepalived (Ubuntu Xenial) |
Changed in keepalived (Ubuntu Focal): | |
assignee: | nobody → Athos Ribeiro (athos-ribeiro) |
This isn't netplan, it's systemd-networkd. Netplan only writes configuration for the chosen renderer (in this case, systemd-networkd).
Either systemd needs to not wipe out foreign addresses (I believe there is a PR in git for that) or keepalived should somehow interface with systemd so they can collaborate on setting and keeping up the IP addresses.
Reassigning.