Configuration option to tweak SubnetLen

Bug #1930087 reported by Felipe Reyes
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Flannel Charm
In Progress
High
Felipe Reyes

Bug Description

[Impact]

Environments with big machines to run kubernetes-worker or a high number of create/destroy pods will see a starvation of available IP address since a /24 subnet (flannel's default) will be able to have 254 hosts, flannel uses a 24h lease which will make this problem easier to find.

Currently the configuration passed omits the SubnetLen key:

https://github.com/charmed-kubernetes/charm-flannel/blob/master/reactive/flannel.py#L178

This should be exposed in a configuration option, and in my opinion the charm should set this to 22 by default since it would allow environments running CI/CD jobs (e.g. gitlab's ci runner) to run without problems.

Using a /22 per worker and a /16 for the cluster would give space to have 64 workers and 1022 IP addresses per worker.

 ~ $ ipcalc-ng 10.1.0.0/16 --split 22 |grep Network | wc -l
64
 ~ $ ipcalc-ng 10.1.252.0/22
Network: 10.1.252.0/22
Netmask: 255.255.252.0 = 22
Broadcast: 10.1.255.255

Address space: Private Use
HostMin: 10.1.252.1
HostMax: 10.1.255.254
Hosts/Net: 1022

[Workaround]

This script could be used to set the SubnetLen manully - https://paste.ubuntu.com/p/Bzr83SCb9H/ - be aware that the charm may overwrite the change, so you may need to install a cronjob to have a consistent configuration over time.

Felipe Reyes (freyes)
tags: added: sts
Changed in charm-flannel:
assignee: nobody → Felipe Reyes (freyes)
Revision history for this message
George Kraft (cynerva) wrote :

> This should be exposed in a configuration option, and in my opinion the charm should set this to 22 by default since it would allow environments running CI/CD jobs (e.g. gitlab's ci runner) to run without problems.

Whatever we set the default to, we will need to make sure that it is handled correctly when upgrading existing clusters that already have the default at 24.

Changed in charm-flannel:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Felipe Reyes (freyes) wrote :

I was implementing a patch and I realize that changing the default from /24 to /22 for running system will have an impact since pods should request a new IP address to make sure they have a correct netmask. A /22 could be pushed into FCB since that one would be used for new deployment.

Revision history for this message
Felipe Reyes (freyes) wrote :
Changed in charm-flannel:
status: Triaged → In Progress
Revision history for this message
Nathan M. (codge) wrote :

So this bug is over a year old now. Is there any movement to get this patched in Juju?

We've had rather large down times attributed to the default /24 subnet over the course of two years running production workloads with Juju.

Adam Dyess (addyess)
Changed in charm-flannel:
milestone: none → 1.28
Adam Dyess (addyess)
Changed in charm-flannel:
milestone: 1.28 → 1.28+ck1
Adam Dyess (addyess)
tags: added: backport-needed
Adam Dyess (addyess)
Changed in charm-flannel:
milestone: 1.28+ck1 → 1.29
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.