Note that I think the best stop gap we can do right now, something that is actually viable to get into 1.25, is to limit the parallelism in juju run so that we're not forking N processes simultaneously, where N == num machines in the environment. This is a pretty trivial fix, and at least will give us some breathing room.
Finding every major memory leak in 500,000 LOC is not likely something we can get done in any kind of short timeframe. Even when we do fix those leaks, forking is still going to be a problem if you have 150 machines in your environment and we're effectively multiplying Juju's memory footprint by 150, so the above parallelism limitation will still be needed and useful.
Note that I think the best stop gap we can do right now, something that is actually viable to get into 1.25, is to limit the parallelism in juju run so that we're not forking N processes simultaneously, where N == num machines in the environment. This is a pretty trivial fix, and at least will give us some breathing room.
Finding every major memory leak in 500,000 LOC is not likely something we can get done in any kind of short timeframe. Even when we do fix those leaks, forking is still going to be a problem if you have 150 machines in your environment and we're effectively multiplying Juju's memory footprint by 150, so the above parallelism limitation will still be needed and useful.