Ubuntu Repository Cache Charm

leader_id stale/incorrect; causes rsync cron job missing on leader unit

Bug #1797297 reported by Haw Loeung on 2018-10-11

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Ubuntu Repository Cache Charm	Fix Released	High	Haw Loeung

Bug Description

Hi,

As seen today, leader_id was stale/incorrect:

| ubuntu@machine-0:~$ sudo juju-run ubuntu-repository-cache/0 "leader-get"
| leader_id: ubuntu-repository-cache/2

| ubuntu@machine-2:~$ sudo juju-run ubuntu-repository-cache/2 "leader-get"
| leader_id: ubuntu-repository-cache/2

leader_id only gets set on leader-elected hook firing. I think we should also have it run on config-changed or some other to ensure that leader_id isn't stale.

Bit of evidence - https://pastebin.canonical.com/p/9qDdJ6jv45/

| 2018-10-11 01:09:20 WARNING juju-log cluster:1: Leader changed between peer_update_metadata and _nonleader_update_metadata

Or even when the sync job runs from cron:

| 2018-10-11 02:23:36,164 - Executing hook: ['juju-run', 'ubuntu-repository-cache/0', '/var/lib/juju/agents/unit-ubuntu-repository-cache-0/charm/hooks/ubuntu-repository-cache-sync ubuntu_2018-10-11_02:23:01_u0']

Have hooks/ubuntu-repository-cache-sync check and ensure leader_id isn't stale.

See original description

Related branches

lp://staging/~hloeung/ubuntu-repository-cache/ensure-leader-id-setting-correct

Merged into lp://staging/ubuntu-repository-cache at revision 295

Barry Price: Approve on 2021-01-08

Canonical IS Reviewers: Pending requested 2021-01-08

Haw Loeung (hloeung) on 2018-10-11

description:

updated

Revision history for this message

Stuart Bishop (stub) wrote on 2018-10-11:

Updating this leader setting more aggressively will help, and a quick fix so can be done.

However, a setting like this is not reliable - it can only ever state which unit *was* the leader, and cannot state which unit *is* the leader. The only unit that can reliably know who the leader is is the leader itself (by calling is-leader). We should drop this leadership setting, and the charm refactored to only use 'is-leader' and not 'is-leader'. To communicate with the leader, place the message on the per relation. Any message being sent to a specific unit because it *was* the leader is a bug, because there is no guarantee that the unit will still be the leader when the message arrives.

Revision history for this message

Stuart Bishop (stub) wrote on 2018-10-11:

Ideally, the only thing the lead unit does is to select which of the unit is primary. We don't want the primary unit in an ubuntu-repository-cache deployment to flap every time there is a netspit (which will trigger Juju leadership elections and flapping)

Revision history for this message

Haw Loeung (hloeung) wrote on 2020-11-05:

Also LP:1673325

Changed in ubuntu-repository-cache:
status:	New → Triaged
importance:	Undecided → High

Haw Loeung (hloeung) on 2020-11-05

Changed in ubuntu-repository-cache:
assignee:	nobody → Haw Loeung (hloeung)

Haw Loeung (hloeung) on 2020-12-17

Changed in ubuntu-repository-cache:
status:	Triaged → In Progress

Revision history for this message

Haw Loeung (hloeung) wrote on 2020-12-18:

To add:

2020-12-17 16:17:45 INFO juju.worker.leadership tracker.go:194 ubuntu-repository-cache/1 promoted to leadership of ubuntu-repository-cache
2020-12-17 16:18:07 INFO juju-log Reactive main running for hook leader-elected
tracer: ++ queue handler reactive/ubuntu-repository-cache.py:218:leader_elected
2020-12-17 16:18:07 INFO juju-log Invoking reactive handler: reactive/ubuntu-repository-cache.py:218:leader_elected
2020-12-17 16:18:07 INFO juju-log leader-elected fired. This is not the leader
2020-12-17 16:18:07 INFO juju.worker.uniter.operation runhook.go:142 ran "leader-elected" hook (via explicit, bespoke hook script)
2020-12-17 22:28:25 INFO juju.worker.leadership tracker.go:194 ubuntu-repository-cache/1 promoted to leadership of ubuntu-repository-cache
2020-12-17 22:48:09 INFO juju-log cluster:1: Updating metadata on the leader
2020-12-17 22:48:09 WARNING juju-log cluster:1: Leader changed between peer_update_metadata and _leader_update_metadata

And:

| ubuntu-repository-cache/1* unknown idle 1 20.195.53.225 80/tcp

However:

| ubuntu@machine-1:~$ sudo juju-run ubuntu-repository-cache/1 "leader-get"
| leader_id: ubuntu-repository-cache/0

summary:

- leader_id stale/incorrect
+ leader_id stale/incorrect; causes rsync cron job missing on leader unit

Revision history for this message

Haw Loeung (hloeung) wrote on 2020-12-28:

Duplicates of this bug

Bug #1885653

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Changed in ubuntu-repository-cache:
status:	In Progress → Fix Committed

Changed in ubuntu-repository-cache:
status:	Fix Committed → Fix Released