relations aren't added when deployment from bundle

Bug #1931632 reported by james beedy
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned

Bug Description

Hello,

We are experiencing an issue where juju doesn't add relations between applications when a bundle is deployed.

I can deploy my charms and relate them by hand, but when I deploy a bundle, only some or no relations are made.

See https://github.com/canonical/operator/issues/551#issuecomment-859031955

applications:
  slurmctld:
    charm: slurmctld
    channel: stable
    num_units: 1
  slurmd:
    charm: slurmd
    channel: stable
    num_units: 1
  slurmdbd:
    charm: slurmdbd
    channel: stable
    num_units: 1
  percona-cluster:
    charm: cs:percona-cluster-293
    series: bionic
    num_units: 1
relations:
  - - slurmdbd:db
    - percona-cluster:db
  - - slurmctld:slurmdbd
    - slurmdbd:slurmdbd
  - - slurmctld:slurmd
    - slurmd:slurmd

Deployig the above bundle fails to create relations where deploying the components and making the relations by hand work.

Revision history for this message
Ian Booth (wallyworld) wrote :

Adding to a milestone so we can get someone to try and reproduce this to see what's happening.

Changed in juju:
milestone: none → 2.9.5
importance: Undecided → High
status: New → Triaged
Changed in juju:
milestone: 2.9.5 → 2.9.6
Revision history for this message
Heitor (heitorpbittencourt) wrote :

This is curious to me. Sometimes the relations are added, sometimes some of the relations are added, and sometimes none of the relations are added.

How can I debug this further?

Revision history for this message
Heitor (heitorpbittencourt) wrote :

Occasionally, this bug makes my units go into a weird state. Sometimes the relations are added but not accounted for.

I deployed the same bundle James described above. The relation for slurmdbd <-> slurmctld was added, as I can see it in `juju status --relations`), but the charms did not "see" them. I have the impression that the relation-created hook did not fire, because when I tried to manually relate the applications, juju gave an error saying the relation already exists.

I removed the relation and re-added it, to make the units run the relation-created functions, but I saw an error in the debug-log i've never seen before:

machine-140: 11:09:29 ERROR unit.slurmctld/35.juju-log slurmdbd:265: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 403, in <module>
    main(SlurmctldCharm)
  File "/var/lib/juju/agents/unit-slurmctld-35/charm/venv/ops/main.py", line 404, in main
    framework.reemit()
  File "/var/lib/juju/agents/unit-slurmctld-35/charm/venv/ops/framework.py", line 732, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-slurmctld-35/charm/venv/ops/framework.py", line 767, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-slurmctld-35/charm/src/interface_slurmrestd.py", line 54, in _on_relation_created
    if not self._charm.slurmdbd_info:
  File "./src/charm.py", line 84, in slurmdbd_info
    return self._slurmdbd.get_slurmdbd_info()
  File "/var/lib/juju/agents/unit-slurmctld-35/charm/src/interface_slurmdbd.py", line 106, in get_slurmdbd_info
    relation = self._relation
  File "/var/lib/juju/agents/unit-slurmctld-35/charm/src/interface_slurmdbd.py", line 102, in _relation
    return self.framework.model.get_relation(self._relation_name)
  File "/var/lib/juju/agents/unit-slurmctld-35/charm/venv/ops/model.py", line 143, in get_relation
    return self.relations._get_unique(relation_name, relation_id)
  File "/var/lib/juju/agents/unit-slurmctld-35/charm/venv/ops/model.py", line 470, in _get_unique
    raise TooManyRelatedAppsError(relation_name, num_related, 1)
ops.model.TooManyRelatedAppsError: Too many remote applications on slurmdbd (2 > 1)

I have only 1 slurmdbd and 1 slurmctld.

This is with Juju controller 2.9.4, juju client 2.9.5 (latest/candidate), and the units running on CentOS7.

Changed in juju:
milestone: 2.9.6 → 2.9.7
Revision history for this message
John A Meinel (jameinel) wrote :

So units don't fire "relation-joined" until a unit has completed its 'start' hook. (you don't see other units, and they don't see you).

If your application was in a stuck state, it may not have progressed through start to get the rest of the hooks.

It may also be a case where if you remove the relation before it gets to relation-joined we *did* fire a relation-created, but we then fail to fire relation-broken to inform the charm that the relation is now gone.

If you still have that situation can you run something like:

 `juju run --unit slurmdb/0 relation-ids DBENDPOINT`

to see what relations the controller thinks that you have.

Pen Gale (pengale)
Changed in juju:
status: Triaged → Incomplete
Changed in juju:
milestone: 2.9.7 → 2.9.8
Changed in juju:
milestone: 2.9.8 → 2.9.9
Revision history for this message
Heitor (heitorpbittencourt) wrote :

Today this happened again, with juju controller 2.9.5 and juju 2.9.8 (2.9/candidate)

The peer relations are added correctly, but some of the regular relations are missing. I have some commands here: https://paste.ubuntu.com/p/BFyk7ZqBsx/

Revision history for this message
Heitor (heitorpbittencourt) wrote :

(followup commands from previous pastebin)

After removing the relation and re-adding it, we can see it exists now:

$ juju remove-relation slurmdbd percona-cluster
$ juju relate slurmdbd percona-cluster
$ juju run --unit slurmdbd/23 relation-ids db
db:238
$ juju run --unit slurmdbd/23 "relation-list -r db:238"
percona-cluster/8

Revision history for this message
John A Meinel (jameinel) wrote :

So from the original paste:
```
$ juju run --unit slurmdbd/23 relation-ids db
db:233
$ juju run --unit slurmdbd/23 "relation-list -r db:233"

(previous line is empty)
```

That would indicate that the relation exists (there is a relation-id for it), but either slurmdbd/23 or percona-cluster/8 has not entered scope yet for the other side to see it. You don't enter scope until you have successfully completed the start hook.

One of the earlier posts hinted that there was charm hook failing with an exception, which could put the charm into an error state, that it wouldn't move on to the next hooks.

Your 'juju status' from the paste says that everything is in either active or blocked, and nothing is in error, which would indicate that they shouldn't be prevented from getting through 'start'. We'd need to see 'juju show-status-log --type unit slurmdbd/23' and on other units to see what hooks have actually been fired.

It would also be useful to grab debug-log from both the running model and the controller model, to see what has been going on in the system (is there an error somewhere, etc).

Revision history for this message
Heitor (heitorpbittencourt) wrote :
Download full text (3.3 KiB)

```
$ juju show-status-log --type unit slurmdbd/23
Time Type Status Message
08 Jul 2021 11:49:52-03:00 juju-unit executing running slurmdbd-relation-changed hook for slurmctld/22
08 Jul 2021 11:49:53-03:00 workload active slurmdbd available
08 Jul 2021 11:49:53-03:00 workload waiting Starting slurmdbd
08 Jul 2021 11:49:54-03:00 workload active slurmdbd available
08 Jul 2021 11:49:54-03:00 workload waiting Starting slurmdbd
08 Jul 2021 11:49:55-03:00 workload active slurmdbd available
08 Jul 2021 11:49:55-03:00 workload waiting Starting slurmdbd
08 Jul 2021 11:49:57-03:00 workload active slurmdbd available
08 Jul 2021 11:49:57-03:00 workload waiting Starting slurmdbd
08 Jul 2021 11:49:58-03:00 workload active slurmdbd available
08 Jul 2021 11:49:58-03:00 juju-unit executing running slurmdbd-relation-changed hook
08 Jul 2021 11:49:58-03:00 juju-unit idle
08 Jul 2021 11:50:52-03:00 juju-unit executing running action juju-run
08 Jul 2021 11:50:52-03:00 juju-unit idle
08 Jul 2021 11:50:55-03:00 juju-unit executing running action juju-run
08 Jul 2021 11:50:55-03:00 juju-unit idle
08 Jul 2021 11:51:06-03:00 juju-unit executing running action juju-run
08 Jul 2021 11:51:07-03:00 juju-unit idle
08 Jul 2021 11:51:10-03:00 juju-unit executing running action juju-run
08 Jul 2021 11:51:11-03:00 juju-unit idle

$ juju show-status-log --type unit percona-cluster/8
Time Type Status Message
08 Jul 2021 10:44:21-03:00 juju-unit executing running start hook
08 Jul 2021 10:44:21-03:00 juju-unit executing running db-relation-joined hook for slurmdbd/22
08 Jul 2021 10:44:22-03:00 juju-unit executing running db-relation-changed hook for slurmdbd/22
08 Jul 2021 10:44:22-03:00 juju-unit idle
08 Jul 2021 11:22:55-03:00 juju-unit executing running db-relation-departed hook for slurmdbd/22
08 Jul 2021 11:22:56-03:00 juju-unit executing running db-relation-broken hook
08 Jul 2021 11:22:56-03:00 juju-unit idle
08 Jul 2021 11:23:44-03:00 juju-unit executing running db-relation-created hook
08 Jul 2021 11:23:44-03:00 juju-unit idle
08 Jul 2021 11:25:54-03:00 juju-unit executing running db-relation-joined hook for slurmdbd/23
08 Jul 2021 11:25:54-03:00 juju-unit executing running db-relation-changed hook for slurmdbd/23
08 Jul 2021 11:25:55-03:00 juju-unit idle
08 Jul 2021 11:49:26-03:00 juju-unit executing running db-relation-departed hook for slurmdbd/23
08 Jul 2021 11:49:26-03:00 juju-unit executing running db-relation-broken hook
08 Jul 2021 11:49:26-03:00 juju-unit idle
08 Jul 2021 11:49:35-03:00 juju-unit executing running db-relation-created hook
08 Jul 2021 11:49:35-03:00 juju-unit executing running db-relation-joined hook for slurmdbd/23
08 Jul 2021 11:49:35-03:00 juju-unit executing running db-relation-changed hook for slurmdbd/23
08 Jul 2021 11:49:36-03:00 juju-unit idle
08 Jul 2021 13:09:29-03:00 workload active Unit is ready
```

Not sure if this means much, but this model is not new, I `juju deploy`ed and removed a lot of units and applications.

I grab...

Read more...

Changed in juju:
milestone: 2.9.9 → 2.9.10
Pen Gale (pengale)
Changed in juju:
status: Incomplete → Triaged
Changed in juju:
milestone: 2.9.10 → 2.9.11
Changed in juju:
milestone: 2.9.11 → 2.9.12
Changed in juju:
milestone: 2.9.12 → 2.9.13
Changed in juju:
milestone: 2.9.13 → 2.9.14
Changed in juju:
milestone: 2.9.14 → 2.9.15
Changed in juju:
milestone: 2.9.15 → 2.9.16
Changed in juju:
milestone: 2.9.16 → 2.9.17
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.