ensure every machine has a nrpe unit on it

Bug #1893272 reported by Andrea Ieri
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juju Lint
Fix Released
High
Gabriel Cocenza

Bug Description

Consider a model where a machine has several units deployed on top of it as lxds, but no principal itself. Although juju-lint will alert on principal charms lacking a nrpe subordinate, it will not report the lack of nrpe on a machine with no principal.

In addition to validating relations between nrpe and principal units, we should ensure that every machine in the model has a nrpe unit on it.

Tags: bseng-142

Related branches

Revision history for this message
Andrea Ieri (aieri) wrote :

on second thought, this affects more than just nrpe: if you have a machine with no principal, you will also be missing ntp / telegraf / etc.

Shall we perhaps have lists of "mandatory units" that all machines and/or containers must have?

Revision history for this message
James Hebden (ec0) wrote :

Reviewing the code after the refactor, the current logic is that each machine does track the subordinates at a machine level, and the lint rules are checked against a set of subordinates on each machine. Have you seen this with the current code base, and do you have an example YAML that would show the behaviour you have seen, if it is still a problem with the latest snap?

Changed in juju-lint:
status: New → Incomplete
importance: Undecided → High
Revision history for this message
Andrea Ieri (aieri) wrote :

I have re-tried linting my model with juju-lint 1.1.dev11+ge9499f8 and the problem persists.

json status output showing the issue is available here (internal link): https://private-fileshare.canonical.com/~aieri/lp1893272.json

Machines 18, 19, 20, and 23 (not an exhaustive list) have no principal charms deployed on them, but juju-lint is not throwing any warning about missing subordinates.

Changed in juju-lint:
status: Incomplete → New
Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote :

Hi Andrea. Could you try running with the changes of this MR and see if it's working?
https://code.launchpad.net/~gabrielcocenza/juju-lint/+git/juju-lint/+merge/422918

After running I could see that the apps canonical-livepatch, ceilometer-agent, hacluster-vault, landscape-haproxy, landscape-postgresql, lldpd, memcached, ntp, thruk-agent are missing relation with nrpe using "nrpe-external-master" endpoint.

It was also possible to see that the apps aodh, bcache-tuning, cinder-ceph, cloudstats, designate, designate-bind, dns-policy-routing, easyrsa, external-policy-routing, filebeat, gnocchi, heat, keystone-ldap, landscape-client, landscape-server, logrotate, neutron-openvswitch, neutron-openvswitch-sriov, telegraf, telegraf-prometheus were missing relation with nrpe using the "juju-info" endpoint because they don't have "nrpe-external-master"

This approach focus on relations instead of if the subordinate is present in a machine. I think this makes sense because a subordinate will just be deployed in a machine when the relation exists.

Eric Chen (eric-chen)
Changed in juju-lint:
assignee: nobody → Canonical BootStack DevOps Centre (canonical-bootstack-doc)
assignee: Canonical BootStack DevOps Centre (canonical-bootstack-doc) → nobody
assignee: nobody → Eric Chen (eric-chen)
assignee: Eric Chen (eric-chen) → nobody
Eric Chen (eric-chen)
tags: added: bseng-142
Changed in juju-lint:
assignee: nobody → Gabriel Angelo Sgarbi Cocenza (gabrielcocenza)
Changed in juju-lint:
status: New → Fix Committed
Changed in juju-lint:
milestone: none → 1.0.5
Changed in juju-lint:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.