juju debug-log stops prematurely

Bug #1700453 reported by Paul Gear
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Tim Penhey
2.2
Fix Released
Critical
Tim Penhey

Bug Description

After upgrade from 2.1.x to 2.2.1, juju debug-log often stops after a few lines of output.

The following in logsink.log on the controller seems to be symptomatic of this problem:

machine-0 2017-06-26 02:06:50 ERROR juju.apiserver debuglog.go:92 debug-log handler error: tailer stopped: "" is not a valid tag

Tags: canonical-is
Paul Gear (paulgear)
tags: added: canonical-is
Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.2.2
importance: Undecided → High
status: New → Triaged
importance: High → Critical
Revision history for this message
John A Meinel (jameinel) wrote :

are you passing in any specific arguments to debug-log?

Revision history for this message
Paul Gear (paulgear) wrote :

@jameinel: No - just "juju debug-log" with the appropriate model selected.

Revision history for this message
John A Meinel (jameinel) wrote :

There isn't a clear smoking gun, but I would guess this is from line 691 of state/logs.go:
 entity, err := names.ParseTag(doc.Entity)

If it was that, I would have expected more annotations on the error, namely:
  return nil, errors.Annotate(err, "while parsing entity tag")
and
   return errors.Annotate(err, "deserialization failed (possible DB corruption)")

Every log entry is supposed to be associated with the entity that generated the entry (machine-0/unit-apache-2, etc).

Having an entity of "" looks like bad data in the database. Is it possible to do something like

  db.logs.find({"n": ""})

Or possibly something about Mongo is causing the connection to die and we end up interpreting it as an empty log entry. But I would expect some sort of line like "tcp ... 37017".

Revision history for this message
Tim Penhey (thumper) wrote :

The reason that we don't see the annotated value is that we are returning the tomb.Err() as the logtailer Err() method.

The tomb Err is set in the Kill call in the goroutine started in NewLogTailer, which uses Errors.Cause(err) to deal with the case where we wrap tomb.ErrDying.

Revision history for this message
Tim Penhey (thumper) wrote :

Wondering if there is some case where we get an empty document.

Tim Penhey (thumper)
Changed in juju:
assignee: nobody → Tim Penhey (thumper)
Tim Penhey (thumper)
Changed in juju:
status: Triaged → In Progress
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1700453] Re: juju debug-log stops prematurely

https://github.com/juju/juju/pull/7577

On Fri, Jun 30, 2017 at 8:20 AM, Tim Penhey <email address hidden>
wrote:

> ** Changed in: juju
> Status: Triaged => In Progress
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1700453
>
> Title:
> juju debug-log stops prematurely
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1700453/+subscriptions
>

Tim Penhey (thumper)
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Tim Penhey (thumper) wrote :

Found the underlying reason. We were not ignoring deletion operations from the oplog tailing.

This meant when the log pruner ran, it would return a valid oplog doc, which we would then try to deserialize into a log record, but it didn't have all the right details, hence the empty tag.

Full fix here: https://github.com/juju/juju/pull/7592

Revision history for this message
John A Meinel (jameinel) wrote :

Interestingly, I wasn't seeing it in my scale testing, because I almost
never ran a 'bare' 'juju debug-log', I always filtered for a module, etc,
so it would stay running for a long time (presumably oplog tailer was
filtering modules, etc before deserializing).

On Tue, Jul 4, 2017 at 8:33 AM, Tim Penhey <email address hidden> wrote:

> Found the underlying reason. We were not ignoring deletion operations
> from the oplog tailing.
>
> This meant when the log pruner ran, it would return a valid oplog doc,
> which we would then try to deserialize into a log record, but it didn't
> have all the right details, hence the empty tag.
>
> Full fix here: https://github.com/juju/juju/pull/7592
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1700453
>
> Title:
> juju debug-log stops prematurely
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1700453/+subscriptions
>

Changed in juju:
milestone: 2.2.2 → 2.3-alpha1
status: Fix Committed → In Progress
Revision history for this message
Paul Gear (paulgear) wrote :

I ran bare juju debug-log for quite a while during upgrades of my Canonistack controllers today. I had one instance of debug-log failing unexpectedly which might be attributable to this still being present, but I can't be certain. One interesting side-effect of the fix is that I got about 4 days of backlogs before the current logs started coming through. So it seems that this fix has uncovered a few unseen log messages still hanging around.

Revision history for this message
John A Meinel (jameinel) wrote :

If the controller you're currently connected to is restarted, debug-log
will stop. Is that what you saw or is there something else?

John
=:->

On Jul 17, 2017 11:11 AM, "Paul Gear" <email address hidden> wrote:

I ran bare juju debug-log for quite a while during upgrades of my
Canonistack controllers today. I had one instance of debug-log failing
unexpectedly which might be attributable to this still being present,
but I can't be certain. One interesting side-effect of the fix is that
I got about 4 days of backlogs before the current logs started coming
through. So it seems that this fix has uncovered a few unseen log
messages still hanging around.

--
You received this bug notification because you are subscribed to juju.
Matching subscriptions: juju bugs
https://bugs.launchpad.net/bugs/1700453

Title:
  juju debug-log stops prematurely

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju/+bug/1700453/+subscriptions

Revision history for this message
Paul Gear (paulgear) wrote :

@jameinel: I did restart one of the controllers once, and that may have been the reason why my debug-log died. I think we can safely call this one done and we can reopen if needed.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

This patch would have merged into develop (2.3) as part of the bigger merge right after 2.2.2 release.

Marking this report as Fix Committed.
Please create new report if you are experiencing issues in this area.

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.