Empty exchanges after sequence number jumping.

Bug #1917540 reported by Simon Poirier
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Landscape Client
Fix Released
High
Kevin Nasto
landscape-client (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

If landscape-server next-expected-sequence is above the client current message sequence, the client will send empty messages / drop it's outgoing messages.

Reproduction:
1. register client. Let it exchange.
2. on the server reset the sequence number to something high:
   psql landscape-test-main -c 'update computer_status set next_expected_sequence = 9000'

Result
If debuging is set on the client, you'll see only empty outgoing messsages:
{.. 'messages': [], 'sequence': 9000, 'server-api': '3.3', 'total-messages': 0}

Expected
If the expected sequence is above the current message number, client should reset its sequence number
to the next-expected-sequence and initiate a resync. (the sequence reset has to be done otherwise the resync might be ignored by the server).

How we got there is still somewhat of a mystery, but my best guess is there
were stale message-server processes still using the old database server when
we did the DB server switch. This led to an exchange which messed clients
expectation of sequence numbers:
   Client: 10 messages, sequence 100
   New server: ack. next-expected-sequence is 110
   Client: 2 messages, sequence 110
   Stale server: (rejects message) next-expected-sequence 100
   Client: (tries to replay from 100. can't. resets sequence to 100)
   Client: 2 messages, sequence: 100
   New server: (ignores messages). next-expected-sequence is 110

   * At this point the client is stuck sending empty messages (as it has no message 110)
     with the same sequence number (110).
     Sequence numbers don't increase (as empty messages are sent).
     Server considers those empty message as successful exchanges.
     Those empty messages will carry on and client broker just drops future outgoing messages.

Also, I suspect it's possible to get back to that issue with database backups restoration.

Simon Poirier (simpoir)
information type: Private → Public
Changed in landscape-client:
assignee: nobody → Kevin Nasto (silverdrake11)
status: New → In Progress
Revision history for this message
Kevin Nasto (silverdrake11) wrote :
Revision history for this message
Kevin Nasto (silverdrake11) wrote :

Manual testing instructions for the pull request I used is, sending tags to the server from the client, with the tags message server branch. Then before the new tag is sent, update the next expected in the server using the psql command in the description. If all goes well the new tag should go through with the fix. Without the fix, it shouldn't go through.

Revision history for this message
Kevin Nasto (silverdrake11) wrote :

This is some debugging info I used. This following set of messages is normal when there is no problems.

NORMAL MESSAGES
Sending 4 messages
Received 2 messages
********************
Server Sequence: 8
Next Expected S: 32
Client Sequence: 28
Pending Offset: 5
Number Messages: 4
********************
Server Sequence: 10
Next Expected S: 32
Client Sequence: 32
Pending Offset: 4
Number Messages: 0

Then this is when there is the sequence jump, when using the fix in the pull request linked above.

SEQUENCE JUMP
Sending 6 messages
Received 0 messages
********************
Server Sequence: 20
Next Expected S: 200
Client Sequence: 64
Pending Offset: 5
Number Messages: 6
********************
Server Sequence: 20
Next Expected S: 200
Client Sequence: 200
Pending Offset: 0
Number Messages: 7

The sequence jump starts a resync and resetting the offsets, and things go back to normal, as in the following.

SEQUENCE JUMP AFTER RESYNC
Sending 24 messages
Received 2 messages
********************
Server Sequence: 20
Next Expected S: 224
Client Sequence: 200
Pending Offset: 0
Number Messages: 24
********************
Server Sequence: 22
Next Expected S: 224
Client Sequence: 224
Pending Offset: 24
Number Messages: 0

Simon Poirier (simpoir)
Changed in landscape-client:
status: In Progress → Fix Committed
no longer affects: landscape
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package landscape-client - 23.02-0ubuntu1

---------------
landscape-client (23.02-0ubuntu1) lunar; urgency=medium

  * New upstream release 23.02:
    - Preventing the generation of large messages and logs that can overwhelm
      Landscape Server (LP: #1995775)
    - Improved MOTD slowdown on machines with many tap network interfaces
      (LP: #2006396)
    - No longer using deprecated apt-key when storing trusted GPG keys
      (LP: #1973202)
    - Fixed issue recognising Parallels VMs as Virtual Machine clients
      (LP: #1827909)
    - Fixes for incorrect logfile rotation config (LP: #1968189)
    - Client-side backoff handling to moderate traffic to Landscape Server
      during high load (LP: #1947399)
    - Avoid sending empty messages when catching up to expected next message
      (LP: #1917540)
    - --is-registered CLI option to quickly check if client is registered
      (LP: #1912516)
    - Can now report Ubuntu Pro attachment information if the version of
      Landscape Server it is registered to supports this (LP: #2006401)
    - Packages installed as dependencies as part of package profiles are now
      appropriately autoremovable (LP: #1878957)
    - Registration timeouts give an error instead of timing out (LP: #1889464)
    - RHEV hypervisor VMs are now recognized as virtual machines (LP: #1884116)
    - Doing a Landscape-driven release upgrade from a release running python 2
      to one running python 3 no longer hangs forever (LP: #1943291)

 -- Mitch Burton <email address hidden> Wed, 08 Feb 2023 10:23:31 -0800

Changed in landscape-client (Ubuntu):
status: New → Fix Released
Changed in landscape-client:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.