Mcollective hearbeat failed: rabbitmq.rb:50:in `on_hbread_fail'

Bug #1357932 reported by Dennis Dmitriev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Dima Shulyak

Bug Description

System test 'deploy_ha' ( Ubuntu thread_5 ) has failed while cluster was deploying with the following error in the mcollective.log on node slave-02:

E, [2014-08-15T07:18:17.479550 #1116] ERROR -- : rabbitmq.rb:50:in `on_hbread_fail' Heartbeat read failed from 'stomp://mcollective@10.108.35.2:61613': {"lock_fail"=>true, "lock_fail_count"=>1, "ticker_interval"=>29.5, "read_fail_count"=>0}
D, [2014-08-15T07:18:24.202553 #1116] DEBUG -- : rabbitmq.rb:66:in `on_hbfire' Publishing heartbeat to stomp://mcollective@10.108.35.2:61613: send_fire, curt1408087104.20195last_sleep30.4989078044891
E, [2014-08-15T07:18:37.539039 #1116] ERROR -- : rabbitmq.rb:30:in `on_miscerr' Unexpected error on connection stomp://mcollective@10.108.35.2:61613: es_oldrecv: receive failed: Connection timed out

ISO #446 (#439 failed with the same error on this test):
{"build_id": "2014-08-17_02-01-17", "ostf_sha": "d2a894d228c1f3c22595a77f04b1e00d09d8e463", "build_number": "446", "auth_required": true, "api": "1.0", "nailgun_sha": "bc9e377dbe010732bc2ba47161ed9d433998e07b", "production": "docker", "fuelmain_sha": "08f04775dcfadd8f5b438a31c63e81f29276b7d3", "astute_sha": "8e1db3926b2320b30b23d7a772122521b0d96166", "feature_groups": ["mirantis"], "release": "5.1", "fuellib_sha": "81741445d4ab2db198585149815bb80a238a1214"}

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
Changed in fuel:
importance: Undecided → High
assignee: nobody → Fuel Library Team (fuel-library)
Dima Shulyak (dshulyak)
summary: - rabbitmq.rb:50:in `on_hbread_fail' Heartbeat read failed
+ Mcollective hearbeat failed: rabbitmq.rb:50:in `on_hbread_fail'
Revision history for this message
Dima Shulyak (dshulyak) wrote :

I will probably close this one as duplicate for https://bugs.launchpad.net/fuel/+bug/1356954, after additional investigation

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Fuel Astute Team (fuel-astute)
Changed in fuel:
status: New → Confirmed
assignee: Fuel Astute Team (fuel-astute) → Dima Shulyak (dshulyak)
Revision history for this message
Dima Shulyak (dshulyak) wrote :

After new version of rabbitmq (3.3.5) was installed this problem shouldnot occur.

If no hb received, rabbitmq handled it and closed connection

=WARNING REPORT==== 19-Aug-2014::14:37:51 ===
STOMP detected missed client heartbeat(s) on connection 172.17.42.1:60217 -> 172.17.0.12:61613, closing it

=INFO REPORT==== 19-Aug-2014::14:37:51 ===
closing STOMP connection <0.1628.0> (172.17.42.1:60217 -> 172.17.0.12:61613)

After this is done mcollective client successfully reconected and deployment continued

2014-08-19T14:37:11 debug: [431] Retry #1 to run mcollective agent on nodes: '6'
2014-08-19T14:37:54 debug: [431] Retry #2 to run mcollective agent on nodes: '6'
2014-08-19T14:37:54 debug: [431] ee649aa6-4468-48ac-a0b9-0c0e1f7dc3d0: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"6", :statuscode=>0, :statusmsg=>"O
K", :data=>{:idling=>0, :status=>"running", :runtime=>1408455474, :stopped=>0, :resources=>{"total"=>0, "restarted"=>0, "out_of_sync"=>0, "failed"=>1, "changed"=>0}, :lastrun=>0, :version=>nil, :output=>"Currently running; last completed run 1408455474 seconds ago", :time=>{"last_run"=>0}, :changes=>nil, :running=>1, :enabled=>1, :events=>nil}}

Changed in fuel:
status: Confirmed → Invalid
Dima Shulyak (dshulyak)
Changed in fuel:
status: Invalid → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.