[Description]
On a 3 nodes rabbitMQ cluster after a failure on one of the nodes (kernel crash) the following succession
of events has been registered:
===
This initial crash report
$ ack-grep -i mnesia_locker | wc -l
16
60-=CRASH REPORT==== 7-Sep-2015::14:48:45 ===
61- crasher:
--
66: {mnesia_locker,'rabbit@10-10-34-2',granted}}
67- in function gen_server2:terminate/3 (src/gen_server2.erl, line 1045)
68- ancestors: [worker_pool_sup,rabbit_sup,<0.110.0>]
69- messages: []
70- links: [<0.114.0>]
71- dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct},
72- {{xtype_to_module,topic},rabbit_exchange_type_topic},
73- {random_seed,{5643,25632,27953}},
74- {{xtype_to_module,fanout},rabbit_exchange_type_fanout},
75- {worker_pool_worker,true},
76- {fhc_age_tree,{0,nil}},
--
88: Reason: {unexpected_info,{mnesia_locker,'rabbit@10-10-34-2',granted}}
89- Offender: [{pid,<0.142.0>},
90- {name,25},
91- {mfargs,{worker_pool_worker,start_link,[25]}},
92- {restart_type,transient},
93- {shutdown,4294967295},
94- {child_type,worker}]
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1.log
5166:Mnesia('rabbit@10-10-34-1'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@10-10-34-2'}
sosreport-svz-op-fdc-os-3.00087142-20150909140820/var/log/rabbitmq/rabbit@10-10-34-3.log.1
545:Mnesia('rabbit@10-10-34-3'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@10-10-34-1'}
After this I started seeing this traces:
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log:572:=CRASH REPORT==== 7-Sep-2015::14:48:45 ===
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log:573: crasher:
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-574- initial call: gen:init_it/6
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-575- pid: <0.2346.189>
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-576- registered_name: []
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-577- exception exit: {{unexpected_cast,{next_job_from,<0.4960.181>}},
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-578- {gen_server2,call,
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-579- [<0.22014.255>,
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-580- {submit,#Fun<rabbit_misc.6.25154013>,<0.2862.189>},
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-581- infinity]}}
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-582- in function gen_server2:terminate/3 (src/gen_server2.erl, line 1045)
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-583- ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.110.0>]
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-584- messages: []
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-585- links: [<0.259.0>]
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-586- dictionary: [{guid,{{3735536962,2437967587,3023977752,60868675},0}}]
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-587- trap_exit: true
Related Bugs:
- https://bugs.launchpad.net/mos/+bug/1401948
- https://github.com/erlang/otp/compare/maint...dgud:dgud%3Bmnesia%3Bsticky-race%3BOTP-11375
Possible related upstream commit:
- http://hg.rabbitmq.com/rabbitmq-server/rev/5a63c9e273cc
This seems to be related to [1] and [2] both fixes available since rabbitmq-server 3.4.0
[1] https://github.com/rabbitmq/rabbitmq-server/commit/1a32616b744f4dab09cba4e7a7e747c2b6550361#diff-3b9dc5e3c18be9549b0ab00763e4e123
[2] https://github.com/rabbitmq/rabbitmq-server/commit/238243b10ba6f666fcd8e84289525961fe6e68b9#diff-3b9dc5e3c18be9549b0ab00763e4e123
Jorge
Have you seen this issue in Xenial or later releases of the RabbitMQ server package? Xenial included a 3.5.x series rabbitmq which should contain the fixes referenced.
Note that we also provide this version of RMQ for trusty uses via the Icehouse Ubuntu Cloud Archive.