[Description]
On a 3 nodes rabbitMQ cluster after a failure on one of the nodes (kernel crash) the following succession of events has been registered:
=== This initial crash report
$ ack-grep -i mnesia_locker | wc -l 16
60-=CRASH REPORT==== 7-Sep-2015::14:48:45 === 61- crasher: -- 66: {mnesia_locker,'rabbit@10-10-34-2',granted}} 67- in function gen_server2:terminate/3 (src/gen_server2.erl, line 1045) 68- ancestors: [worker_pool_sup,rabbit_sup,<0.110.0>] 69- messages: [] 70- links: [<0.114.0>] 71- dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct}, 72- {{xtype_to_module,topic},rabbit_exchange_type_topic}, 73- {random_seed,{5643,25632,27953}}, 74- {{xtype_to_module,fanout},rabbit_exchange_type_fanout}, 75- {worker_pool_worker,true}, 76- {fhc_age_tree,{0,nil}}, -- 88: Reason: {unexpected_info,{mnesia_locker,'rabbit@10-10-34-2',granted}} 89- Offender: [{pid,<0.142.0>}, 90- {name,25}, 91- {mfargs,{worker_pool_worker,start_link,[25]}}, 92- {restart_type,transient}, 93- {shutdown,4294967295}, 94- {child_type,worker}]
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1.log 5166:Mnesia('rabbit@10-10-34-1'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@10-10-34-2'}
sosreport-svz-op-fdc-os-3.00087142-20150909140820/var/log/rabbitmq/rabbit@10-10-34-3.log.1 545:Mnesia('rabbit@10-10-34-3'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@10-10-34-1'}
After this I started seeing this traces:
sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log:572:=CRASH REPORT==== 7-Sep-2015::14:48:45 === sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log:573: crasher: sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-574- initial call: gen:init_it/6 sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-575- pid: <0.2346.189> sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-576- registered_name: [] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-577- exception exit: {{unexpected_cast,{next_job_from,<0.4960.181>}}, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-578- {gen_server2,call, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-579- [<0.22014.255>, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-580- {submit,#Fun<rabbit_misc.6.25154013>,<0.2862.189>}, sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-581- infinity]}} sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-582- in function gen_server2:terminate/3 (src/gen_server2.erl, line 1045) sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-583- ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.110.0>] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-584- messages: [] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-585- links: [<0.259.0>] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-586- dictionary: [{guid,{{3735536962,2437967587,3023977752,60868675},0}}] sosreport-svz-op-fdc-os-1.00087142-20150909104806/var/log/rabbitmq/rabbit@10-10-34-1-sasl.log-587- trap_exit: true
Related Bugs:
- https://bugs.launchpad.net/mos/+bug/1401948 - https://github.com/erlang/otp/compare/maint...dgud:dgud%3Bmnesia%3Bsticky-race%3BOTP-11375
Possible related upstream commit:
- http://hg.rabbitmq.com/rabbitmq-server/rev/5a63c9e273cc
[Description]
On a 3 nodes rabbitMQ cluster after a failure on one of the nodes (kernel crash) the following succession
of events has been registered:
===
This initial crash report
$ ack-grep -i mnesia_locker | wc -l
16
60-=CRASH REPORT==== 7-Sep-2015: :14:48: 45 === locker, 'rabbit@ 10-10-34- 2',granted} } terminate/ 3 (src/gen_ server2. erl, line 1045) pool_sup, rabbit_ sup,<0. 110.0>] to_module, direct} ,rabbit_ exchange_ type_direct} , to_module, topic}, rabbit_ exchange_ type_topic} , seed,{5643, 25632,27953} }, to_module, fanout} ,rabbit_ exchange_ type_fanout} , pool_worker, true}, tree,{0, nil}}, info,{mnesia_ locker, 'rabbit@ 10-10-34- 2',granted} } {worker_ pool_worker, start_link, [25]}}, type,transient} , 4294967295} , type,worker} ]
61- crasher:
--
66: {mnesia_
67- in function gen_server2:
68- ancestors: [worker_
69- messages: []
70- links: [<0.114.0>]
71- dictionary: [{{xtype_
72- {{xtype_
73- {random_
74- {{xtype_
75- {worker_
76- {fhc_age_
--
88: Reason: {unexpected_
89- Offender: [{pid,<0.142.0>},
90- {name,25},
91- {mfargs,
92- {restart_
93- {shutdown,
94- {child_
sosreport- svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1.log 'rabbit@ 10-10-34- 1'): ** ERROR ** mnesia_event got {inconsistent_ database, starting_ partitioned_ network, 'rabbit@ 10-10-34- 2'}
5166:Mnesia(
sosreport- svz-op- fdc-os- 3.00087142- 20150909140820/ var/log/ rabbitmq/ rabbit@ 10-10-34- 3.log.1 'rabbit@ 10-10-34- 3'): ** ERROR ** mnesia_event got {inconsistent_ database, starting_ partitioned_ network, 'rabbit@ 10-10-34- 1'}
545:Mnesia(
After this I started seeing this traces:
sosreport- svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log:572: =CRASH REPORT==== 7-Sep-2015: :14:48: 45 === svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log:573: crasher: svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-574- initial call: gen:init_it/6 svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-575- pid: <0.2346.189> svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-576- registered_name: [] svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-577- exception exit: {{unexpected_ cast,{next_ job_from, <0.4960. 181>}}, svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-578- {gen_server2,call, svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-579- [<0.22014.255>, svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-580- {submit, #Fun<rabbit_ misc.6. 25154013> ,<0.2862. 189>}, svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-581- infinity]}} svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-582- in function gen_server2: terminate/ 3 (src/gen_ server2. erl, line 1045) svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-583- ancestors: [rabbit_ mirror_ queue_slave_ sup,rabbit_ sup,<0. 110.0>] svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-584- messages: [] svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-585- links: [<0.259.0>] svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-586- dictionary: [{guid, {{3735536962, 2437967587, 3023977752, 60868675} ,0}}] svz-op- fdc-os- 1.00087142- 20150909104806/ var/log/ rabbitmq/ rabbit@ 10-10-34- 1-sasl. log-587- trap_exit: true
sosreport-
sosreport-
sosreport-
sosreport-
sosreport-
sosreport-
sosreport-
sosreport-
sosreport-
sosreport-
sosreport-
sosreport-
sosreport-
sosreport-
sosreport-
Related Bugs:
- https:/ /bugs.launchpad .net/mos/ +bug/1401948 /github. com/erlang/ otp/compare/ maint.. .dgud:dgud% 3Bmnesia% 3Bsticky- race%3BOTP- 11375
- https:/
Possible related upstream commit:
- http:// hg.rabbitmq. com/rabbitmq- server/ rev/5a63c9e273c c