fuel-ccp

RPC component was down by docker grpc error?.

Bug #1696675 reported by suzuki on 2017-06-08

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	fuel-ccp	New	Undecided	Unassigned

Bug Description

Hi everyone,

Sometimes, RPC container seems to be stopped after running properly few hours.
I guess this issue is caused by docker error.
Have anyone got similar issue?

I am using Docker v1.13.1

I'm trying to deploy each component one by one.
The following components have been started.

# ccp status
+------------------+-----+-------+-------+----------------------+
| service | pod | job | ready | links |
+------------------+-----+-------+-------+----------------------+
| database | 3/3 | 0/0 | ok | http://0.0.0.0:32215 |
| etcd | 1/1 | 0/0 | ok | http://0.0.0.0:30070 |
| | | | | http://0.0.0.0:32183 |
| keystone | 1/1 | 7/7 | ok | http://0.0.0.0:30372 |
| | | | | http://0.0.0.0:30397 |
| memcached | 1/1 | 0/0 | ok | http://0.0.0.0:30816 |
| notifications | 1/1 | 0/0 | ok | http://0.0.0.0:31065 |
| nova-api | 1/1 | 20/20 | ok | http://0.0.0.0:31636 |
| | | | | http://0.0.0.0:31466 |
| nova-conductor | 1/1 | 0/0 | ok | |
| nova-consoleauth | 1/1 | 0/0 | ok | |
| nova-novncproxy | 1/1 | 0/0 | ok | http://0.0.0.0:31647 |
| rpc | 3/3 | 0/0 | ok | http://0.0.0.0:31684 |
+------------------+-----+-------+-------+----------------------+

But after few hours, 1 rpc component stopped as below.

# ccp status
+------------------+-----+-------+-------+----------------------+
| service | pod | job | ready | links |
+------------------+-----+-------+-------+----------------------+
| database | 3/3 | 0/0 | ok | http://0.0.0.0:32215 |
| etcd | 1/1 | 0/0 | ok | http://0.0.0.0:30070 |
| | | | | http://0.0.0.0:32183 |
| keystone | 1/1 | 7/7 | ok | http://0.0.0.0:30372 |
| | | | | http://0.0.0.0:30397 |
| memcached | 1/1 | 0/0 | ok | http://0.0.0.0:30816 |
| notifications | 1/1 | 0/0 | ok | http://0.0.0.0:31065 |
| nova-api | 1/1 | 20/20 | ok | http://0.0.0.0:31636 |
| | | | | http://0.0.0.0:31466 |
| nova-conductor | 1/1 | 0/0 | ok | |
| nova-consoleauth | 1/1 | 0/0 | ok | |
| nova-novncproxy | 1/1 | 0/0 | ok | http://0.0.0.0:31647 |
| rpc | 2/3 | 0/0 | wip | http://0.0.0.0:31684 |
+------------------+-----+-------+-------+----------------------+

It shows a part of the result of executing the command "kubectl -n ccp get pod"

# kubectl -n ccp get pod
NAME READY STATUS RESTARTS AGE
*snip*
rpc-1937807526-082sl 0/1 Running 45 15h
rpc-1937807526-4fd75 1/1 Running 0 15h
rpc-1937807526-z70f3 1/1 Running 0 15h
*snip*

I checked log by the command "kubectl -n ccp logs rpc-1937807526-082sl".
This log shows that it is not updated after "2017-06-07 20:57:39".
"2017-06-07 20:57:39" means 2017-06-08 05:57:39(JST).

# kubectl -n ccp logs rpc-1937807526-082sl
*snip*
[readiness:29629] DIAGNOSTICS
[readiness:29629] ===========
[readiness:29629]
[readiness:29629] attempted to contact: ['rabbit@172.30.31.6'] [readiness:29629] [readiness:29629] rabbit@172.30.31.6:
[readiness:29629] * connected to epmd (port 4369) on 172.30.31.6
[readiness:29629] * node rabbit@172.30.31.6 up, 'rabbit' application running
[readiness:29629]
[readiness:29629] current node details:
[readiness:29629] - node name: '<email address hidden>'
[readiness:29629] - home dir: .
[readiness:29629] - cookie hash: cPeI/H+zjaqvVesQ1Kjqew== [readiness:29629] Ready to return 0 [readiness:30489] Starting readiness probe at 2017-06-07 20:55:01 [liveness:30077] Ready to return 0 [liveness:30719] Starting liveness probe at 2017-06-07 20:55:30 [readiness:30489] Ready to return 0 [readiness:31304] Starting readiness probe at 2017-06-07 20:57:08 [liveness:30719] Ready to return 0 [readiness:31691] Starting readiness probe at 2017-06-07 20:57:39 [readiness:31304] Ready to return 0 [readiness:31691] Ready to return 0

And I checked syslog.
then I found the docker error as below,

Jun 8 05:57:08 cent-ccp01 dockerd: time="2017-06-08T05:57:08.169566287+09:00" level=error msg="Error running exec in container: rpc error: code = 14 desc = grpc: the connection is unavailable"
Jun 8 05:57:08 cent-ccp01 dockerd: time="2017-06-08T05:57:08.369724051+09:00" level=error msg="Error running exec in container: rpc error: code = 14 desc = grpc: the connection is unavailable"
Jun 8 05:57:08 cent-ccp01 dockerd: time="2017-06-08T05:57:08.369886922+09:00" level=error msg="Error running exec in container: rpc error: code = 14 desc = grpc: the connection is unavailable"

This docker error occurred at the same timing as RPC component log stopped.

I guess RPC container was down by docker error.
Have you seen similar issue ?

Thanks.

See original description

Tags:

suzuki (suzuki.thi) on 2017-06-08

description:

updated

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.