RPC component was down by docker grpc error?.

Bug #1696675 reported by suzuki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
fuel-ccp
New
Undecided
Unassigned

Bug Description

Hi everyone,

Sometimes, RPC container seems to be stopped after running properly few hours.
I guess this issue is caused by docker error.
Have anyone got similar issue?

I am using Docker v1.13.1

I'm trying to deploy each component one by one.
The following components have been started.

# ccp status
+------------------+-----+-------+-------+----------------------+
| service | pod | job | ready | links |
+------------------+-----+-------+-------+----------------------+
| database | 3/3 | 0/0 | ok | http://0.0.0.0:32215 |
| etcd | 1/1 | 0/0 | ok | http://0.0.0.0:30070 |
| | | | | http://0.0.0.0:32183 |
| keystone | 1/1 | 7/7 | ok | http://0.0.0.0:30372 |
| | | | | http://0.0.0.0:30397 |
| memcached | 1/1 | 0/0 | ok | http://0.0.0.0:30816 |
| notifications | 1/1 | 0/0 | ok | http://0.0.0.0:31065 |
| nova-api | 1/1 | 20/20 | ok | http://0.0.0.0:31636 |
| | | | | http://0.0.0.0:31466 |
| nova-conductor | 1/1 | 0/0 | ok | |
| nova-consoleauth | 1/1 | 0/0 | ok | |
| nova-novncproxy | 1/1 | 0/0 | ok | http://0.0.0.0:31647 |
| rpc | 3/3 | 0/0 | ok | http://0.0.0.0:31684 |
+------------------+-----+-------+-------+----------------------+

But after few hours, 1 rpc component stopped as below.

# ccp status
+------------------+-----+-------+-------+----------------------+
| service | pod | job | ready | links |
+------------------+-----+-------+-------+----------------------+
| database | 3/3 | 0/0 | ok | http://0.0.0.0:32215 |
| etcd | 1/1 | 0/0 | ok | http://0.0.0.0:30070 |
| | | | | http://0.0.0.0:32183 |
| keystone | 1/1 | 7/7 | ok | http://0.0.0.0:30372 |
| | | | | http://0.0.0.0:30397 |
| memcached | 1/1 | 0/0 | ok | http://0.0.0.0:30816 |
| notifications | 1/1 | 0/0 | ok | http://0.0.0.0:31065 |
| nova-api | 1/1 | 20/20 | ok | http://0.0.0.0:31636 |
| | | | | http://0.0.0.0:31466 |
| nova-conductor | 1/1 | 0/0 | ok | |
| nova-consoleauth | 1/1 | 0/0 | ok | |
| nova-novncproxy | 1/1 | 0/0 | ok | http://0.0.0.0:31647 |
| rpc | 2/3 | 0/0 | wip | http://0.0.0.0:31684 |
+------------------+-----+-------+-------+----------------------+

It shows a part of the result of executing the command "kubectl -n ccp get pod"

# kubectl -n ccp get pod
NAME READY STATUS RESTARTS AGE
 *snip*
rpc-1937807526-082sl 0/1 Running 45 15h
rpc-1937807526-4fd75 1/1 Running 0 15h
rpc-1937807526-z70f3 1/1 Running 0 15h
 *snip*

I checked log by the command "kubectl -n ccp logs rpc-1937807526-082sl".
This log shows that it is not updated after "2017-06-07 20:57:39".
"2017-06-07 20:57:39" means 2017-06-08 05:57:39(JST).

# kubectl -n ccp logs rpc-1937807526-082sl
 *snip*
[readiness:29629] DIAGNOSTICS
[readiness:29629] ===========
[readiness:29629]
[readiness:29629] attempted to contact: ['rabbit@172.30.31.6'] [readiness:29629] [readiness:29629] rabbit@172.30.31.6:
[readiness:29629] * connected to epmd (port 4369) on 172.30.31.6
[readiness:29629] * node rabbit@172.30.31.6 up, 'rabbit' application running
[readiness:29629]
[readiness:29629] current node details:
[readiness:29629] - node name: '<email address hidden>'
[readiness:29629] - home dir: .
[readiness:29629] - cookie hash: cPeI/H+zjaqvVesQ1Kjqew== [readiness:29629] Ready to return 0 [readiness:30489] Starting readiness probe at 2017-06-07 20:55:01 [liveness:30077] Ready to return 0 [liveness:30719] Starting liveness probe at 2017-06-07 20:55:30 [readiness:30489] Ready to return 0 [readiness:31304] Starting readiness probe at 2017-06-07 20:57:08 [liveness:30719] Ready to return 0 [readiness:31691] Starting readiness probe at 2017-06-07 20:57:39 [readiness:31304] Ready to return 0 [readiness:31691] Ready to return 0

And I checked syslog.
then I found the docker error as below,

Jun 8 05:57:08 cent-ccp01 dockerd: time="2017-06-08T05:57:08.169566287+09:00" level=error msg="Error running exec in container: rpc error: code = 14 desc = grpc: the connection is unavailable"
Jun 8 05:57:08 cent-ccp01 dockerd: time="2017-06-08T05:57:08.369724051+09:00" level=error msg="Error running exec in container: rpc error: code = 14 desc = grpc: the connection is unavailable"
Jun 8 05:57:08 cent-ccp01 dockerd: time="2017-06-08T05:57:08.369886922+09:00" level=error msg="Error running exec in container: rpc error: code = 14 desc = grpc: the connection is unavailable"

This docker error occurred at the same timing as RPC component log stopped.

I guess RPC container was down by docker error.
Have you seen similar issue ?

Thanks.

Tags: docker rpc
suzuki (suzuki.thi)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.