cluster lock can not be release

Bug #1725883 reported by yaofenghua
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
senlin
Fix Committed
Undecided
Duc Truong

Bug Description

senlin/ocata

when a cluster was locked by some action

mysql> select * from cluster_lock;
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| cluster_id | action_ids | semaphore |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| 5e72360f-481b-49b1-9ada-0ce029fcb8f9 | ["313daf5e-4ffd-4ebb-8dc6-3bd44b04d3f2", "9a2a31f1-a1fc-43eb-a16c-5ce58f760efd", "ce79de11-b7d0-4491-a9ae-b85367d2634c", "207638a7-df87-4561-a872-53bd565b75aa"] | 4 |
+--------------------------------------+-----------------------------------------------------------

this lock can not be released , the later action will grap clock ,but failed ,if this cluster has a health check, it will be
| e2b972dc | detach_policy_5e72360f | CLUSTER_DETACH_POLICY | RUNNING | 5e72360f | | | 2017-10-22T00:41:39Z |
| 3eb530e9 | detach_policy_5e72360f | CLUSTER_DETACH_POLICY | RUNNING | 5e72360f | | | 2017-10-22T01:10:31Z |
| 0c5704f2 | detach_policy_5e72360f | CLUSTER_DETACH_POLICY | RUNNING | 5e72360f | | | 2017-10-22T01:12:09Z |
I simulation this Scenes by CLUSTER_DETACH_POLICY action

if this is a check action,these action will increase by time,senlin-engine pressure is getting bigger and bigger

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 4051 admin 20 0 261732 103780 5420 R 26.8 0.6 35:57.72 senlin-engine
13411 rabbitmq 20 0 5374964 911292 2580 S 7.0 5.5 1509:29 beam.smp
 2525 nova 20 0 506244 97080 2336 S 5.0 0.6 1694:25 nova-conductor
 7798 influxdb 20 0 1209724 174092 7336 S 5.0 1.1 217:16.46 influxd

at last,all action will timeout. I think senlin should has a function to release the cluster lock

Revision history for this message
yaofenghua (ydqnyfh) wrote :

ccfdb158 | webhook_76631dc0 | CLUSTER_SCALE_OUT | FAILED | a260a9f6 | | | 2017-10-22T07:52:05Z |
| 68b18716 | webhook_76631dc0 | CLUSTER_SCALE_OUT | FAILED | a260a9f6 | | | 2017-10-22T07:52:35Z |
| c3be7de2 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:53:05Z |
| dc191a32 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:53:35Z |
| 866ead83 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:54:05Z |
| 87f3d245 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:54:36Z |
| b0bbd9fa | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:55:06Z |
| ae89a892 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:55:37Z |
| 127fcc26 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:56:07Z |
| 11bc01c5 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:56:37Z |
| 66071adb | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:57:07Z |
| bbcc478c | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:57:38Z |
| 4868f63a | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:58:10Z |
| cff6c1a6 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:58:40Z |
| 70d777f9 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:59:09Z |
| 146ec5bc | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:59:41Z |
+----------+----------------------------+-----------------------+-----------+-----------+------------+-------------+----------------------+
I met this problem again

mysql> select * from cluster_lock;
+--------------------------------------+------------------------------------------+-----------+
| cluster_id | action_ids | semaphore |
+--------------------------------------+------------------------------------------+-----------+
| a260a9f6-33e0-44b8-90e7-5cd29889a2dc | ["68b18716-a104-4ca1-9022-af1c45a97743"] | -1 |
+--------------------------------------+------------------------------------------+-----------+

this lock can not be release

Revision history for this message
Qiming Teng (tengqim) wrote : Re: [Bug 1725883] Re: cluster lock can not be release
Download full text (5.7 KiB)

Looks like the cluster-scale-out operations are forever running.
When these operations/actions are running, cluster is locked for sure.
Please check the reason why CLUSTER_SCALE_OUT is running for so long.
On Sun, Oct 22, 2017 at 08:02:15AM -0000, yaofenghua wrote:
> ccfdb158 | webhook_76631dc0 | CLUSTER_SCALE_OUT | FAILED | a260a9f6 | | | 2017-10-22T07:52:05Z |
> | 68b18716 | webhook_76631dc0 | CLUSTER_SCALE_OUT | FAILED | a260a9f6 | | | 2017-10-22T07:52:35Z |
> | c3be7de2 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:53:05Z |
> | dc191a32 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:53:35Z |
> | 866ead83 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:54:05Z |
> | 87f3d245 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:54:36Z |
> | b0bbd9fa | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:55:06Z |
> | ae89a892 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:55:37Z |
> | 127fcc26 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:56:07Z |
> | 11bc01c5 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:56:37Z |
> | 66071adb | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:57:07Z |
> | bbcc478c | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:57:38Z |
> | 4868f63a | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:58:10Z |
> | cff6c1a6 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:58:40Z |
> | 70d777f9 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:59:09Z |
> | 146ec5bc | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:59:41Z |
> +----------+----------------------------+-----------------------+-----------+-----------+------------+-------------+----------------------+
> I met this problem again
>
> mysql> select * from cluster_lock;
> +--------------------------------------+------------------------------------------+-----------+
> | cluster_id | action_ids | semaphore |
> +--------------------------------------+------------------------------------------+-----------+
> | a260a9f6-33e0-44b8-90e7-5cd29889a2dc | ["68b18716-a104-4ca1-9022-af1c45a97743"] | -1 |
> +--------------------------------------+-------------------...

Read more...

Revision history for this message
yaofenghua (ydqnyfh) wrote : Re: Re: [Bug 1725883] Re: cluster lock can not be release
Download full text (8.8 KiB)

 a260a9f6-33e0-44b8-90e7-5cd29889a2dc | ["68b18716-a104-4ca1-9022-af1c45a97743"] | -1 |
the cluster is lock by action 68b18716-a104-4ca1-9022-af1c45a97743 ,this action is FAILED, the lock is not released, so the later scale_out action is running all the time, it can not get the lock.

    祝 好 !
********************************************************************
姚封华
中移(苏州)软件技术有限公司
中国移动苏州研发中心
电话:18896725051
邮箱:<email address hidden>********************************************************************

From: Qiming Teng
Date: 2017-10-23 09:31
To: yaofenghua
Subject: Re: [Bug 1725883] Re: cluster lock can not be release
Looks like the cluster-scale-out operations are forever running.
When these operations/actions are running, cluster is locked for sure.
Please check the reason why CLUSTER_SCALE_OUT is running for so long.
On Sun, Oct 22, 2017 at 08:02:15AM -0000, yaofenghua wrote:
> ccfdb158 | webhook_76631dc0 | CLUSTER_SCALE_OUT | FAILED | a260a9f6 | | | 2017-10-22T07:52:05Z |
> | 68b18716 | webhook_76631dc0 | CLUSTER_SCALE_OUT | FAILED | a260a9f6 | | | 2017-10-22T07:52:35Z |
> | c3be7de2 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:53:05Z |
> | dc191a32 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:53:35Z |
> | 866ead83 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:54:05Z |
> | 87f3d245 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:54:36Z |
> | b0bbd9fa | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:55:06Z |
> | ae89a892 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:55:37Z |
> | 127fcc26 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:56:07Z |
> | 11bc01c5 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:56:37Z |
> | 66071adb | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:57:07Z |
> | bbcc478c | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:57:38Z |
> | 4868f63a | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:58:10Z |
> | cff6c1a6 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:58:40Z |
> | 70d777f9 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:59:09Z |
> | 146ec5bc | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T0...

Read more...

Revision history for this message
Qiming Teng (tengqim) wrote :

Can we know if the bug is still there?

Revision history for this message
Duc Truong (dtruong) wrote :

There have been several fixes made to ensure that cluster locks are releaed. Also, there is a action update API to cancel actions how. So this bug should no longer happen.

Changed in senlin:
status: New → Fix Committed
assignee: nobody → Duc Truong (dtruong)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.