Comment 3 for bug 1725883

Revision history for this message
yaofenghua (ydqnyfh) wrote : Re: Re: [Bug 1725883] Re: cluster lock can not be release

 a260a9f6-33e0-44b8-90e7-5cd29889a2dc | ["68b18716-a104-4ca1-9022-af1c45a97743"] | -1 |
the cluster is lock by action 68b18716-a104-4ca1-9022-af1c45a97743 ,this action is FAILED, the lock is not released, so the later scale_out action is running all the time, it can not get the lock.

    祝 好 !
********************************************************************
姚封华
中移(苏州)软件技术有限公司
中国移动苏州研发中心
电话:18896725051
邮箱:<email address hidden>********************************************************************

From: Qiming Teng
Date: 2017-10-23 09:31
To: yaofenghua
Subject: Re: [Bug 1725883] Re: cluster lock can not be release
Looks like the cluster-scale-out operations are forever running.
When these operations/actions are running, cluster is locked for sure.
Please check the reason why CLUSTER_SCALE_OUT is running for so long.
On Sun, Oct 22, 2017 at 08:02:15AM -0000, yaofenghua wrote:
> ccfdb158 | webhook_76631dc0 | CLUSTER_SCALE_OUT | FAILED | a260a9f6 | | | 2017-10-22T07:52:05Z |
> | 68b18716 | webhook_76631dc0 | CLUSTER_SCALE_OUT | FAILED | a260a9f6 | | | 2017-10-22T07:52:35Z |
> | c3be7de2 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:53:05Z |
> | dc191a32 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:53:35Z |
> | 866ead83 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:54:05Z |
> | 87f3d245 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:54:36Z |
> | b0bbd9fa | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:55:06Z |
> | ae89a892 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:55:37Z |
> | 127fcc26 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:56:07Z |
> | 11bc01c5 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:56:37Z |
> | 66071adb | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:57:07Z |
> | bbcc478c | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:57:38Z |
> | 4868f63a | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:58:10Z |
> | cff6c1a6 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:58:40Z |
> | 70d777f9 | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:59:09Z |
> | 146ec5bc | webhook_76631dc0 | CLUSTER_SCALE_OUT | RUNNING | a260a9f6 | | | 2017-10-22T07:59:41Z |
> +----------+----------------------------+-----------------------+-----------+-----------+------------+-------------+----------------------+
> I met this problem again
>
> mysql> select * from cluster_lock;
> +--------------------------------------+------------------------------------------+-----------+
> | cluster_id | action_ids | semaphore |
> +--------------------------------------+------------------------------------------+-----------+
> | a260a9f6-33e0-44b8-90e7-5cd29889a2dc | ["68b18716-a104-4ca1-9022-af1c45a97743"] | -1 |
> +--------------------------------------+------------------------------------------+-----------+
>
> this lock can not be release
>
> --
> You received this bug notification because you are subscribed to senlin.
> https://bugs.launchpad.net/bugs/1725883
>
> Title:
> cluster lock can not be release
>
> Status in senlin:
> New
>
> Bug description:
> senlin/ocata
>
> when a cluster was locked by some action
>
> mysql> select * from cluster_lock;
> +--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
> | cluster_id | action_ids | semaphore |
> +--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
> | 5e72360f-481b-49b1-9ada-0ce029fcb8f9 | ["313daf5e-4ffd-4ebb-8dc6-3bd44b04d3f2", "9a2a31f1-a1fc-43eb-a16c-5ce58f760efd", "ce79de11-b7d0-4491-a9ae-b85367d2634c", "207638a7-df87-4561-a872-53bd565b75aa"] | 4 |
> +--------------------------------------+-----------------------------------------------------------
>
> this lock can not be released , the later action will grap clock ,but failed ,if this cluster has a health check, it will be
> | e2b972dc | detach_policy_5e72360f | CLUSTER_DETACH_POLICY | RUNNING | 5e72360f | | | 2017-10-22T00:41:39Z |
> | 3eb530e9 | detach_policy_5e72360f | CLUSTER_DETACH_POLICY | RUNNING | 5e72360f | | | 2017-10-22T01:10:31Z |
> | 0c5704f2 | detach_policy_5e72360f | CLUSTER_DETACH_POLICY | RUNNING | 5e72360f | | | 2017-10-22T01:12:09Z |
> I simulation this Scenes by CLUSTER_DETACH_POLICY action
>
> if this is a check action,these action will increase by time,senlin-
> engine pressure is getting bigger and bigger
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 4051 admin 20 0 261732 103780 5420 R 26.8 0.6 35:57.72 senlin-engine
> 13411 rabbitmq 20 0 5374964 911292 2580 S 7.0 5.5 1509:29 beam.smp
> 2525 nova 20 0 506244 97080 2336 S 5.0 0.6 1694:25 nova-conductor
> 7798 influxdb 20 0 1209724 174092 7336 S 5.0 1.1 217:16.46 influxd
>
> at last,all action will timeout. I think senlin should has a function
> to release the cluster lock
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/senlin/+bug/1725883/+subscriptions
>

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1725883

Title:
  cluster lock can not be release

Status in senlin:
  New

Bug description:
  senlin/ocata

  when a cluster was locked by some action

  mysql> select * from cluster_lock;
  +--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
  | cluster_id | action_ids | semaphore |
  +--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
  | 5e72360f-481b-49b1-9ada-0ce029fcb8f9 | ["313daf5e-4ffd-4ebb-8dc6-3bd44b04d3f2", "9a2a31f1-a1fc-43eb-a16c-5ce58f760efd", "ce79de11-b7d0-4491-a9ae-b85367d2634c", "207638a7-df87-4561-a872-53bd565b75aa"] | 4 |
  +--------------------------------------+-----------------------------------------------------------

  this lock can not be released , the later action will grap clock ,but failed ,if this cluster has a health check, it will be
  | e2b972dc | detach_policy_5e72360f | CLUSTER_DETACH_POLICY | RUNNING | 5e72360f | | | 2017-10-22T00:41:39Z |
  | 3eb530e9 | detach_policy_5e72360f | CLUSTER_DETACH_POLICY | RUNNING | 5e72360f | | | 2017-10-22T01:10:31Z |
  | 0c5704f2 | detach_policy_5e72360f | CLUSTER_DETACH_POLICY | RUNNING | 5e72360f | | | 2017-10-22T01:12:09Z |
  I simulation this Scenes by CLUSTER_DETACH_POLICY action

  if this is a check action,these action will increase by time,senlin-
  engine pressure is getting bigger and bigger

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
   4051 admin 20 0 261732 103780 5420 R 26.8 0.6 35:57.72 senlin-engine
  13411 rabbitmq 20 0 5374964 911292 2580 S 7.0 5.5 1509:29 beam.smp
   2525 nova 20 0 506244 97080 2336 S 5.0 0.6 1694:25 nova-conductor
   7798 influxdb 20 0 1209724 174092 7336 S 5.0 1.1 217:16.46 influxd

  at last,all action will timeout. I think senlin should has a function
  to release the cluster lock

To manage notifications about this bug go to:
https://bugs.launchpad.net/senlin/+bug/1725883/+subscriptions