Mirantis OpenStack

Platform system tests failed with timeout

Bug #1442206 reported by Timur Nurlygayanov on 2015-04-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mirantis OpenStack	Fix Released	Critical	Timur Nurlygayanov	Mirantis OpenStack 6.1
	6.0.x	Fix Released	Critical	Timur Nurlygayanov	Mirantis OpenStack 6.0-updates

Bug Description

We can see that system tests for platform components failed with the timeout error, example:
http://jenkins-product.srt.mirantis.net:8080/view/6.1_swarm/job/6.1.system_test.ubuntu.services_ha/87/console

we can see that the timeout is 600 minutes for all system tests on Ubuntu HA configurations (for different components) and looks like we need to increase this timeout to 1000 minutes.

Timur Nurlygayanov (tnurlygayanov) on 2015-04-10

Changed in mos:
assignee:	nobody → Timur Nurlygayanov (tnurlygayanov)
importance:	Undecided → Critical
milestone:	none → 6.1
status:	New → Confirmed
status:	Confirmed → Fix Committed
status:	Fix Committed → Fix Released
status:	Fix Released → Confirmed

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2015-04-10:

I have checked how we use te time during the tests and have found that most of the time were lost on:

1 hour 20 minutes - deployment of OpenStack cluster with Sahara
1 hour - execution of OSTF tests for Sahara
1 hour 9 minutes - deployment of OpenStack cluster with Murano
1 hour 12 minutes - deployment of OpenStack cluster with Ceilometer #1
1 hour 10 minutes - deployment of OpenStack cluster with Ceilometer #2

So, as we can see we have added new system tests and these tests requires more time for execution, it means that we need just increase the timeout for the jenkins job with the system tests for this type of tests.

We also need to improve the configuration of compute node on the test environment because execution of Sahara OSTF tests takes 11 hour and it is too long.

Changed in mos:
assignee:	Timur Nurlygayanov (tnurlygayanov) → Fuel DevOps (fuel-devops)

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2015-04-10:

And we also can see that we have VMs with 1 vCPU for OpenStack nodes:

tnurlygayanov@srv36-bud:~$ virsh dumpxml 6.1.system_test.ubuntu.services_ha.88.2015-04-10_02-00-45_slave-01 | grep cpu
  <vcpu placement='static'>1</vcpu>
  <cpu mode='host-model'>
  </cpu>

We need to increase these parameters to at least 2 vCPU for each node.

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2015-04-10:

We need to set SLAVE_NODE_CPU=2 for all services jobs and set timeout to 1200 minutes.

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2015-04-10:

Fixed in https://review.fuel-infra.org/5600

Timur Nurlygayanov (tnurlygayanov) on 2015-04-10

Changed in mos:
status:	Confirmed → Fix Committed
status:	Fix Committed → In Progress

Aleksandra Fedorova (bookwar) on 2015-04-10

Changed in mos:
assignee:	Fuel DevOps (fuel-devops) → Aleksandra Fedorova (afedorova)

Timur Nurlygayanov (tnurlygayanov) on 2015-04-10

Changed in mos:
assignee:	Aleksandra Fedorova (afedorova) → Timur Nurlygayanov (tnurlygayanov)

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2015-04-13:

Fixed, verified and looks like it works :)

Changed in mos:
status:	In Progress → Fix Released

Revision history for this message

Dennis Dmitriev (ddmitriev) wrote on 2015-04-17:

Reproduced again on CI:

http://jenkins-product.srt.mirantis.net:8080/job/6.1.system_test.centos.services_ha_one_controller/72/testReport/%28root%29/deploy_sahara_ha_one_controller_gre/deploy_sahara_ha_one_controller_gre/?

Changed in mos:
status:	Fix Released → Confirmed

Revision history for this message

Dennis Dmitriev (ddmitriev) wrote on 2015-04-17:

fail_error_deploy_sahara_ha_one_controller_gre-2015_04_17__03_39_59.tar.xz Edit (31.0 MiB, application/octet-stream)

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2015-04-17:

Hi Denis,

the issue was successfully solved and the issues which aoy are mentioned - it is another known issue: https://bugs.launchpad.net/mos/+bug/1443360 - it is in progress now.

Changed in mos:
status:	Confirmed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

fail_error_deploy_sahara_ha_one_controller_gre-2015_04_17__03_39_59.tar.xz Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.