Bug #1828228 “corosync fails to start in unprivileged containers...” : Bugs : Auto Package Testing

Revision history for this message

Robie Basak (racb) wrote on 2019-05-09:

#1

Am I right in thinking that the limits being too low are causing false positives in autopkgtests?

If so, we could check the limits in the test themselves and skip (exit 77 and declare "skippable") if on armhf and the limits aren't high enough. That's a reasonable action for the packages, I think.

Changed in corosync (Ubuntu):
status:	New → Triaged
Changed in pacemaker (Ubuntu):
status:	New → Triaged
Changed in corosync (Ubuntu):
importance:	Undecided → Medium
Changed in pacemaker (Ubuntu):
importance:	Undecided → Medium

Rafael David Tinoco (rafaeldtinoco) on 2019-07-04

tags:	added: ubuntu-ha
Changed in corosync (Ubuntu):
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in pacemaker (Ubuntu):
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)

Rafael David Tinoco (rafaeldtinoco) on 2019-07-04

Changed in corosync (Ubuntu):
assignee:	Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in pacemaker (Ubuntu):
assignee:	Rafael David Tinoco (rafaeldtinoco) → nobody
tags:	removed: ubuntu-ha

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-08:

#2

I assigned to myself to address comment #1 from Robie and try to bump needed values from the test itself. I'll test in an armhf environment just to make sure its good. This will unblock:

https://people.canonical.com/~ubuntu-archive/proposed-migration/update_excuses_by_team.html#ubuntu-server

pacemaker (1.1.18-2ubuntu1 to 2.0.1-4ubuntu1) in proposed for 56 days
- pacemaker/2.0.1-4ubuntu1: armhf (log, history)

And as soon as corosync is unblocked because of libknet1 MIR, we will be good for corosync and pacemaker.

Changed in pacemaker (Ubuntu):
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in corosync (Ubuntu):
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-15:

#3

I flagged this as high as this is impacting pacemaker migration. After this being fixed, corosync (depending on libknet1 will still block migration, but thas has already been address in MIR https://bugs.launchpad.net/ubuntu/+source/kronosnet/+bug/1811139).

I'm working on this now.

Changed in corosync (Ubuntu):
importance:	Medium → High
Changed in pacemaker (Ubuntu):
importance:	Medium → High
Changed in corosync (Ubuntu):
status:	Triaged → In Progress
Changed in pacemaker (Ubuntu):
status:	Triaged → In Progress

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-15:

#4

Download full text (3.3 KiB)

Hello Dimitri,

I tried to reproduce the same behaviour using default LXC containers in real HW (ARMv8 - ARMHF containers) and wasn't able to.

Nevertheless, I was able to cause corosync not to start due to failed mlock() calls:

main.log:Jul 15 18:27:57 [2386] hasid01 corosync warning [MAIN ] main.c:corosync_mlockall:481 Could not lock memory of service to avoid page faults: Operation not permitted (1)
main.log:Jul 15 18:27:57 [2386] hasid01 corosync error [MAIN ] main.c:corosync_flock:1087 Corosync Executive couldn't create lock file.

when I made mlock soft/hard limit to be 0 for "hacluster/haclient" user/group like you said.

hacluster@hasid01:~$ strace -f /usr/sbin/corosync -f 2>&1 | grep -i mlock
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Operation not permitted)
mlockall(MCL_CURRENT|MCL_FUTURE) = -1 EPERM (Operation not permitted)

and both calls, prlimit64() and mlockall() failed with EPERM.

When testing with 1MB soft/hard limit:

hacluster@hasid01:~$ strace -f /usr/sbin/corosync -f 2>&1 | grep -i mlock
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Operation not permitted)
mlockall(MCL_CURRENT|MCL_FUTURE) = 0

only prlimit64() fails with EPERM.

It tries to set RLIMIT_MEMLOCK soft and hard limits to RLIM64_INFINITY, which is defined as:

#define RLIM64_INFINITY (~0ULL)

And it is, possibly, the "unlimited" value.

Since it failed with EPERM, checking return: EPERM = An unprivileged process tried to raise the hard limit; the CAP_SYS_RESOURCE capability is required to do this.

Looks like unless your container has "sys_resource" as lxc.cap.keep= value, AND you configure corosync to have CAP_SYS_RESOURCE enabled by default:

sudo setcap 'CAP_SYS_RESOURCE=+ep' /usr/sbin/corosync

the prlimit64() call will fail UNLESS you have unlimited value set for memlock, then it would work:

(c)inaddy@hasid01:~$ sudo su - hacluster
hacluster@hasid01:~$ ulimit -H -l
unlimited
hacluster@hasid01:~$ strace -f /usr/sbin/corosync -f 2>&1 | grep -i mlock
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = 0
mlockall(MCL_CURRENT|MCL_FUTURE) = 0

And, despite failing in other parts:

sched_setscheduler(0, SCHED_RR, [99]) = -1 EPERM (Operation not permitted)
setpriority(PRIO_PGRP, 0, -2147483648) = -1 EACCES (Permission denied)

It works:

(c)inaddy@hasid02:~$ sudo crm status
Stack: corosync
Current DC: hasid02 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Mon Jul 15 19:40:25 2019
Last change: Mon Jul 15 17:51:53 2019 by root via cibadmin on hasid01

3 nodes configured
0 resources configured

Node hasid01: pending
Online: [ hasid02 hasid03 ]

And

(c)inaddy@hasid02:~$ sudo corosync-quorumtool
Quorum information
------------------
Date: Mon Jul 15 19:40:41 2019
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 2
Ring ID: 1/136
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: ...

Hello Dimitri,

I tried to reproduce the same behaviour using default LXC containers in real HW (ARMv8 - ARMHF containers) and wasn't able to.

Nevertheless, I was able to cause corosync not to start due to failed mlock() calls:

main.log:Jul 15 18:27:57 [2386] hasid01 corosync warning [MAIN  ] main.c:corosync_mlockall:481 Could not lock memory of service to avoid page faults: Operation not permitted (1)
main.log:Jul 15 18:27:57 [2386] hasid01 corosync error   [MAIN  ] main.c:corosync_flock:1087 Corosync Executive couldn't create lock file.

when I made mlock soft/hard limit to be 0 for "hacluster/haclient" user/group like you said.

hacluster@hasid01:~$ strace -f /usr/sbin/corosync -f 2>&1 | grep -i mlock
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Operation not permitted)
mlockall(MCL_CURRENT|MCL_FUTURE)        = -1 EPERM (Operation not permitted)

and both calls, prlimit64() and mlockall() failed with EPERM.

When testing with 1MB soft/hard limit:

hacluster@hasid01:~$ strace -f /usr/sbin/corosync -f 2>&1 | grep -i mlock
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Operation not permitted)
mlockall(MCL_CURRENT|MCL_FUTURE)        = 0

only prlimit64() fails with EPERM.

It tries to set RLIMIT_MEMLOCK soft and hard limits to RLIM64_INFINITY, which is defined as:

#define RLIM64_INFINITY		(~0ULL)

And it is, possibly, the "unlimited" value.

Since it failed with EPERM, checking return: EPERM = An unprivileged process tried to raise the hard limit; the CAP_SYS_RESOURCE capability is required to do this.

Looks like unless your container has "sys_resource" as lxc.cap.keep= value, AND you configure corosync to have CAP_SYS_RESOURCE enabled by default:

sudo setcap 'CAP_SYS_RESOURCE=+ep' /usr/sbin/corosync

the prlimit64() call will fail UNLESS you have unlimited value set for memlock, then it would work:

(c)inaddy@hasid01:~$ sudo su - hacluster
hacluster@hasid01:~$ ulimit -H -l
unlimited
hacluster@hasid01:~$ strace -f /usr/sbin/corosync -f 2>&1 | grep -i mlock
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = 0
mlockall(MCL_CURRENT|MCL_FUTURE)        = 0

And, despite failing in other parts:

sched_setscheduler(0, SCHED_RR, [99])   = -1 EPERM (Operation not permitted)                                          
setpriority(PRIO_PGRP, 0, -2147483648)  = -1 EACCES (Permission denied)

It works:

(c)inaddy@hasid02:~$ sudo crm status
Stack: corosync
Current DC: hasid02 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Mon Jul 15 19:40:25 2019
Last change: Mon Jul 15 17:51:53 2019 by root via cibadmin on hasid01

3 nodes configured
0 resources configured

Node hasid01: pending
Online: [ hasid02 hasid03 ]

And

(c)inaddy@hasid02:~$ sudo corosync-quorumtool
Quorum information
------------------
Date:             Mon Jul 15 19:40:41 2019
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          2
Ring ID:          1/136
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
         1          1 hasid01
         2          1 hasid02 (local)
         3          1 hasid03

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-16:

#5

Quick clarifications on next steps:

- corosync runs as root... so its unclear to me it would fail for prlimit64() inside a container if sys_resource is denied. for sure prlimit64() fails in 2 conditions: not root and no "cap_sys_resource" is configured for the binary (CAP_SYS_RESOURCE=+ep), which is not the case, and not root and ulimit for memlock is not unlimited, also not the case since corosync runs as root.

- i'm gonna test lxd defaults, since i was using vanilla lxc setup. intention is to check on sys_resource being default or not, and the impact of lacking sys_resource for root prlimit64() calls without memlock ulimit being unlimited if no sys_resource is set to container.

- will check anything else that might be stepping into our way.

Revision history for this message

Robie Basak (racb) wrote on 2019-07-16: Re: [Bug 1828228] Re: corosync fails to start in container (armhf) bump some limits

#6

Note that if this turns out to be challenging a "force-badtest" is likely to be acceptable to get the package migrated for now.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-16: Re: corosync fails to start in container (armhf) bump some limits

#7

Thanks Robie, and I totally agree. I'll give a fast look in lxd cases and comment back here so we can take a decision.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-20:

#8

This "bug" happens because of "unprivileged" containers:

root@corosync:~# corosync -f
Jul 20 21:26:32 notice [MAIN ] Corosync Cluster Engine 3.0.1 starting up
Jul 20 21:26:32 info [MAIN ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf snmp pierelro bindnow
Jul 20 21:26:32 warning [MAIN ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Jul 20 21:26:32 warning [MAIN ] Could not set priority -2147483648: Permission denied (13)
Jul 20 21:26:32 notice [TOTEM ] Initializing transport (Kronosnet).
Jul 20 21:26:33 crit [TOTEM ] knet_handle_new failed: File name too long (36)
Jul 20 21:26:33 error [KNET ] transport: Failed to set socket buffer via force option 33: Operation not permitted
Jul 20 21:26:33 error [KNET ] transport: Unable to set local socketpair receive buffer: File name too long
Jul 20 21:26:33 error [KNET ] handle: Unable to initialize internal hostsockpair: File name too long
Jul 20 21:26:33 error [MAIN ] Can't initialize TOTEM layer
Jul 20 21:26:33 error [MAIN ] Corosync Cluster Engine exiting with status 15 at main.c:1529.

connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/fs/cgroup/cpu/cpu.rt_runtime_us", O_RDONLY) = -1 ENOENT (No such file or directory)
sched_setscheduler(0, SCHED_RR, [99]) = -1 EPERM (Operation not permitted)
setpriority(PRIO_PGRP, 0, -2147483648) = -1 EACCES (Permission denied)
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Operation not permitted)
[pid 694] setsockopt(11, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
[pid 694] epoll_ctl(0, EPOLL_CTL_DEL, 11, 0xff968fb8) = -1 EINVAL (Invalid argument)
[pid 694] epoll_ctl(0, EPOLL_CTL_DEL, 0, 0xff968fb8) = -1 EINVAL (Invalid argument)
[pid 694] close(0) = -1 EBADF (Bad file descriptor)
[pid 694] close(0) = -1 EBADF (Bad file descriptor)
[pid 695] madvise(0xf6055000, 8368128, MADV_DONTNEED) = -1 EINVAL (Invalid argument)

----

I was able to reproduce the exact same issue by using lxd on armhf with unprivileged containers. And its pretty clear to check the issue by issuing:

root@corosync:~# ulimit -l unlimited
-bash: ulimit: max locked memory: cannot modify limit: Operation not permitted

as root and checking that "root" does not have "cap_sys_resource" capabilities. There is also the Kronosnet initialization failure because of low {r,w}mem_max values.

This "bug" happens because of "unprivileged" containers:

root@corosync:~# corosync -f
Jul 20 21:26:32 notice  [MAIN  ] Corosync Cluster Engine 3.0.1 starting up
Jul 20 21:26:32 info    [MAIN  ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf snmp pierelro bindnow
Jul 20 21:26:32 warning [MAIN  ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Jul 20 21:26:32 warning [MAIN  ] Could not set priority -2147483648: Permission denied (13)
Jul 20 21:26:32 notice  [TOTEM ] Initializing transport (Kronosnet).
Jul 20 21:26:33 crit    [TOTEM ] knet_handle_new failed: File name too long (36)
Jul 20 21:26:33 error   [KNET  ] transport: Failed to set socket buffer via force option 33: Operation not permitted
Jul 20 21:26:33 error   [KNET  ] transport: Unable to set local socketpair receive buffer: File name too long
Jul 20 21:26:33 error   [KNET  ] handle: Unable to initialize internal hostsockpair: File name too long
Jul 20 21:26:33 error   [MAIN  ] Can't initialize TOTEM layer
Jul 20 21:26:33 error   [MAIN  ] Corosync Cluster Engine exiting with status 15 at main.c:1529.

connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/fs/cgroup/cpu/cpu.rt_runtime_us", O_RDONLY) = -1 ENOENT (No such file or directory)
sched_setscheduler(0, SCHED_RR, [99])   = -1 EPERM (Operation not permitted)
setpriority(PRIO_PGRP, 0, -2147483648)  = -1 EACCES (Permission denied)
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Operation not permitted)
[pid   694] setsockopt(11, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
[pid   694] epoll_ctl(0, EPOLL_CTL_DEL, 11, 0xff968fb8) = -1 EINVAL (Invalid argument)
[pid   694] epoll_ctl(0, EPOLL_CTL_DEL, 0, 0xff968fb8) = -1 EINVAL (Invalid argument)
[pid   694] close(0)                    = -1 EBADF (Bad file descriptor)
[pid   694] close(0)                    = -1 EBADF (Bad file descriptor)
[pid   695] madvise(0xf6055000, 8368128, MADV_DONTNEED) = -1 EINVAL (Invalid argument)

----

I was able to reproduce the exact same issue by using lxd on armhf with unprivileged containers. And its pretty clear to check the issue by issuing:

root@corosync:~# ulimit -l unlimited
-bash: ulimit: max locked memory: cannot modify limit: Operation not permitted

as root and checking that "root" does not have "cap_sys_resource" capabilities. There is also the Kronosnet initialization failure because of low {r,w}mem_max values.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-21:

#9

## unprivileged x64:

root@corosync:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Eoan Ermine (development branch)
Release: 19.10
Codename: eoan
root@corosync:~# uname -a
Linux corosync 5.0.0-21-generic #22-Ubuntu SMP Tue Jul 2 13:27:33 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

root@corosync:~# corosync -f
Jul 21 04:20:38 notice [MAIN ] Corosync Cluster Engine 3.0.1 starting up
Jul 21 04:20:38 info [MAIN ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf snmp pierelro bindnow
Jul 21 04:20:38 warning [MAIN ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Jul 21 04:20:38 warning [MAIN ] Could not set priority -2147483648: Permission denied (13)
Jul 21 04:20:38 notice [TOTEM ] Initializing transport (Kronosnet).
Jul 21 04:20:38 crit [TOTEM ] knet_handle_new failed: Cannot allocate memory (12)
Jul 21 04:20:38 error [MAIN ] Can't initialize TOTEM layer
Jul 21 04:20:38 error [MAIN ] Corosync Cluster Engine exiting with status 15 at main.c:1529.

## unprivileged armhf

root@corosync:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Eoan Ermine (development branch)
Release: 19.10
Codename: eoan

root@corosync:~# uname -a
Linux corosync 5.0.0-21-generic #22-Ubuntu SMP Tue Jul 2 13:27:45 UTC 2019 armv8l armv8l armv8l GNU/Linux

root@corosync:~# corosync -f
Jul 21 04:21:35 notice [MAIN ] Corosync Cluster Engine 3.0.1 starting up
Jul 21 04:21:35 info [MAIN ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf snmp pierelro bindnow
Jul 21 04:21:35 warning [MAIN ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Jul 21 04:21:35 warning [MAIN ] Could not set priority -2147483648: Permission denied (13)
Jul 21 04:21:35 notice [TOTEM ] Initializing transport (Kronosnet).
Jul 21 04:21:35 crit [TOTEM ] knet_handle_new failed: Resource temporarily unavailable (11)
Jul 21 04:21:35 error [KNET ] handle: Unable to allocate memory for link to datafd buffer: Resource temporarily unavailable
Jul 21 04:21:35 error [MAIN ] Can't initialize TOTEM layer
Jul 21 04:21:35 error [MAIN ] Corosync Cluster Engine exiting with status 15 at main.c:1529.

## unprivileged x64:

root@corosync:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu Eoan Ermine (development branch)
Release:        19.10
Codename:       eoan
root@corosync:~# uname -a
Linux corosync 5.0.0-21-generic #22-Ubuntu SMP Tue Jul 2 13:27:33 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

root@corosync:~# corosync -f
Jul 21 04:20:38 notice  [MAIN  ] Corosync Cluster Engine 3.0.1 starting up
Jul 21 04:20:38 info    [MAIN  ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf snmp pierelro bindnow
Jul 21 04:20:38 warning [MAIN  ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Jul 21 04:20:38 warning [MAIN  ] Could not set priority -2147483648: Permission denied (13)
Jul 21 04:20:38 notice  [TOTEM ] Initializing transport (Kronosnet).
Jul 21 04:20:38 crit    [TOTEM ] knet_handle_new failed: Cannot allocate memory (12)
Jul 21 04:20:38 error   [MAIN  ] Can't initialize TOTEM layer
Jul 21 04:20:38 error   [MAIN  ] Corosync Cluster Engine exiting with status 15 at main.c:1529.

## unprivileged armhf

root@corosync:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu Eoan Ermine (development branch)
Release:        19.10
Codename:       eoan

root@corosync:~# uname -a
Linux corosync 5.0.0-21-generic #22-Ubuntu SMP Tue Jul 2 13:27:45 UTC 2019 armv8l armv8l armv8l GNU/Linux

root@corosync:~# corosync -f
Jul 21 04:21:35 notice  [MAIN  ] Corosync Cluster Engine 3.0.1 starting up
Jul 21 04:21:35 info    [MAIN  ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf snmp pierelro bindnow
Jul 21 04:21:35 warning [MAIN  ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Jul 21 04:21:35 warning [MAIN  ] Could not set priority -2147483648: Permission denied (13)
Jul 21 04:21:35 notice  [TOTEM ] Initializing transport (Kronosnet).
Jul 21 04:21:35 crit    [TOTEM ] knet_handle_new failed: Resource temporarily unavailable (11)
Jul 21 04:21:35 error   [KNET  ] handle: Unable to allocate memory for link to datafd buffer: Resource temporarily unavailable
Jul 21 04:21:35 error   [MAIN  ] Can't initialize TOTEM layer
Jul 21 04:21:35 error   [MAIN  ] Corosync Cluster Engine exiting with status 15 at main.c:1529.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-21:

#10

Somehow the lxd containers being used for autopkgtest are, likely, different. x64 seems to be running privileged containers for need_root tests, while armhf is not (orelse x64 selfpkgtests wouldn't pass either, like demonstrated in previous comment).

I'll suggest a hints-ubuntu test marking this as bad-test, but it seems that the environment is bad, and not the test.

no longer affects:	pacemaker (Ubuntu)
Changed in corosync-qdevice (Ubuntu):
status:	New → In Progress
importance:	Undecided → High
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-21:

#11

corosync-qdevice autopkgtest is also failing because of the same reason (armhf architecture).

Revision history for this message

Steve Langasek (vorlon) wrote on 2019-07-21: Re: [Bug 1828228] Re: corosync fails to start in container (armhf) bump some limits

#12

On Sun, Jul 21, 2019 at 04:24:08AM -0000, Rafael David Tinoco wrote:
> Somehow the lxd containers being used for autopkgtest are, likely,
> different. x64 seems to be running privileged containers for need_root
> tests, while armhf is not (orelse x64 selfpkgtests wouldn't pass either,
> like demonstrated in previous comment).

armhf is the only architecture that runs tests in containers. All other
architectures run them in VMs. (The only reason armhf doesn't use VMs is
because we can't deploy an armhf VM in openstack.)

> I'll suggest a hints-ubuntu test marking this as bad-test, but it seems
> that the environment is bad, and not the test.

Hard disagree. If the test can detect that it's running in an unprivileged
container, it should skip any tests which require privileges. If the tests
as a whole can't be run in an unprivileged container, then they should
declare Restrictions: isolation-machine instead of Restrictions:
isolation-container.

Rafael David Tinoco (rafaeldtinoco) on 2019-07-21

Changed in corosync (Ubuntu):
status:	In Progress → Invalid
Changed in corosync-qdevice (Ubuntu):
status:	In Progress → Invalid

Revision history for this message

Steve Langasek (vorlon) wrote on 2019-07-21: Re: corosync fails to start in container (armhf) bump some limits

#13

Reopening per my preceding comment

Changed in corosync (Ubuntu):
status:	Invalid → Triaged
Changed in corosync-qdevice (Ubuntu):
status:	Invalid → Triaged

Rafael David Tinoco (rafaeldtinoco) on 2019-07-22

Changed in corosync (Ubuntu):
status:	Triaged → In Progress
summary:	- corosync fails to start in container (armhf) bump some limits + corosync fails to start in unprivileged containers - autopkgtest failure

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-22:

#14

Steve, you are right. I was preparing this comment few mins ago:

"""Speaking with Andreas we had the idea to just exit 2 (at least one test was skipped ret code) the test when running in a unprivileged environment. That can be easily tested by changing memlock size limit as root (need-root in test is needed) and checking for return error""" and it goes the same direction as you pointed.

I'll add isolation-machine and skip test if ulimit -H -l can't be done (since w/ need-root it will indicate a unprivileged namespace).

Tks!

Changed in auto-package-testing:
status:	New → Invalid
Changed in corosync-qdevice (Ubuntu):
assignee:	Rafael David Tinoco (rafaeldtinoco) → nobody
no longer affects:	corosync-qdevice (Ubuntu)

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-23:

#15

Pacemaker also depends on corosync, =), and its autopkgtests can't run in armhf if in unprivileged container. Same change we did for corosync has to be done in pacemaker.

Changed in pacemaker (Ubuntu):
status:	New → In Progress
importance:	Undecided → High

Rafael David Tinoco (rafaeldtinoco) on 2019-07-25

Changed in pacemaker (Ubuntu):
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in corosync (Ubuntu):
status:	In Progress → Fix Released

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-07-26:

#16

This issue is fixed in both, pacemaker and corosync. Other regressions are being investigated at:

https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1838024

Changed in pacemaker (Ubuntu):
status:	In Progress → Fix Released

Andreas Hasenack (ahasenack) on 2020-01-07

Changed in pcs (Ubuntu):
status:	New → In Progress
assignee:	nobody → Rafael David Tinoco (rafaeldtinoco)

Andreas Hasenack (ahasenack) on 2020-01-10

tags:

added: update-excuse

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-04-07:

#17

This bug was fixed in the package pcs - 0.10.4-3

---------------
pcs (0.10.4-3) unstable; urgency=medium

  [ Rafael David Tinoco ]
  * d/p/Fix-python-tornado-5.patch: bring back workaround that fixes
    python-tornado until v6 becomes available.
  * Skip autopkgtest for unprivileged containers (LP: #1828228)

  [ Valentin Vidic ]
  * d/patches: fix warnings in ruby testsuite
  * d/control: update Standards-Version to 4.5.0
  * d/tests: show verbose progress for python tests

-- Valentin Vidic <email address hidden> Sun, 05 Apr 2020 19:40:03 +0200

Changed in pcs (Ubuntu):
status:	In Progress → Fix Released

Rafael David Tinoco (rafaeldtinoco) on 2020-04-07

Changed in pcs (Ubuntu):
assignee:	Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in pacemaker (Ubuntu):
assignee:	Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in corosync (Ubuntu):
assignee:	Rafael David Tinoco (rafaeldtinoco) → nobody

Auto Package Testing

corosync fails to start in unprivileged containers - autopkgtest failure

Bug Description

Related branches

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to
Auto Package Testing	Invalid	Undecided	Unassigned
corosync (Ubuntu)	Fix Released	High	Unassigned
pacemaker (Ubuntu)	Fix Released	High	Unassigned
pcs (Ubuntu)	Fix Released	Undecided	Unassigned