test failure - test-regex

Bug #1940996 reported by Dan Bungert
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grep
Fix Released
Medium
grep (Ubuntu)
New
Undecided
Unassigned

Bug Description

'test-regex' fails when building grep against glibc 2.34.
Per commentary from grep upstream at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=50069,
the test failure can be attributed to skew between the glibc built-in regex and the one that is found in the grep source code.

Revision history for this message
In , Paolo Bonzini (bonzini) wrote :

$ echo 87654321 | \
  grep -E -e '^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?).?\9\8\7\6\5\4\3\2\1$'
Segmentation fault

Will work on a C reproducer soon.

Revision history for this message
In , Paolo Bonzini (bonzini) wrote :

Minimized testcases (same regex):

$ echo 8 | grep -E -e "$regex"
8 # >>> okay
$ echo 87 | grep -E -e "$regex"
Segmentation fault

$ echo 88 | grep -E -e "$regex"
88 # >>> okay
$ echo 887 | grep -E -e "$regex"
Segmentation fault

Also, everything I tried to feed that is of length 9 or higher and should not
match, gives either a false positive or a segfault:

$ echo 987654321 | grep -E -e "$regex"
887654321
$ echo 484635532 | grep -E -e "$regex"
484635532
$ echo 0123454321 | grep -E -e "$regex"
Segmentation fault
$ echo 0000123454321 | grep -E -e "$regex"
Segmentation fault

Revision history for this message
In , Paul Eggert (eggert-gnu) wrote :

I ran into what appears to be the same bug independently and came up with a simpler (all-C) reproducer; please see Bug#17356. I tried to merge the two bug reports via the web interface, but failed to do so.

Revision history for this message
In , Florian Weimer (fweimer) wrote :

*** Bug 17356 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Paul Eggert (eggert-gnu) wrote :

This bug causes GNU coreutils Bug#22793 "grep -E assertion failure with back references"; see <https://bugs.gnu.org/22793>. I'm adding comments to both bug reports so that the connection between the two bugs is clearer.

Although this bug's current assignee is Paolo Bonzini (the original reporter), I think Paolo is pretty busy doing other stuff. Is someone else available to work on regex bugs? I suspect the fix for this bug will not be trivial.

Revision history for this message
In , Paul Eggert (eggert-gnu) wrote :

Created attachment 9758
C code to reproduce the bug

I attached a slightly-simpler C-language reproducer for the bug, derived from the attachment in Bug#17356. If I compile and run this program, it outputs "a.out: regexec.c:1375: pop_fail_stack: Assertion `num >= 0' failed." and then aborts.

Revision history for this message
In , Paul Eggert (eggert-cs) wrote :

Created attachment 10674
This test case silently returns the wrong answer

Following up on a 'grep' bug report here:

https://debbugs.gnu.org/29613

attached is a seemingly-related test case which illustrates a bug that causes 'grep' to quietly return the wrong answer instead of dumping core. This test case should exit successfully, but because of the bug regexec returns 0 so the test case exits with status 1. I compiled and ran it on Fedora 27 x86-64 with "gcc regbug.c; ./a.out".

Revision history for this message
In , Paul Eggert (eggert-cs) wrote :

Another test case for this bug can be found here:

https://debbugs.gnu.org/32806

Revision history for this message
In , Paul Eggert (eggert-cs) wrote :

Another test case for this bug can be found here:

https://debbugs.gnu.org/34238

Revision history for this message
In , Cvs-commit (cvs-commit) wrote :

The master branch has been updated by Andreas Schwab <email address hidden>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=fc141ea78ee3d87c67b18488827fe2d89c9343e7

commit fc141ea78ee3d87c67b18488827fe2d89c9343e7
Author: Andreas Schwab <email address hidden>
Date: Wed Oct 30 10:38:36 2019 +0100

    Fix array bounds violation in regex matcher (bug 25149)

    If the regex has more subexpressions than the number of elements allocated
    in the regmatch_t array passed to regexec then proceed_next_node may
    access the regmatch_t array outside its bounds.

    No testcase added because even without this bug it would then crash in
    pop_fail_stack which is bug 11053.

Revision history for this message
In , Paul Eggert (eggert-cs) wrote :

Created attachment 13204
regex: fix undefined backref behavior

I am attaching a proposed patch for this longstanding bug. I plan to email this to libc-alpha shortly.

Dan Bungert (dbungert)
no longer affects: grep
Changed in grep:
importance: Unknown → Medium
status: Unknown → In Progress
Revision history for this message
In , Michael Hudson-Doyle (mwhudson) wrote :

Did the patch ever get sent to libc-alpha?

Revision history for this message
In , Paul Eggert (eggert-cs) wrote :

(In reply to Michael Hudson-Doyle from comment #11)
> Did the patch ever get sent to libc-alpha?

Unfortunately I never got around to it.

Someone else can shepherd it if it's urgent; otherwise I suppose it can wait until someone gets around to syncing Gnulib with glibc.

Revision history for this message
In , Cvs-commit (cvs-commit) wrote :

The master branch has been updated by Paul Eggert <email address hidden>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0b5ca7c3e551e5502f3be3b06453324fe8604e82

commit 0b5ca7c3e551e5502f3be3b06453324fe8604e82
Author: Paul Eggert <email address hidden>
Date: Tue Sep 21 07:47:45 2021 -0700

    regex: copy back from Gnulib

    Copy regex-related files back from Gnulib, to fix a problem with
    static checking of regex calls noted by Martin Sebor. This merges the
    following changes:

    * New macro __attribute_nonnull__ in misc/sys/cdefs.h, for use later
    when copying other files back from Gnulib.

    * Use __GNULIB_CDEFS instead of __GLIBC__ when deciding
    whether to include bits/wordsize.h etc.

    * Avoid duplicate entries in epsilon closure table.

    * New regex.h macro _REGEX_NELTS to let regexec say that its pmatch
    arg should contain nmatch elts. Use that for regexec, instead of
    __attr_access (which is incorrect).

    * New regex.h macro _Attr_access_ which is like __attr_access except
    portable to non-glibc platforms.

    * Add some DEBUG_ASSERTs to pacify gcc -fanalyzer and to catch
    recently-fixed performance bugs if they recur.

    * Add Gnulib-specific stuff to port the dynarray- and lock-using parts
    of regex code to non-glibc platforms.

    * Fix glibc bug 11053.

    * Avoid some undefined behavior when popping an empty fail stack.

Revision history for this message
Dan Bungert (dbungert) wrote :

Naively dropping this commit into the glibc package yields a dozen test failures.

FAIL: conform/POSIX2008/regex.h/conform
FAIL: conform/POSIX/regex.h/conform
FAIL: conform/UNIX98/regex.h/conform
FAIL: conform/XOPEN2K8/regex.h/conform
FAIL: conform/XOPEN2K/regex.h/conform
FAIL: conform/XPG42/regex.h/conform
FAIL: conform/XPG4/regex.h/conform
FAIL: elf/tst-env-setuid
FAIL: misc/tst-ttyname
FAIL: nss/tst-nss-compat1
FAIL: posix/tst-vfork3
FAIL: support/tst-support_descriptors

Revision history for this message
Dan Bungert (dbungert) wrote :
Revision history for this message
In , Vincent-srcware (vincent-srcware) wrote :

(In reply to <email address hidden> from comment #13)
> The master branch has been updated by Paul Eggert <email address hidden>:
>
> https://sourceware.org/git/gitweb.cgi?p=glibc.git;
> h=0b5ca7c3e551e5502f3be3b06453324fe8604e82
>
> commit 0b5ca7c3e551e5502f3be3b06453324fe8604e82
> Author: Paul Eggert <email address hidden>
> Date: Tue Sep 21 07:47:45 2021 -0700
[...]
> * Fix glibc bug 11053.

What is the status of this bug? The comment says that it is fixed, and I could check on an Ubuntu 22.04.1 LTS machine with libc6 2.35-0ubuntu3.1 that regbug.c and rebug2.c no longer fail, but the result is still incorrect with the grep example from Debian bug 884075:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884075

vinc17@gcc92:~$ echo 11111111111 | grep -E '^(11+)\1+$|^1?$' ; echo $?
11111111111
0

Revision history for this message
In , Paul Eggert (eggert-cs) wrote :

On 9/5/22 18:06, vincent-srcware at vinc17 dot net wrote:
>
> What is the status of this bug? The comment says that it is fixed, and I could
> check on an Ubuntu 22.04.1 LTS machine with libc6 2.35-0ubuntu3.1 that regbug.c
> and rebug2.c no longer fail, but the result is still incorrect with the grep
> example from Debian bug 884075:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884075
>
> vinc17@gcc92:~$ echo 11111111111 | grep -E '^(11+)\1+$|^1?$' ; echo $?
> 11111111111
> 0
>
It looks like my comment
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884075#27> was
incorrect, in that the two bugs are different bugs. glibc bug 11053 is
fixed, but Debian bug 884075 is not fixed. Perhaps a better match for
Debian bug 884075 is glibc bug 10844.

It's not an important bug. However, if you have time to fix it please
feel free to send in a fix.

Revision history for this message
In , Vincent-srcware (vincent-srcware) wrote :

(In reply to eggert from comment #15)
> glibc bug 11053 is fixed,

Shouldn't this bug be resolved as fixed, then?

> but Debian bug 884075 is not fixed. Perhaps a better match for
> Debian bug 884075 is glibc bug 10844.

It seems different. With Debian bug 884075, the "|^1?$" part is important (it yields the incorrect output, even though this part isn't matched), and there is nothing like that in glibc bug 10844:

vinc17@gcc92:~$ echo 11111111111 | grep --color=auto -E '^(11+)\1+$|^1?$'
11111111111
vinc17@gcc92:~$ echo 11111111111 | grep --color=auto -E '^(11+)\1+$'
vinc17@gcc92:~$

Note that for the first command, nothing is colored in "11111111111", i.e. the line is output as appeared to be matched, but no matches are shown by colors. As a comparison, with ten 1s instead of eleven, the line is output and the ten 1s are colored.

Revision history for this message
In , Vincent-srcware (vincent-srcware) wrote :

This could be simplified a bit:

vinc17@gcc92:~$ echo 11111111111 | grep --color=auto -E '^(11+)\1+$|^$'
11111111111

(nothing colored).

Revision history for this message
In , Paul Eggert (eggert-cs) wrote :

(In reply to Vincent Lefèvre from comment #16)
> (In reply to eggert from comment #15)
> > glibc bug 11053 is fixed,
>
> Shouldn't this bug be resolved as fixed, then?

OK, done.

> > Perhaps a better match for
> > Debian bug 884075 is glibc bug 10844.
>
> It seems different.

In that case it might be better to file a new glibc bug report, one that corresponds more closely to Debian bug 884075.

Revision history for this message
In , Vincent-srcware (vincent-srcware) wrote :

Sorry, actually both regbug.c and rebug2.c fail as they return the exit status 1 (with my usual configuration, my prompt shows any non-zero exit status, but this is not the case of the machine on which I had done the test, so that I missed the failure initially):

vinc17@gcc92:~$ ./regbug
vinc17@gcc92:~$ echo $?
1
vinc17@gcc92:~$ ./rebug2
vinc17@gcc92:~$ echo $?
1

However, in the test from Paolo Bonzini's bug report (comment 0), grep no longer crashes (while it still crashes with glibc 2.34, which does not have the fix).

regbug.c is derived from the attachment in Bug#17356 (as said in comment 5). I've tested this original testcase: with glibc 2.34 on x86_64, it crashes (segmentation fault); with glibc 2.35 on riscv64 (host gcc92), it outputs "no match (incorrect)".

So it seems that the fix mentioned in comment 13 fixed the crashes (which was the initial bug report), but not the misbehavior.

Now, with these new details, is it still OK to regard this bug as fixed and that the misbehavior (rebug.c from Bug#17356; regbug.c and rebug2.c from this bug) is actually a new bug?

Revision history for this message
In , Paul Eggert (eggert-cs) wrote :

(In reply to Vincent Lefèvre from comment #19)

> regbug.c is derived from the attachment in Bug#17356 (as said in comment 5).
> I've tested this original testcase: with glibc 2.34 on x86_64, it crashes
> (segmentation fault); with glibc 2.35 on riscv64 (host gcc92), it outputs
> "no match (incorrect)".
>
> So it seems that the fix mentioned in comment 13 fixed the crashes (which
> was the initial bug report), but not the misbehavior.

OK, so in that case how about if we update Bug#17356 by (1) saying it is no longer a duplicate of Bug#11053 (as we've fixed the latter but not the former), and (2) reopening Bug#17536? If I understand you correctly, that would match the symptoms you describe.

Revision history for this message
In , Vincent-srcware (vincent-srcware) wrote :

(In reply to eggert from comment #20)
> OK, so in that case how about if we update Bug#17356 by (1) saying it is no
> longer a duplicate of Bug#11053 (as we've fixed the latter but not the
> former), and (2) reopening Bug#17536? If I understand you correctly, that
> would match the symptoms you describe.

Yes, I think that this is the best solution.

Changed in grep:
status: In Progress → Fix Released
Revision history for this message
In , Paul Eggert (eggert-cs) wrote :

(In reply to Vincent Lefèvre from comment #21)
> (In reply to eggert from comment #20)
> > OK, so in that case how about if we update Bug#17356 by (1) saying it is no
> > longer a duplicate of Bug#11053 (as we've fixed the latter but not the
> > former), and (2) reopening Bug#17536? If I understand you correctly, that
> > would match the symptoms you describe.
>
> Yes, I think that this is the best solution.

OK, done.

Revision history for this message
In , Vincent-srcware (vincent-srcware) wrote :

What about attachment 10674 ("This test case silently returns the wrong answer"), with the pattern "^(11+)\\1+$|^1?$" and the string "1111111111111"?

Should it be regarded as part of Bug#17356 or another bug? This case seems quite different from Bug#10844 and Bug#17356. Unless the intent is to group all the bugs about regexp involving backreferences giving a wrong answer[*] (in which case Bug#10844 and Bug#17356 should be regarded as duplicates to each other), I think that this should be a new bug.

[*] as opposed to a crash like in this bug 11053.

Revision history for this message
In , Paul Eggert (eggert-cs) wrote :

Sure, feel free to file it as a new bug.

Revision history for this message
In , Vincent-srcware (vincent-srcware) wrote :

(In reply to eggert from comment #24)
> Sure, feel free to file it as a new bug.

Bug 29560.

Revision history for this message
In , Cvs-commit (cvs-commit) wrote :

The release/2.34/master branch has been updated by Florian Weimer <email address hidden>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=86a701a20479dfbc23540b3143fd5b28660a2447

commit 86a701a20479dfbc23540b3143fd5b28660a2447
Author: Paul Eggert <email address hidden>
Date: Tue Sep 21 07:47:45 2021 -0700

    regex: copy back from Gnulib

    Copy regex-related files back from Gnulib, to fix a problem with
    static checking of regex calls noted by Martin Sebor. This merges the
    following changes:

    * New macro __attribute_nonnull__ in misc/sys/cdefs.h, for use later
    when copying other files back from Gnulib.

    * Use __GNULIB_CDEFS instead of __GLIBC__ when deciding
    whether to include bits/wordsize.h etc.

    * Avoid duplicate entries in epsilon closure table.

    * New regex.h macro _REGEX_NELTS to let regexec say that its pmatch
    arg should contain nmatch elts. Use that for regexec, instead of
    __attr_access (which is incorrect).

    * New regex.h macro _Attr_access_ which is like __attr_access except
    portable to non-glibc platforms.

    * Add some DEBUG_ASSERTs to pacify gcc -fanalyzer and to catch
    recently-fixed performance bugs if they recur.

    * Add Gnulib-specific stuff to port the dynarray- and lock-using parts
    of regex code to non-glibc platforms.

    * Fix glibc bug 11053.

    * Avoid some undefined behavior when popping an empty fail stack.

    (cherry picked from commit 0b5ca7c3e551e5502f3be3b06453324fe8604e82)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.