sed stops replacing when reaching a special character
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
sed (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: sed
When filtering a large file (~600.000 lines) with sed, it stops replacing if it encounters a special character.
For example, when using the regular expression below to remove all characters except numbers:
sed -e 's/[^0123456789
and if the file contains the following line:
AAAAAüBBBBBB999
the output is:
ü999
instead of:
999
When using the regular expression below to remove all characters before the numbers:
sed -e 's/.*999/Range:/g'
the output is:
AAAAAü999
instead of:
999
It only happens with files containing a large number of lines.
I have applied the same regular expression filtering to the same file with perl and the output is perfect.
ProblemType: Bug
Architecture: i386
Date: Sat Oct 10 05:49:02 2009
DistroRelease: Ubuntu 9.10
NonfreeKernelMo
Package: sed 4.2.1-1
ProcEnviron:
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcVersionSign
SourcePackage: sed
Uname: Linux 2.6.31-12-generic i686
Changed in sed (Ubuntu): | |
status: | New → Invalid |
If you don't know the charset of the file, you should set the LANG or LC_CTYPE variables to "C":
$ echo $'AAAA\x88BBBB' | sed -e 's/[^0123456789 ]//g' | od -x ]//g' | od -x
0000000 0a88
0000002
$ echo $'AAAA\x88BBBB' | LANG=C sed -e 's/[^0123456789
0000000 000a
0000001
This is different from Perl indeed:
$ echo $'AAAA\x88BBBB' | psed 's/[^0123456789 ]//g' | od -x
0000000 000a
0000001