Ubuntu
sed package

sed stops replacing when reaching a special character

Bug #447866 reported by lovinglinux on 2009-10-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	sed (Ubuntu)	New	Undecided	Unassigned

Bug Description

Binary package hint: sed

When filtering a large file (~600.000 lines) with sed, it stops replacing if it encounters a special character.

For example, when using the regular expression below to remove all characters except numbers:

sed -e 's/[^0123456789]//g'

and if the file contains the following line:

AAAAAüBBBBBB999

the output is:

ü999

instead of:

999

When using the regular expression below to remove all characters before the numbers:

sed -e 's/.*999/Range:/g'

the output is:

AAAAAü999

instead of:

999

It only happens with files containing a large number of lines.

I have applied the same regular expression filtering to the same file with perl and the output is perfect.

ProblemType: Bug
Architecture: i386
Date: Sat Oct 10 05:49:02 2009
DistroRelease: Ubuntu 9.10
NonfreeKernelModules: nvidia
Package: sed 4.2.1-1
ProcEnviron:
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-12.41-generic
SourcePackage: sed
Uname: Linux 2.6.31-12-generic i686

Tags:

Revision history for this message

lovinglinux (lovinglinux) wrote on 2009-10-10:

Dependencies.txt Edit (488 bytes, text/plain; charset="utf-8")
XsessionErrors.txt Edit (2.3 KiB, text/plain; charset="utf-8")

Revision history for this message

Paolo Bonzini (bonzini) wrote on 2009-11-11:

If you don't know the charset of the file, you should set the LANG or LC_CTYPE variables to "C":

$ echo $'AAAA\x88BBBB' | sed -e 's/[^0123456789]//g' | od -x
0000000 0a88
0000002
$ echo $'AAAA\x88BBBB' | LANG=C sed -e 's/[^0123456789]//g' | od -x
0000000 000a
0000001

This is different from Perl indeed:

$ echo $'AAAA\x88BBBB' | psed 's/[^0123456789]//g' | od -x
0000000 000a
0000001

Paolo Bonzini (bonzini) on 2010-02-12

Changed in sed (Ubuntu):
status:	New → Invalid

Revision history for this message

lovinglinux (lovinglinux) wrote on 2010-02-14:

I don't see why it should be considered invalid, so I'm changing the status back to new.

Changed in sed (Ubuntu):
status:	Invalid → New

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntused package

sed stops replacing when reaching a special character

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
sed package