xxdiff trashes output file comparing utf-8 files
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
xxdiff (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
xxdiff 5.1 on ubuntu 23.04 converts all input files to utf-8 for comparison. Then when saving
the results, converts the data to latin1. (and replaces char's not representable in latin1 with '?'
comparing two utf-8 files, results in trashing the saved output file.
Apparently qt is now smart enough to convert any ASCII, latin1 or UTF-8 encoded file to UTF-8 for
comparison. but blindly writes out latin1.
Test Case
file1
Is the city Quebec spelt Québec, Quëbec, or Quēbec
it's Québec in both English and French, the Region is Quebec
it's Québec, Quebec, Canada
file2
Is the city Quebec spelt Québec, Quëbec, or Quēbec
it's Québec in both English and French, the Region is Quebec
it's Québec, Quebec, Canada
The city is not Quëbec, nor Quēbec
run xxdiff file1 file2
select the last line of file2, to add it to the file
save the results as new and you get
new (if you cat new)
Is the city Quebec spelt Qu�bec, Qu�bec, or Qu?bec
it's Qu�bec in both English and French, the Region is Quebec
it's Qu�bec, Quebec, Canada
The city is not Qu�bec, nor Qu?bec
new (if you more new)
Is the city Quebec spelt Qu<E9>bec, Qu<EB>bec, or Qu?bec
it's Qu<E9>bec in both English and French, the Region is Quebec
it's Qu<E9>bec, Quebec, Canada
The city is not Qu<EB>bec, nor Qu?bec
new has been converted to latin1 (file2 was UTF-8)
and the last Quebec has a ? instead of e with a bar.
So basically any saved UTF file is likely trashed.
I rate this as a completely unacceptable result.
especially when this occurs somewhere in a large file
so you don't notice it immediately.
Since xxdiff is comparing two files in UTF-8
it should write it out in UTF-8 as well.
Especially since my LANG=en_US.UTF-8