UnicodeDecodeError in posixpath for non-ascii filename
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
CVS to Bazaar importer |
Triaged
|
Medium
|
Unassigned | ||
Mukti Bangla Open Type font |
New
|
Undecided
|
Unassigned |
Bug Description
cvsps-import fails for me, probably due to a non-ascii character in some file name.
$ bzr cvsps-import --encoding=latin1 $PWD/CVSROOT ModuleName .
Creating cvsps dump file: ./staging/
Read 5718 patchsets (string cache hits: 0, total: 14181)
Failed while processing: Patchset(871, HEAD, materlik, 2002/02/11 18:01:04)
Processed 870 patches (870 new, 0 existing) on 14 branches (6 tags) in 2264.3s (0.38 patch/s)
bzr: ERROR: exceptions.
Traceback (most recent call last):
File "/usr/lib/
return the_callable(*args, **kwargs)
File "/usr/lib/
ret = run(*run_argv)
File "/usr/lib/
return self.run(
File "/home/
importer.
File "/home/
self.
File "/home/
rev_id, action = cvs_to_
File "/home/
revision_id = self._extract_
File "/home/
txt, executable = self._cvs_
File "/home/
rcs_file = self._get_
File "/home/
filename + ',v')
File "/usr/lib/
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 20: ordinal not in range(128)
bzr 2.0.0 on python 2.6.2 (Linux-
arguments: ['/usr/bin/bzr', 'cvsps-import', '--encoding=
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'de_DE.utf8'
plugins:
bzrtools /usr/lib/
cvsps_import /home/mvg/
launchpad /usr/lib/
netrc_
qbzr /usr/lib/
svn /home/mvg/
*** Bazaar has encountered an internal error. This probably indicates a
bug in Bazaar. You can help us fix it by filing a bug report at
https:/
including this traceback and a description of the problem.
The fact that even if I configured the cvs encoding as latin1 and my filesystem as utf8, python tries to interpret the string as ascii, seems a clear indication that this is not a configuration issue.
Related branches
- Jelmer Vernooij (community): Approve (code)
-
Diff: 72 lines1 file modifiedcvsps/importer.py (+11/-6)
Changed in bzr-cvsps-import: | |
status: | New → Triaged |
importance: | Undecided → Medium |
I added some debug output. The problem lies in these lines in _get_rcs_filename:
rcs_ file = osutils. pathjoin( self._cvs_ root, self._cvs_module,
filename + ',v')
The first two arguments are unicode strings, while the third one is a byte string. Given the fact that afaik CVS doesn't particularly care about encodings, and that there might well be some legacy files in some Attic which are illegal according to current filesystem encoding. So paths inside the repository should be treated as binary, and the fact that a clean conversion using the current filesystem character set might be impossible should be taken into account as well. I'm thinking about a patch, but have no good solution yet.