I realized the approach is different, that's why I asked you to
test it :)
Sorry for not having commented on your changes more clearly, see
my comments below.
You started on the assumption that the client and the server will
use the same encoding which is less likely to work than using
UTF8 (see RFC2640).
Kamil> FTP servers are usually encoding agnostic.
Not really, they have to obey whatever file system is used
underneath.
Kamil> They treat file names as built of 8-bit characters
Kamil> without any specific encoding. They don't support any
Kamil> encoding explicitly,
Many file systems will refuse to create files with arbitrary
8-bits characters. Using UTF8 as an intermediate representation
guarantees that clients and servers talk about the same paths
(again, have a look at RFC2640).
And some ftp servers will just not support arbitrary 8-bits paths
(medusa for one).
Kamil> Please take a look at my changes. They should fix the
Kamil> problem even if your system uses some other encoding
Kamil> for filenames (for example iso-8859-2).
Your changes only impact the client side, the problem is on the
server side.
RFC2640 says that the clients should use UTF8 and that servers
should handle UTF8 and *then* do what is needed on their side.
If the clients uses iso-8859-1, mac-roman or utf8 and the server
uses iso-8859-2, then using your changes can break but using utf8
should work.
Now, it depends on whether the server is handling utf8.
If it doesn't, then if (but only if) the server and all the
clients use the same fs encoding, your changes will work.
If it does, your changes will break but using utf8 will work.
So the proposed changes may not be the final answer but it's more
likely to work under more configurations.
And that's why I wanted to know if it worked for *you* !
Kamil> The root of my problem seems to come from the fact
Kamil> that python os.path functions appear not to be aware
Kamil> of the fact that OS might use different encodings for
Kamil> file names. They just return string and if it is
Kamil> implicitly converted to unicode at some point then the
Kamil> 'ascii' encoding is used.
True and you re-discover what is done in bzrlib.osutils (_fsenc,
get_terminal_encoding and get_user_encoding, but mainly the first
one).
The ftp transport receives paths that have been processed by
higher layers, it should not worry about what the file system
encoding is, the higher layers did.
Kamil> I wonder how the core of bzr deals with this.
By using unicode internally and decoding the paths respecting the
file system encoding when needed (outside the scope or both your
changes and mine).
Kamil> Some people get around such bugs by setting default
Kamil> encoding for unicode<->string conversions by calling
Kamil> os.setdefaultencoding() like complained about here
Kamil> http://tarekziade.wordpress.com/2008/01/08/syssetdefaultencoding-
Kamil> is-evil/
Kamil> I have not made merge proposal yet because I wanted to
Kamil> hear your opinion first and test if it breaks
Kamil> testsuite for myself.
But merge proposals are easier to deal with even at that point as
they present a diff of your changes without the need to drill
down into each revision or grab a local copy of your branch.
They also allows us to discuss there instead of here :)
<snip/>
Kamil> I'm using bazaar with python 2.6.4, I'll try to
Kamil> install medusa or pyftpdlib, run the tests the way you
Kamil> described and report my findings.
If you use 2.6.4 then install pyftpdlib. You can have a look at
medusa but the bugs there are really hard to understand and bzr
will avoid trying to use it (so you'll have to revert the check
in bzrlib/tests/ftp_server/__init__.py first).
>>>>> "Kamil" == Kamil Szot writes:
Kamil> By examining the changes you madebazaar.launchpad.net/~vila/bzr/472161-ftp-utf8/revision/4936 I
Kamil> http://
Kamil> doubt this goes anywhere near fixing my trouble.
I realized the approach is different, that's why I asked you to
test it :)
Sorry for not having commented on your changes more clearly, see
my comments below.
You started on the assumption that the client and the server will
use the same encoding which is less likely to work than using
UTF8 (see RFC2640).
Kamil> FTP servers are usually encoding agnostic.
Not really, they have to obey whatever file system is used
underneath.
Kamil> They treat file names as built of 8-bit characters
Kamil> without any specific encoding. They don't support any
Kamil> encoding explicitly,
Many file systems will refuse to create files with arbitrary
8-bits characters. Using UTF8 as an intermediate representation
guarantees that clients and servers talk about the same paths
(again, have a look at RFC2640).
And some ftp servers will just not support arbitrary 8-bits paths
(medusa for one).
Kamil> Please take a look at my changes. They should fix the
Kamil> problem even if your system uses some other encoding
Kamil> for filenames (for example iso-8859-2).
Your changes only impact the client side, the problem is on the
server side.
RFC2640 says that the clients should use UTF8 and that servers
should handle UTF8 and *then* do what is needed on their side.
If the clients uses iso-8859-1, mac-roman or utf8 and the server
uses iso-8859-2, then using your changes can break but using utf8
should work.
Now, it depends on whether the server is handling utf8.
If it doesn't, then if (but only if) the server and all the
clients use the same fs encoding, your changes will work.
If it does, your changes will break but using utf8 will work.
So the proposed changes may not be the final answer but it's more
likely to work under more configurations.
And that's why I wanted to know if it worked for *you* !
Kamil> The root of my problem seems to come from the fact
Kamil> that python os.path functions appear not to be aware
Kamil> of the fact that OS might use different encodings for
Kamil> file names. They just return string and if it is
Kamil> implicitly converted to unicode at some point then the
Kamil> 'ascii' encoding is used.
True and you re-discover what is done in bzrlib.osutils (_fsenc,encoding and get_user_encoding, but mainly the first
get_terminal_
one).
The ftp transport receives paths that have been processed by
higher layers, it should not worry about what the file system
encoding is, the higher layers did.
Kamil> I wonder how the core of bzr deals with this.
By using unicode internally and decoding the paths respecting the
file system encoding when needed (outside the scope or both your
changes and mine).
Kamil> Some people get around such bugs by setting defaultcoding() like complained about heretarekziade.wordpress.com/2008/01/08/syssetdefaultencoding-
Kamil> encoding for unicode<->string conversions by calling
Kamil> os.setdefaulten
Kamil> http://
Kamil> is-evil/
Kamil> I have not made merge proposal yet because I wanted to
Kamil> hear your opinion first and test if it breaks
Kamil> testsuite for myself.
But merge proposals are easier to deal with even at that point as
they present a diff of your changes without the need to drill
down into each revision or grab a local copy of your branch.
They also allows us to discuss there instead of here :)
<snip/>
Kamil> I'm using bazaar with python 2.6.4, I'll try to
Kamil> install medusa or pyftpdlib, run the tests the way you
Kamil> described and report my findings.
If you use 2.6.4 then install pyftpdlib. You can have a look attests/ftp_server/__init__.py first).
medusa but the bugs there are really hard to understand and bzr
will avoid trying to use it (so you'll have to revert the check
in bzrlib/