Comment #13 : Bug #472161 : Bugs : Bazaar

Comment 13 for bug 472161

Revision history for this message
Vincent Ladeuil (vila) wrote : Re: [Bug 472161] Re: Can't upload non-ascii URLs

>>>>> "Kamil" == Kamil Szot writes:

    Kamil> By examining the changes you made
    Kamil> http://bazaar.launchpad.net/~vila/bzr/472161-ftp-utf8/revision/4936 I
    Kamil> doubt this goes anywhere near fixing my trouble.

I realized the approach is different, that's why I asked you to
test it :)

Sorry for not having commented on your changes more clearly, see
my comments below.

You started on the assumption that the client and the server will
use the same encoding which is less likely to work than using
UTF8 (see RFC2640).

    Kamil> FTP servers are usually encoding agnostic.

Not really, they have to obey whatever file system is used
underneath.

    Kamil> They treat file names as built of 8-bit characters
    Kamil> without any specific encoding. They don't support any
    Kamil> encoding explicitly,

Many file systems will refuse to create files with arbitrary
8-bits characters. Using UTF8 as an intermediate representation
guarantees that clients and servers talk about the same paths
(again, have a look at RFC2640).

And some ftp servers will just not support arbitrary 8-bits paths
(medusa for one).

    Kamil> Please take a look at my changes. They should fix the
    Kamil> problem even if your system uses some other encoding
    Kamil> for filenames (for example iso-8859-2).

Your changes only impact the client side, the problem is on the
server side.

RFC2640 says that the clients should use UTF8 and that servers
should handle UTF8 and *then* do what is needed on their side.

If the clients uses iso-8859-1, mac-roman or utf8 and the server
uses iso-8859-2, then using your changes can break but using utf8
should work.

Now, it depends on whether the server is handling utf8.

If it doesn't, then if (but only if) the server and all the
clients use the same fs encoding, your changes will work.

If it does, your changes will break but using utf8 will work.

So the proposed changes may not be the final answer but it's more
likely to work under more configurations.

And that's why I wanted to know if it worked for *you* !

    Kamil> The root of my problem seems to come from the fact
    Kamil> that python os.path functions appear not to be aware
    Kamil> of the fact that OS might use different encodings for
    Kamil> file names. They just return string and if it is
    Kamil> implicitly converted to unicode at some point then the
    Kamil> 'ascii' encoding is used.

True and you re-discover what is done in bzrlib.osutils (_fsenc,
get_terminal_encoding and get_user_encoding, but mainly the first
one).

The ftp transport receives paths that have been processed by
higher layers, it should not worry about what the file system
encoding is, the higher layers did.

    Kamil> I wonder how the core of bzr deals with this.

By using unicode internally and decoding the paths respecting the
file system encoding when needed (outside the scope or both your
changes and mine).

    Kamil> Some people get around such bugs by setting default
    Kamil> encoding for unicode<->string conversions by calling
    Kamil> os.setdefaultencoding() like complained about here
    Kamil> http://tarekziade.wordpress.com/2008/01/08/syssetdefaultencoding-
    Kamil> is-evil/

    Kamil> I have not made merge proposal yet because I wanted to
    Kamil> hear your opinion first and test if it breaks
    Kamil> testsuite for myself.

But merge proposals are easier to deal with even at that point as
they present a diff of your changes without the need to drill
down into each revision or grab a local copy of your branch.

They also allows us to discuss there instead of here :)

<snip/>

    Kamil> I'm using bazaar with python 2.6.4, I'll try to
    Kamil> install medusa or pyftpdlib, run the tests the way you
    Kamil> described and report my findings.

If you use 2.6.4 then install pyftpdlib. You can have a look at
medusa but the bugs there are really hard to understand and bzr
will avoid trying to use it (so you'll have to revert the check
in bzrlib/tests/ftp_server/__init__.py first).