unicode support for doc_ids/content buggy and or inconsistent
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
U1DB |
In Progress
|
High
|
Unassigned |
Bug Description
right now we get the following behaviors:
Python 2.7.3 (default, Apr 10 2012, 22:21:37)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import u1db
>>> x = u1db.open('foo.db', create=True)
>>> x.create_doc('{}', doc_id=u"\xab")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xab' in position 9: ordinal not in range(128)
>>> x.create_
Document(ab, 514785fd3a66469
we need to decided what we really want here, don't take unicode, accept (byte)strings that are utf-8... accept ascii only for doc_ids?...
Related branches
- Lucio Torre (community): Approve
- Diff: 0 lines
- Ubuntu One hackers: Pending requested
- Diff: 0 lines
- Lucio Torre (community): Approve
- Diff: 0 lines
- Lucio Torre (community): Approve
- Diff: 0 lines
- John A Meinel (community): Approve
-
Diff: 275 lines (+79/-42)4 files modified.bzrignore (+1/-0)
setup.py (+1/-1)
src/u1db_query.c (+70/-41)
u1db/tests/test_backends.py (+7/-0)
Changed in u1db: | |
status: | New → Confirmed |
importance: | Undecided → High |
Changed in u1db: | |
assignee: | nobody → Eric Casteleijn (thisfred) |
status: | Confirmed → In Progress |
Changed in u1db: | |
status: | Fix Released → Fix Committed |
summary: |
- unicode support for doc_ids/content buggy and or incosistent + unicode support for doc_ids/content buggy and or inconsistent |
Changed in u1db: | |
status: | Fix Committed → In Progress |
Changed in u1db: | |
assignee: | Eric Casteleijn (thisfred) → nobody |
What's the use case for non-ascii doc ids?
I'm tempted to say doc ids should be non-whitespace printable ascii. Anything else will get weird very quickly; even that has some concerns.