JSON loader fails with 'Unpaired high surrogate'
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Meliae |
New
|
Undecided
|
Unassigned |
Bug Description
When loading the attached file, I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/
manager = _load(source, using_json, show_prog, input_size)
File "/usr/lib/
factory=
File "/usr/lib/
yield decoder(factory, line, temp_cache=
File "/usr/lib/
val = simplejson.
File "/usr/lib/
return _default_
File "/usr/lib/
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/
obj, end = self.scan_once(s, idx)
simplejson.
These 2 lines are actual lines from my much bigger heap dump. The string is clearly bogus unicode data, so I'm not sure if this is a problem with the writer when serializing unicode strings, or a problem with the loader in not ignoring the escapes.
The following egregious hack gets me past, but a proper fix would be welcome:
def _from_json(cls, line, temp_cache=None):
try:
val = simplejson.
except:
print "Could not parse %s" % line
val = simplejson.
# etc ....
Seems this is the root cause: http:// bugs.python. org/issue11489
e.g. this script fails:
#!/usr/bin/python
import simplejson
import binascii
origstr = binascii. unhexlify( 'EDA588C2A36C6C 6F') dumps(origstr)
z = simplejson.
print z
y = simplejson.loads(z)
print y