I did some investigation on the attached images as well as on a compressed tarball (a 7 MB .tar.xz archive) uploaded to Launchpad using poc.py with python2 and python3.
First of all, the binary size is different (following is using python3 REPL, kyo2 and kyo3 being the binary data from the images uploaded with python2 and python3 libs, respectively):
The same issue happens with the .tar.xz binary except it's shifted in more than one place (probably because it's much bigger than the 83 kb image file)
I did some investigation on the attached images as well as on a compressed tarball (a 7 MB .tar.xz archive) uploaded to Launchpad using poc.py with python2 and python3.
First of all, the binary size is different (following is using python3 REPL, kyo2 and kyo3 being the binary data from the images uploaded with python2 and python3 libs, respectively):
kyo2 = open('/ tmp/kyo2. jpg', 'rb').read() tmp/kyo3. jpg', 'rb').read()
kyo3 = open('/
len(kyo2)
83703
len(kyo3)
83702
The difference boils down to a different representation of CR:
kyo2.count(b'\r')
355
kyo2.count(b'\n')
334
kyo3.count(b'\r')
0
kyo3.count(b'\n')
688
→ 355+334 = 689, one byte more than the number of \n in the image uploaded using python3.
The "missing byte" seems to appear in byte 41177:
for i in range(41177-5, 41177+10): '{}\t{} \t{}'.format( i, kyo2[i:i+1], kyo3[i:i+1]))
print(
41172 b'\xfc' b'\xfc'
41173 b'\xc2' b'\xc2'
41174 b'\xb5' b'\xb5'
41175 b'\x81' b'\x81'
41176 b'\r' b'\n'
41177 b'\n' b'\xc3' ← from this byte, everything is shifted by one byte
41178 b'\xc3' b'\xd1'
41179 b'\xd1' b'B'
41180 b'B' b'\x92'
41181 b'\x92' b'r'
41182 b'r' b','
41183 b',' b'<'
41184 b'<' b'\xac'
41185 b'\xac' b'n'
41186 b'n' b'N'
(...)
The same issue happens with the .tar.xz binary except it's shifted in more than one place (probably because it's much bigger than the 83 kb image file)