I ran into a bug that causes lxml to truncate the output when using "tostring" with encoding set to "utf8", while it works correctly when encoding is set to "utf-8". Running the attached example file produces the following output for me:
Bad:
b'<record><datafield tag="520" ind1=" " ind2=" "><subfield code="9">APS</subfield><subfield code="a">The first measurement of the dependence of <math display="inline"><mrow><mi>\xce\xb3</mi><mi>\xce\xb3</mi><mo stretchy="false">\xe2\x86\x92</mo><msup><mrow><mi>\xce\xbc</mi></mrow><mrow><mo>+</mo></mrow></msup><msup><mrow><mi>\xce\xbc</mi></mrow><mrow><mo>\xe2\x88\x92</mo></mrow></msup></mrow></math> production on the multiplicity of neutrons emitted very close to the beam direction in ultraperipheral heavy ion collisions is reported. Data for lead-lead interactions at <math display="inline"><mrow><msqrt><mrow><msub><mrow><mi>s</mi></mrow><mrow><mi>N</mi><mi>N</mi></mrow></msub></mrow></msqrt><mo>=</mo><mn>5.02</mn><mtext>\xe2\x80\x89</mtext><mtext>\xe2\x80\x89</mtext><mi>TeV</mi></mrow></math>, with an integrated luminosity of approximately <math display="inline"><mrow><mn>1.5</mn><mtext>\xe2\x80\x89</mtext><mtext>\xe2\x80\x89</mtext><msup><mrow><mi>nb</mi></mrow><mrow><mo>-</mo><mn>1</mn></mrow></msup></mrow></math>, are collected using the CMS detector at the LHC. The azimuthal correlations between the two muons in the invariant mass region <math display="inline"><mrow><mn>8</mn><mo>&lt;</mo><msub><mrow><mi>m</mi></mrow><mrow><mi>\xce\xbc</mi><mi>\xce\xbc</mi></mrow></msub><mo>&lt;</mo><mn>60</mn><mtext>\xe2\x80\x89</mtext><mtext>\xe2\x80\x89</mtext><mi>GeV</mi></mrow></math> are extracted for events including 0, 1, or at least 2 neutrons detected in the forward pseudorapidity range <math display="inline"><mrow><mrow><mo stretchy="false">|</mo><mi>\xce\xb7</mi><mo stretchy="false">|</mo></mrow><mo>&gt;</mo><mn>8.3</mn></mrow></math>. The back-to-back correlation structure from leading-order photon-photon scattering is found to be significantly broader for events with a larger number of emitted neutrons from each nucleus, corresponding to interactions with a smaller impact parameter. This observation provides a data-driven demonstration that the average transverse momentum of photons emitted from relativistic heavy ions has an impact parameter dependence. These results provide new constraints on models of photon-induced interactions in ultraperipheral collisions. They also provide a baseline to search for possible final-state effects on lepton pairs caused by traversing a quark-gluon plasma produced in hadronic heavy ion collisions.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="9">arXiv</subfield><subfield code="a">The first measurement of the dependence of $\\gamma\\gamma$$\\to$$\\mu^{+}\\mu^{-}$ production on the multiplicity of neutrons emitted very close to the beam direction in ultraperipheral heavy ion collisions is reported. Data for lead-lead interactions at $\\sqrt{s_\\mathrm{NN}} =$ 5.02 TeV, with an integrated luminosity of approximately 1.5 nb$^{-1}$, were collected using the CMS detector at the LHC. The azimuthal correlations between the two muons in the invariant mass region 8 $\\lt$$m_{\\mu\\mu}$$\\lt$ 60 GeV are extracted for events including 0, 1, or at least 2 neutrons detected in the forward pseudorapidity range $|\\eta|$$\\gt$ 8.3. The back-to-back correlation structure from leading-order photon-photon scattering is found to be significantly broader for events with a larger number of emitted neutrons from each nucleus, corresponding to interactions with a smaller impact parameter. This observation provides a data-driven demonstrat</subfield></datafield></record>'
Good:
b'<record><datafield tag="520" ind1=" " ind2=" "><subfield code="9">APS</subfield><subfield code="a">The first measurement of the dependence of <math display="inline"><mrow><mi>\xce\xb3</mi><mi>\xce\xb3</mi><mo stretchy="false">\xe2\x86\x92</mo><msup><mrow><mi>\xce\xbc</mi></mrow><mrow><mo>+</mo></mrow></msup><msup><mrow><mi>\xce\xbc</mi></mrow><mrow><mo>\xe2\x88\x92</mo></mrow></msup></mrow></math> production on the multiplicity of neutrons emitted very close to the beam direction in ultraperipheral heavy ion collisions is reported. Data for lead-lead interactions at <math display="inline"><mrow><msqrt><mrow><msub><mrow><mi>s</mi></mrow><mrow><mi>N</mi><mi>N</mi></mrow></msub></mrow></msqrt><mo>=</mo><mn>5.02</mn><mtext>\xe2\x80\x89</mtext><mtext>\xe2\x80\x89</mtext><mi>TeV</mi></mrow></math>, with an integrated luminosity of approximately <math display="inline"><mrow><mn>1.5</mn><mtext>\xe2\x80\x89</mtext><mtext>\xe2\x80\x89</mtext><msup><mrow><mi>nb</mi></mrow><mrow><mo>-</mo><mn>1</mn></mrow></msup></mrow></math>, are collected using the CMS detector at the LHC. The azimuthal correlations between the two muons in the invariant mass region <math display="inline"><mrow><mn>8</mn><mo>&lt;</mo><msub><mrow><mi>m</mi></mrow><mrow><mi>\xce\xbc</mi><mi>\xce\xbc</mi></mrow></msub><mo>&lt;</mo><mn>60</mn><mtext>\xe2\x80\x89</mtext><mtext>\xe2\x80\x89</mtext><mi>GeV</mi></mrow></math> are extracted for events including 0, 1, or at least 2 neutrons detected in the forward pseudorapidity range <math display="inline"><mrow><mrow><mo stretchy="false">|</mo><mi>\xce\xb7</mi><mo stretchy="false">|</mo></mrow><mo>&gt;</mo><mn>8.3</mn></mrow></math>. The back-to-back correlation structure from leading-order photon-photon scattering is found to be significantly broader for events with a larger number of emitted neutrons from each nucleus, corresponding to interactions with a smaller impact parameter. This observation provides a data-driven demonstration that the average transverse momentum of photons emitted from relativistic heavy ions has an impact parameter dependence. These results provide new constraints on models of photon-induced interactions in ultraperipheral collisions. They also provide a baseline to search for possible final-state effects on lepton pairs caused by traversing a quark-gluon plasma produced in hadronic heavy ion collisions.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="9">arXiv</subfield><subfield code="a">The first measurement of the dependence of $\\gamma\\gamma$$\\to$$\\mu^{+}\\mu^{-}$ production on the multiplicity of neutrons emitted very close to the beam direction in ultraperipheral heavy ion collisions is reported. Data for lead-lead interactions at $\\sqrt{s_\\mathrm{NN}} =$ 5.02 TeV, with an integrated luminosity of approximately 1.5 nb$^{-1}$, were collected using the CMS detector at the LHC. The azimuthal correlations between the two muons in the invariant mass region 8 $\\lt$$m_{\\mu\\mu}$$\\lt$ 60 GeV are extracted for events including 0, 1, or at least 2 neutrons detected in the forward pseudorapidity range $|\\eta|$$\\gt$ 8.3. The back-to-back correlation structure from leading-order photon-photon scattering is found to be significantly broader for events with a larger number of emitted neutrons from each nucleus, corresponding to interactions with a smaller impact parameter. This observation provides a data-driven demonstration that the average transverse momentum of photons emitted from relativistic heavy ions has an impact parameter dependence. These results provide new constraints on models of photon-induced interactions in ultraperipheral collisions. They also provide a baseline to search for possible final-state effects on lepton pairs caused by traversing a quark-gluon plasma produced in hadronic heavy ion collisions.</subfield></datafield></record>'
As you can see, the ouput of the last subfield is truncated in the first case.
Required information:
Python : sys.version_info(major=3, minor=9, micro=2, releaselevel='final', serial=0)
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 10)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)
Further testing show that this affects Debian, it works correctly when using the binary wheel on the same system, so I've reported the bug against the Debian package.