Comment 48 for bug 623438

Revision history for this message
Jakub Wilk (jwilk) wrote : Re: [Bug 623438] Re: Font size not correct in merged sandvich PDF

>I find the specification somewhat difficult to interpret at times but
>it is my understanding that character bbox info goes within the
>ocr_line tag element. whether it goes before or after the textual
>elements is irrelevant. E.g.
> <span class='ocr_line' id='line_18' title="bbox 363 1253 581 1289">
> <b>BYGGNADER </b>
> <span class='ocr_cinfo' title="x_bboxes 363 1253 382 1279 383 1254 407 1281 409 1255 431 1283 434 1256 458 1284 460 1258 485 1285 486 1260 511 1286 514 1261 538 1287 541 1260 560 1289 561 1261 581 1289 -1 -1 -1 -1 ">
> </span>

Apart from not being valid HTML, this doesn't make sense. And this was
already pointed out a year ago(!):
https://lists.launchpad.net/cuneiform/msg00450.html

>and
> <span class='ocr_line' id='line_18' title="bbox 363 1253 581 1289">
> <span class='ocr_cinfo' title="x_bboxes 363 1253 382 1279 383 1254 407 1281 409 1255 431 1283 434 1256 458 1284 460 1258 485 1285 486 1260 511 1286 514 1261 538 1287 541 1260 560 1289 561 1261 581 1289 -1 -1 -1 -1 ">
> <b>BYGGNADER </b>
> </span>
>are equally correct, it is the association to the correct line which matters.

If you don't close <span> it's not even a valid HTML...

--
Jakub Wilk