Pdf search doesn't find strings running over lines

Bug #376073 reported by markling
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Poppler
Unknown
Wishlist
poppler (Ubuntu)
Triaged
Low
Unassigned

Bug Description

Binary package hint: evince

E.g. in the following pdf:

http://www.ico.gov.uk/upload/documents/library/data_protection/detailed_specialist_guides/review_of_eu_dp_directive.pdf

On page 33, paragraph 6:

The string "13 years" runs over the first and onto the second line of this paragraph.

Searching for "13 years" finds no results.

Searching for "13" will find the incidence of this string on this page.

o/s:- Jaunty
App:- Evince 2.26.1 Using poppler 0.10.5 (cairo)

Tags: likely-dup
Revision history for this message
In , Phil-ganchev (phil-ganchev) wrote :

Hyphens should also be treated as whitespace.

It should also be optionally diacritic insensitive: http://bugzilla.gnome.org/show_bug.cgi?id=418189

Perhaps this should be in the same option as whitespace insensitive?

Revision history for this message
In , Fk-cogitatio (fk-cogitatio) wrote :

Actually hyphens should be ignored when searching text. and whitespaces after hyphens aswell. Otherwise you will not find a match on hyphenated words.

Revision history for this message
markling (markling) wrote :

Binary package hint: evince

E.g. in the following pdf:

http://www.ico.gov.uk/upload/documents/library/data_protection/detailed_specialist_guides/review_of_eu_dp_directive.pdf

On page 33, paragraph 6:

The string "13 years" runs over the first and onto the second line of this paragraph.

Searching for "13 years" finds no results.

Searching for "13" will find the incidence of this string on this page.

o/s:- Jaunty
App:- Evince 2.26.1 Using poppler 0.10.5 (cairo)

Revision history for this message
Pedro Villavicencio (pedro) wrote :

works fine with acrobat it seems it doesn't search for words which are compose with another one on a second line.

Changed in evince (Ubuntu):
importance: Undecided → Low
affects: evince (Ubuntu) → poppler (Ubuntu)
affects: poppler (Ubuntu) → evince (Ubuntu)
Changed in evince (Ubuntu):
assignee: nobody → Ubuntu Desktop Bugs (desktop-bugs)
status: New → Confirmed
tags: added: likely-dup
Revision history for this message
Pedro Villavicencio (pedro) wrote :

it was indeed a poppler issue and known upstream: http://bugs.freedesktop.org/show_bug.cgi?id=11381

affects: evince (Ubuntu) → poppler (Ubuntu)
Changed in poppler (Ubuntu):
status: Confirmed → Triaged
Changed in poppler:
status: Unknown → Confirmed
Changed in poppler:
importance: Unknown → Wishlist
Revision history for this message
markling (markling) wrote :

This should be high priority.

As it stands, the Ubuntu Document viewer cannot be trusted. The search function does that work.

How many professions can you think of where it is crucial to know whether a string occurs in a document or not?

Law, journalism, health, security, law enforcement.

Need I go on?

Can anyone trust to use the search in Ubuntu's document viewer? Of course they can't they shall have to manually read a document before they can establish whether the text they are looking for is there or not.

Now consider an instance where one of these professionals has to establish that a string DOES NOT appear in a document. These documents can stretch to hundreds of pages. Are you going to sit down and search through manually and trust your judgment when you need to establish the absence of something as a matter of life and death, of national security, of legal importance? Of course you're not. You are going to get another piece of software to do it, aren't you?

Changed in poppler:
importance: Wishlist → Unknown
Changed in poppler:
importance: Unknown → Wishlist
Revision history for this message
markling (markling) wrote :

This fix should be high priority.

It can be hard to prove a negative.
The same goes for documents.
I.e. Professional people who use .pdf readers sometimes have a need to prove something does not exist in a document.
They can do this by searching for the thing they need to disprove.

But if the search does not work properly because it can't spot strings that fall over lines, and they don't know that this fault exists, they will end up making decisions based on faulty information given them by the software.

Revision history for this message
markling (markling) wrote :

This error is still effecting me. It has been this way for four years.

I have a pdf open in Okular. If I search for a term with two words, Okular will not find it if the term runs over a line. So you have to remember to search for either of the words in the term, and lug past all those matches where the single word was found but the term wasn't there. Linux pdf search can't handle two words. It's a joke.

Revision history for this message
Fredrik Wendt (fredrik-wendt) wrote :

Evince uses poppler for the searching, and the bug agains poppler is reported here https://bugs.freedesktop.org/show_bug.cgi?id=61104

Revision history for this message
In , Germán Poo-Caamaño (gpoo) wrote :

*** Bug 61104 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Fredrik Wendt (fredrik-wendt) wrote :

I assume that https://bugzilla.gnome.org/show_bug.cgi?id=652909 is related too - "find" doesn't match when the text is "small caps".

Changed in poppler (Ubuntu):
assignee: Ubuntu Desktop Bugs (desktop-bugs) → nobody
Revision history for this message
In , Germán Poo-Caamaño (gpoo) wrote :

*** Bug 9648 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Gitlab-migration (gitlab-migration) wrote :

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/56.

Changed in poppler:
status: Confirmed → Unknown
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.