Fuzzy Search allows for near matches in both Search Terms and Search text.
This article describes the approach I take throughout my book “PDF Expert” to first explain the rich functionality that is built into the PDF format, and then demonstrate how to exercise that potential in Kofax Power PDF. This is an overview of that approach applied in Chapter 14: “Search – Simple to Sophisticated.”
Please note that while this article focuses on the Search functions themselves, one of the most powerful applications for this functionality is used with Search and Redact, where actions are applied to the Search Results.
In order of simplest to most advanced, these are the ten options you have to create PDF documents with Kofax Power PDF:
- Search Current
To search within the open document from the Home ribbon, a pulldown menu offers the two basic options to Match Case and Match whole word only to offer basic precision. You can specify to include Bookmarks and Comments in the search, which are separate from the body text. You also have the option to do a Fuzzy Search which provides the ability to find approximate matches to your target term. This is very useful when you are unsure of the spelling and is most useful when applied to Searchable PDF where the Hidden Text Layer may include misrecognized letters and words.
- Search Multiple
To search many documents at once, Search Multiple offers a pulldown to choose where to Look In, including PDF Packages or Portfolios, Folders, or Indexes. This option includes all the choices described in Search Current, with an additional option to include Stemming to find variants such as “Search,” “Searching,” “Searchers” and so on. Additional flexibility and precision is available through the options described below.
- Search for Single or Multiple Words
In addition to the obvious enhancement of being able to search for multiple words or even phrases, Search Multiple pops open a simple menu to add multiple entries. These can then be saved and Exported for re-use in future searches, which is very helpful when working on new documents that involve related information and contain the same terms.
- Pattern Search
Standard Search patterns include common items: Social Security, Phone and Credit Card Numbers, Emails and Dates. These patterns have been available in PDF editors for years. Power PDF introduced very significant enhancement of this functionality about 7 or 8 versions ago. Custom Patterns employ Wild Card characters to define any combination of alpha, numeric or keyboard characters. This provides flexibility and precision to find unique patterns such as serial numbers, customer ID’s or insurance codes.
- Fuzzy Search
This is another feature unique to Power PDF among PDF editors. This is often referred to as “web like” search because we are all familiar with online search engines that present results that approximately match the entered search terms, often with a hint such as “do you want to search for only XYZ?” to let the user know that the Search Results contain additional terms that you didn’t enter but might be relevant. Fuzzy Search is handy when you might not know how to spell the term such as a name you are looking for. But as mentioned above, Fuzzy Search is extremely valuable when you are perusing Searchable PDF documents that might contain invisible OCR errors in the Hidden Text Layer.
- Logical and Boolean Operators
These operators have what can be thought of as the opposite effect of Fuzzy Search. While Fuzzy widens your search to find approximate matches, Logical and Boolean Operators allow to much more finely tune your search to deliver a precise set of results. Briefly, Logical Operators include AND, OR, NOT specificity for search terms. Boolean Operators allow you to combine Logical Operators into a search term, just like algebra. To use these advanced features, you need to be working within a pre-built Search Index.
- Search with Document Properties
There are two classes of Document Properties for every PDF file. One class is automatically generated including Important Dates – Created and Last Modified. Other properties are referred to as the Doc Info fields, including File, Title, Author, Subject, Keywords. These can either be manually entered or captured from the source file when created from applications like MS Word. The highly advanced precision offered by these fields in Boolean Search comes from the fine focus of searching within multiple advanced criteria. A simple example would be searching a stack of invoices, which share many of the same contents, fields, and criteria. Boolean Search with Doc Info Criteria allows you to find invoices by dates, including Before, After or Equal To.
- Create Search Index
This is the feature that allows you gather a large group of PDF files, including entire Folders and Subfolders, and render them as a self-contained, portable Document Management System, all within your PDF Editor! The combination of the rich architecture built into the PDF Format and the Index and Search functionality of Power PDF gives you tremendous document management capability.
I cover all these features and demonstrate how easy it is to exercise this power functionality, including 14 helpful screenshots in Chapter 14 of “PDF Expert – Master PDF and OCR.”
PDF Expert – Master PDF and OCR
Copyright © 2023 Tony McKinley. All rights reserved.
Email: amckinley1@verizon.net