Generating PDF Links by Text Search with AutoBookmark plug-in
- Introduction
- This tutorial shows how to automatically generate PDF links by text search using the AutoBookmark™ plug-in for the Adobe® Acrobat®. The "Link by Search" operation finds text that matches a user-specified text pattern and adds links based on a user-specified action description. The regular expressions are used to define text patterns. "Regular expression" is a widely used syntax for describing text patterns and performing powerful search operations.
- Prerequisites
- You need a copy of Adobe® Acrobat® along with the AutoBookmark™ plug-in installed on your computer in order to use this tutorial. You can download trial versions of both the Adobe® Acrobat® and the AutoBookmark™ plug-in.
- Contents:
- Overview of the Linking Methods
- What Kind of PDF Links Can Be Created?
- Getting Started
- Creating PDF Links to a Single Web Page
- Creating PDF Links to Different Web Pages (Online Catalog Example)
- Creating PDF Links to Pages (Using Page Numbers)
- Creating PDF Links to Pages (Using Page Labels)
- Creating PDF Links to Pages With a Page Offset
- Creating PDF Links to Named Destinations
- Creating PDF Links to External Files
- Creating PDF Links to Items Lists with Multiple Rules
- Using Search Context to Add Links to Page Ranges and Lists
- Selecting Link Appearance and Other Settings
- Batch Processing with Action Wizard
- Overview of the Linking by Text Search ↑overview
- The AutoBookmark software provides two similar "Generate Link By Text Search" operations. The first one uses only a single rule to perform linking, while the second one provides a way to define multiple rules. Both operations search PDF documents for text that matches the user-defined pattern(s) and add interactive links to the to the matching text.
- Use single-rule linking (available via "Plug-ins > Links > Generate Links > Generate Links By Text Search (Single Rule)" menu) for the simple cases when a single search expression is enough. It provides a simpler user-interface that does not have to deal with creating and managing multiple rules.
- Use multiple-rule linking (available via "Plug-ins > Links > Generate Links > Generate Links By Text Search (Multiple Rules)" menu) for the complex cases when just one search expression cannot do the job.
- Both methods share the same foundation and provide the same interface for defining the linking rules.
- What kind of PDF Links Can Be Created? ↑overview
- The "Generate Links By Text Search" operation can create the following PDF link types:
-
- Go to a page view
- Go to a page view in another document
- Go to a page view in PDF attachment
- Open a file
- Open a weblink
- Execute JavaScript code
- Execute a menu
- Getting Started: ↑overview
-
- Start Adobe Acrobat.
- Open a PDF document that needs to be linked.
- Select "Plug-ins > Links > Generate Links > Generate Links By Text Search (Single Rule)" from the menu.
- Configure the processing parameters and press OK to generate links.
- The "Create Links By Text Search" dialog provides control over all aspects of the linking operation. User needs to specify a text pattern to search for, define a linking action, and optionally provide link appearance and other settings.
- The "Find text pattern" and "Link action" parameters fully control what text will be linked using what link action. The "Find text pattern" text defines a search pattern (regular expression) that is used to search for a text that needs to get a link. A regular expression is a special text string for describing a search pattern. Regular expressions are widely used by most text processing applications and tools. The "Link action" field determines what happens when a reader clicks on a link.
- Common Link Generation Scenarios ↑overview
- Adding Links To a Single Web Page ↑overview
- The following examples illustrates how to automatically add links to a single web page. For example, link every occurence of "U.S. Department of Justice" text to "https://www.justice.gov" web address:
-
Find text pattern: U.S. Department of Justice
Link action: URI:https://www.justice.gov - Click "OK" once done.
- The dialog will show the total number of links created. Click "OK".
- The plug-in automatically adds links to the user-specified web-address from all occurrences of the matched text:
- Adding Web Links To Different Web Pages (Online Catalog Example) ↑overview
- It is a common task to generate web links to online "product" pages based on some kind product index or SKU. For example, if website www.mystore.com has a number of product pages that have SKU as a part of the URL, then it is possible to automatically add links to SKUs in the PDF document. Let's assume that SKU is formatted as a single letter that is followed by a 10 digit number (A1234567890 or F1392493420). There are corresponding web pages with the following URLs: www.mystore.com/A1234567890 and www.mystore.com/ F1392493420. Use the following settings to automatically add web links to every occurrence of SKU in the PDF document:
-
Specify "Find text pattern": ([A-Z]\d{10})
Link action: URI:http://www.mystore.com/\1 - Click "OK" once done.
- The dialog will show the total number of links created. Click "OK".
- The "Find text pattern" expression will match a single letter ([A-Z]) that is followed by a 10 digit number (\d{10}). The whole expression is enclosed in round brackets ( ) to allow referring back to the search results in the link action. The "Link Action" starts with URI: prefix. It indicates an "Open a web link" action. It is followed by a static portion of the web address: http://www.mystore.com/ which is always the same for all links. Next, it is followed by the search results returned by the first matching group (expression enclosed with round brackets) \1. If text search has located F1392493420 SKU in the text, it will add a web link with the following URL: http://www.mystore.com/F1392493420.
- Adding Links To Pages (Using Page Numbers) ↑overview
- Quite often it is necessary to add a page link to "see page N" text, where N is a page number. Use the following expression to accomplish that. Page number is expected to be just a series of digits. This is a basic example that deals only with a single page reference. See advanced tutorial that shows how to add multiple links to page ranges and page lists.
-
Find text pattern: see page (\d+)
Link action: \1 - Click "OK" once done.
- The dialog will show the total number of links created. Click "OK".
- The \d+ in "Find text pattern" is a regular expression for matching one or more digits. The \1 in "Link action" refers to a text that is matched by the first capturing group (\d+) in the "Find text pattern" expression. A capturing group is a set of characters enclosed by brackets. For example, when this operation encounters "see page 3" text string, it will use "3" for a link action. This is a most basic link action that refers to a page number in the current PDF document.
- If you want to add link only to the page number, then use the following settings:
-
Find text pattern: see page \K(\d+)
Link action: \1 - The link will be added only to the text that is matched to the right of the \K, while everything to left of it will be excluded. The \K keyword is available starting with AutoBookmark version 6.12. If you are using an older version, then use the following "Find text pattern" that will produce the same results: (?<=see pages )(\d+).
- Adding Links To Pages (Using Page Labels) ↑overview
- PDF document may have logical page labels assigned to pages in the "Thumbnails" panel. The page labels can be used in the page selector and provide a more flexible way to page numbering in PDF documents. Page labels can use any combination of characters, symbols and digits. The following example shows how to add page links to "see page A-14" or "see page B-17" text references.
-
Find text pattern: see page ([A-Z][-]\d+)
Link action: plabel:\1 - Click "OK" once done.
- The dialog will show the total number of links created. Click "OK".
- The [A-Z][-]\d+ in "Find text pattern" is a regular expression for matching a single letter (any letter from A to Z) that is followed by a dash and then by one or more digits. For example: A-14, B-17 and etc. The plabel: keyword indicates that the text that follows it is actually a page label reference (\1). The \1 refers to matching text from a first capturing group ([A-Z][-]\d+). See the following tutorials on assigning page labels.
- Adding Links To Pages With a Page Offset ↑overview
- It is often necessary to add a page link to "see page N" text with a page shift. Page shift is a difference between a number in the text and a "physical" page number in the PDF document. Sometimes, printed page numbers do not match "physical" page numbers in a PDF file. For example, "see page 3" needs to point to page 13 in the PDF document. This means that we cannot directly use a page number extracted from a text to refer to a specific page. The page offset comes to the rescue. Page offset can be either a positive or negative integer that is added to the page number specified in a "Link action" string before creating a page link. If you need to specify a negative page offset then type +-10. The example below assumes that there is an offset of 10 pages between printed and physical page numbers.
-
Find text pattern: see page (\d+)
Link action: \1, +10 - Click "OK" once done.
- The dialog will show the total number of links created. Click "OK".
- The \d+ in the "Find text pattern" is a regular expression for matching one or more digits. The \1 in the "Link action" refers to a text that is matched by a first sub-pattern (\d+) in the "Find text pattern" expression, +10 is appended to the end of a link action description. For example, when "see page 3" text string is encountered, a link to page 13 is added and etc.
- Adding Links To Named Destinations ↑overview
- This example shows how to add links to "see page XX" text, where XX is a reference to a named destination. The "named destination" or just "destination" is a named page view that can be created in the "Destinations" panel. The following example assumes that PDF document has named destinations and document text contains text such as "see page AB234" or "see page 254A", and where AB234 and 254A do actually exist as destinations in the current PDF document.
-
Find text pattern: see page (\w+)
Link action: @Page\1
- Click "OK" once done.
- The dialog will show the total number of links created. Click "OK".
- The \w+ in "Find text pattern" is a regular expression for matching one or more "word" characters (letter, digit or underscore). It matches any alphanumeric page number that contains only letters or digits (such as A1, D23 and etc.) The \1 in the "Link action" refers to a text that is matched by a first capture group (\w+) in the "Find text pattern" expression. A capture group is a set of characters enclosed by brackets. For example, when this operation encounters "see page A53" text, then it will use @PageA53 for a link action. This is a link action that refers to a named destination PageA53. The @ symbol is used to designate named destinations.
- Adding Links to External Files ↑overview
- For example, use the folliwing parameters to add file links to every occurence of the "catalog XXXXXX" text, where XXXXX represents a full or partial name of the corresponding file.
-
Find text pattern: catalog (\w+)
Link action: file://\1.pdf
- Click "OK" once done.
- The dialog will show the total number of links created. Click "OK".
- The \w+ in "Find text pattern" is a regular expression for matching one or more "word" characters (letter, digit or underscore). This expression finds all occurences of word "catalog" that is followed by an alphanumeric name (may contain letters, digits and underscore only). The \1 in "Link action" refers to a text that is matched by a first sub-pattern (\w+) in the "Find text pattern" expression. A sub-pattern is a set of characters enclosed by brackets. For example, when this operation encounters "catalog B200" text string, it will use "file://B200.pdf" as a link action to create "Open a file" PDF action. This link action refers to a file B200.pdf located in the same folder as a current PDF document.
- Important: Always save file to its intended location prior to using this operation to make sure all links are pointing to a correct folder.
- Creating PDF Links to Item Lists with Multiple Rules ↑overview
- This example shows how to use "Plug-ins > Links > Generate Links > Generate Links by Text Search (Multiple Rules)" menu. The "Multiple Rules" operation is essentially a collection of single rules that we covered in the previous examples.
- One of the common linking tasks is to add separate links to the multiple items that appear as a list after a specific keyword. For example, each page reference in the following example needs to get its own link to a page in the corresponding external PDF file. The name of the file is also derived from the document's text:
- It is not possible to create such context-depended links with just a single rule. This is because we need to have a separate rule for each item in the list in order to establish a reference to the correct external file (A.pdf or B.pdf). For example, if there can be up to 3 items in the list, then we need to have 3 rules in order to cover each possible case. One way to accomplish linking of the variable-length list items is with help of the \K regular expression keyword. Here are the 3 rules that we are going to use to link each page number to a correct document and page. We assume that there is a corresponding PDF file for each appendix in this example. "Appendix A" refers to A.pdf, "Appendix B" refers to B.pdf and so on.
- RULE 1: This rule adds a link to the first item in the page reference list that follows "Appendix X:" (where X is any letter):
-
Find text pattern: Appendix ([A-Z]): \K(\d+)
Link action: \2,file://\1.pdf
- Explanation:
- Note that the "find text pattern" has two capture groups: one for the appendix letter ([A-Z]) and one for the page number (\d+). They are referred in the link action as \1 and \2. Capture groups are used to refer to the specific part within the matched text. The link action is composed with a page number (\2) that is followed by file:// keyword (defines a reference to the external file), and then followed by the appendix letter and “.pdf” file extension. For example, for the “Appendix A: 9”, the link action is going to be: 9,file://A.pdf. The \K is used to drop all matched text to the left of the page reference. This is necessary to limit the link only to the page number and exclude Appendix A: from being included into the link. This is a key part of the text pattern that makes linking of list items possible.
- RULE 2: This rule adds a link to the second items in the page reference lists. The approach is the same as for the first rule:
-
Find text pattern: Appendix ([A-Z]): \d+, \K(\d+)
Link action: \2,file://\1.pdf
- RULE 3: This rule adds a link to the third items in the page reference lists while skipping previous two items:
Find text pattern: Appendix ([A-Z]): \d+, \d+, \K(\d+)
Link action: \2,file://\1.pdf
- Note how the \K keyword is used in all three rules to exclude all text to the left of the page reference from the link. It is positioned immideately before the page number capturing group (\d+). Important: this feature is available starting with AutoBookmark version 6.12 (March 24th, 2019).
- Here is a screenshot of the "Multiple Rules" dialog screen after adding all 3 linking rules. Use "Add Link Rule..." button to add new rules:
- It is a good idea to save the rules into a settings file for later reuse. Use "Save Rules to File" button to do that.
- Press "OK" button to start linking after adding all necessary rules. Note that if you need to re-run linking multiple times, then you would need to delete existing links first. You can do that via "Plug-ins > Links > Delete All" menu. The new links are not added to the text that is already covered by a link.
- Using "Search Context" to Add Links to Page Ranges and Page Lists ↑overview
- There is a very powerful feature called “search context” that makes simple many linking tasks that otherwise require writing a long list of rules. It makes this possible by establishing a two-stage text search. First, a “search context” regular expression is used to find text that matches a specific text pattern. This establishes a search context for the actual linking rule. Linking rule is applied only on the text that matches the “search context” pattern and ignores the rest of the text. This means that linking search expression is significantly simplified, because it does not have to check for any additional conditions that are already established by the first expression. For example, the search context pattern may be used to find all page numbers and ranges that follow “See pages” and therefore limit the linking only to the part of the text where each number represents just a page number. Next, the linking expression needs only add links to any number within the context without additional checks.
-
- Here is a single linking rule that will add page references to single page numbers and page-ranges that follow "see page" or "see pages" text.
Find text pattern:(\d+)
Link action: \1
Search context: See page(s)? [\d, \-]+- The above linking rule is easy to understand and removes unnecessary complexity that would be required without using a search context functionality.
- The search context is available as an option in both single-rule and multiple-rule linking operations starting with AutoBookmark version 6.14.
- Check "Search context" option and enter a desired search pattern (regular expression) into the box:
- Drawback of this method is that it is not possible to use any matching text from the search context in the link action. The previous example with multiple rules can be implemented with the help of the search context, but then it is necessary to have one rule per each “Appendix X” reference: one rule/context for Appendix A, one rule/context for Appendix B and etc. Nevertheless, search context is a great tool that significantly expands the applicability of the automated linking and improves readability and manageability of the linking rules.
- Selecting Link Appearance and Other Settings ↑overview
- Step 1 - Open a PDF File
- Start the Adobe® Acrobat® application and open a PDF file using “File > Open…”.
- Step 2 - Open the "Generate Links By Text Search" Dialog
- Select “Plug-Ins > Links > Generate Links > Generate Links By Text Search (Single Rule)” to open the "Generate Links By Text Search" dialog.
- Use "Linking" toolbar (installed by AutoBookmark into "Tools") for a quick access to this operation:
- Step 3 - Specify Page Area (Optionally) ↑overview
- Press "Edit Page Area..." button to specify an area on the page to perform a text search. All text outside of the specified area will be ignored.
- Check the "Process text located only in the following area" box. Click “Select Page Area From Sample Page…”.
- Click and hold the left mouse button while using the Selection tool, then draw a rectangle around an area on the sample page.
- Step 4 - Specify Link Appearance (Optionally) ↑overview
- Press "Edit Appearance..." button to specify desired visual appearance of the resulting links.
- Select a link type: visible or invisible rectangle. Select "Invisible Rectangle" if you don’t want users to see the link in the PDF. An invisible link is useful if the link is over an image. Note that the "Line Thickness", "Line Style" and "Color" options are not available if "Invisible Rectangle" is selected.
- Select a highlight style: none, invert, outline or inset.
- None: doesn’t change the appearance of the link.
- Invert: changes the link’s color to its opposite.
- Outline: changes the link’s outline color to its opposite.
- Inset: creates the appearance of an embossed rectangle.
- Select a line thickness: thin, medium or thick.
- Select a line style: solid, dashed or underline.
- Select a color for the link using the "Color" menu. The color is used only for drawing the link outline. Check the "Change underlying text color to" box and select a desired color from the pull-down menu. Click "OK" to close the dialog.
- Batch Processing with Action Wizard
- Both single-rule and multiple rule linking operations are available in Action Wizard. Action Wizard is a batch processing
tool available in Adobe Acrobat Pro. Batch processing allows applying the same processing workflow to
the multiple PDF files without any manual intervention.
- These operations available as "Add Links by Text Search" and "Add Links By Rules" commands in Action Wizard. See this tutorial to learn how to use Action Wizard.