Bookmarking PDF Documents by Text Style
- Introduction
- The tutorial shows how to generate PDF bookmarks based on text style using the AutoBookmark™ plug-in for the Adobe® Acrobat®. Use this method to automatically generate multi-level bookmarks from the text attributes such as font style, text size, indentation and/or text pattern. All text that uses a selected font and/or text size will be automatically bookmarked.
- The main part of the tutorial contains step-by-step instruction for bookmarking by text style.
- In addition, there is also an advanced section that covers in detail various bookmarking settings:
- Processing Page Range
- Where to Insert New Bookmarks
- Using Stop Words
- Set Bookmarking Level
- Set Matching Text Style
- Set Tolerance for Matching Text
- Using Text Patterns
- Set Page Area for Text Search
- Set Visual Appearance
- Customize Bookmark Text
- Format Bookmarks with Text Patterns
- Save Settings
- Load Settings
- This operation is available via application menu/toolbar and via Action Wizard (Acrobat's batch processing tool).
- Prerequisites
- You need a copy of the Adobe® Acrobat® along with the AutoBookmark™ plug-in installed on your computer in order to use this tutorial. You can download trial versions of both the Adobe® Acrobat® and the AutoBookmark™ plug-in.
- Bookmarking by Text Style ↑overview
- Step 1 - Check for Searchable Text
- Open a PDF document that needs to be bookmarked using "File > Open..." menu.
- The first step is to verify that input PDF document actually contains a searchable text. If you can highlight a text string and copy/paste it into another text editor (such as Notepad, MS Word or even Outlook), then the document does contain a searchable text and can be used for bookmarking by style.
- If the PDF file has been scanned from a paper document, then it needs to be processed by "Text Recognition" tool to make it searchable. Select "Enhance Scans" from the "Tools" menu and click on "Recognize Text" to run text recognition on the currently open PDF document.
- Step 2 - Select a Sample Text
- Use the selection tool to select a sample text that needs to be bookmarked.
- Step 3 - Start Bookmarking Tool
- Select "Plug-Ins > Bookmarks > Generate From Text Styles..." from the main menu to open the "Generate Bookmarks From Text Style" dialog.
- Step 4 - Add a Bookmark Level
- Click "Add..." to create a new bookmark level description. The level description defines the settings that will be used to find and bookmark text for one bookmark level. In many applications, there is only single level of the bookmarks required - the top one. However, if multiple levels of bookmarks are necessary, then each level needs its own set of settings. For example, the top-level bookmarks can be created out of the 20pt Arial text, while second-level bookmarks out of 15pt Tahoma text.
- If a sample text has been selected on the page, then the "Add New Level Definition" dialog will prompt to create a new level based on the selected text style. Click "OK" to proceed. Optionally, check the "Open settings dialog to edit level parameters" option to configure settings in detail.
- The new bookmark level is now added to the list:
- Step 5 - Add Additional Level(s)
- If you want to bookmark only text with just selected style, then click "OK" to start bookmarking.
- If you want to add an additional level(s) of bookmarks, then move the "Generate Bookmarks From Text Style" dialog to the side of the screen and use the selection tool to highlight a sample text for the another level.
- Step 6 - Add a New Bookmark Level
- Click "Add..." to create a new bookmark level description.
- The "Add New Level Definition" dialog appears on the screen prompting to create a new level based on a selected text style. Optionally, specify the desired bookmark level from the "Bookmark at Level:" pull-down menu. It is possible to create multiple descriptions for the same bookmark level. For example, use two bookmark level settings for the top-level to create bookmarks from text that uses both 20pt Arial and 25 pt Times font.
- Click "OK" to confirm adding a new level.
- Now there are two levels defined. If you want to add more levels, then repeat steps 5-6.
- Step 7 - Configure Processing Settings
- There are many processing settings that can be customized to get the particular results. The advanced section explains how to configure various processing parameters.
- Step 8 - Start Bookmarking
- Click "OK" in the "Generate Bookmarks From Text Style" dialog to start the bookmarking process.
- Click "OK" again to confirm the processing.
- The report dialog appears on the screen at the end of the processing showing the number of bookmarks created. Click "OK" to close it.
- Step 9 - Inspect the Results
- The software searches document pages for all occurrences of the text that is matching the bookmark level description(s). The matching text is used to create bookmarks. The bookmarks are automatically arranged into a nested hierarchy based on the configuration settings.
- The bookmark panel is automatically opened at the end of processing to show the bookmarks created. Inspect the bookmarks to make sure everything is bookmarked correctly. If there are any problems, adjust processing settings accordingly.
- Configuring Processing Settings ↑overview
- Processing Page Range ↑overview
- Specify the page range of where to look for bookmarks. Enter a first and a last page number in the "Generate Bookmarks From Text Style" dialog. This option is useful when it is necessary to exclude certain portions of the document from processing, such as the table of contents or the index.
- Where to Insert New Bookmarks ↑overview
- Use "Insert bookmarks" pulldown menu to specify where to insert new bookmarks: after, before, or in place (replace) of the existing bookmarks.
- Here are the examples of the output:
- Using Stop Words ↑overview
- The "stop words" feature can be used to filter out the unwanted bookmark titles. If any "stop word" is present in the bookmark title, then the bookmark will be excluded from the output. The "stop word" can be a single word or a phrase. For example, use "Annual report" to avoid creating bookmarks that contain "Annual Report" anywhere in the bookmark title.
- Click the "Options..." button in the "Generate Bookmarks From Text Style" dialog to enter a list of stop-words.
- Check the "Ignore text that contains stop words:" option if you want to enter a list of stop words. Click the "Edit Stop Words..." button to manage the list.
- Enter "stop words" or regular expressions on the separate line in the text editing area of the "Edit Stop Words" dialog.
- You can enter "stop words" by:
-
- Manually typing in in the editing area of the dialog. Each separate entry should appear on a separate line.
- Using the "Select Text" tool and copying a desired text from a document. Copy text to the clipboard and then paste it into the "stop words" editing list.
- Copy text from another text editor.
- Check the "Use regular expressions (text patterns)" option to indicate that the stop-words use regular expression syntax.
- Check the "Match case" option to match words down to the letter case.
- Check the "Match whole words" option to match only whole words. For example, if this option is on, then "Account" will not match "Accounts" or "Accounting".
- Click "OK" button to finish editing "stop words" and return back to the "Bookmarking Options" dialog.
- Stop word example:
- Bookmarking Options
- Check the "Ignore consecutive duplicate bookmarks" option to skip consecutive bookmarks that have the same title. Only the first bookmark will be retained.
- Check the "Sort bookmarks vertically within each page" to sort resulting bookmarks within each page prior to adding them to the document's bookmarks. PDF documents are not really "text documents" in traditional sense. PDF file might store text elements in a different order than they are appearing on the page. This might result in wrong nesting order of bookmarks. It's generally recommended to turn this option "on" unless input document has multiple-column text. In that case the vertical order of bookmarks does not reflect the logical order of the text on the page.
- Click "OK" button to return back to the "Generate Bookmarks From Text Style" dialog.
- Set Bookmarking Level ↑overview
- Select the bookmarking level in the tree and click "Set Level..." to modify the bookmarking level.
- The "Set Bookmarking Level" dialog appears. Select the desired level from the "Set Bookmarking Level:" pull-down menu. Click "OK".
- Set Matching Text Style ↑overview
- Double-click on the specific bookmark level in list to edit bookmarking settings. Alternatively, select the bookmark level and click "Edit...". The "Bookmark Level Description" dialog will appear.
- Select any combination of the text attributes that you want to use for an automatic generation of the bookmarks in the "Text Matching" tab of the "Bookmark Level Description" dialog.
- The most commonly used attributes are the font name(s) and size. Most documents contain various section headings that have a distinctive font style that is different from the surrounding text. You can either enter these parameters manually or use a sample text from the document. It is possible to use more than a single font name to describe a desired bookmark level.
- Modify matching text attributes by selecting a font name and specifying font size manually. Click "Add..." and select desired font name from the list. The list contains only the most common font names. However, you can retrieve the names of all fonts that are used within the current document by pressing the "Update Fonts" button. The software scans all pages in the PDF document and enumerates the font names.
- Optionally, use a sample text from the document and click the "Set Font Style From Selected Text" button to set the font name and size to match a currently selected sample text. You have to select the sample text prior to opening the "Bookmark Level Description" dialog.
- Set Tolerance for Matching Text ↑overview
- The software allows to specify a tolerance for matching the text size. Tolerance is the specified maximum acceptable variation from a target value.
- Double-click on the specific bookmark level in the tree of the "Generate Bookmarks From Text Style" dialog. Alternatively, select the bookmark level and click "Edit...". The "Bookmark Level Description" dialog will appear.
- Select the "Text Matching" tab in the "Bookmark Level Description" dialog. Adjust the "Tolerance" parameter for the text size. For example, if the font size parameter is set to 10pt and tolerance is set to 1pt, then the software will match all text that uses font size between 9 and 11 pt.
- You may also check the "Allow partial match for font names" option to relax font name matching requirement and allow matching similar font names. For example, if you specified "Helvetica" font, then all fonts that have "Helvetica" anywhere in their names ("Helvetica-Bold" or "Helvetica-Italic") will also produce a match.
- Check the "Allow characters with different style and size inside text line" to ignore differences in style or size in the middle and end of the line. Software will only match text style for a first character/word on the line and match everything else regardless of the style and size.
- Tolerance example:
- Using Text Patterns ↑overview
- The software allows to bookmark only a text that matches a user-defined text pattern. Use this option to bookmark text that can be represented as a text pattern. For example: email address, account number, date, repeating header/footer and etc.
- Double-click on the specific bookmark level in the tree of the "Generate Bookmarks From Text Style" dialog. Alternatively, select the bookmark level and click "Edit...". The "Bookmark Level Description" dialog will appear.
- Select the "Text Matching" tab in the "Bookmark Level Description" dialog.
- Check the "Match Text Pattern" option to specify a text matching pattern. Note that it will be used in addition to other matching parameters (such as text style). There is a separate bookmarking method is focused on using text patterns.
- A text pattern is a sequence of letters and symbols that defines what characters can appear in the matching text string. The AutoBookmark™ plug-in uses regular expressions for defining text patterns. Only text that matches a specified pattern will be used to create a bookmark. The software does not require an exact match, it only checks if a text line contains a given pattern.
- For example, if you specified word "Chapter" as a matching text pattern then multiple text lines might match it. "Chapter 1" or "Chapter 1 - Functionality Overview" will both satisfy the matching criteria. This will result in bookmarking of both text lines. The text line that matches the pattern is used for the bookmark title. However, this might produce very long titles or titles that contain undesired text.
- Check the "Limit bookmark titles to matching pattern only" option to use only the portion of the text string that matches a specified pattern. For example, when using the following text pattern "Chapter \d" (\d - matches any digit) both "Chapter 1" and "Chapter 2 - Functionality Overview" text strings will be matched. If "Limit bookmark titles to matching pattern only" option is checked, then bookmark titles will read "Chapter 1" and "Chapter 2" respectively.
- Using the text pattern example:
- Set Page Area for Text Search ↑overview
- Sometimes, an unwanted text may get bookmarked because it is using the same font style as a legitimate text. Use processing page area to limit text search only to the specific part of the page.
- Double-click on the specific bookmark level in the tree of the "Generate Bookmarks From Text Style" dialog. Alternatively, select the bookmark level and click "Edit...". The "Bookmark Level Description" dialog will appear.
- Select the "Text Location" tab in the "Bookmark Level Description" dialog.
- Check the "Match text located only in the following area:" option. Click "Set Page Area From a Sample Page...".
- Select a sample page number. Specify a text location on a sample page by drawing a rectangle. The text search will be limited to the selected area on the page. Click "OK" once done.
- Define a Visual Appearance of the Bookmarks ↑overview
- The software allows to customize a visual appearance of the resulting bookmarks by specifying text color and style.
- Double-click on the specific bookmark level in the tree of the "Generate Bookmarks From Text Style" dialog. Alternatively, select the bookmark level and click "Edit...". The "Bookmark Level Description" dialog will appear.
- Select the "Appearance" tab in the "Bookmark Level Description" dialog.
- Set a desired text style (Plain, Bold, Italic, Bold & Italic).
- Zoom option defines how a bookmarked page is displayed in the viewer when a bookmark is clicked:
- Inherit Zoom - Displays a page designated by a bookmark using a current zoom factor. Page is positioned in the viewer in way that bookmarked text appears at the top of the view window. This only happens when a page layout mode of the viewer is set to "Continuous". Use "View/Page Layout" menu to set a desired page layout mode.
- Fit Page - Displays a page designated by a bookmark, with its contents magnified just enough to fit the entire page within the window both horizontally and vertically. If the required horizontal and vertical magnification factors are different, uses the smaller of the two, centering the page within the window in the other dimension.
- Fit Width - Displays a page designated by a bookmark, with the vertical coordinate positioned at the top edge of the window and the contents of the page magnified just enough to fit the entire width of the page within the window.
- Fit Visible - Displays a page designated by a bookmark, with its contents magnified just enough to fit its bounding box entirely within the window both horizontally and vertically. If the required horizontal and vertical magnification factors are different, uses the smaller of the two, centering the bounding box within the window in the other dimension.
- Actual Size - Displays a page designated by a bookmark with 100% magnification factor.
- Check the "Show expanded" option to display all bookmarks at this level expanded.
- Click "OK" once done.
- Examples of the different bookmark styles:
- Customize the Bookmark Titles ↑overview
- The software allows to customize titles of the resulting bookmarks by enforcing a text case, adding leading numbers, inserting a custom text and/or performing a search and replace operation.
- Double-click on the specific bookmark level in the tree of the "Generate Bookmarks From Text Style" dialog. Alternatively, select the bookmark level and click "Edit...". The "Bookmark Level Description" dialog will appear.
- Select the "Content" tab in the "Bookmark Level Description" dialog.
- Initially, an original text from a document is used for the bookmark titles. This text can be modified in the number of ways.
- Text case can be altered to produce a uniformly formatted titles. Available options are:
- Do Not Change - no changes to the original text is done.
- UPPERCASE - all titles are converted to the upper case characters.
- Title Case - first letters of each word are capitalized.
- Sentence Case - only first letter of the title is capitalized, all others appear in the lower case.
- lowercase - all characters appear in lower case.
- Text case examples:
- Additional text can be added to all bookmark titles. Check the "Insert this before each title" option and enter a desired text in the editing box to the right. This text will be inserted before each title. Check the "Insert this text after each title" option and enter a desired text in the editing box to the right. This text will be appended to the end of each bookmark title.
- Leading numbers can be optionally added or remove to/from bookmark titles. Number, letters or roman numerals can be used as leading numbers. The format of leading numbers is set separately for each bookmark level. This allows to create arbitrary numbering schemes.
- Bookmarks text can be limited to a certain number of characters to avoid accidental creation of the large unreadable titles. Enter the maximum allowed title length (in characters) in the "Maximum title length" entry box. Default value is set to 128 characters.
- Examples of the bookmarks customization:
- Format the Resulting Bookmarks` Titles with Text Patterns ↑overview
- Bookmark titles can be modified and formatted using powerful text patterns called regular expressions. The AutoBookmark™ plug-in provides functionality to search bookmark titles with a regular expressions and either remove it completely or replace it with other text. It is possible, for example, to replace all phone numbers or email addresses with something else and perform advanced formatting such as to move words in the bookmark titles.
- Double-click on the specific bookmark level in the tree of the "Generate Bookmarks From Text Style" dialog. Alternatively, select the bookmark level and click "Edit...". The "Bookmark Level Description" dialog will appear.
- Select the "Content" tab in the "Bookmark Level Description" dialog.
- Check the "Search and replace bookmark titles with text patterns" option to search and replace bookmark text.
- When performing this style of formatting with regular expressions you must know what it is you want to format and how. First you need write a regular expression to find a desired text substring in a bookmark title. Second, you need to specify a replacement pattern that will replace this substring in the bookmark title. In its simplest form, you can just specify a text string that you want to find and string that you want to replace with.
- For example, if bookmark titles contain words "the court of last resort" that you want to replace with "COLR", then enter "the court of last resort" as a search pattern and "COLR" as the replace pattern. However, the real potential of this operation comes when you start using the full power of the regular expressions that allow you to match dynamic text and refer to substrings while performing the replacement. You can completely transform the input text into anything you want.
- Sample output after text "search and replace" operation:
- Save Configuration Settings ↑overview
- Bookmarking settings can be saved into a settings file for later reuse. The settings file stores all processing parameters including the stop words and bookmark level descriptions. This helps to save time when the same processing settings need to be frequently used.
- Click "Save..." in the "Generate Bookmarks From Text Style" dialog to save bookmarking settings into a file.
- ma
- The "Save As" dialog appears on the screen. Browse to the desired storage folder and enter an appropriate file name. Click "Save". Settings will be stored in the file with *.ABM extension.
- Load Configuration Settings ↑overview
- Configuration settings can be loaded from earlier saved *.ABM file. The configuration file stores all processing parameters including the stop words and bookmark level descriptions.
- Open the "Generate Bookmarks From Text Style" dialog by selecting "Plug-Ins > Bookmarks > Generate From Text Styles..." in the main menu. Click "Load..." to use earlier saved configuration settings.
- The "Open" dialog appears on the screen. Browse to the desired storage folder and select a particular AutoBookmark™ settings file with *.ABM extension. Click "Open".
- Settings will be loaded, and user interface will be updated. All current settings will be lost.
- Click here for a list of all step-by-step tutorials available.