Deleting PDF Pages By Text Search
Introduction
This tutorial shows how to delete pages by text search within a PDF document by using the AutoSplit™ plug-in for the Adobe® Acrobat®. The software searches a PDF document for pages matching user-specified search list and deletes them from the document. Both text patterns (using regular expressions syntax) and plain text strings can be used in the search list.
Input Document Description
The sample input PDF document that is used in this tutorial contains multiple invoices. Each invoice contains a client name or a client ID (or both) on each page. In the example 1, the goal is to delete pages that contain invoices for John Doe only. Not all invoices contain the name. We are going to search for both text strings "John Doe" and "CLIENT ID: 00340957" (that is John Doe`s Client ID) in order to make sure that all pages with invoices of John Doe are found and deleted.
Each invoice contains "Page: N of M" text pattern (Page 1 of 3, Page 2 of 3 and etc.). In the example 2, the goal is to delete every 3rd page of the invoice. To do that we are going to use regular expression "Page 3 of \d+".
Prerequisites
You need a copy of the Adobe® Acrobat® along with the AutoSplit™ plug-in installed on your computer in order to use this tutorial. You can download trial versions of both the Adobe® Acrobat® and the AutoSplit™ plug-in. This function is available both in AutoSplit Standard and Professional.
Step 1 - Open A PDF File
Start the Adobe® Acrobat® application and open a PDF document that need to be processed using "File > Open…" menu.
Step 2 - Open The "Find And Delete Pages With Matching Text" Dialog
Select "Plug-Ins > Split Documents > Delete Pages By Text Search" from the menu.
Step 3 - Specify Text Searching Options
Type one or more search strings into the text search box, one item per line. Check necessary option boxes. Click "OK" to execute searching.
Check the "Match text case" option to match text case exactly as it entered into the search window. Check this option if it is necessary to match words exactly as they typed.
Check the "Match whole words" option to match text that represents a complete word. Use this option to avoid partial matching.
In the example 1, "John Doe" and "CLIENT ID: 00340957" text strings are entered on the separate lines and the "Match text case" option is checked. The software will search for the exact text match of these strings.
Check the "Use regular expressions" option to search for patterns, not for exact text. Use regular expression syntax to search for social security numbers, phone numbers, account numbers and etc. For example, to find all pages with social security numbers (SSN is using the following pattern 123-45-6789) enter the following regular expression: \d{3}[-]\d{2}[-]\d{4}.
In the example 2, "Page 3 of \d+" text pattern is entered and the "Use regular expressions" option is checked. The software will search for the pages that contain "Page 3 of M" text pattern in the text.
Step 4 - Select Pages For Deletion
The list of matching pages is displayed once the search is completed. Use checkboxes to select/unselect pages from the deletion list. Click on the item in the list to display a corresponding page in the Adobe® Acrobat® document window.
Click "Delete Pages" to delete all checked pages from the PDF document.
The dialog will appear with a number of deleted pages. Click "OK" to close it.
Step 5 - Examine The Results
In the example 1, the software searched for "John Doe" and "CLIENT ID: 00340957" text strings and deleted checked pages with matching text only from the PDF document.
In the example 2, the software searched for "3 of M" text pattern in the text that is every 3rd page of the invoice and deleted checked pages with matching text only from the PDF document.
You can find a list of other step-by-step tutorials here: http://www.evermap.com/AutoSplit.asp#tutorials.