Extract Email Addresses from a PDF Document into a Spreadsheet
Introduction
This tutorial explains how the AutoDocSearch™ plug-in can be used to search one or more PDF documents for email addresses, and export them into a spreadsheet.
This may be useful for extracting email addresses from large PDF files, and creating contact records. The plug-in makes this procedure much faster than manually searching for email addresses and copying them to another document.
The sample PDF we will use to demonstrate contains multiple single-page sample payslips - each one contains the recipient's email address. The goal is to use a search expression to identify email addresses on each page, extract them, and save them in a CSV spreadsheet.
extract PDF emails
Prerequisites
You need a copy of Adobe® Acrobat® along with the AutoDocSearch™ plug-in installed on your computer in order to use this tutorial. Both are available as trial versions.
Step 1 - Open the Tool
In Acrobat, select "Plug-Ins > AutoDocSearch Plug-in > Search PDF Files..." from the main menu.
open the tool
Press "Start Using the Software" to open the tool.
Optionally use the other two buttons for further help with using AutoDocSearch™.
start using software
Step 2 - Add Search Expression
Enter a search expression in the entry box - here, we will use [\w._-]+@[\w-_]+[.]\w{2,9} to create matches with a typical email format. This is a basic regular expression that is provided here for the purpose of this tutorial only. There are copious examples of regular expressions for matching email addresses available on countless web sites devoted to text search.
Ensure that the "Use regular expressions" option is checked.
enter search expression
Press "Next>>" to proceed.
confirm search
Step 3 - Select Input Files
Use the "Select Input Files" screen to select files for processing via the "Add Files/Folder..." buttons. In this example, we will add one input PDF containing all the payslips to be processed.
select files/folders
Locate and select the desired input file(s) and press "Open".
locate input file
All selected files/folders and their file paths will be shown in the "Selected Files" box. Press "OK" to proceed.
confirm input files
Step 4 - View Search Results
After searching the input document(s), the "Search Results" box opens. Clicking on any text matches in the list will open the corresponding file/page in Acrobat, with matching text highlighted. Here, the plug-in has identified an email address in each page of the PDF.
view matches
Step 5 - Export Search Results
Use either "File > Save Search Results as Spreadsheet..." or "Export Search Results..." below to export these results as a spreadsheet.
save/export search results
In the "Select Output Format" dialog that opens, select "Save search results into a plain text file", and press "OK".
create plain text file
Use the "Save As" window to select a folder location for the results to be saved in. Name the output file, then select the "All Files" file type from the "Save as type:" drop-down list, and manually add a *.csv extension to the filename.
Press "Save" to continue.
choose save location
Step 6 - Inspect the Spreadsheet
The spreadsheet will be automatically opened in Excel, and saved in the chosen folder. Check that the email addresses have been successfully extracted:
inspect spreadsheet data
Click here for a list of all step-by-step tutorials.