Extract Files From a PDF Portfolio
Introduction
This tutorial demonstrates how to extract all files (including attachments) from a PDF Portfolio, into a specific output location. All items (emails) and attachments will be extracted into individual files and placed in the output folder. All file attachments can optionally be converted to PDF and appended to parent documents, or extracted and kept in their native file formats. The AutoPortfolio™ plug-in also offers the ability to process only specific entries – e.g.: for a selected date or a person (in the case of email Portfolio’s). The contents of MSG and ZIP files will be automatically processed, and the portfolio metadata will be exported into a number of spreadsheet-ready formats.
What is a PDF Portfolio?
A PDF Portfolio contains multiple files assembled into an integrated PDF unit. For example, it can include text documents, e-mail messages, spreadsheets, CAD drawings, and PowerPoint presentations etc. The original files retain their individual identities but are assembled into one PDF Portfolio file. In this tutorial, the sample Portfolio we'll use is an entire Microsoft Outlook inbox of emails, extracted into a Portfolio file to include all corresponding attachments. See the tutorial on how to extract Outlook emails as a Portfolio file here.
It is important to understand that a PDF Portfolio is not a PDF document. It is an archive of files stored inside a single document, with a PDF extension. PDF portfolios are commonly used for storing emails exported from Microsoft Outlook.
Every file inside a PDF Portfolio may contain associated metadata information. In the case of emails, it can include "From", "To", "Subject", "Sent", "Description", "Attachments" and other fields. The list of fields depends on the type of the email messages and may vary. The metadata fields may be absent from the portfolio if it was not exported from Outlook, and instead created directly in Adobe Acrobat.
Prerequisites
You need a copy of Adobe® Acrobat® along with the AutoPortfolio™ plug-in installed on your computer in order to use this tutorial. Both are available as trial versions.
Step 1 - Opening the Tool
Start Adobe® Acrobat® and select “Plug-ins > AutoPortfolio Plug-in > Extract Files from Portfolio(s)...” from the main Adobe Acrobat menu to open the extraction dialog. Do not open a PDF Portfolio directly in Adobe Acrobat, or the program will automatically disable most tool menus including “Plug-ins”.
Step 2 - Select Input Files
Press the “Add Files…” button to select the input PDF portfolio for processing.
Select the required input PDF Portfolio file and click "Open".
Step 3 - Select Portfolio Components for Processing
Use the “Specify Sort Order” dialog to select the parts of the PDF Portfolio that are to be processed. Do this manually via the check boxes in front of each record. Return to default by using the "Select All" button, or use "Toggle Selection" to de-select all entries, and manually select fewer necessary entries with the checkboxes. Alternatively, click "Select by Search..." to perform a text search which can be used to select/unselect all corresponding entries.
If used, this button opens the "Select Records by Search" dialog. First, select how the text search will be used - either to select or unselect records. Then specify the text to search for in the entry box. Search expressions can also be used via regular expression syntax. To do this, ensure that "Use regular expressions" is checked. Make other necessary selections ("Match text case/whole words"), as well as where to search; search within specific fields by using the drop down list. By default, the text search would search all fields.
Step 4 - Confirm Selections
Optionally, use the “Select Records" menu to manipulate the current selections. Use the listed features to select a specific subset of Portfolio entries. Click "OK" on the “Specify Sort Order” dialog to confirm selections.
Step 5 - Select an Output Location
The input Portfolio file is now added to the processing list. Repeat this procedure using the "Add Files..." button to process multiple Portfolio's at the same time.
Select an output folder by pressing the “Browse…” button.
Step 6 - Select Output Options

By default, certain output options will be automatically selected:

  • Use “Extract and process attachments” to extract all attachment files for each portfolio entry.
  • Use “Convert non-PDF attachments into PDF format” to convert all supported non-PDF files into PDF format.
  • Use “Delete links in output documents” to completely remove links to file attachments in their native formats.
Also check “Append attachments to parent documents” to append all file attachments at the end of their parent PDF documents. This way each portfolio entry (“email” in the case of email-based portfolios) contains all of its attachments in a single PDF file.
Step 7 - Confirm Settings
When ready, press “ OK” to start extracting files from the input PDF portfolio.
Step 8 - Process Files
The “Job Cover Page” is automatically created for each job and displayed on the screen during the procedure. Note that the extraction process may take a considerable amount of time depending on the size of the input portfolio, so it is generally a good idea to process smaller portfolios. During this time, the standard Acrobat progress dialog is displayed in the lower-right corner of the screen.
Step 9 - Check the Processing Report
Once processing is completed, a report message appears on the screen prompting to display a detailed HTML processing report that the plug-in has created. Click “OK” to open it in your default web browser.
The processing report lists every file and attachment that was processed and contains separate records for the attachments that were converted into PDF format and optionally appended to the parent document.
It’s a good idea to inspect the report and see if there are any file attachments that were not converted into a PDF file format. File attachments that failed to convert will be highlighted by a red line.
Scroll down to the end of the report to see the total count of non-PDF file attachments that were successfully converted (or not) into a PDF format.
Step 10 - Inspect the Contents of the Output Folder
The output folder will not contain attachments in their native file formats, as the “Convert non-PDF attachments into PDF format” option was selected. Open this folder to inspect its contents.
The output folder also contains a number of various auxiliary files such as extraction reports (both in CSV and HTML formats), the processing cover page, and a *.pdf.report.csv file which can be opened directly in Microsoft Excel and contains metadata for all extracted files.
There is also a file called LoadFile.txt (in CaseMap format) that lists all the extracted top-level files (in the case of email-based portfolios, only top level emails).
Here is an alternative view of the output folder when the “Convert non-PDF attachments into PDF format” and “Append attachments to parent documents” options were turned off. The folder now contains file attachments as well as PDF portfolio items in their native formats. All main level PDF Portfolio items (emails) are still present as PDF files, as they were already stored in the portfolio in PDF format.
If the input PDF Portfolio contains an MSG file (Microsoft Outlook message format) or ZIP archive, then a folder is created for each file that contains files from the corresponding MSG and ZIP archives. If there are nested MSG or ZIP files, the second-level subfolders are automatically created and the files extracted.
You can find more AutoPortfolio™ tutorials here.