Convert a PDF Portfolio into TIFF and Text Format for Concordance/Summation
Introduction
This tutorial shows you how to use the AutoPortfolio™ plug-in to convert a PDF Portfolio into TIFF and Text format, suitable for importing data into several litigation support systems such as Concordance and Summation. The procedure’s output is a collection of files – one TIFF image file and one text file for each page of every document in the portfolio. The plug-in also provides the ability to process specific entries, for example those for a selected date or a person (as in the case of email Portfolio’s).
What is a PDF Portfolio?
A PDF Portfolio contains multiple files assembled into an integrated PDF unit. For example, it can include text documents, e-mail messages, spreadsheets, CAD drawings, and PowerPoint presentations etc. The original files retain their individual identities but are assembled into one PDF Portfolio file. In this tutorial, the sample Portfolio we'll use is an entire Microsoft Outlook inbox of emails, extracted into a Portfolio file to include all corresponding attachments. See the tutorial on how to extract Outlook emails as a Portfolio file here.
It is important to understand that a PDF Portfolio is not a PDF document. It is an archive of files stored inside a single document, with a PDF extension. PDF portfolios are commonly used for storing emails exported from Microsoft Outlook.
Every file inside a PDF Portfolio may contain associated metadata information. In the case of emails, it can include "From", "To", "Subject", "Sent", "Description", "Attachments" and other fields. The list of fields depends on the type of the email messages and may vary. The metadata fields may be absent from the portfolio if it was not exported from Outlook, and instead created directly in Adobe Acrobat.
Prerequisites
You need a copy of Adobe® Acrobat® along with the AutoPortfolio™ plug-in installed on your computer in order to use this tutorial. Both are available as trial versions.
Step 1 - Opening the Tool
Start Adobe® Acrobat® and select “Plug-ins > AutoPortfolio Plug-in > Convert PDF Files for Concordance and Summation (TIFF and Text)...” from the main Adobe Acrobat menu to open the conversion dialog. Do not open a PDF Portfolio directly in Acrobat, or the program will automatically disable most tool menus including “Plug-ins”.
Step 2 - Select Input Files
In the "Convert Files for Concordance and Summation" dialog, press the “Add Files…” button to select the input PDF portfolio for processing.
Select the required input PDF Portfolio file and click "Open". The sample Portfolio used here contains multiple emails with various file attachments of different formats.
Step 3 - Select Portfolio Components for Processing
Use the “Specify Sort Order” dialog to select the parts of the PDF Portfolio that are to be processed. Do this manually via the check boxes in front of each record. Return to default by using the "Select All" button, or use "Toggle Selection" to de-select all entries, and manually select fewer necessary entries with the checkboxes. Alternatively, click "Select by Search..." to perform a text search which can be used to select/unselect all corresponding entries.
If used, this button opens the "Select Records by Search" dialog. First, select how the text search will be used - either to select or unselect records. Then specify the text to search for in the entry box. Search expressions can also be used via regular expression syntax. To do this, ensure that "Use regular expressions" is checked. Make other necessary selections ("Match text case/whole words"), as well as where to search; search within specific fields by using the drop down list. By default, the text search would search all fields.
Step 4 - Confirm Selections
Optionally, use the “Select Records" menu to manipulate the current selections. Use the listed features to select a specific subset of Portfolio entries. Click "OK" on the “Specify Sort Order” dialog to confirm selections.
Step 5 - Select an Output Location
The input Portfolio file is now added to the processing list. Repeat this procedure using the "Add Files..." button to process multiple Portfolio's at the same time.
Select an output folder by pressing the “Browse…” button.
Step 6 - Select Further Output Options
Select necessary output options using the checkboxes. By default, "Extract and process attachments" will be selected. If required, also select "Convert non-PDF attachments into PDF format (if possible)" if the Portfolio files being processed contain email messages and non-PDF attachments. Files that fail to be converted can optionally be copied to a specific folder location via the "Conversion Options..." button.
Click "File Numbering Options..." to choose desired file numbering settings.
Step 7 - Select File Numbering Options
This dialog allows you to configure the plug-in to number output files to match file naming in the Summation database. Refer to Summation documentation for details on file naming and DII import, or press the “Help…” button to read more about these options.
Click "OK" to close the dialog."
Step 8 - Confirm the Conversion
When ready, press “ OK” to start the conversion process.
Step 9 - Process the Files
The "AutoPortfolio Job Cover Page" is automatically created for each job and displayed on the screen during the procedure.
Note that the conversion process may take a considerable amount of time depending on the size of the input portfolio, so it is generally a good idea to process smaller portfolios. During this time, the standard Acrobat progress dialog is displayed in the lower-right corner of the screen.
The procedure consists of extracting every PDF document from the portfolio as well as any existing file attachments. Non-PDF attachments are also converted into PDF (if this option is selected), then all resulting PDF files are converted into TIFF/TEXT format. For every page in a PDF file there are two output files: one TIFF image and one plain text file.
Once processing is completed, a report message appears on the screen prompting to display a detailed HTML processing report that the plug-in has created. Optionally click “OK” to open it in your default web browser.
Step 10 - Inspect the Output Files
Open the specified output folder to view the files that have been created. There will be two sub-folders in the output folder: “PDF” and “TIFF and TEXT”:
The "PDF" folder will contain a number of PDF files named File_X_*.pdf. The total number of PDF files in this folder is equal to the total number of pages in all the documents of the input PDF Portfolio, and each PDF file contains only one page. Here is an example of the contents of the “PDF” folder:
The “TIFF and TEXT” folder will contain a number of *.TXT files named File_X_*.txt. There is exactly one text file for each page in the PDF Portfolio. It will also contain a number of image *.TIF files named File_X_*.tif. Here is an example of the contents of the “TIFF and TEXT” folder:
You can find more AutoPortfolio™ tutorials here.