Using Scripting to Extract Document Bookmarks and Attachments
Introduction
Acrobat JavaScript provides access to many properties and elements of PDF documents that can be used for data extraction purposes. A document's metadata, bookmarks, file attachments, page information and text can all be accessed via custom scripting. In this tutorial, we will demonstrate how to use custom scripting to extract a document’s bookmarks and file attachments, then assign them as field values in an output *.csv spreadsheet.
extracted bookmarks/attachments
What is JavaScript?
JavaScript is Adobe Acrobat's built-in scripting engine. Custom JavaScript scripts can be used for: data formatting; assigning field values based on a document's metadata properties; or custom processing logic. Each data field can optionally have a user-supplied script that is executed after the data value is extracted from the document. Please refer to Adobe Acrobat documentation for details on using Acrobat's JavaScript programming language.
Prerequisites
You need a copy of Adobe® Acrobat® along with the AutoExtract™ plug-in installed on your computer in order to use this tutorial. Both are available as trial versions.
Step 1 - Open AutoExtract
Select "Plug-Ins > Extract Data > Extract Data Records From Document Text…" to open the "AutoExtract Plug-in" dialog.
open autoextract
Step 2 - Add a 'Bookmarks' Data Field
Press the "Add Field..." button to add a field to the settings configuration.
add data field
Enter a name for the data field next to "Field name:". This will become the field header in the output spreadsheet(s).
Next, check "Set or change field value by running JavaScript code" and press "Edit Script...".
name data field
Type the desired JavaScript code - the code used here will get a list of all bookmark titles in the PDF file (as a single text string) and assign it to the field value. Titles will be separated by a semicolon:
function ExtractBookmark(Bm, isRoot)


{
var Title = "";
if (isRoot == false)
{
Title += Bm.name;
}

// get a list of child bookmarks
if (Bm.children != null)
{
for (var i = 0; i < Bm.children.length; i++)
{
if (!(Title === "")) Title += "; "
Title += ExtractBookmark(Bm.children[i], false);
}
}
return Title;
}

var root = this.bookmarkRoot;
event.value = ExtractBookmark(root, true);
Press "OK" to proceed.
type javascript code
Step 3 - Add an 'Attachments' Data Field
Press "Add Field..." again to add another field definition.
add data field
Name the field and proceed to add a script.
name the field
Type JavaScript code - the code used here will get a list of all file attachments and assign it to the field value. File names will be separated by a semicolon:
var d = this.dataObjects;
event.value = "";
for (var i = 0; i < d.length; i++)
{
if (i > 0) event.value += ";";
event.value += d[i].name;
}
Press "OK" to proceed.
type javascript
Step 4 - Confirm Extraction Settings
Enter an output filename template - the output spreadsheet in this example will be titled "Bookmarks_attachments.csv". Optional: check "Create single data file..." to store extracted data from all input PDFs in one spreadsheet file.
Press "OK" to proceed.
confirm extraction settings
Step 5 - Extract Bookmarks/Attachments
Proceed through the next dialogs by selecting the desired input PDF documents. Run the procedure, then open the ouput spreadsheet to inspect the extracted data. Every row is a record for each input PDF - the extracted bookmarks/attachments will be presented under corresponding data field headers:
inspect bookmarks/attachments
Click here for a list of all step-by-step tutorials available.