Using Scripting to Extract Document Bookmarks and Attachments

Introduction: Acrobat JavaScript provides access to many properties and elements of PDF documents that can be used for data extraction purposes. A document's metadata, bookmarks, file attachments, page information and text can all be accessed via custom scripting. In this tutorial, we will demonstrate how to use custom scripting to extract a document’s bookmarks and file attachments, then assign them as field values in an output *.csv spreadsheet.
What is JavaScript?: JavaScript is Adobe Acrobat's built-in scripting engine. Custom JavaScript scripts can be used for: data formatting; assigning field values based on a document's metadata properties; or custom processing logic. Each data field can optionally have a user-supplied script that is executed after the data value is extracted from the document. Please refer to Adobe Acrobat documentation for details on using Acrobat's JavaScript programming language.
Prerequisites: You need a copy of Adobe® Acrobat® along with the AutoExtract™ plug-in installed on your computer in order to use this tutorial. Both are available as trial versions.

Step 1 - Open AutoExtract: Select "Plug-Ins > Extract Data > Extract Data Records From Document Text…" to open the "AutoExtract Plug-in" dialog.
Step 2 - Add a 'Bookmarks' Data Field: Press the "Add Field..." button to add a field to the settings configuration.; Enter a name for the data field next to "Field name:". This will become the field header in the output spreadsheet(s).; Next, check "Set or change field value by running JavaScript code" and press "Edit Script...".; Type the desired JavaScript code - the code used here will get a list of all bookmark titles in the PDF file (as a single text string) and assign it to the field value. Titles will be separated by a semicolon:; function ExtractBookmark(Bm, isRoot)

{
var Title = "";
if (isRoot == false)
{
Title += Bm.name;
}

// get a list of child bookmarks
if (Bm.children != null)
{
for (var i = 0; i < Bm.children.length; i++)
{
if (!(Title === "")) Title += "; "
Title += ExtractBookmark(Bm.children[i], false);
}
}
return Title;
}

var root = this.bookmarkRoot;
event.value = ExtractBookmark(root, true);; Press "OK" to proceed.
Step 3 - Add an 'Attachments' Data Field: Press "Add Field..." again to add another field definition.; Name the field and proceed to add a script.; Type JavaScript code - the code used here will get a list of all file attachments and assign it to the field value. File names will be separated by a semicolon:; var d = this.dataObjects;
event.value = "";
for (var i = 0; i < d.length; i++)
{
if (i > 0) event.value += ";";
event.value += d[i].name;
}; Press "OK" to proceed.
Step 4 - Confirm Extraction Settings: Enter an output filename template - the output spreadsheet in this example will be titled "Bookmarks_attachments.csv". Optional: check "Create single data file..." to store extracted data from all input PDFs in one spreadsheet file.; Press "OK" to proceed.
Step 5 - Extract Bookmarks/Attachments: Proceed through the next dialogs by selecting the desired input PDF documents. Run the procedure, then open the ouput spreadsheet to inspect the extracted data. Every row is a record for each input PDF - the extracted bookmarks/attachments will be presented under corresponding data field headers:; Click here for a list of all step-by-step tutorials available.