Understanding PDF File Size
- PDF File Size Issue
- Quite often users are wondering why a specific PDF file is so big while it is just few pages long. Similar questions arise when splitting a PDF document into multiple files and discovering that resulting file sizes are not proportional to number of pages. It is crucial to understand the actual file size allocation between various PDF components before attempting to reduce the size of the specific PDF document or making any conclusions that file is "too big" for the number of pages.
- What PDF file is composed of?
- PDF file is composed from multiple components:
- Content streams (the actual text content of the document)
- Embedded Fonts
- Bookmarks, links, annotations
- Document overhead and various low level elements such as extended graphics states, structure info and etc.
- PDF form data
- File attachments
- PDF File Size and Number of Pages
- The only part of the PDF file that is proportional in size to number of pages is "content streams". Depending on internal file structure, content streams might occupy just a small percentage of the overall file size or almost an entire document. This means that number of pages cannot be used to measure how large or small a specific PDF file should be.
- Causes of File Size Increase
- Typically, there are two major reasons why PDF file size can be "disproportionally" large.
- The first reason is that one or more fonts are stored inside PDF document. Fonts can be subsetted and embedded right into PDF file. Adding a single font to a PDF document may increase file size by 400-600KB. It is recommended to avoid using Adobe Acrobat to directly editing text in PDF documents.
- The second reason is using images for creating PDF file. The resolution and bit-depth of images greatly affects overall file size. Color images take up more space than monochrome or grayscale images. The image resolution (image dimension in pixels) is also crucial. The higher is image resolution, the more space images will take.
- Auditing PDF File Size
- The Adobe Acrobat provides excellent function for inspecting PDF file structure. Open a PDF document and select "File > Save As Other > Optimized PDF..." from Acrobat menu. Press "Audit Space Usage..." button to display distribution of the file size among various file components. The example below shows that 53% of the file size are taken by fonts, while only 23% are occupied by actual document text.
- Why Splitting Does Not Always Reduce File Size?
- If a PDF file is mostly composed of content streams (text) and has no embedded fonts or images, then splitting such file into multiple documents produces files with sizes that are proportional to number of pages. However, if a PDF file has fonts and images, then output files should also contain these fonts and images (in some cases) regardless of number of pages. The output files cannot be smaller than total number of font resources in the original PDF document. If input PDF file has a set of fonts, then each output file needs to have them as well.
- Example 1: PDF file without fonts and images.
Sample PDF file has 74 pages (749KB total file size). Most of the file (91%) is taken by Content Streams (page text). There are just 25KB of fonts in this document. Here is a screenshot of "Audit Space Usage" dialog for this document:
If this PDF document is split into 74 files with one page per file, then output file size for each document is about 44KB. The content stream for each output document is about 10K (input PDF document has 684KB in content streams, about 10KB per page). Each output file also has 24KB of fonts, the rest is what it is called "document overhead". There is a direct correspondence between file size and number of pages, because most of the input file size is composed of document text:
- Example 2: PDF file with embedded fonts.
The sample PDF file has 5 pages and is 556KB in size. There are 404KB of fonts inside this PDF file, while document text accounts just for 28KB. The most space in this file is taken by fonts (72%) and images (18%), while text occupies just 5% of the file size:
If this PDF document is split into 5 files, one page per file, then each output file is about 420KB in size. This is exactly as expected, because fonts account for 403KB in each output document. This is why there is a very little difference between input and output file sizes. This example shows that it is incorrect to assume that sizes of the output documents will be proportional to number of pages.
- How to Reduce PDF File Size
- 1. The first step is to save file under a different file name by using "File > Save As..." menu. Quite often, users make modifications to the document and use "File > Save" menu to save changes to disk. Adobe Acrobat just appends changes to the end of the file and file size keeps growing bigger and bigger. The "File > Save As..." completely rebuilds file structure and purges all accumulated changes. For some file, this can immediately reduce the file size without performing any optimizations.
- 2. Use "File > Save as Other > Reduced Size PDF" or "File > Save as Other > Optimized PDF" menus to perform optimization. However, it is important to understand that there is no magic and file reduction can be only achieved by either down-sampling images and removing embedded fonts. Down-sampling images or using a lossy image compression algorithm decrease file size by discarding information. It will always result in loss of images quality. This is not acceptable depending on project requirements. Removing embedded fonts is possible in some cases by using "PDF Optimizer", but for some PDF files it is not an option.
- 3. The last resort is to print PDF file to "Adobe PDF" printer. This method often helps when there is a font or internal file structure problem. However, this method may also reduce image quality. In addition, this method will not carry over any interactive elements such as bookmarks, links, annotations and etc.