Let's Talk About .NET, Java, and Various File Formats!

PDF and Its Structure

PDF stands for Portable Document Format. It is an open standard for document exchange. A PDF file contains both text and binary data. When a PDF file is viewed using a text editor, one can see only the raw objects which form the contents and structure of the PDF file.

The PDF file is structured in hierarchical manner. This structure defines a flow by which a PDF viewer application reads the contents in a sequence and draws them on the screen. The syntax of a PDF file can be described at three levels — object, file and document.

In order to better understand the structure of a PDF file, we need to consider it in four parts — objects, file structure, document structure, and content stream. In the following paragraphs, we’ll have a look into these individual parts of the PDF file.


A PDF file is composed of small sets of basic types of data objects. These basic data objects collectively form a PDF document’s data structure. These objects include the character set which is used to write these objects and other syntactical elements. The basic types define the properties of the objects and the syntax as well.

File Structure

The second part of the PDF document is file structure. The way basic objects are stored in the PDF file and later accessed or updated is defined by the file structure. The file structure is independent of the semantics of the objects; this means that the file structure is only responsible for organizing and updating the objects.

Document Structure

The document structure actually describes that how the basic objects are grouped together to form various components of the PDF file. These components can be pages, annotations, form fields etc. So, in fact, this part describes the semantics of the components of the PDF file.

Content Stream

A sequence of instructions which describe the appearance of any graphical entity is represented in the form of content stream. The content stream is also composed of objects, however these objects are distinct from the basic types of data objects.


Comments on: "Learn and Understand the Structure of a PDF File" (1)

  1. […] my previous post titled ‘Learn and Understand PDF Structure‘, I shared some details regarding the structure of the PDF file. Sometimes, we also need to […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: