In my previous post titled ‘Learn and Understand PDF Structure‘, I shared some details regarding the structure of the PDF file. Sometimes, we also need to view the internal structure of the PDF files in order to understand the objects of the PDF file and their relationships. In this post, I’ll share how you can view this internal structure in detail easily.
You can either use Adobe Acrobat to view the PDF structure or a free tool called PDFXplorer. If you want to use Acrobat to view the internal structure, you can use Preflight option to do that. In Acrobat X, you can access this option from Tools -> Print Production -> Preflight.
The following snapshot shows you how the internal PDF structure will look like in the Acrobat.
As I mentioned earlier, there is a free tool as well to view the internal PDF structure named PDFXplorer. You can download it from this link. It is very simple tool which gives you only one option to browse a particular PDF and then view its internal structure. You can see in the following snapshot that how the internal structure will look like in this tool.
I hope this post helps you to view the internal PDF structure.
In general terms, accessibility is used to describe the degree to which something is available to as many people as possible. However, it is often used to focus on people with disabilities and their right of access to entities. Assistive technologies are helping the people with special needs in all the areas of life. This is true in the field of computer and internet as well, so that this technology could also be made accessible to the people with disabilities.
While making the computers and web accessible to the people with such special needs, it is important to make the contents and data accessible to the users; PDF files are also part of these contents. When we want to make a PDF accessible, it means we need to follow certain standards while creating or editing PDF files, so that the contents of these PDF files can be accessed by the assistive technologies and provided to the users.
These standards include making the contents completely tagged and marking the document tagged as well, defining document language, providing accessible font encoding, bookmarks, and consistent heading structure etc. You can find more details about accessible PDF files in section 14.9 of PDF Specification Reference. And in order to understand the accessibility for the contents published on the web, you need to check out W3C document Web Content Accessibility Guidelines.
We have seen a little bit about accessibility and accessible PDF files. Now, let’s have a look at a free tool, called PAC (PDF Accessibility Checker) , provided by “Access for All” Foundation. This is a very simple tool to test the technical accessibility of the PDF documents. You only need to download and extract the tool, and run PAC.exe to check the PDF.
Simply browse the PDF file you want to test for accessibility and press Start Check button. It will show you all the passed results in the testing criteria.
PDF and Its Structure
PDF stands for Portable Document Format. It is an open standard for document exchange. A PDF file contains both text and binary data. When a PDF file is viewed using a text editor, one can see only the raw objects which form the contents and structure of the PDF file.
The PDF file is structured in hierarchical manner. This structure defines a flow by which a PDF viewer application reads the contents in a sequence and draws them on the screen. The syntax of a PDF file can be described at three levels — object, file and document.
In order to better understand the structure of a PDF file, we need to consider it in four parts — objects, file structure, document structure, and content stream. In the following paragraphs, we’ll have a look into these individual parts of the PDF file.
A PDF file is composed of small sets of basic types of data objects. These basic data objects collectively form a PDF document’s data structure. These objects include the character set which is used to write these objects and other syntactical elements. The basic types define the properties of the objects and the syntax as well.
The second part of the PDF document is file structure. The way basic objects are stored in the PDF file and later accessed or updated is defined by the file structure. The file structure is independent of the semantics of the objects; this means that the file structure is only responsible for organizing and updating the objects.
The document structure actually describes that how the basic objects are grouped together to form various components of the PDF file. These components can be pages, annotations, form fields etc. So, in fact, this part describes the semantics of the components of the PDF file.
A sequence of instructions which describe the appearance of any graphical entity is represented in the form of content stream. The content stream is also composed of objects, however these objects are distinct from the basic types of data objects.