In my previous post titled ‘Learn and Understand PDF Structure‘, I shared some details regarding the structure of the PDF file. Sometimes, we also need to view the internal structure of the PDF files in order to understand the objects of the PDF file and their relationships. In this post, I’ll share how you can view this internal structure in detail easily.
You can either use Adobe Acrobat to view the PDF structure or a free tool called PDFXplorer. If you want to use Acrobat to view the internal structure, you can use Preflight option to do that. In Acrobat X, you can access this option from Tools -> Print Production -> Preflight.
The following snapshot shows you how the internal PDF structure will look like in the Acrobat.
As I mentioned earlier, there is a free tool as well to view the internal PDF structure named PDFXplorer. You can download it from this link. It is very simple tool which gives you only one option to browse a particular PDF and then view its internal structure. You can see in the following snapshot that how the internal structure will look like in this tool.
I hope this post helps you to view the internal PDF structure.
In the .NET applications, we some times need to render the PDF files to the browser using our code — C# or VB.NET etc. It’s not a big deal! You only need to use Response object to send the file to the browser. The only thing you need to take care of is the use of proper methods and attributes.
First of all, we need to save the PDF document to a MemoryStream object. For example, we have a MemoryStream object named outStream and we need to render it to the browser. The following code snippet can be used to render the file:
//create new MemoryStream object and add PDF file’s content to outStream.
MemoryStream outStream = new MemoryStream();
//specify the duration of time before a page cached on a browser expires
Response.Expires = 0;
//specify the property to buffer the output page
Response.Buffer = true;
//erase any buffered HTML output
//add a new HTML header and value to the Response sent to the client
Response.AddHeader(“content-disposition”, “inline; filename=” + “output.pdf”);
//specify the HTTP content type for Response as Pdf
Response.ContentType = “application/pdf”;
//write specified information of current HTTP output to Byte array
//close the output stream
//end the processing of the current page to ensure that no other HTML content is sent
You need to use AddHeader method of Response object to add header and value to the response sent to the client. Content-Disposition response header field is used to convey additional information about how to process the response, and also to attach additional metadata, such as filename. The PDF will be viewed in the PDF viewer plugin installed for the browser. You can see a practical example of rendering the PDF file to browser in Aspose.Pdf Demo. In order to view the source code, please click on the Source tab.
PDF and Its Structure
PDF stands for Portable Document Format. It is an open standard for document exchange. A PDF file contains both text and binary data. When a PDF file is viewed using a text editor, one can see only the raw objects which form the contents and structure of the PDF file.
The PDF file is structured in hierarchical manner. This structure defines a flow by which a PDF viewer application reads the contents in a sequence and draws them on the screen. The syntax of a PDF file can be described at three levels — object, file and document.
In order to better understand the structure of a PDF file, we need to consider it in four parts — objects, file structure, document structure, and content stream. In the following paragraphs, we’ll have a look into these individual parts of the PDF file.
A PDF file is composed of small sets of basic types of data objects. These basic data objects collectively form a PDF document’s data structure. These objects include the character set which is used to write these objects and other syntactical elements. The basic types define the properties of the objects and the syntax as well.
The second part of the PDF document is file structure. The way basic objects are stored in the PDF file and later accessed or updated is defined by the file structure. The file structure is independent of the semantics of the objects; this means that the file structure is only responsible for organizing and updating the objects.
The document structure actually describes that how the basic objects are grouped together to form various components of the PDF file. These components can be pages, annotations, form fields etc. So, in fact, this part describes the semantics of the components of the PDF file.
A sequence of instructions which describe the appearance of any graphical entity is represented in the form of content stream. The content stream is also composed of objects, however these objects are distinct from the basic types of data objects.