Let's Talk About .NET, Java, and Various File Formats!

Archive for the ‘Understand File Formats’ Category

View PDF Structure using Adobe Acrobat or a Free Tool called PDFXplorer

In my previous post titled ‘Learn and Understand PDF Structure‘, I shared some details regarding the structure of the PDF file. Sometimes, we also need to view the internal structure of the PDF files in order to understand the objects of the PDF file and their relationships. In this post, I’ll share how you can view this internal structure in detail easily.

You can either use Adobe Acrobat to view the PDF structure or a free tool called PDFXplorer. If you want to use Acrobat to view the internal structure, you can use Preflight option to do that. In Acrobat X, you can access this option from Tools -> Print Production -> Preflight.

Open-Preflight-Option-in-Acrobat-X

The following snapshot shows you how the internal PDF structure will look like in the Acrobat.

view-internal-pdf-structure-in-acrobat-x

As I mentioned earlier, there is a free tool as well to view the internal PDF structure named PDFXplorer. You can download it from this link. It is very simple tool which gives you only one option to browse a particular PDF and then view its internal structure. You can see in the following snapshot that how the internal structure will look like in this tool.

view-internal-pdf-structure-using-pdfxplorer

I hope this post helps you to view the internal PDF structure.

Accessible PDF Files and Checking PDF Accessibility using Free Tool

In general terms, accessibility is used to describe the degree to which something is available to as many people as possible. However, it is often used to focus on people with disabilities and their right of access to entities. Assistive technologies are helping the people with special needs in all the areas of life. This is true in the field of computer and internet as well, so that this technology could also be made accessible to the people with disabilities.

While making the computers and web accessible to the people with such special needs, it is important to make the contents and data accessible to the users; PDF files are also part of these contents. When we want to make a PDF accessible, it means we need to follow certain standards while creating or editing PDF files, so that the contents of these PDF files can be accessed by the assistive technologies and provided to the users.

These standards include making the contents completely tagged and marking the document tagged as well, defining document language, providing accessible font encoding, bookmarks, and consistent heading structure etc. You can find more details about accessible PDF files in section 14.9 of PDF Specification Reference. And in order to understand the accessibility for the contents published on the web, you need to check out W3C document Web Content Accessibility Guidelines.

We have seen a little bit about accessibility and accessible PDF files. Now, let’s have a look at a free tool, called PAC (PDF Accessibility Checker) , provided by “Access for All” Foundation. This is a very simple tool to test the technical accessibility of the PDF documents. You only need to download and extract the tool, and run PAC.exe to check the PDF.PDF Accessibility Checker

Simply browse the PDF file you want to test for accessibility and press Start Check button. It will show you all the passed results in the testing criteria.

Learn and Understand the Structure of a PDF File

PDF and Its Structure

PDF stands for Portable Document Format. It is an open standard for document exchange. A PDF file contains both text and binary data. When a PDF file is viewed using a text editor, one can see only the raw objects which form the contents and structure of the PDF file.

The PDF file is structured in hierarchical manner. This structure defines a flow by which a PDF viewer application reads the contents in a sequence and draws them on the screen. The syntax of a PDF file can be described at three levels — object, file and document.

In order to better understand the structure of a PDF file, we need to consider it in four parts — objects, file structure, document structure, and content stream. In the following paragraphs, we’ll have a look into these individual parts of the PDF file.

Objects

A PDF file is composed of small sets of basic types of data objects. These basic data objects collectively form a PDF document’s data structure. These objects include the character set which is used to write these objects and other syntactical elements. The basic types define the properties of the objects and the syntax as well.

File Structure

The second part of the PDF document is file structure. The way basic objects are stored in the PDF file and later accessed or updated is defined by the file structure. The file structure is independent of the semantics of the objects; this means that the file structure is only responsible for organizing and updating the objects.

Document Structure

The document structure actually describes that how the basic objects are grouped together to form various components of the PDF file. These components can be pages, annotations, form fields etc. So, in fact, this part describes the semantics of the components of the PDF file.

Content Stream

A sequence of instructions which describe the appearance of any graphical entity is represented in the form of content stream. The content stream is also composed of objects, however these objects are distinct from the basic types of data objects.

How Open File Formats Can Contribute to Green Computing

What is Green Computing?

Green Computing refers to environmentally sustainable computing. San Murugesan defines the field of green computing as “the study and practice of designing, manufacturing, using, and disposing of computers, servers, and associated subsystems—such as monitors, printers, storage devices, and networking and communications systems—efficiently and effectively with minimal or no impact on the environment.”

Bits, Bytes, and CO2!

The green computing is not limited to the hardware; it also encompasses the software and the data. Whenever the data is processed, it takes computer resources; the more data we process, the more carbon dioxide (CO2) we add to the environment. The electricity consumed by cloud computing globally will increase up to 1,963 billion kWh by 2020 and the associated carbon dioxide equivalent emissions would reach 1,034 megatons.

Write Efficient Software

One way to reduce the CO2 emissions is to use energy-efficient hardware. IT industry can also contribute to the green computing by increasing the use of renewable energy. In addition to that, efficient software, which can process the same amount of data with fewer lines of code, in an effective way, avoiding redundancy to save CPU power can also greatly contribute in this area.

Role of Open File Formats

The role of open file formats is no less in contributing to green computing. According to Wikipedia, an open file format is a published specification for storing digital data, usually maintained by a standard organization, which can therefore be used and implemented by anyone.

The proprietary file formats or standards can work for particular platforms with certain software, while open file formats can work on any platform with a variety of tools and software for various purposes. The closed or proprietary formats are more likely to become obsolete sooner or later and leaving behind a pile of digital footprint as ‘digital waste’. This digital waste is going to contribute in the carbon dioxide emissions for no reason. While there are less chances of all this happening with the open file formats.

Our Social Responsibility

The IT industry should focus on the efficient hardware and software, use of renewable energy, and open file formats to contribute to the green computing in the best interest of the generations to come.