PDF/A
Encyclopedia : P : PD : PDF : PDF/A
ISO 19005-1:2005 is an ISO Standard that was published on October 1, 2005:
- Document Management - Electronic document file format for long term preservation - Part 1: Use of PDF 1.4 (PDF/A-1)
PDF/A is in fact a subset of PDF, leaving out PDF features not suited to long-term archiving. This is similar to the definition of the PDF/X subset for the printing and graphic arts.
The standard specifies two levels of compliance:
- PDF/A-1a - Level A compliance in Part 1
- PDF/A-1b - Level B compliance in Part 1 (less stringent requirements)
The Standard does not define an archiving strategy or the goals of an archiving system. It identifies a "profile" for electronic documents that ensures the documents can be reproduced the exact same way in years to come. A key element to this reproducibility is the requirement for PDF/A documents to be 100 % self-contained. All of the information necessary for displaying the document in the same manner every time is embedded in the file. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. A PDF/A document is not permitted to be reliant on information from external sources (e.g. font programs and hyperlinks).
Other key elements to PDF/A compatibility include:
- Audio and video content are forbidden.
- Javascript and executable file launches are prohibited.
- All fonts must be embedded and also must be legally embeddable for unlimited, universal rendering.
- Colorspaces specified in a device-independent manner.
- Encryption is disallowed.
- Use of standards-based metadata is mandated.
Advantages to PDF/A
Electronic documents have countless advantages over traditional archiving formats (e.g. paper or microfilm). Improved accessibility alone may substantiate the implementation of an electronic archive. Some advantages of a PDF archive over a TIFF or a paper-based archive are:
- PDF stores objects (e.g. text, graphics), allowing for an efficient full-text search in an entire archive. TIFF is a raster format and must first be scanned with an OCR (optical character recognition) engine.
- PDF files require only a fraction of the memory space of original or TIFF files, without loss of quality. The smaller file size is especially advantageous by electronic file transfers (FTP, e-mail attachment etc.)
- PDF format can be optimized. The optimization can be focused on images (e.g. scanned checks) or extracting structured data (e.g. voucher information). TIFF treats all file information the same.
Background
PDF/A was originally a new joint activity between NPES - The Association for Suppliers of Printing, Publishing and Converting Technologies, and the Association for Information and Image Management, International (AIIM International) to develop an International standard that defines the use of the Portable Document Format (PDF) for archiving and preserving documents. The goal was to address the growing need to electronically archive documents in a way that will ensure preservation of their contents over an extended period of time, and will further ensure that those documents will be able to be retrieved and rendered with a consistent and predictable result in the future. This need exists in a growing number of international government and industry segments, including legal systems, libraries, newspapers, regulated industries, and others.
Literature
- [White Paper: PDF/A - The Basics] - from PDF Tools AG
See also
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
