The Future Text Project



Future Text Publishing Initiative : Rich PDF


We are working on vital aspects of digital publishing to support rich, open standards which work seamlessly with legacy documents: Our proposal is to distribute the resulting document as a PDF file, but with the original document zipped and embodied in the document, along with a neutral XML representation of the document. The result would be that a user who only has a legacy PDF reader would be able to open the document and see the basic document, whereas someone who has the creator software will get the native document with full interaction opportunity and other applications which support the XML specification will be able to extract more interactive data than a plain PDF would offer. Online archives hosting such a Rich PDF could detect such rich content and provide access to this via their own tools.


More Technical Description: We propose a mechanism in which any number of packages of data may be encoded within a PDF. Each package consists of a block of data and one or more codes to indicate how to interpret it the data in the package, and a unique identifier for the package within the PDF. Multiple codes on one package allow indications that it may be interpreted by both general tools and very specific tools. For example a package containing data backing a graph in the document would have a code to indicate that it's a CSV file, and a second code to indicate that it is in the classic "first row is headings" layout. A second package has a code which indicates it links one or more parts of the visible document to data, and within it's dataset it indicates that the location of the graph in the document is linked to the package containing the relevant CSV data. A naive tool can still discover the CSV data. A more advanced tool can link it from them document interface. This approach allows variety and innovation which will be required to achieve the full potential.



Document IDs to augment Server Networks


Furthermore, a potential to be looked into is for the documents published will have unique IDs appended, so that if the server location they were originally hosted at becomes unavailable, the document linking to the document will automatically be able to perform a (Google) search for the document and provide the user with a quick dialogue stating that the document was not found at the specified location but has been located a this other location and the user can choose whether to open the document or search for it elsewhere, with a quick click.