## Which is a better document standard: pdf or xml?

Some time ago I was asked by Oxford University Press to write an article for their Library Magazine about which document format is better: pdf or xml. I defended pdf and following is my text. Martin Fenner defended xml. You can download both contributions as pdf file (!) here.

Pdf is still strong
For scientists the pdf (Portable Document Format) standard, put forward in 1993 by Adobe, is a blessing. Suddenly we didn’t need expensive Postscript printers any longer and the output was from then on both of high graphical quality and consistent. Up to today publishers of scientific content cannot afford to refrain from supplying a pdf version of their content. Out of commercial reasons software companies and publishing companies are continuously looking for opportunities to kill the standard. Microsoft, a company that till now has not been able to have its cash-cow word processor MS-Word produce consistent output, decided to invent an open xml standard, Office Open XML. The malversations by Microsoft to get this standard accepted, read like a soap opera. Adobe’s pdf is an open ISO-standard since 2008. It is also the only file format that can reasonably be protected without the use of dedicated servers. Although the increased use of Javascript makes it less safe.

Supplementary material
Another line of attack by science publishers is to lure the scientist away from the pdf file by convincing the scientific community that important additional scientific material – like video, interviews, photographs and large datasets – can be more conveniently packaged in a different format. Preferably a proprietary xhtml-like format that due to the inclusion of proprietary server-side scripts will be bound to the web site of the publisher. Many of the additions are not necessary or even unwanted. For sharing of large datasets scientific collaborations find their own way of sharing data and need no assistance of commercial publishers.

Amusing is the fact when you look at the quality of the state of the art (x)html-versions of scientific manuscripts as shown on present-day publishers’ websites they are still of much lower quality than the 1993 pdf files.

Improved pdf files
In my opinion scientists are happy with the proliferation of the pdf file. There is some room for improvement and signs of danger. Pdf files allow for bookmarks and hyperlinks, that are useful for navigating through large pdf files. I would like those implemented more frequently, by authors and publishers. The danger is that Adobe is continuously updating and complicating its Adobe Reader and Adobe Acrobat. Javascript is now allowed in pdf.

The answer to the question in the title is of course: pdf.

1. 20 Sep 2012 19:39, Jean Luc Lebrun

As an aside, the header of the comment box I am entering this comment into reads: “XHTML: You can use these tags:…
Now to your question. Yes, there is always the danger of the addition of proprietary scripts – I would normally have voted for PDF, but if pdf formats were so un-proprietary, then why are all the non-Adobe pdf-to-doc file converters out there having such a hard time giving a 100% perfect conversion? I work on an open-source project (SWAN-Scientific Writing Assistant – Joenssu University) that assesses the quality of a scientific paper before submission. Our team has such a hard time importing a pdf document to extract its structure automatically. No problem with .odt files though. OpenOffice is really “Open”.

2. 6 Feb 2013 6:08, Jan Bakker

PDF is better for contracts, but XML has alot more editing possibilities. Cant really make a judgement on this one.

