Ad Lagendijk Ad Lagendijk 20 September 2012

Which is a better document standard: pdf or xml?

Tags: , , ,
Posted in Web 2.0

Some time ago I was asked by Oxford University Press to write an article for their Library Magazine about which document format is better: pdf or xml. I defended pdf and following is my text. Martin Fenner defended xml. You can download both contributions as pdf file (!) here.

Pdf is still strong
For scientists the pdf (Portable Document Format) standard, put forward in 1993 by Adobe, is a blessing. Suddenly we didn’t need expensive Postscript printers any longer and the output was from then on both of high graphical quality and consistent. Up to today publishers of scientific content cannot afford to refrain from supplying a pdf version of their content. Out of commercial reasons software companies and publishing companies are continuously looking for opportunities to kill the standard. Microsoft, a company that till now has not been able to have its cash-cow word processor MS-Word produce consistent output, decided to invent an open xml standard, Office Open XML. The malversations by Microsoft to get this standard accepted, read like a soap opera. Adobe’s pdf is an open ISO-standard since 2008. It is also the only file format that can reasonably be protected without the use of dedicated servers. Although the increased use of Javascript makes it less safe.

Supplementary material
Another line of attack by science publishers is to lure the scientist away from the pdf file by convincing the scientific community that important additional scientific material – like video, interviews, photographs and large datasets – can be more conveniently packaged in a different format. Preferably a proprietary xhtml-like format that due to the inclusion of proprietary server-side scripts will be bound to the web site of the publisher. Many of the additions are not necessary or even unwanted. For sharing of large datasets scientific collaborations find their own way of sharing data and need no assistance of commercial publishers.

Amusing is the fact when you look at the quality of the state of the art (x)html-versions of scientific manuscripts as shown on present-day publishers’ websites they are still of much lower quality than the 1993 pdf files.

Improved pdf files
In my opinion scientists are happy with the proliferation of the pdf file. There is some room for improvement and signs of danger. Pdf files allow for bookmarks and hyperlinks, that are useful for navigating through large pdf files. I would like those implemented more frequently, by authors and publishers. The danger is that Adobe is continuously updating and complicating its Adobe Reader and Adobe Acrobat. Javascript is now allowed in pdf.

The answer to the question in the title is of course: pdf.

- - - - - -
If you like this post why don't you email subscribe to our new posts. Or subscribe to our RSS feed.
  1. Unregistered

    20 Sep 2012 19:39, Jean Luc Lebrun

    As an aside, the header of the comment box I am entering this comment into reads: “XHTML: You can use these tags:…
    Now to your question. Yes, there is always the danger of the addition of proprietary scripts – I would normally have voted for PDF, but if pdf formats were so un-proprietary, then why are all the non-Adobe pdf-to-doc file converters out there having such a hard time giving a 100% perfect conversion? I work on an open-source project (SWAN-Scientific Writing Assistant – Joenssu University) that assesses the quality of a scientific paper before submission. Our team has such a hard time importing a pdf document to extract its structure automatically. No problem with .odt files though. OpenOffice is really “Open”.

  2. Ad Lagendijk

    20 Sep 2012 20:21, Ad Lagendijk

    Jean Luc thanks very much.

    Importing to pdf seems indeed very difficult. Even Adobe itself finds it difficult as its InDesign does not allow you to import a pdf file. Importing MS Word doc in InDesign is much easier. Shouldn’t you compare odt to doc. I think pdf was never meant to be a text processor. It was a converter right from the start. What I learn from you is that the conversion is basically a one-way proces.

    Exporting to pdf seems much easier as many programs do this quite well.

  3. Unregistered

    6 Feb 2013 6:08, Jan Bakker

    PDF is better for contracts, but XML has alot more editing possibilities. Cant really make a judgement on this one.

  4. Unregistered

    20 Feb 2015 7:00, Marko

    I personally use and prefer PDF files since they are far more usefull when it comes to using them on several devices. Xml can not be opened on tablets and phones without any additional apps on them, and PDF, in most cases, can. Thanks for the great article though 😉

XHTML: You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


By submitting a comment here you grant this site a perpetual license to reproduce your words and name/web site in attribution.

Subscribe without commenting