Archives

Categories

Ad Lagendijk Ad Lagendijk 7 March 2010

Making good pdf files with MS Word

Tags: , , , ,
Posted in Technical (ms word, tex), Tips

Office document or (La)Tex
Creating a pdf file
Additional requirements
Bookmark generation with MS Word
Save to PDF
Show your tree
Skip numbered entries
Different numbering of the same level
Thinning your pdf file
Fast web view
Bookmarks with Latex
Example pdf files
Recap

Adobe has blessed the digital world with a document format that is really platform independent. I am referring to the pdf (“portable document format“). As a result the leading format for scientific articles is the pdf file. Scientific journal editors require from prospective authors that they submit their manuscript as a pdf document. More and more grant organizations also request proposals to be sent in as a pdf file.


Office document or (La)Tex
The pdf standard has many advantages. Its major disadvantage is its limited editing possibility. As a result authors have to keep at least two files of a manuscript in sync: the “source” manuscript and the pdf version. Additional graphic material is either embedded in the source or is kept as separate files. The most popular formatter for generating the source is Microsoft’s Word, or a comparable office document formatter. Another source approach is to use a simple ascii file, containing script language directives that are to be interpreted or compiled. Of the latter class the Tex, LaTex, AmsTex family is the most popular. For some fortunate reason the Tex-family has escaped the fate of almost any open-source project: multiple distributions, bad documentation, incompatible versions, and a Windows-hostile developers community.

In office environments where (parts of) documents are written or corrected by non-scientist the Tex-way is a no-go area. No secretary is willing to learn AmsTex. When grant organization prescribe formatting of their forms escape of MS Word is also impossible. The availability of a good WYSIWYG editor for the Tex-family could bring this family in the office environment. In an upcoming post I will write about the WYSIWYG Tex-editors, Scientific Word and Lyx. Unfortunately both editors fail in many aspects.

Happily MS Word, or comparable office document formatters, will often do the job. Also for  scientific papers if only occasionally a mathematical formula is needed. All scientific journals accept manuscript composed with with MS Word. In the following I will assume you work on a Windows platform.


Creating a pdf file
To create a pdf file we need a program that “translates” a source file into a pdf file. And we would like to be able to polish up the pdf a little bit.

To make the pdf file readable on as many computers as possible a good pdf file has all fonts embedded in the file (will make the file larger)



Additional requirements
People more and more begin to discover the ease of pdf bookmarks for quick navigation through multi-page pdf files. Your professional pdf files should have many bookmarks.  I will show in this post how to make these bookmarks. To my disappointment many pdf writer programs, including older Adobe’s pdf printer drivers and many other pdf printer drivers, do not generate bookmarks. Adobe’s newest printer driver  “Adobe pdf converter”, part of Acrobat does generate bookmarks. In the next I will use the built in Save as PDF in MS Word 2007.


Bookmark generation with MS Word
You must prepare your MS Word document with the style files Header 1, Header 2 etc. or styles that are derived from these heading styles. Assign a heading level to a certain paragraph. Right click on the headings number and you can adjust the numbering style of all headings: like A.2.3 or 1.b.III, for a three-level deep hierarchy.The numbering of MS Word is fragile and buggy, even in Word 2007, but the final result can have a professional look. The outline numbering of MS Word is superior to the Tex-family, because with only one mouse click you can change the hierarchical level of one or more paragraphs. I find this extremely handy.


Save to PDF
To create a pdf file with bookmarks, you must use MS Word 2007, because earlier versions of MS Word did not have their own “Save As PDF” option. In MS Word 2007 this saving to pdf is built-in and it is of good quality. If you have recently installed an Adobe program chances are that you have an Adobe printer driver installed with a name like “Adobe PDF”. If the version of this printer (driver) is new enough it also has an option to convert headings to bookmarks.

Before you save your MS Word document to PDF you have to do something else. MS Word is so smart to keep the numbering of the heading stored as macros. If you save such a file to PDF the bookmarks will be there but they will only contain the titles of the headings, but not their numbering. You should flatten these macros to text. You can do this easily, but first you have to save the source file under a different name. For instance: old_name_flat.doc. This is an indispensable safety precaution because the conversion of macros to text is irreversible, and you want to keep the file without the flattened macros. PLEASE DO NOT FORGET.
Flatten headings: Go to Developer. Click on Visual Basic. The Visual Basic Window will show. Go to View: Click Immediate Window. The “Immediate window” will be shown. Type in that window: ActiveDocument.ConvertNumbersToText
and press enter. The file buffer will now contain your document with the heading numbers as text. Do not make the mistake of editing that file. (If you see errors, reload the unflattened document version and edit that file). Now you can Save As PDF or XPS. Click Options Check Create bookmarks using: and check Headings.  Check Bitmap text when fonts may not be embedded. Click publish. The pdf file will be generated, and to my surprise: awfully quick. But you are still not not done yet.


Show your tree
Even if you have never seen a bookmark navigation pane before, once you see one in a pdf reading program you know immediately how to use it. But if you have never seen such a pane, or hardly ever used it, you will only use it if it is *shown* by the pdf reading program. You can force this showing.

Any pdf file contains a number of directives to tell the reading programs how it wants itself to be displayed. If these directives are not set, the reading program will use its own defaults. Well-developed reading programs will never override these directives if they are set.

The directive that is essential for us refers to the showing of the navigation pane with the bookmarks when pdf file is opened. Of course any human reader can force the reading program to open the navigation pane, but human readers often do not know of this option. If the show-navigation-pane-directive is not set in the pdf file some reading programs, notably Adobe Reader, will not show the navigation tree, whereas other programs, notably Foxit do show the tree, only if any bookmarks are present of course.

Navigation pane: Open the pdf file in Acrobat, go to File go to Properties, go to Initial View and set Navigation tab to Bookmarks Panel and Page, and save the file. And open it with an independent reading program to check that the navigation pane is present. With Acrobat you can even force what bookmarks will be collapsed and which one will be expanded in the navigation tree. It is quite likely that other programs, besides Acrobat, also allow for setting the initial opening of bookmarks.


Skip numbered entries
It is not uncommon that a grant organization asks – numbered outlined – information that is not always relevant for all applicants. One solution is that you implement the irrelevant heading and enter “Not applicable” under it. For instance when they ask the descriptions of your patents and you have none. This approach of “place holder headings” is bad practice. First it takes space and the proposal will have severe space limitations and you do not want to waste space. And secondly you do not want to emphasize that you do not have patents. If you do not introduce the irrelevant heading the problem is that your numbering will be out of phase with the prescribed format of the grant organization. MS Word indeed has a facility to skip numbers, but that does not work consistently and has severe unwanted side effects.
Hiding: The solution is simple. You introduce the “non-applicable” heading, but makes its font hidden. MS Word will count it but not show it. Do this in the flattened version. Now you are done. After you Save as PDF or XPS you will have the right bookmarks with absent non-applicable headings and with the right numbering. Exactly what you wanted.


Different numbering of the same level
I have noticed recently that the European Research Council, the European grant organization that is quickly becoming very important, makes it difficult to implement their prescribed numbering of hierarchical levels. In one part of their prescription a level is “numbered” with a number and somewhere else the same level is numbered with alphabetic letters. MS Word cannot deal with this. Period. The only solution is to partition the mother file in subfiles, make pdf files of the subfiles and combine the sub-pdf files, for instance in Acrobat. After you have done so, you will have to edit the root bookmarks of the individual sub files in the combined files, as they contain the awkward names of the individual files. I always do this bookmark editing with Acrobat.


Thinning your pdf file
The pdf files can be bulky. Acrobat has a very simple tool Optimizer that easily reduces the size of a pdf file with 70 to 80% without loss, as far as I can see, of quality.


Fast web view
If you put your pdf file on-line it should be optimized for fast web view. (with for instance Acrobat). This option is often on by default in pdf creation programs. Fast-view optimized pdf files are serialized and can be shown page by page, good for slow Internet connections.


Bookmarks with Latex
With Latex the generation and editing of bookmarks is relatively easy using the the hyperref package. No Acrobat is needed.


Example pdf files
I have prepared an example pdf file that has numbered bookmarks, has skipped a heading and will open with the navigation pane visible


Recap

  1. In MS Word 2007 use Heading styles
  2. Set your style of outlined numbering
  3. Use the macro ActiveDocument.ConvertNumbersToText
  4. Save As PDF or XPS with headings as bookmarks
  5. Edit the pdf file with Acrobat and enforce the pdf file to be opened with the navigation pane showing
  6. Run optimizer to reduce its size
[CR_show_voting_stars_handler]
- - - - - -
If you like this post why don't you email subscribe to our new posts. Or subscribe to our RSS feed.
  1. Unregistered

    7 Jun 2011 20:11, Hans van Leunen

    I know how to create a scientific paper in MS Word 2010. I use the equation editor extensively. I designed a way to add equation numbers via end-line text boxes. I have move a set of commands to the lint at the top of the window. This eases the inline usage of equation parts. I find the font types inside MS Word acceptable. Cambria Math is very flexible. However, some annoying problems exist. Conversion to html is not reliable. You often are required to re-edit the result. And I do not want to publish in html. You never know what the browser does. Conversion to pdf is reliable, but after conversion the fonts look bad. Including fonts via the options settings does not give any relief. Publishers of scientific papers seem to have similar problems with Word and ask you to put equations in Courier new. Word does not allow that. It uses Cambria Math and that’s it!
    It would be nice to have a better pdf printer than the one included with Word 2010. See examples on my website or for example http://www.vixra.org/abs/1101.0055.

  2. Unregistered

    4 May 2012 6:19, Entourage

    This is very very useful information and it helped me a lot.

  3. Unregistered

    28 Dec 2016 19:51, bandar togel

    When grant organization prescribe formatting of their forms escape of MS Word is also impossible. The availability of a good WYSIWYG editor for the Tex-family could bring this family in the office environment.togel online

  4. Unregistered

    29 Dec 2016 22:35, bandar togel

    For some fortunate reason the Tex-family has escaped the fate of almost any open-source project: multiple distributions, bad documentation, incompatible versions, and a Windows-hostile developers community.togel singapore