Archives

Categories

Ad Lagendijk Ad Lagendijk 4 February 2009

Software, like EndNote, for managing references is basically trash

Tags: , , , , ,
Posted in Getting published, Technical (ms word, tex), Tips, useful software, Web 2.0

Every scientist has to cope with the problem of managing references (or citations, or notes, or literature, or whatever you call it.)  When writing his second paper he discovers that he has to type a number of references that he already typed in when preparing his first paper. This repetitive action calls for a repository of references. In an ideal world many group members submit their references to this repository and after some time a very efficient storage medium has been created.

Pitfalls
Alas. The real world is never like this. And for many reasons. Typos in entries will live for ever, or will give rise to duplicate entries. Incomplete entries will downgrade the usefulness of the database. Inconsistent use of case (uppercase, lowercase, title case) is causing a mess. Different spelling of names will lead to duplicate entries, or to angry readers when they see their name misspelled in a list of references in an article in a high-impact journal. Many programs (or ‘wizards’) that import references cannot deal with extended characters (leave alone Unicode).  Names with diacritics (like umlauts) are dealt with either inconsistently or wrongly.  Partitioning of names into initials, first names and last names is full of traps and many import filters fall in those traps. In this respect the following error in the book Latex by Leslies Lamport (an excellent book and excellent macro package, of course) is typical: on page 141 (Chapter on “The Bibliography Database”) Lamport discusses “von Beethoven, Ludwig”. The name of course is Ludwig van Beethoven, as the name is of Flemish origin. And indeed “Van” is not his middle name.

Science social networks
Given the success of building old boys networks through LinkedIn and Facebook, quite a number of entrepreneurs are trying to capture the social activities of scientists into their social site. I will give a few examples of social sites aimed at scientists: Lalisio, ResearchGATE, SciBog, SciLink, SciMeet, and SciSpace. The site SciTechNet collects many of them. Increased number of collaborations and increased sharing of data are mentioned as advantages when participating in such a network. A common aspect of all those sites is that a participating scientist can maintain a personal profile, in which he can, among other things, import his publication list. An up-to-date publication list is of utmost importance for the career of a scientist. I bet that any social site that would get this publication list complete will attract the majority of scientists. Unfortunately at all those sites the imported list is way incomplete and full of errors (for all the reasons I mentioned earlier). In addition the editing facilities are unwieldy.

ISI Web of Science
The Web of Science is a product of Thomson Reuters. Given the vast amount of resources of that company you would expect their web interfaces to be highly professional and comparable to the quality of Google. But their monopoly, the scitation data base, gives them such a comfortable position that they can get away with their amateurish web interfaces. Let me give just a few comments on the quality of their data base . The same authors regularly occur with different names (three different names is not an exception). ISI made the stupid mistake, typical for Americans not being able to deal with names of foreign origin, to add the prefixes of last names to the last names. So a guy called “Johannes de Boer”, who is a scientist with first name Johannes and last name “de Boer” is classified either as “deBoer” or “DEBOER”. This is even dumber if you realize that people who really have as last name “deBoer” or “DeBoer” also exist, and also exist in their database. In addition ISI has abbreviated the names of author institutions, whereas the full names of the institutions were on the papers they store in their database. So why don’t they clean up this mess? Why? Read what Wikipedia says about monopolies.

EndNote
ISI has acquired EndNote and is (still) attempting to integrate this software product in their Web of Science. For academic institutions EndNote can be obtained at a very low price. Even for that price I do not like this poor software package. The Windows version seems to be designed 10 years ago. Primitive interface. Very primitive MDI (Multiple Document Interface) format with inflexible string grids. Unclear how Unicode is dealt with. Poor context menus (not even a change of case of the selected text is offered.) Very poor hyperlinks to journals are presented. Etc. And that for a commercial company with unlimited resources. Maybe the free,  open-source, product Zotero is an interesting alternative. As always: if an open-source product is good it will be attacked by the commercial market leader. So, the Thomson Scientific division of Reuters, maintainer of the EndNote software, is suing the George Mason University over Zotero. The fight is over the alleged proprietary file format of EndNote‘s library files. Program developers (like those of EndNote) that use non-ascii data files have something to hide, often their lack of competence.

ISI ambition
The long run ambition of ISI is clearly to expand their site into *the* social site for scientists. Given the fact that their citation data base is a very good starting point, I think that if they are really serious they will win. Unless some other giants will try to grab this very interesting market. Or unless the academic community will get their act together.

Request for Comment Needed
How can we end this mess concerning different and incomplete storing formats of bibliographic information. Learned librarian societies already have put forward a lot of instructions. But these are very complicated, not very well suited for modern computer implementation and very general as they refer to all literature and all books. Let us stick to the sciences.

Compare how internet communication has been developed and standardized. Already more than thirty years ago a number of very smart people thought about exchange of information between computers and networks running totally different operating systems (and even different character sets). In a set of voluntary standards called Request for Comments the rules were developed. And these rules work, up to today. No company, not even Microsoft or Google can afford not to obey these rules.

I think we should have a few Request for Comments on how to deal with bibliographic records. About how to deal with multiple author names, first names or initials. About how to deal with Chinese author names. If succesful the ISI would have to rework their whole database. And we would not have this endless editing and corrections of imported records.

Unique Author Identifier
Another helpful resource would be a database where each author of a scientific paper could get a unique identifier and in which a correct spelling of his name is to be found. Only the author himself (or a legal representative) through a secure connection and digitally signed communication would be allowed to edit his entry. Databases like the ISI would have to replace author’s names by this unique identifier.

- - - - - -
If you like this post why don't you email subscribe to our new posts. Or subscribe to our RSS feed.
  1. Unregistered

    4 Feb 2009 22:21, Klaas Wijnne

    Well, at least papers now have doi’s. As for names, there is always the option of changing one’s name to be more anglo-saxon. That’s what I did.

  2. Unregistered

    5 Feb 2009 11:09, Jacopo Bertolotti

    What about the “Wiki” way? Let’s assume that somewhere there is a server where the MediaWiki software is installed. Scientists could get the right to edit this wiki only when their identity is clearly proved (this is far from being difficult since any of us know in person a big number of other scientists) and then log-in with their real name and surname. At this point they can create and edit a database of references, correcting mistakes and adding missing entries.
    Assuming that scientists can do it without damaging the references of their competitors this would provide an open-access and self-correcting database.
    Finding the server space is easy and installing the MediaWiki software with all the necessary plug-ins and flag correctly set is even simpler. The only big problem would be (imho) to build up a crytical mass of users.

  3. Unregistered

    5 Feb 2009 20:55, Bruce

    Quick correction:

    The fight is over the alleged proprietary file format of EndNote’s library files.

    No; the fight is over EndNote’s citation styling configuration files.

  4. Unregistered

    17 Feb 2009 17:29, Antonio Neves

    Better to have a DOI for scientists, then it would be more like, DSI “Digital Scientist Identifier”. Only if it doesn’t get sued by the DOI Foundation.

  5. Unregistered

    1 Mar 2009 10:52, Bram van Ginneken

    I miss a reference to Scopus here, Elsevier’s alternative to Reuter’s Web of Science. It’s becoming a good alternative. Google Scholar is also a good tool, but it has a different focus. Scopus has already spurred ISI to improve their website and to include more and more journals and conferences in their database. In my field (medical), Pubmed has surfaced as a very well maintained and easy to use tool to search for literature. Many researchers don’t bother with a homepage (or newfangled alternative) with an up-to-date publication list, they just refer people to pubmed. If you have a common name, you can design a search query which corrects for that. But the Unique Author Identifier would definitively be a big step forward.

    I think the blog post covers too much topics. Software for managing references is just one little part of the bigger topic of organizing scientific papers. We use Jabref to maintain a big bib file in my group, with most entries slurped directly from pubmed, and I’m quite happy with that.

  6. Unregistered

    27 Mar 2009 10:17, Jacopo Bertolotti

    On today’s issue of Science there is a article on the reasearch identification number: http://www.sciencemag.org/cgi/content/full/323/5922/1662?rss=1

  7. Ad Lagendijk

    9 Apr 2009 18:54, Ad Lagendijk

    @Bram
    We also used Jabref but abandoned it for reasons I can’t remember right now. I agree that more and more companies/organizations supply bibliometric information. But it has to be realy good (that is almost [let us say 99.9%] complete) and it has to go back in history quite long.

  8. Unregistered

    23 Apr 2009 14:43, Witek

    Hello! I agree with the author that spelling of names is often mistaken in bibliographic databases. Only today I spent time to pull bibliographic data from ISI, IEEE, EBSCO, ScienceDirect, Emerald. Names with foreign letters are westernised by some, and not by others. Quite a lot of time was spent on corrections. Have you tried Zotero?

  9. Ad Lagendijk

    23 Apr 2009 19:57, Ad Lagendijk

    I am happy that the internet is more than charset us-ascii. Although many western developers pretend not to know. I love Unicode and utf-8. We do not use Zotero yet. We have developed our own php-mysql database system. But maybe it is about time our group will try Zotero.

  10. Unregistered

    11 Sep 2009 1:55, David Stern

    ISI have introduced “ResearcherID” but it doesn’t really seem to interact with the main Web of Science database at all. My claiming my articles in ResearcherID doesn’t add a tag to those records so that other people searching the database can work out which articles written by “D Stern” are mine. There is also now academia.edu

  11. Unregistered

    24 Aug 2010 15:30, Witek

    Have you tried Mendeley? Looks like an interesting alternative to EndNote.

  12. Unregistered

    26 Sep 2012 15:40, Carlos Rivera

    Thankfully we have Mendeley now.