Organizing your results

Otto Muskens 17 July 2010

Organizing your results

Tags: Administration, Management, research
Posted in Research and education, Tips, Web 2.0

When you start your career as a postgraduate student, it is normal that you start collecting your scientific results in a slightly unorganized way. However as time proceeds, some basic rules are needed to keep track of your work. Every scientist has to develop his own systems for keeping organized. Ideally, a minimal set of rules should be used consistently by group members, including staff members and students, to facilitate data exchange. Perhaps some aspects appear trivial, but in my contact with undergraduate and postgraduate students I have seen many shocking examples of (lack of) data management. Here I give an example of how to organize data using a Windows operating system, based on my own set of rules. Again it should be emphasized that this is just one example of an organizational structure, which is aimed at avoiding some of the most common mistakes.

Folder hierarchy
Results should be organized in a hierarchical folder structure of which at least the raw data can be sorted by date. After several modifications, my file structure has crystallized into a form which can be explained by the following example:
The first folder identifies where and when the project took place. A first layer of subfolders discriminates ‘experiments’, ‘manuscripts’, ‘presentations’, ‘theory’, ‘administration’, etc. In the ‘experiments’ folder I have decided to sort thematically, i.e. on specific experiments, in this case ‘ebs’. For me this is a good way because different experiments usually imply different experimental setups and methods of data analysis. Within the thematic folder there are folders with raw data sorted on year_month_day. This is the only format which can be sorted chronologically, although many students (including myself at the beginning) stubbornly start out with day/month/year. In the thematic folder I also place subfolders such as ‘matlab_programs/’, which contains scripts to analyze the data, and ‘origin/’ which contains all the processed data using the program Origin. In this way I do not have to dig into the raw data folders to find some analyzed results. Any special characters should be avoided in the file name. For data-sharing with Linux/Unix users, the difference between capitalized and non-capitalized parts of the filename should be taken into consideration.

File names and unique identifiers
Already the results from a single measurement day can give rise to much confusion in the logbooks of undergraduate students. A common mistake is the lack of chronological identifiers and use of ambiguous names like ‘data1.txt’, ‘test.dat’, or ‘signal.txt’. In all circumstances, the first rule should be that measurement files have to start with a chronological identifier number, which can be used to track down the order of measurements and which relates to the lab journal. In experimental research, often things go wrong. Instruments break down and human error results in mistakes in the data collection. The unique identifier number is often the only way to backtrack your results and identify which ones are valid or not. Lab journals should contain information such as ‘during file 10 the laser became unstable’ or ‘I forgot to set the stage to 1.8 in files 10-20’. Ten years ago, the unique identifier number was about the only thing you could fit into the 8-character filename. Nowadays, we have the freedom to put a lot more information after the identifier, separated by underscores. This may include the sample identifier and experimental settings. Of course this information has to be present in the lab journal as well and preferably in the form of a header in the file itself.

Consistency
Once you have a system, the most important thing is to stick to it and be consistent in your use of folders and identifiers. It is always possible to add some new folder names as you proceed – such as e.g. the folder ‘grants’ when you start your own group – but the basic framework should stay the same over many years. This brings me to another problem, what to do with the emergence of new methods like Web 2.0, file sharing with Google docs, wikis, and new methods of sorting data such as introduced in Windows 7. Again, if you plan to stick to it I believe there is no objection in starting off using one of these available ways of managing and sharing results. However, and this I believe strongly, a significant risk exists if you start using different systems at the same time, as this can lead to problems with maintaining an ordered structure and in the worst case, loss of results. In terms of data management a conservative approach is probably the safest.

Data management
I carry around the results of 10 years experimental research on a portable hard disk which at present contains 74 GB of content, ranging from raw data to manuscripts and presentations. The amount of data collected per year has increased tremendously, as shown in the bar diagram. The dramatic increase in data rate follows the technological capabilities. In my case this increase can be explained by the use of automated data collection using software controlled stages, spectrometers, and cameras. Note that the drop in data in 2009 results from a change in my backup system, which was necessitated by fact that some sessions now yield 1GB of high-resolution CCD images, resulting in a total of around 30 GB for 2009. I now keep these measurements on the original pc, with a dvd backup. My example may be exemplary for experimental scientists who nowadays are confronted with large data streams of up to terabytes. Of course the data management problems of a table-top scientist are peanuts compared to those of the real world or CERN. Still, many institutional file servers have not yet adapted to accomodate the rapidly increasing amounts of data from modern experiments.

Paperless office
While my computer is becoming more and more organized, my desk and shelves are still a complete mess piled with forms, bills, and paperwork. Especially since most information arrives electronically, it is more efficient to prevent double bookkeeping and organize everything on the computer. The concept of the paperless office has been invented to increase productivity and limit the amount of time spent on sorting paperwork. Again, here the keyword is consistency. If you go for paperless it is very annoying to still have piles of paperwork as you don’t want to sustain a double system. For myself, I am still struggling with the maintenance of a partially hardcopy administration, and I would be happy if someone else wishes to share his or her views on that aspect of professional practice.

- - - - - -

If you like this post why don't you email subscribe to our new posts. Or subscribe to our RSS feed.

19 Jul 2010 8:46, Julio E. Peironcely

Thanks for the advice. It sounds almost too simple and like something people should come up by themselves. Unfortunately, most of PhD students need somebody to tell them how to be a bit organized.
19 Jul 2010 17:03, Otto Muskens

I have made a short presentation on file organization for my group several weeks ago and people found it really useful and an eye opener. On the drive of one of the students I found folders named ’02june10′, ‘7th May’, ’26thmay, ”2010june9th’, ’31march’. And most of the time they are not even aware of this chaos.
5 Sep 2010 1:05, Carbon Souls

Re: handling and organizing paper–
I recommend using the system from David Allen’s book _Getting Things Done_. He recommends using folders labeled with whatever most makes sense to you so that months later when you’re looking for a particular piece of paper you know you have, you have an idea of where to look. These folders are simply kept alphabetically. I also maintain a list of names of my folders; whenever I create a new folder, I add the name to the list in the right spot (alpha). Every so often, when the list is cluttered or I’m too burned out that day to do anything else, I add the new entries to the typed list (in Excel), print a new copy, and post it. That way, I can also scan the list for places where I might have put something I need. This works very well and has made my life much easier. A critical point Allen makes is that the barrier to making a new folder should be very low, or you won’t do it. Have the folders near your work space, have plenty of new folders available, have a label-maker near-by, and don’t worry if folders only have one lonely sheet in them; make folders for even single sheets of paper you need to keep. If you need it, you need it. If you might need it, you need it. Every year or so, go through the folders and purge stuff you no longer need. Sounds simple, and it is, which is why it works.
5 Sep 2010 12:19, Otto Muskens

@CarbonSouls thanks very much for the useful tip! Its the sorting out where things usually go wrong with me: you have to do it at the moment you hold the paper in your hand, but its just too simple to throw it in the corner for later…
5 Jan 2011 10:20, Jo-M

Thanks for a well thought out article. I share your pain when it comes to students not maintaining any kind of organisation and not even seeming to notice the problem. I’m the lab Photoshop expert and students often come to me asking me for help with their figures. When I ask them to show me their original image data files they can very rarely find them without a great deal of searching and sometimes not at all. Makes me want to tear my hair out!
I recently posted about my suggestions for data filing – your system takes it even further and I’m looking forward to putting some of your ideas into place. Cheers and Happy New Year!

Archives

Categories

Cool Links

Cool Blogs

Add-ons to Science Survival Guide book