A Little Order

I recently saw a tweet from Katie Furman about realizing partway through your degree that your file system is terrible. I had this realization at roughly the same time she mentioned in her tweet, the third year of my PhD.

 

I was never shown a specific system for managing files when I started my degree. We didn't have a group policy and any naming or organizing conventions I pieced together were initially based on the system the senior grad student was using. It was only in 2017 when I wrote my first article that I became aware of how terrible my system was. Even now I would hardly consider my current system great. In fact, the system is almost always in the process of evolving.

 

To summarize my advice quickly I would say the following:

  • Use consistent naming conventions and informative titles

  • Keep electronic records where possible

  • Templates are your friend

  • Use your data to store valuable extra information

 

To my mind the key thing is having consistency, at least for yourself, if not for your group. I like to start every file name with the date in reverse order, because it makes sorting by name very useful. Nearly every file I've generated since 2017 has the format "YYYY-MM-DD - Experiment Title - Relevant File Information - Python Tags". I separate data into folders, keep version numbers simple, and, most importantly, I use templated files for everything.

 

I really cannot oversell templates. I maintained all my experiment records in digital form using word documents. These days I would likely use an electronic notebook. Regardless of the format I had consistent section headers. I used templates for my graphs in Origin and I collected all the key data for each experiment in a templated PowerPoint presentation. That made it easy to share with colleagues or supervisors and it made it easy to quickly go through what the experiment had accomplished.

 

Because many projects run simultaneously I kept everything in a single "Experiments" folder separated into years. Projects would get their own folders that contained essential information for publications (drafts, figures, etc) as well as a folder containing  shortcut links to all the relevant experiment folders. The projects folder typically also had duplicate copies of all the data that was used in any presentation, proceeding, or publication. Any data used for averages was collected into a spreadsheet held in the specific project folder. By using consistent naming conventions it was easy to refer back to exact data files in the publication if the need ever arose.

 

One thing that I would definitely change moving forward is embedding valuable information into data files instead of entering it manually into experimental procedures. In the later part of my degree I worked on perovskite devices, which can be very sensitive to temperature, humidity, etc. I made a habit of noting these values as well as the dates that certain processing or analysis steps were conducted. It rapidly becomes overwhelming to document this for every process and so embedding that information into datafiles has serious benefits.

 

The last thing I give increased consideration to these days is data readability. I believe there is enormous value in large datasets and that extracting that value first comes from being able to access it. For that reason my attention has recently turned to machine readable formatting and similar considerations.

Previous
Previous

Where to?

Next
Next

Atomic Geometry - Importing CSV to Geonodes