Reproducible software and data science

Research projects often involve the need for some software development and data processing. The challenge is that often, in order to run and use the software, a lot of libraries and dependencies are required. One solution to this problem is to use containers (e.g. using Docker) which can be built using a declarative description of everything needed to run the software.

A team of researchers at various universities working with software and data science has published a paper titled “Ten Simple Rules for Writing Dockerfiles for Reproducible Data Science”. Consider to build a container and follow these rules next time you create some software related to a research project. This will ensure that if you need to pick up your own software down the line or you want to share it with someone, it should be up and running again in no time.

Leave a Reply