Long-term archiving of (raw) data and routines
- Data and metadata (i.e., information about the numeric data)
need to be stored savely and securely.
- Archiving should last for decades – a problem yet to be solved by archivars.
- Processing steps and routines used for data processing
should be stored together with the (raw) data.
- Long-term storage requires open formats and well-developed strategies.
The most basic prerequisite of any reproducibility is access to the original data. While lab notebooks are still used for part of this, nowadays, usually the huge majority of data is obtained in digital form and stored electronically in some or other way.
While well-crafted books can last for centuries if handled and stored correctly, the average lab notebook will barely physically survive more than a few decades, let alone being archived properly. With the advent of data-driven science in most disciplines, the problem has exponentially grown. Archiving of digital data should last for decades – a problem yet to be solved by archivars and involving a number of different aspects such as data formats, storage hardware, and programs and operating systems to access the data.
Furthermore, archiving data alone is close to useless, as this would mostly be mere numbers. Hence, storing appropriate metadata is imperative. Only thus we can pass the meaning of the data to future generations (or even our future selves). Those metadata should clearly describe in as much detail as possible (while being succinct at the same time) how and why the data have been obtained originally.
Nearly all data need to be processed and analysed in order to get insight and draw conclusions. Therefore, not only the data and accompanying metadata need to be archived, but as well both, the routines used to process and analyse these data and the complete information of each individual processing and analysis step. The latter are add another level of metadata. For details, see the requirements section on documentation.
Taken together, long-term archiving data that form the basis of empirical science requires both, open formats as well as well-developed strategies and established routines.