The project will establish an institution-wide central research data repository. The repository will be accessible to all Eawag research groups to
The repository is based on the well-established open-source data-management software CKAN. While CKAN is being used by many organizations around the world in an “OpenData” context, its use as a research data management tool requires the development of custom-tailored extensions that adapt CKAN to the specific needs at Eawag.
Features of the ERDP will include:
Development of a metadata structure that is
The repository roll-out will be accompanied by the development of data-management guidelines. These guidelines will be developed with user-input in an iterative fashion and compliance will be supported by the actual software implementation. The content of the guidelines presumably specifies the precise meta-data requirements, the types of datasets that should be submitted to the repository, rules for write-access to the repository and similar topics. See the NERC Data Value Checklist as an example of important considerations about which data to archive.
Research groups will receive support to streamline their data submission procedures. This includes case-by-case analysis of current workflow and requirements, support with structuring, formatting, annotation, uploading, searching and downloading, the provision of software to automate such tasks, if applicable, and the cooperative development of specific meta-data schemes.
The Eawag Guidelines for Good Scientific Practice assigns to a research project’s PI the responsibility for data management. In particular that includes ensuring the data’s safe (person-independent) storage and accessibility for a prolonged time period, coming up with a suitable set of metadata and the respective annotation of datasets, care against misuse of data by temporary or external collaborators, and transfer of their own knowledge and data if they leave Eawag.
In practice, secure storage and retrievability handled individually in each workgroup is riddled with difficulties. Fluctuating scientific staff, diverse storage locations, ad-hoc or missing meta-data schemes are among the reasons for sub-optimal storage of often irreplaceable and non-reproducible datasets that were produced at substantial cost.
In general, individually carried out proper mid- and long-term storage and archival of research data requires an undue amount of effort, time and know-how outside their respective research area from the scientist in charge, whose time would be much better spent for doing research.
Therefore, the natural strategy is the establishment of a professionally managed central data repository. This goes hand in hand with the establishment of data-management procedures that can be followed “blindly”.
It is also a coming trend that research funders will require a data management plan as a required part of research proposals. The Eawag research data management guidelines and platform will be adapted to support the researchers in fulfilling these requirements.
During the pilot-phase the platform will become available to all Eawag groups in order to test the system against real-world use-cases and to adapt it to individual requirements. This phase will include individual meetings with research group representatives (“Data Managers”) to collect user-feedback and to help refine the system towards a state where it can be routinely used as a standard service. In particular, that will include the development of user specific client-side software to facilitate data preparation and upload. An equally important aspect during that phase is the ability of the users to actively influence the system’s development in terms of features, capabilities and modes of access.
Since significant changes to the system will most likely be necessary at that stage, we can neither guarantee the stability of the database structure nor adequate data-security. That means that data delivered to the platform still should be backed up somewhere else. However, we will take care that the work that goes into data-preparation and packaging during the pilot-phase will not be lost and that re-uploading to the final system will be trivial. We will also help with interim-solutions regarding backup, if necessary.