Eawag Research Data Management Project

General Data Management

Guide:The Eawag Guide for Research Data Publishing and Archiving

This is a hands-on guide for researchers who want to prepare their research data for depositing it in an archive or repository. It also contains practical advice regarding day-to-day data-management best practices. While framed as guide for Eawag, it should be useful for a wide range of scientists.

Data organization in spreadsheets
Article: Karl W. Broman and Kara H. Woo (2017).
Data organization in spreadsheets.

Absolute must-read in case you have to use Excel in your research.

Abstract (Broman and Woo, 2017)
Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this paper offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, don't leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, don't include calculations in the raw data files, don't use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text file.


File Formats
File Formats: It is difficult to give comprehensive advice on file formats suited for archiving and digital preservation. The Library of Congress Recommended Formats Statement is a good starting point, as is the List of Archivabable File Formats of the Swiss Federal Archives.

Rules of Thumb

  • Avoid proprietary formats (prefer standardized ones)
  • Avoid pseudo-standards, such as xlsx, docx, which you can recognize by the fact that there do not exist multiple implementations across different applications and platforms.
  • Anything "plain text" that is not pure ASCII should be UTF-8 encoded.
  • Simpler is better (as long as no info is lost).
  • If your original data comes in a "low-quality" format (e.g. MP3, GIF, ...) just keep it like that.

File Organization
File Organization: Some rules and helpful hints about how to store files on your computer that will make your life easier during the course of research. Slides from the MIT Data Management Group.

File Formats
RDM Guide: CESSDA ERIC provides a hands-on guide to elementary data-management. It is targeted at social sciences but contains enough general know-how to be worth the read for other fields.

OSODOS book
Book: Rutger Vos and Pedro Fernandes (2017).
Open Science, Open Data, Open Source:
21st century research skills for the life sciences.

An excellent resource, not only for life sciences, that covers a lot of ground.

R for datascience book
Book: Garrett Grolemund and Hadley Wickham (2017).
R for Data Science. Online.

Really not "Science" but an excellent tutorial and reference for data handling in R.

Managing and sharing data - book
Book (40 pages): Van den Eynden, V, et al. (2011).
Managing and Sharing Data; a best practice guide for researchers. UK Data Archive.

Contains many practical hints.

openrefine software
Software: OpenRefine (formerly Google Refine)

State-of-the-art power tool to clean, explore and transform large, messy datasets. A free desktop application for Windows, Mac and Linux. Check this out the moment you begin to swear at Excel (at the latest).

DLCM project
Project: Research Data Life-Cycle Management (DLCM): From Pilot Implementations to National Services.

A large, national-level project that aims at providing guidelines, training activities, policy support and various tools to support researchers and their institutions in all aspects of data management.

The Turing Way
Online Book: The Turing Way: A lightly opinionated guide to reproducible data science.

A work in progress that containes a number of selected topics which are treated in-depth. This is aimed in large part at scientists who want to integrate basic tools, methods and techniques common in software-development into their workflow. A remarkable section for example gives a very nice introduction into writing Makefiles -- a dramatically under-hyped technique for low-threshold reproducible computing.

Karl Broman's tutorials
Tutorials: Karl Broman provides a number of succinct tutorials relevant for reproducible research, data organization, and R-programming. For busy researchers who want to up their data-analysis skills and habits, these highly practical guides are an excellent first stop.

Legal aspects for research data

UNIBAS: Personal and sensitive data
Online Resource: Personal and sensitive data

The University of Basel provides several highly relevant documents detailing the legal aspects and giving practical advice about how to deal with sensitive data. This should be required reading for all researchers dealing with personal data. These resources are in German only (for now) because references to legal terminology and law are hard to translate correctly.

Leitfaden: Rechtsfragen bei Open Science
Book: Rechtsfragen bei Open Science

This is a long overdue guide to the legal aspects of Open Science, which also covers research data specific questions in detail. Written in German, targeting the German legal situation, it will be applicable in large parts also for other jurisdictions.

Data Analysis

Data Visualization
Online Book: A very comprehensive treatise to visualize all kinds of data in a scientific context. The modern R code (using tidyverse) for the many example figures is also available for download, enabling the reader to learn R-ggplot2 on the way. The book itself is written in a language-agnostic style though, and for example Python users, in particular if they work with ggplot, will profit as well.

Research Software Engineering

Managing Research Software Projects

Presentation: Managing Research Software Projects: How to set up, manage, and share your work in less time and with less pain. Greg Wilson, 2020.

A very concise overview that touches the most important points.


Research Software Engineering
		    with Python

Online Book: Research Software Engineering with Python: Building software that makes research possible.

Targeting researchers, the book walks the reader through a worked example project and covers all required tools, techniques and best practices from the ground up. For example UNIX shell, Git, make, testing, continuous integration and python package creation are covered.

Data Management Plans

Eawag DMP Guide
Guide: The Eawag Data Management Guide: Instructions for creating data management plans for SNSF proposals.

Extensive step-by-step instructions to prepare an SNSF-compliant DMP. Is in parts Eawag specific (some cut & paste snippets) but also useful for other Swiss researchers.

DLCM DMP page
Resources & Guide: Help with data mananagement plans from the DLCM Project.

Various useful links for DMP preparation. In particular, there is the DLCM Template for the SNSF Data Management Plan, a 30 page document with step-by-step instructions.

DMP online tools
Online Tools: There are various web applications that help to create and manage data management plans. These tools include templates that reflect the requirements of various funding agencies. Unfortunately, the the Swiss National Science Fundation has not yet integrated their DMP template into one of these tools.
  • Software: DMPRoadmap. The joint codebase from DMPTool (University of California Curation Center, UC3) and DMPonline (Digital Curation Centre, DCC). Probably the most established tool today. Open source / Ruby.
  • Software: RDMO. A DMP web application by the Research Data Management Organiser (RDMO) project, which is funded by the DFG and supported and used by a large number of German institutions. This software is pretty new and still a bit rough around the edges but has a great number of advanced features. Open source / Python.
  • Service: DMPTool Public service by UC3.
  • Service: DMPonline Public service by DCC.
  • Examples: Public plans: 100s of public DMPs from DMPTool.

Metadata Standards

metadata directory
Maintained by the RDA Metadata Standards Directory Working Group, this directory lists a large number of (mostly) domain-specific metadata standards. Look here first if you are looking for a system to annotate your data that is established for your field of research.

Resources for RDM professionals

4TU

The 4TU. Centre for Research Data and in particular the Research Data Services of the TU Delft Library have long-term experience and impressing know-how.

Presentation with interesting numbers by Alastair Dunning
(PDF version from the original Powerpoint file)


Science Europe

Science Europe represents the the national funding agencies of 28 European countries. "Research Data" is one of their top-priorities and their publications indicate where the community is moving towards. Of particular interest are the RDM-related evaluation criteria put forward by Science Europe.

The Practical Guide to Sustainable Research Data provides matrices that help to assess the maturity of research data management practices for

  • funding organisations,
  • research performing institutions, and
  • research data infrastructures.

Other interesting publications can be found on their Research Data pages.


Curating Research Data - V2

Book: Johnston, Lisa R. (2017). Curating Research Data Volume Two: A Handbook of Current Practice. Association of College & Research Libraries.

A very extensive checklist for people who establish an institutional research data repository.


RDM-Survey

Survey: TU Delft and EPFL have published an RDM-Survey among their researchers. n = 659 (both Institutions). Although it covers exclusively engineering + CS faculties, it is a good start for providing some sort of empirical status-quo.

It would be nice if other institutions used a compatible survey.


Data Curation Network

The Data Curation Network is a Sloan-funded project that aims to conceptualize and develop a “network of expertise” model for U.S. academic libraries to collectively provide data curation services to support digital research data deposit into repositories for open access and reuse.

Next to the down-to-earth conceptual model, they provide a wealth of recent empirical information, advice, checklists & workflows.


#activeDMPs

#activeDMPs is a pointer-webpage to the community that drives forward concepts and tools for active and machine-actionable data management plans.


zbmedeln

The ELN-Wegweiser about Electronic Lab Notebooks (ELNs) (in German) gives practical advice for the establishment of ELNs in the life-sciences but should also provide a lot of help for similar projects in other felds.

ZB MED (Hrsg.) 2019. Elektronische Laborbücher im Kontext von Forschungsdatenmanagement und guter wissenschaftlicher Praxis - ein Wegweiser für die Lebenswissenschaften, Köln.
https://doi.org/10.4126/FRL01-006415715

Download PDF (3.0 MB)