Eawag Research Data Management Project

Frequently Asked Questions

Very good! Git is an excellent tool to keep track of and organize your analysis workflow, your code and your results.

Get it

Git is not a "server". Git is a program that you can install on your personal computer:
https://git-scm.com/downloads

Use it

To use it, you have to learn some basic concepts and a handful of commands. If you put in a day to read and experiment, your are likely set for 90% of all tasks you ever want to accomplish with Git. Here are some resources to help you:

Show it

When you say "server" you probably think of services such as GitHub.com and GitLab.com. These are place where you can "push" or synchronize your local git-repository to, to make it visible to other people. They make it easy to browse your files and their history and also provide a ton of advanced collaboration- and development features. Such services are useful to showcase your work and to collaborate with others. They are an unnecessary distraction if you just want to improve your personal data management. There are many free high-quality hosting services of this kind. This is the reason we don't provide another one at Eawag -- it is simply not necessary. To pick one, your main guide should be fact that collaboration features do not work across different providers. If you want to contribute to or make use of projects mostly hosted on one specific service, use the same. Ask in your research-group whether there is already a convention established.

Here are the best ones:

Warning: All these web-services have to be regarded as ephemeral. Do not rely on them as "master repository"! Assume your Git web-service disappears tomorrow, including everything you put there.
Commercial

Even though all commercial providers listed here allow for "private" repositories, none of them should be trusted with sensitive information such as personal data!

GitHub
The largest and most well known service today. Choose GitHub if it is important that your repository shows up on Google or if your project uses code or other material from projects that are also hosted there.
GitLab
Smaller competitor of GitHub
Bitbucket
High quality service of about the same size as GitLab. It is run by Atlassian and has outstanding documentation.

Community

c4science.ch
Infrastructure for scientific code co-creation, curation, sharing and testing. Available to the entire Swiss universities community and accessible to external collaborators. Hosted on SWITCHengines, managed by EPFL-SCITAS, created via EnhanceR, a Swissuniversities funded project. That appears to be the most reliable and trustworthy option at the moment.
gitlab.switch.ch
A server for Git repositories, issue trackers, continuous integration and more. Run by SWITCH. This is not an official service of SWITCH, but provided on a best-efforts basis.

Warning: All these web-services have to be regarded as ephemeral. Do not rely on them as "master repository"! Assume your Git web-service disappears tomorrow, including everything you put there.
Tip:
It is quite easy to keep an automatically synchronized copy on Eawag internal storage:
  1. Create a "bare" repository in a backupped location, e.g.
    git init --bare path/to/secure/location/myrepo.git
  2. Usually the name of the "remote" at the web-service is "origin" (automatically assigned if you clone a repository, say, from GitHub). Set a "pushurl" for "origin" to both, the GitHub-URL and to your secure local copy, e.g.:
    git remote set-url --add --push origin https://github.com/username/myrepo.git
    git remote set-url --add --push origin path/to/secure/location/myrepo.git
Now, each time you "push" your repository, both, the one at GitHub and the local one are updated.
And don't worry, the commands above are way more complicated than anything you'd ever use in day-to-day work.

"Open data" means that your dataset is publicly available for unrestricted download and use. The SNSF expects data generated by funded projects to be published as Open Data. Eawag encourages to publish Open Research Data because it reflects well on scientific credibility and contributes to the efficiency of the scientific enterprise.

How to publish

  1. Make a data package and upload it to ERIC.
    Please consider that content and form of your published dataset will convey an impression of the quality of your scientific work. Please make sure you can tick each checkbox in the Checklist for publishing data packages. Have a look at the Eawag guide for publishing and archiving of research data.
  2. Thoroughly check that you are allowed to publish all information in your data package. Look out for
    • personal data,
    • creative works, such as text, maps, drawings or computer programs for which you don't have the right to publish them, for example because you signed over those rights to the publisher of your paper,
    • data that you obtained under licensing conditions that prohibit redistribution,
    • data in which scientific collaborators have a stake and might disagree with publication,
    • data that you or your group might want to exploit further in the future and therefore not make it available to the public to avoid being scooped
    Note that you can exclude files that you uploaded to internal ERIC from external publication.
  3. Notify rdm@eawag.ch that you would like to publish this package to the world. Mention any files that should be excluded from the external publication.

A DOI (Digital Object Identifier) is a string, such as 10.25678/000066, which is registered in a global registry and is associated with a URL, such as https://doi.org/10.25678/000066, which is redirected to the location of the document, dataset, or landing-page thereof.

DOIs are also associated with meta-data (pretty much everything that you submit when you create a data package in ERIC), which feeds into global search services, such as search.datacite.org or Google Dataset Search and thus makes your work more visible.

Thanks. But how do I get one?

Just follow the instructions from the previous question to make your data open. A DOI is assigned automatically to all Eawag Open Research Data.

Anything I need to be aware of, with regard to DOIs?

The content of a dataset is immutable once a DOI for it is registered. It is then not possible anymore to

  • add files,
  • remove files,
  • change files.
In case an error is detected in a dataset that already has a DOI, a new version with a new DOI has to be created and linked back to the erroneous dataset and DOI. This is a tedious and manual process.

The meta-data associated with a DOI can be changed anytime though.

I need to make data available to the reviewer. There might be changes to the data before the final version can be published.

We recommend to make an internal data package and submit it to ERIC. Internal packages can always be edited. Then make the data available to the reviewer by uploading it to SwitchDrive. Notify rdm@eawag.ch to trigger external publication once the dataset is not expected to change anymore.

I want to refer to the DOI of the dataset in my paper, but the dataset is not yet ready for publication

  1. Make an internal data package and submit it to ERIC.
  2. Notify rdm@eawag.ch that you would like to reserve a DOI for this package. This means that you will get a DOI-string that is not registered, the associated URL will not resolve, but nobody else can use that DOI.
  3. Once the dataset is ready for publication, notify rdm@eawag.ch.
Warning: It is your responsibility to eventually provide a public dataset that can be associated with the DOI. A DOI in print that doesn't resolve is a very bad thing.

Do you think your software has the potential to be commercialized?

> Yes: Get in touch with the Knowledge and Technology Transfer Service at EMPA.

> No: Use either an OSI approved Open Source license, or put your software in the Public Domain by attaching the CC‑Zero Public Domain Dedication.

Which one then?

  1. Choose the GNU Affero General Public License (AGPL) if you build on or depend on software that you obtained under one of the GNU General Public Licenses (GPL), since you are then legally required to publish your software under GPL as well. The AGPL also ensures your work is and remains free software, a property that is regarded by many as ethically desirable.
  2. Choose the MIT license if you want it simple and human readable.
  3. Choose the CC-Zero Public Domain Dedication if you don't want to have anything to do with copyright and licenses. Be aware though, that software in the public domain provides less motivation to others to give you credit for your work.