Guide

1. Data collection and documentation

1.1 What data will you collect, observe, generate or re-use?

SNSF [SNSF2017]

Questions you might want to consider:

  • What type, format and volume of data will you collect, observe, generate or reuse?
  • Which existing data (yours or third-party) will you reuse?

Briefly describe the data you will collect, observe or generate. Also mention any existing data that will be (re)used. The descriptions should include the type, format and content of each dataset. Furthermore, provide an estimation of the volume of the generated datasets.

(This relates to the FAIR Data Principles F2, I3, R1 & R1.2)

Instructions

  1. Identify each dataset
    (or collection of datasets that have the same characteristics with respect to content-type, source, format, etc.) and consider to introduce an abbreviation for each one.
  2. For pre-existing datasets,
    briefly describe the content, role in the project, and the source.
  3. For datasets to be produced in this project,
    briefly describe their content, role in the project, and how they will be collected, observed or generated.
  4. For all datasets,
    mention the general data type (e.g., text, tabular data, images, movies, audio files, device-specific binary data, geospatial data, census data, …) along with the their file-formats (as created by the device(s) used, by the simulation software, or as downloaded), e.g., CSV, netCDF, JPEG, PNG, MPEG-4, PDF, Shapefile, VCF, SMILES, … For datasets for which you have options to choose a file-format, please refer to Appendix A: File format recommendations.
  5. For all datasets,
    give an estimate of the total volume and the number of files, if a large number of them is to be expected. Orders of magnitude are sufficient.

Note

Make sure to not only consider the “raw” data, but also datasets that will be expected at the end of the analysis chain and directly underlie the resulting publications.

Example snippets

Example 1 [UGLA2015-1]

The data produced from this research project will fall into two categories:

  1. The various reaction parameters required for optimization of the chemical transformation.
  2. The spectroscopic and general characterization data of all compounds produced during the work.

All data is tabular. Data in category 1 will be documented in [file_format]. Spectroscopic data in category 2 will be produced as [file_format] and converted to [file format] for further use. Other characterization data in this category will be collected in [file_format].

We anticipate that the data produced in category 1 will amount to approximately 10 MB and the data produced in category 2 will be in the range of 4 - 5 GB.

Example 2 [UGLA2015-2]

This project will work with and generate four main types of raw data:

  1. Images from transmitted-light microscopy of giemsa-stained squashed larval brains.
  2. Images from confocal microscopy of immunostained whole-mounted larval brains.
  3. Western blot data.
  4. Results from LC/MS analyses of larval brains

All data will be stored either in the format in which it was originally generated (i.e. Metamorph files for confocal images; Spectrum Mill files for mass spectra with results of mass spectra analyses stored in CSV files; TIFF files for transmitted-light microscopy), or will be converted into a digital form via scanning to create TIFF or JPEG files (e.g. western blots or other types of results).

Measurements and quantification of the images will be recorded in Excel files. For long term preservation, they will be converted to CSV files. Micrograph data is expected to total between 100 GB and 1 TB over the course of the project. Scanned images of western blots are expected to total around 1GB over the course of the project. Other derived data (measurements and quantification) are not expected to exceed 10 MB.

Example 3 (modified from [EPFL2017])

The data are tabular health records generated by users of the application X, which is deployed to 2 million users by company Y, who will also collect the data from all users to provide us with the complete dataset.

All fields contain user observations and are entered manually, except for temperature, which is measured by a Bluetooth connected thermometer. Data recording will take place over the course of three months.

Data fields per user: User identifier, Age, Weight, Size.

Six data fields per users per day of observation, which include one numerical value (temperature), five values on a nominal scale, e.g. “cervical fluid quality”, and one free-text field.

Data will be received in CSV format and has a total volume of about 15 GB.

Example 4 (from real Eawag DMP)

There will be two categories of data: NEW data from this project and EXISTING data from the FOEN Lake Monitoring program. The NEW data will consist of several file types, all CSV real number format, which are all organized along the same principle: matrixes of times series with various channels, each corresponding to a sensor (number of sensors varies from 1 to10) and very different length, as the sampling frequency varies by several orders-of-magnitudes. (i) 6 files of CO2, DO, PAR and temperature (24 files at a time; Figure 2), each file only 1 sensor (Delta = 10 min; continuous), (ii) Thetis profiles corresponding to time series (equivalent to depth series) of 10 sensors (Delta = 1 s; 5-10 times per day). (iii) 5 files of CO2 time series for short-term surface flux measurements (several files, one per month), (iv) meteodata file (eight sensors; continuous), (v) T-Microstructure profiles files (6 sensors at 512 Hz; several files, once per month) and (vi) excel files for individual chemical samples (such as alkalinity, sediment trap estimates, ect; sporadic). The EXISTING data is already available (CIPAIS, CIPEL) in excel sheets with matrices for the individual samplings and a variable number of parameters (~10 to ~25). The EXISTING data will not be modified and remains with the organizations. We will keep a copy on our computers during the project. We anticipate the data produced in category 1 to amount to several hundred MB for the moored and profiled sensor files and ~100 GB for the T-microstructure profiles; the EXISTING data in category 2 is in the range of ~20 MB.

1.2 How will the data be collected, observed or generated?

SNSF [SNSF2017]

Questions you might want to consider:

  • What standards, methodologies or quality assurance processes will you use?
  • How will you organize your files and handle versioning?

Explain how the data will be collected, observed or generated. Describe how you plan to control and document the consistency and quality of the collected data: calibration processes, repeated measurements, data recording standards, usage of controlled vocabularies, data entry validation, data peer review, etc.

Discuss how the data management will be handled during the project, mentioning for example naming conventions, version control and folder structures. (This relates to the FAIR Data Principle R1)

Instructions

This section actually has two parts, 1. Quality Control and 2. Organization.

1. Quality assurance

For each dataset, mention standards, methodologies and processes that serve to ensure that the data meets the expected quality. This might for example include:

  • The use of core facility services (specify their certifications, if any)
  • Codes of good research practice that are being followed.
  • Quality control procedures such as plausibility checks, range check, double data entry, statistical or visual outlier detection, instrument verification tests, etc., that you plan to apply.
  • The method to record data quality (e.g. quality flags for data points), if applicable.
  • Arrangements to assign responsibilities for quality control.
  • Training activities.
2. Data Organization

Briefly describe how the data will be organized. That might be a folder-structure together with a file naming convention, a local SQL or NoSQL database, a cloud-based collaboration platform, a version-control system such as git, an Electronic LaboratoryNotebook / Laboratory Information System (ELN/LIMS), etc.

Consider how the chosen organization schema supports version-control (if necessary), collaboration (if necessary) and is suited for the expected data volume and data structure.

Example snippets

Example 1 (modified from [UGLA2015-1])

The reaction conditions will be recorded and collated using a spreadsheet application. The resulting files will be saved in directory, one for each scientist, with appropriately set file permissions. A filename convention that encodes reaction, reaction generation and date will be applied.

These directories will be mirrored to SWITCHDrive to for collaboration.

The various experimental procedures and associated compound characterization will be written up using the Royal Society of Chemistry standard formatting in a Word document, each Word document will also be exported to PDF-A. The associated NMR spectra will be collated in chronological order in a PDF-A document.

Example 2 (modified from [UGLA2015-2])

All samples on which data are collected will be prepared according to published standard protocols in the field [cite reference]. Files will be named according to a pre-agreed convention. The dataset will be accompanied by a README file which will describe the directory hierarchy. Each directory will contain an INFO.txt file describing the experimental protocol used in that experiment. It will also record any deviations from the protocol and other useful contextual information.

The format used for microscope images captures and stores a range of metadata (field size, magnification, lens phase, …) with each image. We will use a Python script that automatically extracts these metadata and stores them together with the respective filenames in a SQLite database.

Example 3 (from a real Eawag DMP)

The data from the moored sensors is sensor-internally stored and recovered every two months, when sensors will be cleaned and recalibrated if data indicates quality loss. The CO2 sensors will be cross calibrated against atmospheric pressure. The DO and PAR sensors in the mooring will be compared to profiled sensors and deviations detected. Temperature sensors are extremely stable and are only calibrated before and after the two years using the laboratory temperature bath which is calibrated agaist the Office of Metrology in Bern every few years to 0.001 oC. The Thesis sensor data is transmitted when surfacing via GSM communication system directly to the lab where sensors deterioration is weekly checked. The instrument will be retrieved every month and sensors cleaned. The optical sensors will be calibrated according the manual every six months. The T-microstructure sensors do not need calibration as the data is matched to (very accurate) CTD temperature. Small T shifts are irreverent, as only the spectra matter. The sensors deterioration (or frequency loss) will visually be checked and is seen in the quality of the Batchelor spectra. The very simple structure of the CSV files holding the raw data will be documented in a plain text README file. This file, and all raw data files as they become available, will be uploaded to the Eawag Research Data Institutional Collection into one “data package”, which is annotated with general metadata. Copies of the raw data files as well as set of calibrated, quality-controlled files stored on the group computers at EPFL will be organized in a folder structure that is also documented in a README file. At the end of the project, the entire set of calibrated, quality-controlled files will be annotated and stored on the Eawag institutional repository as well.

Example 4 [EPFL2017]

All experimental data will be automatically imported into the institutional electronic Laboratory Information System (LIMS) from the measurement device. Methods and materials will be recorded using the institutional Electronic Lab Notebook (ELN).

Example 5

The sensor data are being fed into a Postgresql database running on an institutional server. The database implements rules for basic validity checks (range-checks, plausibility checks). The R scripts for data analysis are stored in the institutional Git repository for version control and collaboration.

1.3 What documentation and metadata will you provide with the data?

SNSF [SNSF2017]

Questions you might want to consider:

  • What information is required for users (computer or human) to read and interpret the data in the future?
  • How will you generate this documentation?
  • What community standards (if any) will be used to annotate the (meta)data?

Describe all types of documentation (README files, metadata, etc.) you will provide to help secondary users to understand and reuse your data. Metadata should at least include basic details allowing other users (computer or human) to find the data. This includes at least a name and a persistent identifier for each file, the name of the person who collected or contributed to the data, the date of collection and the conditions to access the data.

Furthermore, the documentation may include details on the methodology used, information about the performed processing and analytical steps, variable definitions, references to vocabularies used, as well as units of measurement. Wherever possible, the documentation should follow existing community standards and guidelines. Explain how you will prepare and share this information. (This relates to the FAIR Data Principles I1, I2, I3, R1, R1.2 & R1.3)

Instructions

Conceptualize two types of metadata: 1. Scientific metadata and 2. General metadata:

1. Scientific metadata

Scientific metadata provides all necessary information to correctly understand, interpret, assess, replicate (within limits), build upon, and generally use your data. This metadata might be compiled “free-form” into one or several README-file(s) that accompany the data.

Certain fields have formally defined established metadata standards, e.g. the Ecological Metadata Language (EML), the Open Microscopy Environment Schemas or WaterML. Mention it, if you use such a standard. Have a look at The RDA metadata directory for an overview of existing standards.

This metadata could contain for example:

  • A description of the organization and relationships of the files or database tables and other supporting materials.
  • Information about the naming convention (if applicable).
  • A mapping of data files to the corresponding section of the associated publication, if applicable.
  • Information about units of measurements, variable definitions, columns headings and abbreviations (if not present in the data-files proper).
  • Information about the software (name, version, system environment). used to produce and read the data (if the software is not included as data).
  • Information about which files were used in what way at what stage of the work.
  • Suggestions for how to best reuse the data.
  • Any information suited to decrease the chances that a future user of the data needs to contact you with questions.

Describe, as detailed as possible, what will comprise the scientific metadata. Make sure to mention all information, or information categories, that a future user of your data will need to read and interpret the data.

Describe how this metadata will be managed, i.e. who or what will generate it when, in what form it is stored in which location, and how it is associated with the respective experiment, measurement, or observation. Describe technical aspects of the metadata management, e.g. the use of database software, and the protocol or mechanism to handle updates and version control, if applicable.

2. General metadata

This type of metadata serves to make your data findable. It consists of general attributes that help to search, sort, index, access and propagate the dataset or collection of datasets. At Eawag, capture, storage, formatting and dissemination of this metadata is handled by the institutional research data repository. You might use the Eawag standard snippet “metadata in ERIC”.

Examples for 1. Scientific metadata

Example 1 [DataONE2011]

We will first document our metadata by taking careful notes in the laboratory notebook that refer to specific data files and describe all columns, units, abbreviations, and missing value identifiers. These notes will be transcribed into a .txt document that will be stored with the data file. After all of the data are collected, we will then use EML (Ecological Metadata Language) to digitize our metadata. EML is one of the accepted formats used in ecology, and works well for the types of data we will be producing. We will create these metadata using Morpho software, available through KNB (http://knb.ecoinformatics.org/morphoportal.jsp). The metadata will fully describe the data files and the context of the measurements.

Example 2 [UGLA2015-1]

The data will be accompanied by the following contextual documentation, according to standard practice for synthetic methodology projects:

  1. spreadsheet documents which detail the reaction conditions.
  2. text files which detail the experimental procedures and compound characterization.

Files and folders will be named according to a pre-agreed convention. The final dataset as deposited in the institutional data repository will also be accompanied by a README file listing the contents of the other files and outlining the file-naming convention used.

Example 3 (from a real Eawag DMP)

For every data stream (sequences of identical data files) over the entire 2-year period of data acquisition a README File will be generated which contains: (a) the sensors used (product, type, serial number), (b) the temporal sequence of the sensors (time and location, sampling interval), (c) the observations made during maintenance and repairs, and (d) details on the physical units, as well as the calibration procedure and format. This is a standard procedure which we have used in the past.

Standard snippet for 2. General metadata

Eawag standard snippet “metadata in ERIC”

The completed dataset will be uploaded to the Eawag Research Data Institutional Collection (ERIC). This repository collects (upon upload) the metadata according to the DataCite metadata schema 4.0, an accepted state-of-the-art standard. In addition to the mandatory fields of the DataCite schema, ERIC collects several metadata fields such as time-range, spatial extent, geographical names, measured variables, chemical substances and taxonomic information. ERIC provides search functionality and assigns a persistent URL to each dataset.

3. Data storage and preservation

3.1 How will your data be stored and backed-up during the research?

SNSF [SNSF2017]

Questions you might want to consider:

  • What are [sic] your storage capacity and where will the data be stored?
  • What are the back-up procedures?

Please mention what the needs are in terms of data storage and where the data will be stored. Please consider that data storage on laptops or hard drives, for example, is risky. Storage through IT teams is safer. If external services are asked for, it is important that this does not conflict with the policy of each entity involved in the project, especially concerning the issue of sensitive data. Please specify your back-up procedure (frequency of updates, responsibilities, automatic/manual process, security measures, etc.)

Instructions

Describe storage location and backup procedure during all phases of research, e.g. a), during data collection / generation, and b), during analysis.

  1. At stages where data can not be stored on Eawag infrastructure (e.g. fieled campaign involving dataloggers and laptops), take care to implement a backup protocol that should

    • be as automatic as possible,
    • frequent enough,
    • duplicate the data onto another storage medium, which
    • is kept at a different location and
    • ideally includes (automatic) checks for the success of each backup.

    From copying data from the field-laptop to a flash drive that is kept by another person to automatic synchronization with SWITCHDrive, there are many options to do this reliably and comfortably. Consult your IT department if you need help, or just to assess you strategy. Describe this backup strategy.

  2. At a stage where you have access to the Eawag shared filesystem, store your data there. Make sure you know which directories of your workstation are mapped to backed-up server storage (see IT documentation - Backup). Check with IT whether you have access to the required storage capacity and arrange an increase of the quota, if necessary. Copy & paste the text-snippet below (Eawag standard snippet “file services - backup”) to account for this stage.

  3. In case you plan to use other servers, e.g. for doing bioinformatics at the Genetic Diversity Centre, inquire about their backup procedure and briefly describe it here. In case you need to set up a backup-solution by yourself, consider getting advice from the IT department.

  4. In case you plan to use cloud storage for collaboration (e.g. SWITCHDrive), make sure a replica of that data is kept on Eawag infrastructure at any time. Encrypt sensitive that is being stored by third parties. Mention such a setup here.

  5. Check whether you have the necessary storage capacity at all storage locations you plan to use. Mention that here (if not already covered by Eawag standard snippet “file services - backup”.

Standard snippet

Eawag standard snippet “file services - backup”

Data will be stored on back-upped servers in the Eawag local network. For file services and virtual server farm, Eawag shares a server/storage platform (Netapp Metrocluster, Cisco UCS Server, VMWare) with Empa. The backup procedure is fully automatic. Snapshots of files are taken at least three times during a working day. All data are mirrored synchronously between the two server sites on the Empa-Eawag campus in Dübendorf. Additionally, backups (to disk) are taken from the Metrocluster at a third location on the campus. Backups are kept for three months. We have arranged to have access to the required storage-capacity.

Examples

Example 1

Data will be downloaded from the dataloggers diurnally to the field-laptop, and immediately copied to a flash-drive, which is stored in a physically secure location in the field office. Success of the download is checked immediately. The laptop is brought to Dübendorf campus (no network link on-site) on the same day and the data is copied to a backed-up server in the Eawag local network. [copy text from Eawag standard snippet “file services - backup”]

Example 2

The simulations will be carried at supercomputing facility X, where backup is not available. On the local workstation runs a script that periodically calls rsync to mirror the remote directory, where the simulation results are written, to a backed-up share on Eawag infrastructure (which is mounted on the local workstation). [copy text from Eawag standard snippet “file services - backup”]

Example 3

Our team stores the data to be analyzed along with the results using Eawag file services. [copy text from Eawag standard snippet “file services - backup”] To easily share data with our collaborators in Fribourg, we synchronize those data with a folder on SWITCHdrive. Since this is sensitive personal data, the folder being synchronized contains encrypted files (public key encryption, key-pairs specifically created for this project).

3.2 What is your data preservation plan?

SNSF [SNSF2017]

Questions you might want to consider:

  • What procedures would be used to select data to be preserved?
  • What file formats will be used for preservation?

Please specify which data will be retained, shared and archived after the completion of the project and the corresponding data selection procedure (e.g. long-term value, potential value for reuse, obligations to destroy some data, etc.). Please outline a long-term preservation plan for the datasets beyond the lifetime of the project. In particular, comment on the choice of file formats and the use of community standards. (This relates to the FAIR Data Principles F2 & R1.3)

Instructions

It is Eawag policy to generally preserve all relevant data generated or used by research projects in the Eawag Research Data Institutional Collection. Refer to internally communicated guidelines or contact the Eawag Research Data Management Project <rdm@eawag.ch> for help. You can copy & paste the standard text-snippet below (Eawag standard snippet “preservation”). Note that this does not necessarily means that all this data will be publicly shared. Data that will not be shared should be mentioned in Section 4.2.

  1. Check whether there are reasons not to preserve a part of the data and mention if there are any. That could apply for example to data that

    • is subject to a contractual or legal obligation to destroy data after a certain amount of time, or
    • simulation data that can be re-created through computation, or
    • high-volume data that can be downloaded any time from a reliable external long-term repository, e.g. climate model output.
  2. If there are no exceptions, follow Eawag standard procedure and copy & paste the Eawag standard snippet “preservation”.

  3. Check whether you consider any of the data eligible for Long Term Storage. Mention those datasets and adapt the standard text-snippet below (Eawag standard snippet “long-term storage”).

    This applies to data of long-term institutional or societal value. Long Term Storage tries to ensure re-usability of the data at a time when the creators of the data, the current custodians, the current storage platform, or the currently responsible institution (Eawag) are not available anymore. Such data is of high quality and consists primarily of unique observations of the environment or experimental results. Data flagged as Long Term Storage in the institutional repository will be reviewed with regard to file-formats and documentation.

  4. List the file-formats that you are going to upload to the institutional repository. (see Appendix A: File format recommendations).

Standard snippets

Eawag standard snippet “preservation”

It is Eawag policy to generally preserve all relevant data generated or used by research projects in the Eawag Research Data Institutional Collection. This includes all raw data, all processed data that directly underlies the reported results, and all ancillary information necessary to understand, evaluate, interpret and re-use the results of the study. Data from intermediate steps of the analysis that can be re-created from preserved information does not need not to be stored.

Eawag standard snippet “long-term storage”

The dataset X and Y will be flagged for Long Term Storage upon submission to the Eawag institutional collection because they represent unique and non-reproducible information about the state of the environment and we expect them to be of high quality and of great utility for future researchers. Datasets flagged for Long Term Storage will be subjected to specific measures to preserve data integrity and data safety, such as additional backups, regular re-writes to new storage media and redundant storage in third-party repositories. Additionally, data flagged in this way will be stored in file-formats that minimize the chance for format obsolescence.

Examples

Example 1

All data from this project will be stored in plain text CSV files (UTF-8 encoding, no BOM). Text-files containing graphics and layout will be stored in PDF/A. Microscopy images will be stored as TIFF.

4. Data sharing and reuse

4.1 How and where will the data be shared?

SNSF [SNSF2017]

Questions you might want to consider:

  • On which repository do you plan to share your data?
  • How will potential users find out about your data?

Consider how and on which repository the data will be made available. The methods applied to data sharing will depend on several factors such as the type, size, complexity and sensitivity of data. Please also consider how the reuse of your data will be valued and acknowledged by other researchers. (This relates to the FAIR Data Principles F1, F3, F4, A1, A1.1, A1.2 & A2)

Instructions

  1. Check whether there is a well-recognized, specialized data repository for the kind of data you are producing. For example, it might be standard procedure in your field to submit data to Gene Expression Omnibus or Array Express. Mention it if you do so.
  2. Otherwise, you might use the Eawag standard snippet “sharing” below.

Standard snippet

Eawag standard snippet “sharing”

The data from this project will be shared through the public facing mirror of the Eawag Research Data Institutional Collection, ERIC/open. This repository aims at supporting the FAIR Data Principles to the extent possible and provides

  • a persistent identifyer (DOI) for each dataset,
  • a rich set of metadata (compliant with the DataCite Metadata Scheme 4.0),
  • file fixity through SHA-2 hashsums for all files,
  • long-term data safety as provided by Eawag/Empa redundant storage infrastructure and Eawag’s institutional commitment to keep the repository running,
  • a well-documented API (including a subset of the well-known SOLR/Lucene query language) for searching, finding and harvesting datasets, as well as a web-interface with search- and faceting functionality,
  • dissemination of the metadata through the DataCite Metadata Store, which is harvested by an increasing number of indexing services, such as the Bielefeld Academic Search Engine (BASE), OpenAire, OSF SHARE, Google Scholar, …
  • provision of cut & paste text snippets for proper citation, and
  • linking with the associated scholary articles through DORA, the repository for publications run by Lib4RI, and through the partnership between Crossref and DataCite.

4.2 Are there any necessary limitations to protect sensitive data?

SNSF [SNSF2017]

Questions you might want to consider:

  • Under which conditions will the data be made available (timing of data release, reason for delay if applicable)?

Data have to be shared as soon as possible, but at the latest at the time of publication of the respective scientific output. Restrictions may be only due to legal, ethical, copyright, confidentiality or other clauses. Consider whether a non-disclosure agreement would give sufficient protection for confidential data. (This relates to the FAIR Data Principles A1 & R1.1)

Instructions

  1. If you can publish all data at the time of publication just state that.
  2. If parts of the data will not be made available at all, state the reason(s).
  3. If you intend to publish part of the data before the related publication is finished, you might state this here.
  4. In general, the optimal point in time to publish data that underpins a publication is right after the publication has been accepted. In case you are not able to publish the data latest at the time of publication of the respective output, state the reason(s) for that delay. State explicitly when exactly you plan to publish it though.
  5. If you are confident that you can make all relevant data public in time, you might use the Eawag standard snippet “publishing OK below.

Note

The SNSF description seems to imply that delayed or forgone publication of data is only acceptable for sensitive data as described in Section 2.1. We believe there are other valid reasons and suggest you wait until this problem actually materializes at the end of the project, describe the issue in the final DMP, and hope that the SNSF will accept that.

Possible reasons for delayed publication (accepted by SNSF) could for example include:

  • The time necessary to anonymize personal data.
  • The need to keep patentable information secret until patent protection applies.

Possible reasons for delayed publication, which are currently not accepted by SNSF, include for example:

  • The intent to synchronize the publication of the data with other publications (e.g. project report, paper, press release) to maximize visibility and impact.
  • The intent to base follow-up publications on the data, after the project has finished.
  • The intent to couple re-use of the data by other groups to an offer for collaboration.

Standard snippet

Eawag standard snippet “publishing OK”

We expect no limitations with respect to publishing the data. It will be made available to the public in full, latest at the time of publication of the project report.

Examples

Example 1

Our data will include meteorological observations obtained from MeteoSwiss who prohibit any further distribution of the data. Therefore we will have to exclude these data from publication.

Example 3

The extensive household survey about water-born diseases poses severe challenges with regard to anonymization, since simple pseudonymization might not be sufficient to guard against the identification of individual households by an inference attack that uses other available information.

Therefore we will be only able to publish summary statistics together with the associated article. If a sufficiently anonymized dataset turns out to still hold scientific value, we will publish it no later than one year after completion of the project.

Example 4

We expect that the the sampling campaign will yield useful data that cannot be completely exploited within the frame of this project. We therefore anticipate a follow-up project based on these data and might therefore delay the the publication of the full dataset for two years.

4.3 I will choose digital repositories that are conform to the FAIR Data Principles. [checkbox]

SNSF [SNSF2017]

The SNSF requires that repositories are conform to the FAIR Data Principles (Section 5 of the guidelines for researchers, SNSF’s explanation of the FAIR Data Principles). If there are no repositories complying with these requirements in your research field, please deposit a copy of your data on a generic platform (see examples). If no data can be shared, this is a statement of principles.

Instructions

Just check the box:

4.4 I will choose digital repositories maintained by a non-profit organisation. [radio-button yes/no]

SNSF [SNSF2017]

-> If the answer is no: “Explain why you cannot share your data on a non-commercial digital repository.”

The SNSF supports the use of non-commercial repositories for data sharing. Costs related to data upload are only covered for non-commercial repositories.