Data management & data processing
Sensitive data management in practice
What does a secure and trusted environment for working with sensitive data look like?
Together with a wide range of collaborative partners, SURF is building a framework for sensitive data management that supports the process to join guidelines, formulate general standards and facilitating shared systems. In this SURF has a connecting role as a neutral and trusted partner bringing collaborative partners together.
In this process, we have identified eight process steps that are generally desirable for most sensitive data workflows that we have encountered:
- Publish metadata
- Find data
- Request access
- Request Trusted Research Environment (TRE) project
- Transfer data
- Process data
- Output check
- Publish results
For each process step, we provide an explanation and highlight the projects in which SURF is collaborating.
1. Publish metadata
The process begins with making data available via a metadata portal. There is a large number of organisations such as governments, companies, knowledge institutions and even citizens that make datasets available. They often do this via a metadata provider. This provider collects data from various sources and manages and structures it by adding the appropriate metadata to make information easier to find, understandable and useable.
2. Find data
For the researcher the journey starts with finding the right and suitable data for their research. A searchable metadata portal is a most common place for researchers to explore which data exists and how one can apply for access to them.
A metadata portal is a web page or application designed to manage and present metadata. It helps organisations organise, make searchable and manage their digital data more effectively.
We list a few metadata portals:
- The ODISSEI Metadata portal: an open data infrastructure for social science and economic Innovations. It offers a secure HPC enclave to work with CBS Microdata.
- The CLARIAH Media Suite: a common infrastructure for the humanities and social sciences.
- Health-RI: the Dutch National Health Data Catalogue for health and life science data.
3. Request access
When the right dataset is found and it contains sensitive data, this dataset can’t simply be downloaded. Researchers must submit an access request that explains:
- Who they are
- What they want to use the data for
- How they plan to protect it
This request is assessed by the data provider, often with help from services like the Data Access Broker. A Data Access Broker simplifies the process of obtaining datasets for research or analysis purposes. It acts as an intermediary between researchers and various data sources and simplifies this step by:
- Standardising access procedures
- Enabling automation where possible
- Giving providers tools to review and approve requests
We list a few data access brokers:
- SSHOC-NL: Digital Infrastructure for Social Sciences and Humanities
- Health Data Access Body - NL: This entity is currently under development and should be operational for the Netherlands from 2029 onwards. All European countries will have their own Data Access Body.
4. Request Trusted Research Environment (TRE) project
To request a Trusted Research Environment (TRE) project, the researcher must apply through a specific TRE provider, outlining the research benefit, undergo rigorous checks, and gain approval to access sensitive data remotely within a secure environment where data never leaves, ensuring only vetted results are exported.
5. Transfer data
In most sensitive data workflows, the data moves into a controlled environment, or the researcher gets secure access to the environment where the data already lives. There are diverse common options to give access to the data:
- Uploading data into a trusted research environment where the researcher logs in securely. A Trusted Research Environment, also known as a Secure Data Environment or Data Safe Haven, is a secure, digital environment in which researchers can access sensitive data for scientific research.
- Granting access to existing data in-place via secure connection.
- Only in low-risk cases: allowing controlled downloads (e.g. pseudonymised, aggregated data under license).
SURF provides serval services that support secure data movement:
- SURFfilesender: For secure and encrypted file transfers
- Research Drive: Collaborative storage for research projects under policy controls
- SURFdrive: Personal cloud storage under policy controls with a maximum of 1 TB
6. Process data
The next step is that the processing of the research data takes place in a trusted research environment: a secure virtual workspace with strict access controls. This way trusted research environments make sure that:
- Only authorized users can access the data
- All activity is logged and auditable
- Data cannot leave the environment without approval
SURF provides several trusted research environments for different research needs:
- SANE – flexible, cloud-based trusted research environment on the SURF Research Cloud
- OSSC – high-performance trusted research environment on the Snellius supercomputer
- Alzheimer genetics hub – dedicated secure cluster for Alzheimer’s genetics research
These environments are ISO27001-certified and designed to meet both researcher workflows and data provider requirements.
7. Output check
When the analysis is ready, researchers want to take their results, in the format of graphs, summaries or models out of the trusted research environments. But those outputs may still carry sensitive traces. That is why most trusted research environments include a step called output control. Before anything leaves the environment, it is checked by the data provider or a designated reviewer to ensure:
- No personal data is exposed
- Results are sufficiently aggregated or anonymized
- Output aligns with the approved use
This step helps data providers stay in control, while still allowing researchers to publish meaningful insights.
8. Publish results
Once the output has been checked, the results will be disclosed. This can either be done by sending the results to the researcher, or perhaps it could be that the results will become part of the catalogue of datasets to be reused in further research.