Data management & data processing

Sensitive data explained

What is sensitive data? 

Sensitive data refers to information that must be protected due to its potential impact on privacy, security, or legal compliance. This includes personally identifiable information (PII), health data, financial information, intellectual property, and other confidential details that require safeguarding.  

Some examples of sensitive data used for research:

  • Three-dimensional MRI scans of human patients used in life sciences research.
  • Chamber of Commerce company data utilized for socioeconomic research.
  • Copyright-protected e-books accessed for research in the humanities.
  • Detailed bathymetric data revealing ocean depths and underwater terrain for oceanographic research.

What are the reasons for calling data sensitive?

Sensitive data is defined by the potential harm that could be caused if it falls into the wrong hands. This harm can manifest in various ways, including: 

  • Privacy concerns: Many types of sensitive data are considered private and confidential. Their disclosure could violate an individual’s right to privacy.
  • Security risks: Sensitive data, especially when stored or transmitted, can be a target for cyberattacks. Protecting it is crucial for maintaining overall security.
  • Reputational damage: Information about an individual’s health, political opinions, or religious beliefs could lead to social stigma or discrimination.
  • Business impact: Sensitive data can include trade secrets, financial information, and other data that is critical to a company’s operations and competitive advantage. Unauthorized access could severely impact a business.
  • GDPR and other regulations: Many jurisdictions have laws and regulations that require specific protections for sensitive data, such as General Data Protection Regulation in Europe. 

Why should I handle sensitive data differently from non-sensitive data? 

Sensitive data must be handled differently from non-sensitive data due to the significantly higher risks associated with exposure or misuse. This distinction is crucial for maintaining privacy, complying with regulations, and preventing severe consequences like identity theft, financial loss, or reputational damage. Properly identifying and handling sensitive data is a critical component of data security and privacy best practices.

What are the challenges of working with sensitive data in research?   

Research across many domains is becoming increasingly data driven. Often, the most valuable datasets are also the most sensitive. As a result, working with sensitive data is now commonplace in fields such as life sciences and social sciences. 

For data providers – organisations and people that make data available for research – maintaining control over access and use is critical. At the same time, researchers need the freedom to explore, analyse, and manipulate data effectively. These priorities frequently clash, making efficient management and support essential. 

These are the 4 key challenges of working with sensitive data in research :

  1. Access is complicated – With fragmented rules, long approvals, and legal ambiguity.
  2. You can’t just send data around – Sensitive data must stay protected in transit and storage.
  3. Secure computing isn’t simple – Researchers need familiar tools in locked-down environments.
  4. Even results must be checked – Outputs may risk re-identification if not handled properly.

Why is getting access to sensitive datasets complicated?

Accessing sensitive data is often complicated because each data provider applies its own rules and conditions. There is no shared standard for how access should be requested, assessed, or granted. As a result, researchers and other data users must navigate lengthy, inconsistent, and sometimes ambiguous procedures. This lack of alignment leads to significant delays. Especially for researchers, for whom timely access can be critical to advancing their work.

Why can’t you just send sensitive data around?

Transferring sensitive data is not as simple as sending a file from one place to another. Depending on the level of sensitivity, data providers must ensure that the data cannot be intercepted, copied, or accessed by unauthorized parties during transfer. Standard internet security protocols are sometimes insufficient, requiring additional safeguards such as end-to-end encryption or controlled transfer channels. In many cases, sensitive data may not be downloaded to a researcher’s personal device at all. Instead, it must be accessed within dedicated, secure computing environments that enforce strict controls over who can use the data and how it can be handled.

Why is secure computing not simple?

Secure computing environments place strict controls on what users can access and do. While these controls are essential for protecting sensitive data, they often disrupt familiar research workflows. For example, when internet access is blocked, installing or updating the required software becomes far from straightforward. And when researchers are not allowed to view or inspect the data directly, their entire approach must change. This often requires the use of synthetic data, mock datasets, or developing and testing analysis code in a separate environment before running it in the secure one. These constraints make secure computing effective for protection, but also substantially more complex to work with.

Why do results need to be checked when working with sensitive data?

When researchers work with sensitive data, the outputs they produce may inadvertently reveal information about individuals or small groups. Even if the data itself never leaves the secure environment. To prevent this, results must undergo an output check (also known as disclosure control) before they can be released. This review ensures that no identifiable patterns, outliers, or inadvertently included raw data can be traced back to a person or organization. By checking results before they leave the secure environment, data providers uphold legal and ethical obligations while still allowing researchers to publish meaningful, aggregated findings.

What is SURF's role in working with sensitive data?

Working with sensitive data can be challenging, and SURF’s role is to make that process as smooth and secure as possible. We provide services that help researchers work effectively with sensitive data while ensuring that all requirements set by data providers are fully met. We collaborate closely with research communities that have long-standing expertise in handling and sharing sensitive data. Through these partnerships, we continually refine our services to support researchers in the most reliable and practical way.

What does SURF do with secure data management?

SURF helps by initiating projects that involve processing and analysing large and complex amounts of research data, as well as sharing and accessing it securely. In doing so, we research new technologies, share knowledge and develop new services.

Why does SURF explore new technologies in secure data management?

At SURF we are always looking for insight into the impact and potential added value of new technologies. Several projects bring together needs and new technologies in data processing and data management. We research and validate these new technologies, including proofs of concept and pilots.

What are the benefits and revenues of these explorations?

We also potentially create new services and/or additions to existing services from SURF’s service portfolio. In this way, we strengthen the innovative power of both the SURF members and the SURF organisation.

Would you like to know more or collaborate?