Sensitive data remain unused
Although non-academic parties have an increasing number of interesting datasets available, there is currently no infrastructure available allowing researchers to analyse sensitive data in a way that data providers remain in control. As a result, most potential data providers are reluctant to share their datasets and so they remain unused (such as governments, heritage institutions or commercial parties like the Chamber of Commerce or Funda). Yet scientific breakthroughs would be possible if these datasets were available.
The Solution: SANE
The Secure ANalysis Environment (SANE) is a virtual, fully shielded computer containing pre-approved analysis software (such as R and Jupyter notebooks) and access to the sensitive data. It allows the data provider to maintain complete control while still allowing the researcher to study the data in a convenient manner.
Researchers can analyse the data within the SANE environment, after the data provider has granted access. Results of the analyses can only be exported to the researchers' own computer after verification by the data provider. The data provider can even prevent the researcher from seeing the data. All actions of the researcher are monitored. Data uploading can also be prevented, as combining more data may result in de-anonymization.
SANE comes in 2 variants. Tinker SANE allows the researcher to see and manipulate the data. In Blind SANE, the researcher submits an algorithm without being able to see the data and the data provider approves the algorithm and the output. Interest in SANE is high: even before the project team started, 6 parties have already expressed interest.
Funding Awarded by PDI-SSH Foundation
PDI-SSH (Platform Digital Infrastructure Social Sciences & Humanities) awarded a funding request of nearly one million euros in December for the development of this secure data environment. SANE is being developed by the Erasmus School of Social and Behavioural Sciences, ODISSEI (Open Data Infrastructure for Social Science and Economic Innovations), Netherlands Institute for Sound and Vision, CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities), SURF and KB, National Library of the Netherlands.
Researchers in all disciplines
SANE builds on previous initiatives of the project partners, such as the CBS Remote Access Environment, ODISSEI Secure Supercomputer, SURF Data Exchange, SURF Research Cloud and CLariah-as-a-Service (CLaaS). We are building a generic off-the-shelf solution that can be applied by any provider of sensitive data and by any researcher. SANE can be used by researchers in all disciplines, as illustrated by the involvement of consortia in both the social sciences (ODISSEI) and humanities (CLARIAH). The platform is expected to go into production within 3 years.