Case: Comprehensive datasets provide insight into causes of schizophrenia(Publicatie)

Genetic variants play a major role in the occurrence of psychiatric conditions such as depression, schizophrenia, and autism. Comprehensive data sets and plenty of processing power are needed to map these. Prof. Danielle Posthuma and her research group are therefore making intensive use of the Research Capacity Computing Service (RCCS) at SURFsara.

18 MAY 2017

Needle in a haystack

Posthuma is Professor of Psychiatric Genetics at the Vrije Universiteit Amsterdam. Together with her research team she is investigating the genetic variants that play an important role in the occurrence of psychiatric conditions. This is like looking for a needle in a haystack, given that thousands of variants can be involved in a single condition. Mapping these thus requires large datasets and plenty of processing power.

Global consortium

For the last 10 years, Posthuma has been working in a consortium of research institutions called the Psychiatric Genomics Consortium (PGC). Within this consortium, a large number of researchers across the world are working on psychiatric conditions. "Each of those researchers has a relatively small dataset that might contain 1,000 patients," explains Posthuma. "A dataset of this kind is too small on its own to form the basis of a publication, but when you combine them you have access to 70,000 patients and 100,000 control subjects (healthy individuals)."

Federative or central

Many researchers who cooperate internationally opt for a federative structure, in which they manage the data themselves and process it on local facilities. Posthuma has opted for a single central location to store the data: SURFsara. Why? "We want to have all the data in our own hands to ensure that there is one person who always cleans up and analyses the data in the same way. It is possible to work together with other institutions by following a single shared protocol, but this has the disadvantage that each institution interprets that protocol in their own way. You can also perform additional analyses when you have the data in your own hands."

Data sharing

Researchers who have contributed data can also use data belonging to other parties. "We have set up a procedure for this, as you need separate permission for some data," explains Posthuma. "There are all kinds of privacy aspects involved in sharing raw data. That's why it's beneficial that the data is hosted by a neutral body, SURFsara, that complies with all privacy and security requirements. We apply a web-based procedure that checks whether all the permissions are in order."

Processing power

A lot of processing power is needed for the analyses. Although RCCS is constantly being expanded, according to Posthuma, the need for computing power is growing even more quickly: "We all put in some money a few years ago and bought a terabyte server that we added to the computing cluster. We also invested in a high-memory node. These investments are necessary in order to be able to load extensive datasets into memory and to analyse them."


Although the researchers sometimes complain about long waiting times, Posthuma is enthusiastic about the cooperation with SURFsara: "We've been working together for over 10 years now. The helpdesk is genuinely fantastic, always ready to help you out, and sometimes we even get unsolicited advice if a job doesn't run entirely effectively. Even when I have other requests, such as a high-memory node, they are always handled in close consultation. We are very happy with that."

Number of times shown:
Latest modifications 29 May 2017