Case study

Comprehensive datasets provide insight into psychiatric disorders

Genetic variants are important in the development of psychiatric disorders such as depression, schizophrenia and autism. In order to map them out, extensive datasets and a great deal of computing power are required. Prof. Dr. Danielle Posthuma and her research group make intensive use of Research Compute Capacity Services (RCCS) for this purpose.

29 November 2023

Needle in a haystack

Posthuma is professor of psychiatric genetics at VU University Amsterdam. Together with her research team, she investigates the genetic variants that are important in the development of psychiatric disorders. This is like looking for a needle in a haystack, as a single disorder may involve thousands of variants. Mapping them therefore requires large datasets and a lot of computing power.

Global consortium

Posthuma works together in a consortium of research institutes, the Psychiatric Genomics Consortium (PGC). In it, a large number of researchers in the field of psychiatric disorders from all over the world work together. "Each of those researchers has a relatively small dataset, say 1,000 patients," says Posthuma. "Such a dataset by itself is too small for a publication, but if you combine them you have access to 70,000 patients and 100,000 controls (healthy individuals)."

"Each researcher has a relatively small dataset, say 1,000 patients. Such a dataset by itself is too small for a publication, but if you combine them you have access to 70,000 patients and 100,000 healthy control subjects."

Federated or centralised

Many researchers collaborating internationally choose a federated set-up, managing and processing the data themselves at local facilities. Posthuma opted for one central location where the data are stored: SURF. Why? "We want to control all the data ourselves, so that there is one person who always cleans up and analyses the data in the same way. You can cooperate with other institutes according to the same protocol, but the disadvantage is then that each institute interprets that protocol in its own way. Moreover, you can do additional analyses if you have the data yourself."

Sharing data

Researchers who have contributed data can also use data from other parties. "We have set up a procedure for that, because for some data you need separate permission," says Posthuma. "Sharing raw data involves all kinds of privacy issues. That is why it is a godsend that the data is hosted by a neutral body, SURF, which meets all privacy and security requirements. We use a web-based procedure that checks that all permissions are in order."

"We have been working together for more than 10 years. The helpdesk is really great, always willing to help, sometimes we even get unsolicited advice if a job is not running quite effectively."

Computing power

The analyses require a lot of computing power. Although SURF's computing power is constantly being expanded, the need for computing power grew even faster a few years ago, says Posthuma: "We all put some money in the pot at the time and bought a terabyte server from that, which was added to the computing cluster. We also invested in a high-memory node. These investments are needed to load and analyse the large data sets in memory."

Collaboration

Although researchers sometimes complain about long waiting times, Posthuma is very pleased with the cooperation with SURF: "We have been working together for more than 10 years. The helpdesk is really fantastic, always willing to help, sometimes we even get unsolicited advice if a job is not running quite effectively. Even if I have other wishes, for example a high-memory node, it always goes in good consultation. We are very happy with that."