Use case: statistical analyses with Lisa for insight into diseases

The amount of available knowledge on the relationship between genetic variants and psychiatric conditions is growing at a rapid pace. However, it is becoming clear that each condition may involve thousands of variants, rendering the relationship far more complex than had previously been assumed. Computational analysis can be of great help.

Kabels uit supercomputer

Genetic research

Geneticists are applying computational analysis to gain greater insight into the relative importance of genetic variants. The research group headed by Danielle Posthuma, professor of psychiatric genetics at VU University Amsterdam (VU) has been making intensive use of the compute cluster Lisa at SURF.

“We've known that psychiatric conditions are caused by a large number of variants for several years now”, Posthuma explains. “We used to think depression could be attributed to a single gene, but we now know at least a thousand - and possibly up to eight thousand - are involved in the process. This applies to all the conditions we've studied, such as depression, schizophrenia, autism and ADHD. However, this also means each individual variant can only have a minor effect. A person suffering from depression won't have all the genetic variants we associate with depression. In most cases, the number will be limited to a few hundred. This makes it difficult to identify all the relevant genes.”

“We used to think depression could be attributed to a single gene, but we now know at least a thousand - and possibly up to eight thousand - are involved in the process.”
Prof. dr. Danielle Posthuma, professor of psychiatric genetics at VU University Amsterdam (VU)

Large-scale datasets

The VU researchers have been collaborating in international consortia as a part of the project. These consortia share large-scale datasets, as Posthuma explains: “Our largest dataset is the one for schizophrenia, which includes data on 70,000 patients and 110,000 control group subjects (healthy individuals). We measured a total of one million genetic variants for each control group subject. The subsequent analysis obviously required a lot of computing power. We use the compute cluster Lisa at SURF for statistical analysis: we assess each measured variant to determine whether it's more common in patients than the control group subjects. We also use Lisa to conduct advanced analyses, such as the interaction between various genetic variants and the effects of multiple variants from the same biological pathway.”

“It wasn't long until we needed more computing power. We need to invest in Lisa in order to ensure that we can load our large-scale datasets into the system memory and analyze them.”
Prof. dr. Danielle Posthuma, professor of psychiatric genetics at VU University Amsterdam (VU)

Expanding Lisa's capacity

As Posthuma explains, the collaboration with SURF is intensive: “We started conducting project-based computations in 2006. However, it wasn't long until we needed more computing power. At a certain point, we used a NWO (Netherlands Organisation for Scientific Research) grant to finance an expansion of the Lisa cluster. From that moment onwards, our cooperation with SURF became more structural. However, we're coming up against the limits of our capacity, as the scale of our analyses outgrows our infrastructure. We're currently working to further expand Lisa. We've actually purchased a TeraByte server and are in the process of acquiring a high-memory node. We need to invest in Lisa in order to ensure that we can load our large-scale datasets into the system memory and analyze them.”

From statistics to therapy

In early 2014, the VU researchers received another substantial NWO grant. “We'll continue to do a lot of computing with Lisa in 2014”, Posthuma explains. “Amongst other objectives, we aim to develop new methods that will offer greater insight into the relative importance of the thousands of genetic variants involved in a psychiatric condition. Knowledge of the genes involved in a specific condition still doesn't tell us how the actual process works. That's going to be our next challenge: we need to make the transition from statistical correlations to causal relationships."