Endless amounts of data are available nowadays. But how can you process, analyse and (re)use them safely and securely? The projects in this Labs theme explore these aspects.
eTEC-BIG Advanced data management
Obtaining data is becoming increasingly cheaper, which means that datasets are growing explosively. To ensure that plant genome analyses remain possible even with large datasets, SURF is working with WUR and eScience Center on a scalable solution for the bioinformatics application PanTools.
Why do we do this project?
In plant sciences, researchers study how plants can be improved. They use genetic data for this purpose. It is becoming increasingly cheaper to obtain this data, which is why datasets are becoming larger. Where in the past researchers worked with a dataset of a handful of lettuce genomes, they now work with a dataset of no less than a hundred genomes. Together with WUR and the eScience Center, SURF is optimising the bioinformatics application PanTools to keep up with the explosive growth in data. In doing so it will ensure that plant researchers are able to analyse the data more effectively.
What are the main activities?
SURF is designing a scalable solution for PanTools that will enable it to analyse ever larger data sets. We are doing this in collaboration with scientists from WUR and engineers from the eScience Center. SURF explores and advises on Big Data techniques and databases, and provides infrastructure for developing and testing scalable analyses. Finally, SURF advises on improving development methods so that ultimately a high-quality and scalable application can be delivered to researchers.
Who do we collaborate with?
We are collaborating on this project with WUR and eScience Center.