Three winning proposals for ‘big science’ call
The eTEC-BIG call from SURF and the Netherlands eScience Center aims to support research and development of innovative eScience technologies and software associated with big data handling, big data analytics and related computational methods, driven by a direct demand from any research area that can be identified broadly by the term ‘Big Science’.
The proposals are classified into one of three technological research directions: Scalable Machine Learning & AI; Processing of Streaming Data; Large Scale (Distributed) Data Organization, Management & Semantics.
Each of the winning projects will receive a grant consisting of funds and in-kind support by research engineers from the eScience Center and technology and e-Infrastructure experts from SURFsara.
The awarded projects
DarkGenerators – Interpretable Large Scale Deep Generative Models for Dark Matter Searches
Dr. Christoph Weniger (University of Amsterdam)
Dark matter is five times more abundant in the universe than visible matter. Yet, its nature remains unknown and constitutes one of the most exciting and complex research questions today. This project will use advanced data science methods to enhance and accelerate the interpretation of astrophysical and collider data in the search for signals of dark matter. As such, deep generative models and differentiable probabilistic programming will be used to construct a framework for the fast and precise inference of high-dimensional data models.
Technological research direction: Scalable Machine Learning & AI
The PetaFLOP AARTFAAC Data-Reduction Engine
Dr. John Romein (Netherlands Institute for Radio Astronomy)
AARTFAAC is an all-sky radio telescope and transient-detection facility. It piggybacks on raw data from a limited number of antennas of the LOFAR telescope. Last year, the AARTFAAC 2.0 program started, which combines a planned telescope upgrade with better transient-detection capabilities and new science cases. The project will improve the AARTFAAC processing pipeline in order to:
- incorporate algorithmic improvements and new GPU technologies to permit scaling to larger collecting area, larger bandwidth and higher resolution;
- detect transients well within 7 seconds to allow triggering of the TBBs and alert other instruments;
- provide near real-time calibrated data/images for space-weather and ionospheric monitoring;
- facilitate other science cases by providing intermediate data products.
Technological research direction: Processing of Streaming Data
Scaling up Pangenomics for Plant Breeding
Dr. Sandra Smit (Wageningen University)
Modern plant research is being transformed to a data-driven endeavor. A main driver of this development is the continuous reduction in DNA sequencing costs – reconstructing the complete genome of a plant from short DNA sequences or finding genetic variants with respect to a reference genome are applications where large amounts of sequencing data are generated and applied to study plants and to accelerate and improve breeding. Traditional approaches to compare genomes, centered on a single reference, no longer suffice and therefore the field of genomics is switching to so-called pangenome approaches. Several novel graph-based data structures and algorithms are under development, but none of these can handle the numbers of large plant genomes required in modern research and in applications in plant breeding. This project will improve the scalability of a promising pangenome approach, called PanTools, using eScience technologies.
Technological research direction: Large-Scale (Distributed) Data Organization, Management & Semantics.