Nobel Prize winners have used the Grid to measure gravitational waves.
High-performance data processing
Do you want to process and store large volumes of data? Our team of experts can support you in using our high-throughput data processing systems and storage solutions.
For processing large, structured datasets
Our high-throughput data processing services are suited for projects that require processing of large, structured datasets. Such as instrument data from sensors, DNA sequencers, telescopes, and satellites.
Use data processing services for:
- Collaborative work on a shared set of data and software
- Parallel processing of large amounts of data, from many terabytes to petabytes
- Processing of large independent simulations and workflows
- Optimized data transport with scalable high-bandwidth network
- Easy access to scalable data storage solutions
A number of massive (inter)national research projects have already set up their own production and collaborative scenarios on our data processing infrastructure. Some examples:
- Astronomers reveal hundreds of thousands of previously unknown galaxies
- SURF supports SRON for Tropomi satellite data analysis
- LIGO-Virgo: gravity wave detectors by Nobel Prize winners Rainer Weiss and Kip Thorne
- Project MinE: international genomic sequencing project to understand the neurodegenerative disease ALS
- Large Hadron Collider, the world’s largest and most powerful particle accelerator
- Radio telescope LOFAR: long-term storage and data analysis
- BBMRI: large biobank storing omics data for reusable research
- Xenon: international research collaboration searching for dark matter
High-performance data processing projects need support focused on transferring, storing, accessing and processing large amounts of data. We have extensive experience in all aspects of large-scale data processing. We offer specialized support in building custom production solutions for your research community.
Data Processing Platforms
We have 2 powerful data processing platforms catered to differing community needs:
Grid for scalability and federation
Grid provides limitless computing potential with a series of international clusters interconnected via a fast network. These clusters are located in datacentres distributed over the world.
Advantages of Grid:
- Access to European Grid for federated data-processing
- Huge processing capacity aimed at steady production workflows
- Opportunity to collaborate with large distributed communities
Platform for agility and flexibility
Spider is a dynamic, flexible, and customizable platform locally hosted at SURF. Optimized for collaboration, it is supported by an ecosystem of tools to allow for data-intensive projects that you can start up quickly and easily.
Advantages of the Spider platform:
- Interactive processing with user-friendly interoperable industry standard interface
- Private nodes and clusters for tailored productivity and availability
- Collaborate in a private and secure project-based workspace
These specifications refer to the data processing platforms at SURF in January 2020 to give an idea of the quality as well as the capacity of our facilities.
|Operating system||Linux CentOS 7.x 64bit||Linux CentOS 7.x 64bit|
gLite & DIRAC
|Network uplink to external sources||1200 Gbit/s||1200 Gbit/s|
|Existing data||10.000TB disk / 50.000TB tape||10.000TB disk / 50.000TB tape|
|Non-shared node storage (scratch) max||80GB per core / 11TB per node||80GB per core / 11TB per node|
|Shared local storage||n/a||250TB +|
|Remote storage||dCache||dCache / SWIFT / Archive / Cloud storage|
|Memory (RAM) max||8GB per core / 1 TB per node||8GB per core / 1 TB per node|
|Private resources||n/a||project-tailored nodes & clusters|
|Type of jobs||Single-core / multi-core / whole-node||Single-core / multi-core / whole-node|
|Security and privacy||Standard||Customizable|
More (technical) information can be found in the user information.
If you use this service, you may also be interested in the following services:
Long-term storage of research data
Our services are connected to the central archive of SURFsara. This archive offers you extensive options for storing your research data. In addition, you can also use the PID (Persistent Identifiers) service on data that is stored on SURFsara storage services. Do you want to store your data securely over long periods? Then make use of our Data Archive service.
Visualisation: immediate clarity of results
Do you work with calculations that produce large amounts of data? Then you should use our visualisation techniques and support. Visualisation helps you to better interpret the results of your calculations.
Send data quickly with SURFlichtpaden
Do you want a fast and reliable connection from your own network to our systems? A lightpath is a direct connection that is shielded from the Internet. It is extra secure, reliable and suitable, for example, for privacy-sensitive information. The biggest challenge with these lightpaths is to connect them to the systems on both sides. We will help you by bridging the final metres between the end point of a light path and your data sources.
Consultancy: independent advice
Our consultants support you from the first analysis of the problem to the final implementation. They provide independent advice on, among other things:
- accessing our systems
- submitting jobs
- methods for approaching your data
- design and optimisation of your software
- the exact design of your data storage system
- how to organise your data infrastructure
- how to make optimal use of our calculation and storage facilities
- integrating your virtual infrastructure into your work processes
- optimisation of applications
- running your software in parallel for faster processing
Depending on the size and complexity of your question, you will receive a customised proposal. We offer you many options in the field of Big Data Services. This includes education and training, but also advice about architecture and the use of technology. For more information, please contact our consultancy service.