High-performance data processing

Do you want to process and store large volumes of data? Our team of experts can support you in using our high-throughput data processing systems and storage solutions.

Onderzoeker achter scherm

For processing large, structured datasets

Our high-throughput data processing services are suited for projects that require processing of large, structured datasets. Such as instrument data from sensors, DNA sequencers, telescopes, and satellites.

Use data processing services for:

  • Collaborative work on a shared set of data and software
  • Parallel processing of large amounts of data, from many terabytes to petabytes
  • Processing of large independent simulations and workflows
  • Optimized data transport with scalable high-bandwidth network
  • Easy access to scalable data storage solutions

Request access 

Nobel Prize winners have used the Grid to measure gravitational waves.
Big Science examples

A number of massive (inter)national research projects have already set up their own production and collaborative scenarios on our data processing infrastructure. Some examples:

Custom support

High-performance data processing projects need support focused on transferring, storing, accessing and processing large amounts of data. We have extensive experience in all aspects of large-scale data processing. We offer specialized support in building custom production solutions for your research community.

Data Processing Platforms

We have 2 powerful data processing platforms catered to differing community needs:

Grid for scalability and federation

Grid provides limitless computing potential with a series of international clusters interconnected via a fast network. These clusters are located in datacentres distributed over the world.

Advantages of Grid:

  • Access to European Grid for federated data-processing
  • Huge processing capacity aimed at steady production workflows
  • Opportunity to collaborate with large distributed communities

Platform for agility and flexibility

Spider is a dynamic, flexible, and customizable platform locally hosted at SURF. Optimized for collaboration, it is supported by an ecosystem of tools to allow for data-intensive projects that you can start up quickly and easily.

Advantages of the Spider platform:

  • Interactive processing with user-friendly interoperable industry standard interface
  • Private nodes and clusters for tailored productivity and availability
  • Collaborate in a private and secure project-based workspace
Technical specifications

These specifications refer to the data processing platforms at SURF in January 2020 to give an idea of the quality as well as the capacity of our facilities.

 
Platform Grid Spider
Cores 10.000+ 10.000+
Operating system Linux CentOS 7.x 64bit Linux CentOS 7.x 64bit
Workload manager

gLite & DIRAC

Slurm
Network uplink to external sources 1200 Gbit/s 1200 Gbit/s
Existing data 10.000TB disk / 50.000TB tape 10.000TB disk / 50.000TB tape
Non-shared node storage (scratch) max 80GB per core / 11TB per node 80GB per core / 11TB per node
Shared local storage n/a 250TB +
Remote storage dCache dCache / SWIFT / Archive / Cloud storage
Memory (RAM) max 8GB per core / 1 TB per node 8GB per core / 1 TB per node
Federated sites worldwide n/a
Private resources n/a project-tailored nodes & clusters
Type of jobs Single-core / multi-core / whole-node Single-core / multi-core / whole-node
Container support Singularity Singularity
Security and privacy Standard Customizable

More (technical) information can be found in the user information Grid and user information Spider.

Complete support with our additional services

If you use this service, you may also be interested in the following services:

Long-term storage of research data

Our services are connected to the central archive of SURFsara. This archive offers you extensive options for storing your research data. In addition, you can also use the PID (Persistent Identifiers) service on data that is stored on SURFsara storage services. Do you want to store your data securely over long periods? Then make use of our Data Archive service.

Visualisation: immediate clarity of results

Do you work with calculations that produce large amounts of data? Then you should use our visualisation techniques and support. Visualisation helps you to better interpret the results of your calculations.

Send data quickly with SURFlichtpaden

Do you want a fast and reliable connection from your own network to our systems? A lightpath is a direct connection that is shielded from the Internet. It is extra secure, reliable and suitable, for example, for privacy-sensitive information. The biggest challenge with these lightpaths is to connect them to the systems on both sides. We will help you by bridging the final metres between the end point of a light path and your data sources.

Consultancy: independent advice

Our consultants support you from the first analysis of the problem to the final implementation. They provide independent advice on, among other things:

  • accessing our systems
  • submitting jobs
  • methods for approaching your data
  • design and optimisation of your software
  • the exact design of your data storage system
  • how to organise your data infrastructure
  • how to make optimal use of our calculation and storage facilities
  • integrating your virtual infrastructure into your work processes
  • optimisation of applications
  • running your software in parallel for faster processing

Depending on the size and complexity of your question, you will receive a customised proposal. We offer you many options in the field of Big Data Services. This includes education and training, but also advice about architecture and the use of technology. For more information, please contact our consultancy service.

Want to find out more about the possibilities?

Please contact us:

Mail info@surfsara.nl

This is an optional SURF service.