SURF’s Data Archive now stores 100 PB of research data

The SURF Data Archive has achieved a new milestone: researchers now store 100 petabytes (PB) of data on our system. In 4 years' time, the amount of data has doubled. The Data Archive is our centralised location for long-term storage of research data.
How much data is 100 petabytes? 

  • If we stored the data on CDs, we would have a pile of 184 km height. 
  • If the files were music, we can listen to it for 11,384,615,322 minutes or 12,450 years. 
  • And we could watch movies for 47,058,823 hours or 5,372 years.  
  • If we stored the data on DVDs, we would get a pile of 14 km height. 
  • If we stored the data on Amazon S3, it would cost us $2,250,000 per month. 

We can store even more with our current libraries and the latest LTO tape drive technology. At the moment, we have a total of 20.000 tape slots in our two tape libraries, where we could store 12TB*20.000=240PB  

What data do we hold?  

SURF is, of course, not storing music of movies on the archive but scientific data. Our top users are from the fields of Astrophysics and Particles Physics.

Real data vs managed data 

If we asked all our users how much data they store, we would not end up with 100 PB but "only" with 77,9 PB. This is because most of our users store a double copy of the data without knowing it. The customer data is stored on two tape volumes in two different libraries. To ensure that the data is saved even in case of a disaster, SURF stores the data at two libraries at separate places in Amsterdam. Some users, like the High-Energy Particles Physics communities, store only a single copy at SURF since they have a second (or sometimes third) copy at other datacenters in the world.