"We are interested in IT solutions, but not in managing them"
Insomnia is one of the problems that the Netherlands Institute for Neuroscience (NIN) studies. Test subjects are given a kind of bathing cap, with a large number of sensors that register brain activity for a night. "The enormous data streams that this generates pose a great challenge for us," says Tom Bresser, PhD student in the Sleep & Cognition group at the NIN. "In the past, a colleague built a kind of computer cluster for data storage and processing. But when he went to work elsewhere, he took with him all his knowledge and experience. Moreover, our cluster is very outdated and is starting to get a bit shaky. So we started looking for - preferably cheaper - alternatives."
"iRODS facilitates collaboration"
When the research group saw that SURF was offering data storage in combination with iRODS, the choice was quickly made. "iRODS facilitates collaboration. With this tool, we can give colleagues from other institutions remote access to anonymised data, rather than having to physically send files, or having people come to us to work on that data."
Equally important, SURF's iRODS servers are well integrated with computing facilities such as the Lisa cluster. Bresser: "So we switched to Lisa for our data processing and analysis together with the entire NIN." The research group is now in the process of carrying out the migration step by step. Bresser: "The ladies and gentlemen at SURF are happy to help. We tend to devise complicated constructions ourselves, but there is often a much easier solution."
The Wageningen University & Research biotechnologists were also looking for a data management system that offered more possibilities than their existing internal facilities. The reason was the UNLOCK project: in this project, they work together with researchers from Delft to map out which bacteria could possibly be used for applications such as food production and water purification. Up to now, mankind has only used one percent of all micro-organisms.
Researcher Jasper Koehorst: "We were looking for a way to centrally store and process as automatically as possible all genetic and other information that will be produced by the equipment in the participating laboratories. SURF introduced us to iRODS. It enables us to set up very structured working environments."
For example, the researchers have linked their own Kubernetes cluster (a system that manages applications) to iRODS. Koehorst: "We can say to Kubernetes: perform this analysis, get the data in from iRODS, and store the results there as well." The workflows being developed for UNLOCK will soon be made publicly available by the participants. "The SURF storage with iRODS works very well," says Koehorst. "We really like the fact that our data is immediately available in the right place, linked to the right metadata. And that we will still be able to find everything there in a few years' time." Scalability is also essential: "The size of the data may soon reach hundreds of terabytes. iRODS can handle that."
The researchers from Wageningen have a great deal of ICT expertise, which is now being put to good use. Koehorst: "iRODS is a very bare system: everything goes through text commands. If you want to use it efficiently, you need to be technically savvy. Fortunately, we are, so it is not a problem for us. There is software available to make it easier to use, such as Yoda, but we have no experience with that. It also has to fit in with our way of working."
iRODS is open source. Have Koehorst and his colleagues not considered managing it themselves? "No, because you need two or three people to manage iRODS. Security, back-up, archiving... As biologists, we don't have time for that. So we're really pleased that we can purchase the hosting from SURF at an acceptable price. Because we're interested in ICT solutions, but not in managing them."
What does iRODS hosting by SURF entail?
The open-source tool iRODS enables researchers to collect, store, describe and share their data in a structured way. It has quickly become a market standard: most Dutch universities are already using or seriously considering iRODS. But to implement it properly you need experts, and they are scarce.
SURF is now making iRODS accessible to everyone by, if desired, providing the hosting - including tasks such as security, back-up, archiving, and updates - on the SURF infrastructure. For SURF it's a new addition to a growing portfolio of à la carte services for research data management, says Hylke Koers, group leader for data services at SURF. "The first was Storage scale out: it allows institutions that need extra storage capacity to link their iRODS tool to SURF's Data Archive."
"The next addition we are developing is hosting a customer-specific Yoda environment. That's an application on top of iRODS, developed by Utrecht University, that offers a graphical interface. But it also offers all kinds of extra functions, for example to add metadata, or to prepare your dataset for archiving or publication according to the FAIR principles." After a successful pilot, Yoda hosting will now be available for 2 years as a pre-production service for a number of universities and medical centres.
A community edition of iRODS hosting is also being considered. That would be aimed at individual researchers or groups who have received a NWO grant, for example, to use the national supercomputer or another SURF service. Koers: "They can then access their data from different working environments, with iRODS acting as a data management layer over all our computing and data processing facilities. That can save them a lot of work."
Text: Aad van de Wijngaart
Photo: Marieke de Lorijn
'Data management à la carte' was first published in SURF Magazine, September 2020.