Data Persistent Identifier: data always findable by permanent references
Persistent identifiers (PIDs) ensure the findability of your data, now and always. PIDs are comparable to the ISBN numbers assigned to books. Even if the location or underlying infrastructure changes, the reference path remains intact. SURFsara offers the PID service in cooperation with the European Persistent Identifier Consortium (EPIC).
The data explosion
The volume of stored data is growing rapidly across all fields of science, as is the number of data connections. Publications, which are themselves data objects, are supported by analysed data which is in turn based on raw data. Data corresponding to a specific publication may be housed in various data centers and recorded on various types of media. Storage locations are subject to change as well. This makes it increasingly difficult to guarantee the findability of, and access to, the data. At the same time, access is becoming ever more vital due to the reproducibility requirements of research and the reuse of scientific information.
Persistent identifiers: ISBN numbers for data
To resolve this issue, a coding system for data has been developed: persistent identifiers (PIDs). PIDs are comparable to the ISBN numbers applied to books. Just as an ISBN number provides a permanent, citable reference to a certain book, PIDs do the same for data. PIDs allow us to find data and refer back to it as well. One of the most important functions of a PID is its role as a fixed reference to underlying data, no matter where the latter is located. Any researcher consulting a PID must be able to trust that he or she will find the underlying data. This applies even if the storage location or physical form has been altered.
SURFsara offers researchers the opportunity to register their collected data and to make it accessible through the use of PIDs. This is done as follows:
- SURFsara uses the handle software provided by the Corporation for National Research Initiatives (CNRI) as a structural foundation. This handle software uses a software model resembling DNS. A reference in the top determines where each PID is located.
- PIDs consist of a prefix and a suffix. The prefix is the first piece of the persistent identifier and can be requested directly from SURFsara by contacting us via email@example.com. The prefix belongs to the applicant. An applicant or institution may only submit PIDs that begin with their own individual prefix. As many unique suffixes as desired may be listed under a single prefix.
- SURFsara acts as a host for the PIDs. The PIDs are then replicated internally at SURFsara, as well as externally in the context of the EPIC consortium.
- The prefix can be used to create, modify, search for and delete PIDs. This is done through a HTTPS RESTful API.
- The so-called PID resolver is an application that allows the user to determine the location of data, or to request the data object itself, based on a PID. The PID resolver is accessible via an HTTP interface. This makes it possible to use a browser or URL to resolve PIDs at http://hdl.handle.net. The PID resolver always works with one of the three identical PIDs.
The PID service is especially relevant to research projects involving vast amounts of collected data that is used by multiple parties. A real-life example is the collaboration with KNMI/ORFEUS regarding seismic data.
Do it yourself or let SURFsara help
A system administrator at your institution can create and modify the PIDs accompanying a data project. Doing so will require a certain amount of programming ability. You can also ask SURFsara to create PIDs for you, although we will only be able to assist in this matter if the data is also stored in SURFsara systems. This is because SURFsara has no control over data stored in its clients' systems.
The client retains responsibility for the integrity of the PIDs and the corresponding data objects. This responsibility is especially salient when data is being relocated. In such cases the data manager must ensure the PID reference is altered to reflect the new location.
Support & consultancy
PID users can always count on us for support. We can help you create PIDs, for example, and offer advice on maximizing the findability of your data.
Our helpdesk is available by telephone and email, but can also assist you in person. If you have any questions or want to report a problem, please send an email to firstname.lastname@example.org or phone +31-208001400. The help desk is available during office hours (9:00–17:00).
For advice on more specific topics, such as designing your data infrastructure, please contact our consultants.
If you use this service, you may also be interested in the following services:
Consultancy: independent advice
Our consultants support you from the first analysis of the problem to the final implementation. They provide independent advice on, among other things:
- Accessing the Grid
- submitting jobs
- how to ensure that Cartesius or the Lisa Computing Cluster deliver an even better performance
- methods for approaching your data
- design and optimisation of your own software
- the exact design of your data storage system
- how to organise your data infrastructure
- how to make optimal use of our calculation and storage facilities
- integrating your virtual infrastructure into your work processes
- optimisation of applications
- running your software in parallel for faster processing
Depending on the size and complexity of your question, you will receive a customised proposal. We offer you many options in the field of Big Data Services. This includes education and training, but also advice about architecture and the use of technology. For more information, please contact our consultancy service.
Long-term storage of research data
The Grid, the HPC Cloud and Data Ingest are all connected to the central archive of SURFsara. This archive offers you extensive options for storing your research data. In addition, you can also use the PID (Persistent Identifiers) service on data that is stored on SURFsara storage services, such as Data Archive. Do you want to store your data securely over long periods? Then make use of our Data Archive service.