Data Persistent Identifier

Persistent identifiers (PIDs) ensure the findability of your data, now and always. PIDs are comparable to the ISBN numbers assigned to books. Even if the location or underlying infrastructure changes, the reference path remains intact. SURFsara offers the PID service in cooperation with the European Persistent Identifier Consortium (EPIC).

The data explosion

The volume of stored data is growing rapidly across all fields of science, as is the number of data connections. Publications, which are themselves data objects, are supported by analysed data which is in turn based on raw data. Data corresponding to a specific publication may be housed in various data centres and recorded on various types of media. Storage locations are subject to change as well. This makes it increasingly difficult to guarantee the findability of, and access to, the data. At the same time, access is becoming ever more vital due to the reproducibility requirements of research and the reuse of scientific information.

Persistent identifiers: ISBN numbers for data

To resolve this issue, a coding system for data has been developed: persistent identifiers (PIDs). PIDs are comparable to the ISBN numbers applied to books. Just as an ISBN number provides a permanent, citable reference to a certain book, PIDs do the same for data. PIDs allow us to find data and refer back to it as well. One of the most important functions of a PID is its role as a fixed reference to underlying data, no matter where the latter is located. Any researcher consulting a PID must be able to trust that he or she will find the underlying data. This applies even if the storage location or physical form has been altered.

PIDs for researchers

SURFsara offers researchers the opportunity to register their collected data and to make it accessible through the use of PIDs. This is done as follows:

  • SURFsara uses the handle software provided by the Corporation for National Research Initiatives (CNRI) as a structural foundation. This handle software uses a software model resembling DNS. A reference in the top determines where each PID is located.
  • PIDs consist of a prefix and a suffix. The prefix, the first piece of the code, can be requested from CNRI (http://handle.net). CNRI manages the top reference, known as the prefix. The prefix belongs to the applicant. An applicant or institution may only submit PIDs that begin with their own individual prefix. As many unique suffixes as desired may be listed under a single prefix.
  • SURFsara can act as a host for the PIDs. The PIDs are then replicated internally at SURFsara, as well as externally. There are always three identical PIDs referring to the same URL.
  • The prefix can be used to create, modify, search for and delete PIDs. This is done through a HTTPS RESTful interface called EPIC-API. This EPIC-API was developed by a European consortium to which SURFsara belongs: the European Persistent Identifier Consortium (EPIC).
  • The so-called PID resolver is an application that allows the user to determine the location of data, or to request the data object itself, based on a PID. The PID resolver is accessible via an HTTP interface. This makes it possible to use a browser or URL to resolve PIDs at http://hdl.handle.net. The PID resolver always works with one of the three identical PIDs.

Do it yourself or let SURFsara help

A system administrator at your institution can create and modify the PIDs accompanying a data project. Doing so will require a certain amount of programming ability. You can also ask SURFsara to create PIDs for you, although we will only be able to assist in this matter if the data is also stored in SURFsara systems. This is because SURFsara has no control over data stored in its clients' systems.

Client's responsibility

The client retains responsibility for the integrity of the PIDs and the corresponding data objects. This responsibility is especially salient when data is being relocated. In such cases the data manager must ensure the PID reference is altered to reflect the new location.

Support & consultancy

PID users can always count on us for support. We can help you create PIDs, for example, and offer advice on maximising the findability of your data.

Helpdesk

Our helpdesk is available by telephone and email, but can also assist you in person. If you have any questions or want to report a problem, please send an email to helpdesk@surfsara.nl or phone +31-208001400. The help desk is available during office hours (9:00–17:00).

For advice on more specific topics, such as designing your data infrastructure, please contact our consultants.

Case example

The PID service is especially relevant to research projects involving vast amounts of collected data that is used by multiple parties. A real-life example is the collaboration with KNMI/ORFEUS regarding seismic data.

Contact

Need additional information? Please contact us at info@surfsara.nl.

Latest modifications 02 Jul 2015