HAdW Web Services – Beaconizer & Discoverer (Hagrid)

Term: March 1, 2025Dec. 31, 2025

The research units based at the Heidelberg Academy are producing digital data on an ever-increasing scale: As part of this digital work, entities such as personal names and place names are increasingly being assigned persistent identifiers for the purposes of disambiguation and register creation—typically GNDs for personal names and Geonames for toponyms.

From the user’s perspective, this presents two challenges: First, when searching for documents and information—for example, about a person or a place—there are many entry points, since each research institution has its own website. Second, these project websites typically allow only simple text-based searches for people and places; relevant indexes are rudimentary, and there is currently no linking of entities already uniquely identified by standard data across research center boundaries or with other databases worldwide, nor is there integration into larger research networks.

The Hagrid project aims to enable the active reuse of datasets accumulated over many years by developing tailored interfaces to extract data from the respective data silos and make it available to the scientific community through links and Linked Open Data modeling. In this respect, Hagrid serves—much like the character of the same name from the Harry Potter book series, who is described there as a kind of friendly, helpful “Keeper of Keys and Grounds of Hogwarts”—as a central gateway to and aid in researching standard data at the HAdW research centers.

Hagrid extracts standard data from the databases of specific research centers and aggregates it into a central standard data database. During the pilot phase, standard data from two thematically closely related research centers at the HAdW is HAdW :
1)“Melanchthon Correspondence” and 2) “Correspondence of Theologians in the Southwest of the Empire in the Early Modern Period (1550–1620),” both of which are compiling editions of letters from the Reformation era. This data is exported as CSV; for this purpose, a generic exchange format (Normdata Interchange Format, NDIF) is being developed that defines column information and data types.

Definition of the Normdata Interchange Format (NDIF)

Definition of the Normdata Interchange Format (NDIF)

This data is periodically updated by research institutions, retrieved by Hagrid, and transferred to the central Hagrid reference data aggregator. Various interfaces enable (partially) automated information exchange between computer systems, while the Hagrid website allows for user-friendly interaction with the Hagrid dataset via a web browser.

Hagrid Infrastructure

As part of the Hagrid project, following an API-first approach, the APIs are developed first, and then the Hagrid website is built on top of them. The Hagrid APIs are compatible with the OpenAPI standard(external link), and documentation for the available APIs is automatically generated using the Python framework FastAPI(external link).

As an example, let us introduce the Discovery interface: Based on a GND, this interface identifies which individuals had any kind of relationship within the letter network—whether as the writer or recipient of a letter, or as a person mentioned in the letter’s text. The interface returns the data in either JSON or GraphML format; the latter can then be seamlessly imported into the network visualization program Gephi, for example, and subjected to further analysis there. This enables federated person searches across research institution boundaries.

GraphML response from the Discovery web service (/api/discovery/100089003)

GraphML response from the Discovery web service (/api/discovery/100089003)

Visualization of the weighted graph of Georg von Erbach's letter network in Gephi

Visualization of the weighted graph of Georg von Erbach's letter network in Gephi

The additional interfaces help researchers match existing identifiers—such as Geonames, Wikidata, and Pleiades—to corresponding GND identifiers; the publication of Beacon files makes the use of authority data in research institutions’ datasets publicly available, thereby increasing the visibility of research findings; Furthermore, the Beaconizer makes legacy data in open data repositories accessible in the form of dynamically generated Beacon files. 

Overall, Hagrid serves as a central hub for conducting targeted searches for and working with standardized data in research projects at the Heidelberg Academy of Sciences and Humanities.


Publication

Grieshaber, F. (2023). GND-based Standard Data Web Service Suite. Text+ Plenary 2023:
Connecting People and Data, SUB Göttingen. Zenodo. https://doi.org/10.5281/zenodo.10033934.
(Poster presentation as part of the 2nd Text+ Plenary: Connecting People and Data on September 28–29 at the SUB Göttingen.)