WissKI - Virtual Research Environment and Linked Open Data Management System

Session Room

Room 218 (BoFs & Workshops)

Time Slot

Fri 14:15-16:15

Speaker(s)

rnsrk

Session track

Clients & Industry Experiences

Experience level

Beginner

Duration

1.5h (workshop)

What is WissKI?

WissKI (Scientific Communication Infrastructure/ Wissenschaftliche Kommunikationsinfrastruktur) is a set of modules to combine Drupal's content management features with linked open data and semantic web technologies to a scientific research environment. The main components of WissKI are the WissKI Core, which provides the WissKI entity and bundle definitions and their basic management operations; the WissKI Pathbuilder, which provides the ability to map RDF patterns to bundles and fields; the WissKI Abstraction Layer, which orchestrates and optimises queries to (multiple) repository adapters; and the WissKI SPARQL 1.1 Adapter, which acts as the default connector to a triplestore as a graph database storage and translates queries to SPARQL for communication.

Additional modules add specific field or media types (e.g. for IIIF support), bulk operations, connectors to authority files (e.g. geonames) or features to make working with research data easier in general.

WissKI started as a joint venture of the Digital Humanities Research Group of the Department of Computer Science at Friedrich-Alexander University (FAU) in Erlangen, the Department of Museum Informatics at the Germanic National Museum (GNM) in Nuremberg and the Biodiversity Informatics Group at Zoological Research Museum Koenig (ZFMK) in Bonn. The Software development was funded by the German Research Foundation (DFG) from 2008 to 2012 and from 2014 to 2017 and it was funded as a participant in 2021 by the NFDI4Culture Consortium within the National Research Data Infrastructure (NFDI) in Germany. WissKI is now supported by the Association for semantic data processing (IGSD e.V.) and has a small but continuously growing world wide community. You can try it with a WissKI cloud account or the example docker container.

What’s the purpose?

WissKI provides functionality for all steps of the research data life cycle (planning, creation, cleaning and analysis, dissemination and publication, archiving, reusing). The focus of WissKI is to provide findable, accessible, interoperable and reusable (FAIR) data. That's why WissKI's data modelling follows an ontological approach, preferably, but not exclusively, according to the Conceptual Reference Model of the International

Committee on Documentation (CIDOC CRM). The so-called "Pathbuilder" is used to create "paths" through the ontology. A path is a concatenation of n concepts and n-1 relationships between the concepts. When storing data using a path, an individual is

first created for each concept. Then the created individuals are connected by the relations according to the specifications of the path. At the end of such a path in WissKI there is always a relation to a primitive data type in which the actual input is stored. In order to store several inputs about the same subject, paths can be combined to so called groups. The group defines the common part of all paths belonging to it. The trivial case (groups and paths of length 1) corresponds directly to the implementation of OWL described above. However, it also allows the implementation of more complex modelling, as required by the CIDOC CRM. For the implementation in Drupal, the Pathbuilder forms an intermediate layer between the triplestore with the data stored in triples on the one hand and Drupal with the data storage based on entity types, bundles, entities and fields on the other hand. It creates a mapping from groups and subgroups in Pathbuilder to bundles and referenced bundles in Drupal, and from paths to data fields. Fields like “Name” of an bundle “Person” are mapped to semantic structures like “E21 Person -> P1 is identified by -> E41 Appellation -> 190 has symbolic content -> ‘string’”. WissKI cares about the creation of unique identifiers of the individuals, disambiguation, references and other necessities. This mechanism hides the full complexity of the semantic web approach as well as the CIDOC CRM from the actual user, who only has to fill in forms that are translated into RDF/OWL structures when saved.

In this way, RDF data can easily be created, stored, managed and displayed through the Drupal framework, but the content data is separately stored from Drupal's configurations. Our vision was, in the sense of linked open data, to build an agnostic middleware between Drupal’s configurable content management and independent remote triple stores, which can easily (un-)plugged, combined and federated without system or framework dependencies.

WissKI was primarily designed for research activities in the field of cultural heritage, humanities and archives, but seems to evolve to a tool set for a broader user group in politics and administration and knowledge management in general.

Challenges and future work

One of the main challenges was the abstraction of the entity, storage and database engine. Since WissKI potentially uses multiple data backends (not one relational database), and RDF and SPARQL as metamodel and query language (not SQL), core assumptions and implementations that are often tied to relational data structures and Drupal's database have to be rethought and restructured.

Unfortunately, Drupal relies on entity and data structures that produce some overhead, for example the existence of an incrementable entity id as a primary key that is inadequate as a globally unique resource identifier, media file entities that are difficult to abstract into Drupal independent structures, or modules that focus on SQL structures such as the Solr search api.

Our main goals are therefore to further develop clean abstraction layers between the Drupal API and RDF/SPARQL backend storages and engines, especially in the area of using multiple data repositories, federated queries and better views integration.

Content of the Workshop

Participants will have the opportunity to create their own WissKI instance via the WissKI Cloud and build their own linked open data content management system using a practical example of a fictitious computer game console collection.

You will get to know the infrastructure of WissKI, the connection between Drupal, WissKI and the graph database, how to configure WissKI and how to connect to SPARQL endpoints. You will learn the principles of RDF/OWL, how to create an ontology (based on the conceptual reference model CIDOC-CRM and with Webprotégé) and import it into WissKI, and how to generate a semantic data model with the WissKI Pathbuilder.

It will also be shown which extensions increase the potential of WissKI and how they are integrated. Depending on the interests of the participants, the installation, configuration and use of IIP Server and IIIF Viewer for hosting IIIF images, Sketchfab Embed for displaying 3D models, ODBC Import for bulk data import, WissKI API for programmatic headless management or Solr Server and Viewfields for high performance search and display of individual views can be discussed in more detail or we can have a deep dive into the code base of WissKI.

The workshop is aimed at site builders, developers and database curators interested in managing complex data, graph data, linked open data, RDF/OWL and SPARQL, therefore, experience with Drupal configuration, entities and views, RDF/OWL description and query languages, or the aforementioned extensions is helpful, but not required. We show the strengths and weaknesses of the WissKI system, how best to use it and where interested parties can get involved.

Robert Nasarek lives in Halle/ Saale (Germany) works at the Germanic National Museum in Nuremberg (Germany) and as a freelance Drupal developer, is a linked open data and ontology enthusiast and dreams of a web of federated data.