WissKI - Virtual Research Environment and Linked Open Data Management System

Session Room
Room 218 (BoFs & Workshops)
Time Slot
Fri 14:15-16:15
Speaker(s)
rnsrk
Session track
Clients & Industry Experiences
Experience level
Beginner
Duration
1.5h (workshop)

What is WissKI? 

WissKI (Scientific Communication Infrastructure/ Wissenschaftliche  Kommunikationsinfrastruktur) is a set of modules to combine Drupal's content  management features with linked open data and semantic web technologies to a  scientific research environment. The main components of WissKI are the WissKI Core,  which provides the WissKI entity and bundle definitions and their basic management  operations; the WissKI Pathbuilder, which provides the ability to map RDF patterns to  bundles and fields; the WissKI Abstraction Layer, which orchestrates and optimises  queries to (multiple) repository adapters; and the WissKI SPARQL 1.1 Adapter, which  acts as the default connector to a triplestore as a graph database storage and  translates queries to SPARQL for communication. 

Additional modules add specific field or media types (e.g. for IIIF support), bulk  operations, connectors to authority files (e.g. geonames) or features to make working  with research data easier in general. 

WissKI started as a joint venture of the Digital Humanities Research Group of the  Department of Computer Science at Friedrich-Alexander University (FAU) in Erlangen,  the Department of Museum Informatics at the Germanic National Museum (GNM) in  Nuremberg and the Biodiversity Informatics Group at Zoological Research Museum  Koenig (ZFMK) in Bonn. The Software development was funded by the German Research  Foundation (DFG) from 2008 to 2012 and from 2014 to 2017 and it was funded as a  participant in 2021 by the NFDI4Culture Consortium within the National Research Data  Infrastructure (NFDI) in Germany. WissKI is now supported by the Association for  semantic data processing (IGSD e.V.) and has a small but continuously growing world  wide community. You can try it with a WissKI cloud account or the example docker  container. 

What’s the purpose? 

WissKI provides functionality for all steps of the research data life cycle (planning,  creation, cleaning and analysis, dissemination and publication, archiving, reusing). The  focus of WissKI is to provide findable, accessible, interoperable and reusable (FAIR)  data. That's why WissKI's data modelling follows an ontological approach, preferably,  but not exclusively, according to the Conceptual Reference Model of the International 

Committee on Documentation (CIDOC CRM). The so-called "Pathbuilder" is used to  create "paths" through the ontology. A path is a concatenation of n concepts and n-1  relationships between the concepts. When storing data using a path, an individual is  

first created for each concept. Then the created individuals are connected by the  relations according to the specifications of the path. At the end of such a path in WissKI  there is always a relation to a primitive data type in which the actual input is stored. In  order to store several inputs about the same subject, paths can be combined to so called groups. The group defines the common part of all paths belonging to it. The trivial  case (groups and paths of length 1) corresponds directly to the implementation of OWL  described above. However, it also allows the implementation of more complex  modelling, as required by the CIDOC CRM. For the implementation in Drupal, the  Pathbuilder forms an intermediate layer between the triplestore with the data stored in  triples on the one hand and Drupal with the data storage based on entity types,  bundles, entities and fields on the other hand. It creates a mapping from groups and  subgroups in Pathbuilder to bundles and referenced bundles in Drupal, and from paths  to data fields. Fields like “Name” of an bundle “Person” are mapped to semantic  structures like “E21 Person -> P1 is identified by -> E41 Appellation -> 190 has symbolic  content -> ‘string’”. WissKI cares about the creation of unique identifiers of the  individuals, disambiguation, references and other necessities. This mechanism hides  the full complexity of the semantic web approach as well as the CIDOC CRM from the  actual user, who only has to fill in forms that are translated into RDF/OWL structures  when saved. 

In this way, RDF data can easily be created, stored, managed and displayed through the  Drupal framework, but the content data is separately stored from Drupal's  configurations. Our vision was, in the sense of linked open data, to build an agnostic  middleware between Drupal’s configurable content management and independent  remote triple stores, which can easily (un-)plugged, combined and federated without  system or framework dependencies. 

WissKI was primarily designed for research activities in the field of cultural heritage,  humanities and archives, but seems to evolve to a tool set for a broader user group in  politics and administration and knowledge management in general. 

Challenges and future work 

One of the main challenges was the abstraction of the entity, storage and database  engine. Since WissKI potentially uses multiple data backends (not one relational  database), and RDF and SPARQL as metamodel and query language (not SQL), core  assumptions and implementations that are often tied to relational data structures and  Drupal's database have to be rethought and restructured.

Unfortunately, Drupal relies on entity and data structures that produce some overhead,  for example the existence of an incrementable entity id as a primary key that is  inadequate as a globally unique resource identifier, media file entities that are difficult  to abstract into Drupal independent structures, or modules that focus on SQL  structures such as the Solr search api. 

Our main goals are therefore to further develop clean abstraction layers between the  Drupal API and RDF/SPARQL backend storages and engines, especially in the area of  using multiple data repositories, federated queries and better views integration. 

Content of the Workshop 

Participants will have the opportunity to create their own WissKI instance via the WissKI Cloud and build their own linked open data content management system using a  practical example of a fictitious computer game console collection.  

You will get to know the infrastructure of WissKI, the connection between Drupal,  WissKI and the graph database, how to configure WissKI and how to connect to  SPARQL endpoints. You will learn the principles of RDF/OWL, how to create an ontology  (based on the conceptual reference model CIDOC-CRM and with Webprotégé) and  import it into WissKI, and how to generate a semantic data model with the WissKI  Pathbuilder.  

It will also be shown which extensions increase the potential of WissKI and how they  are integrated. Depending on the interests of the participants, the installation,  configuration and use of IIP Server and IIIF Viewer for hosting IIIF images, Sketchfab  Embed for displaying 3D models, ODBC Import for bulk data import, WissKI API for  programmatic headless management or Solr Server and Viewfields for high performance search and display of individual views can be discussed in more detail or  we can have a deep dive into the code base of WissKI. 

The workshop is aimed at site builders, developers and database curators interested in  managing complex data, graph data, linked open data, RDF/OWL and SPARQL,  therefore, experience with Drupal configuration, entities and views, RDF/OWL  description and query languages, or the aforementioned extensions is helpful, but not  required. We show the strengths and weaknesses of the WissKI system, how best to  use it and where interested parties can get involved. 

Robert Nasarek lives in Halle/ Saale (Germany) works at the Germanic National  Museum in Nuremberg (Germany) and as a freelance Drupal developer, is a linked open  data and ontology enthusiast and dreams of a web of federated data.