The quality of data for cultural objects is crucial to their accessibility and subsequent use. This is applies to all data providers, but especially for shared platforms such as the Deutsche Digitale Bibliothek (DDB) and the Graphikportal, as well as the growing data collections of NFDI consortia, such as NFDI4Culture and Text+. Often, data to be integrated does not meet the quality requirements of target systems. Before integration, the data must be analyzed and, if necessary, adapted. However, defining data quality (DQ) requires deep domain knowledge, technical expertise (e.g., query languages), and coordination between domain experts, data engineers, and data model specialists. As a result, domain experts are often unable to define and implement quality assurance independently.

The objective of this project is to develop a workflow that will allow domain experts, regardless of their technical expertise, to assess DQ. At its core is Constrainify, an open-source web application that supports an agile, standalone QA process. It can be integrated into existing pipelines or used independently. Constrainify empowers domain experts to specify quality requirements in controlled natural language, minimizing the need for technical knowledge. The quality analysis approach builds on the results of KONDA and the MQAF (used by Europeana and DDB), transforming natural language constraints into machine-readable queries.

Use cases include quality assurance of LIDO data for integration into the DDB and the Graphikportal, as well as TEI header data in the TextGrid repository. The evaluation of the approach is embedded in the NFDI consortia NFDI4Culture and Text+. Given that the approach is independent of specific data formats and technologies and is therefore generic, it can be applied to data quality assurance in other domains.

Project Goals

  1. Development of a process for agile quality assurance of cultural heritage data in the context of data integration processes.
  2. Develop software for user-friendly, domain-specific quality assurance based on the software developed in the KONDA project and MQAF. This will allow enable domain experts to define and execute domain-specific quality assurance independently.
  3. Evaluation of the process and the supporting software for quality assurance using (1) LIDO data for integration into the Deutsche Digitale Bibliothek (DDB), (2) LIDO data for integration into the Graphikportal, and (3) TEI header data in the TextGrid repository.

GWDG’s Role in the project

The GWDG and Philipps-Universität Marburg will jointly develop the process and the software for domain-specific quality assurance. In addition, the GWDG will provide the technical infrastructure for the software development.

Project Partners

Göttingen State and University Library (until 2024) Philipps-Universität Marburg Verbundzentrale des GBV (VZG) (since 2025)

Projektlogo

Kontakt

E-Mail

Laufzeit

01.11.2025 - 30.10.2025

Webseite

Constrainify Code Repository Constrainify Demo