- De
- En
The quality of data for cultural objects is crucial to their accessibility and subsequent use. This is applies to all data providers, but especially for shared platforms such as the Deutsche Digitale Bibliothek (DDB) and the Graphikportal, as well as the growing data collections of NFDI consortia, such as NFDI4Culture and Text+. Often, data to be integrated does not meet the quality requirements of target systems. Before integration, the data must be analyzed and, if necessary, adapted. However, defining data quality (DQ) requires deep domain knowledge, technical expertise (e.g., query languages), and coordination between domain experts, data engineers, and data model specialists. As a result, domain experts are often unable to define and implement quality assurance independently.
The objective of this project is to develop a workflow that will allow domain experts, regardless of their technical expertise, to assess DQ. At its core is Constrainify, an open-source web application that supports an agile, standalone QA process. It can be integrated into existing pipelines or used independently. Constrainify empowers domain experts to specify quality requirements in controlled natural language, minimizing the need for technical knowledge. The quality analysis approach builds on the results of KONDA and the MQAF (used by Europeana and DDB), transforming natural language constraints into machine-readable queries.
Use cases include quality assurance of LIDO data for integration into the DDB and the Graphikportal, as well as TEI header data in the TextGrid repository. The evaluation of the approach is embedded in the NFDI consortia NFDI4Culture and Text+. Given that the approach is independent of specific data formats and technologies and is therefore generic, it can be applied to data quality assurance in other domains.
The GWDG and Philipps-Universität Marburg will jointly develop the process and the software for domain-specific quality assurance. In addition, the GWDG will provide the technical infrastructure for the software development.
Göttingen State and University Library (until 2024) Philipps-Universität Marburg Verbundzentrale des GBV (VZG) (since 2025)