Guest Talk "Why Databases Should Know How Much They Know"
Date/Time: 27.01.2020, 10:00 am
Data in a database is not always complete. A bibliographical database
need not contain all publications of a given researcher, nor all
researchers of a given university. A knowledge base like DBpedia need
not contain all actors of a movie. As a consequence, answers to a
query over such a database can be unreliable: relevant answers may be
missing, aggregate values like counts or averages may be incorrect, or
answers to queries with negation may simply be wrong.
A generally incomplete database will usually be partially complete:
for some researchers, it may contain all journal papers and for some
movies the complete cast. Online resources like Wikipedia or IMDb
contain such statements.
In this talk, we argue that databases should come with metadata that
states for which parts of their domain they are complete and show how
this can be used to generate meta-information about the reliability
of query answers.
We discuss what are possible forms for such statements, both for
relational and RDF data, and how they can be given a formal
semantics. Then we present algorithms that reason, given a query,
whether the information in the database is sufficient to return a
complete set of answers, or correct answers, respectively, in the case
of queries with aggregation or negation. Finally, to cope with
possibly large sets of completeness meta-data, we report on indexing
techniques that potentially allow one to reduce the time spent on
reasoning to the same order of magnitude as query evaluation. We
conclude with a discussion of how to obtain completeness metadata and
give an overview of work to generate it automatically.
Werner Nutt is a professor at the Faculty of Computer Science at
the Free University of Bozen-Bolzano since 2005. Prior to this, he was
reader at Heriot-Watt University, Edinburgh (2000-2005), visiting
professor at the Hebrew University of Jerusalem (1997-2000), and
research scientist at the German Research Center for Artificial
Intelligence (DFKI) in Saarbruecken (1992-2000).
His research interests are in data and knowledge management, with a
focus on extracting structured information from text, ensuring data
quality and organising knowledge-intensive processes. Since 2017, he
is coordinating the ERDF project COCkPiT, which aims to create
techniques and tools to model, schedule and monitor construction