Guest Talk: Paul Groth "Data Curation and Debugging for Data Centric AI"


Paul Groth, University of Amsterdam

Date/Time: Friday, 10 March 2023, 12:30 p.m.
Location: WU Vienna Campus, TC.1.02 (Teaching Center)


It is increasingly recognized that data is a central challenge for AI systems - whether training an entirely new model, discovering data for a model, or applying an existing model to new data. Given this centrality of data, there is need to provide new tools that are able to help data teams create, curate and debug datasets in the context of complex machine learning pipelines. In this talk, I outline the underlying challenges for data debugging and curation in these environments. I then discuss our recent research that both takes advantage of ML to improve datasets but also uses core data management and semantic techniques for debugging in such complex ML pipelines.


Paul Groth is Professor of Algorithmic Data Science at the University of Amsterdam where he leads the Intelligent Data Engineering Lab (INDElab). He holds a Ph.D. in Computer Science from the University of Southampton (2007) and has done research at the University of Southern California, the Vrije Universiteit Amsterdam and Elsevier Labs. His research focuses on intelligent systems for dealing with large amounts of diverse contextualized knowledge with a particular focus on web and science applications. This includes research in data provenance, data integration and knowledge sharing.
Paul is scientific director of the UvA’s Data Science Center. Additionally, he is co-scientific director of two Innovation Center for Artificial Intelligence (ICAI) labs: The AI for Retail (AIR) Lab - a collaboration between UvA and Ahold Delhaize; and the Discovery Lab - a collaboration between Elsevier, the University of Amsterdam and VU University Amsterdam.
Previously, Paul led the design of a number of large scale data integration and knowledge graph construction efforts in the biomedical domain. Paul was co-chair of the W3C Provenance Working Group that created a standard for provenance interchange. He has also contributed to the emergence of community initiatives to build a better scholarly ecosystem including altmetrics and the FAIR data principles.
Paul is co-author of “Provenance: an Introduction to PROV” and “The Semantic Web Primer: 3rd Edition” as well as numerous academic articles.


If you wish to attend the talk, please register here:

Back to overview