It is widely accepted that humans construct narratives to make sense of complex issues in their lives and society. If we want to build machines that are capable of truly understanding such narratives, we need to be able to reliably identify and track the characters and other entities that play a role in a narrative. For example, if a person writes: “The boy likes The Witches. The book was written by Roald Dahl.”
We would like to identify three entities: a boy, The Witches, and Roald Dahl. We also need to be able to figure out that “the book” is a reformulation of the entity The Witches, and that the entity Roald Dahl is the author of that book. In order to achieve those things, this micro-project therefore focuses on these tasks: how can entities be identified (“entity recognition”), disambiguated (“entity linking”, e.g. figuring out that “The Witches” refers to the book and not the movie) and traced over the course of a longer discourse (“entity reformulation and elaboration”)? We will explore this question by integrating models of computational construction grammar and cognitive semantics with neuro-statistical NLP tools and knowledge graphs.
The project has started on 18 October 2021 and will run for two months. We have already developed several key components: an NLP-pipeline that allows constructional language processing to consider information from neuro-statistical Named Entity and Dependency Parsing models, an API for interfacing with knowledge graphs such as Wikidata, and a constraint-satisfaction system capable of integrating information from text, knowledge graphs, and discourse models.
Demonstrator & Article
Lara Verheyen, VUB - EHAI
Remi van Trijp, Sony Computer Science Laboratories Paris