Skip to main content

Recipe Execution Benchmark | a benchmark for natural language understanding

cookingbot evaluator logo cropped

This benchmark for recipe understanding in autonomous agents aims to support progressing the domain of natural language understanding by providing a setting in which performance can be measured on the everyday human activity of cooking. Showing deep understanding of such an activity requires both linguistic and extralinguistic skills, including reasoning with domain knowledge. For this goal, the benchmark provides a number of recipes written in natural (human) English that should be converted to a procedural semantic network of cooking operations that can be interpreted and executed by autonomous agents. A system, which supports one-click installation and execution, is also included that can perform recipe execution tasks in simulation allowing both analysis and evaluation of predicted networks. The provided evaluation metrics are mostly simulation-based, because demonstrating deep understanding of recipes can be done by effectively taking all the appropriate actions required for cooking the intended dish.

The full benchmark has been made available standalone and as part of the Babel toolkit. Both options provide the same benchmark functionalities, but the Babel toolkit also provides the option of extending the system.

Download the benchmark:

Discover  the benchmark in this video pill!