This research conducted at the University of Amsterdam explores the impact of dialogue context on the crowdsourced evaluation of response relevance and usefulness in conversations. Utilizing the ReDial dataset, which includes over 11,000 dialogues, the study is structured in two phases: the first examines the effect of varying the amount of context available to evaluators, while the second assesses how different types of previous contextual information influence judgments. Key findings indicate significant implications for understanding conversational AI systems' performance in real-world applications.
The study explores how varying dialogue context affects the consistency of crowdsourced judgments, focusing on its relevance and usefulness in conversational responses.
Using the ReDial dataset, the research involved two phases of experiments to understand the impact of context on annotators' evaluations of system responses.
Collection
[
|
...
]