Retrieval-augmented generation (RAG) systems have presented new challenges in terms of evaluation. The Eval4RAG Workshop at ECIR 2025 aims to help the community grapple with these challenges and reconcile the variety of evaluation proposed protocols. We ultimately aim to push towards a common mindset or conceptual framework for the evaluation of RAG systems that considers the diverse viewpoints of the community.
Tentative Program
Program | Speaker or Participants | |
---|---|---|
14:30 – 14:40 | Opening with Overview of Existing RAG Shared Tasks | TBD |
14:40 – 15:15 | Keynote Talk – A Journey Through Domain-Specific RAG and Agent Evaluation | Fabio Petroni |
15:18 – 15:25 | Automated Evaluation of RAG in Romanian Authors: Claudiu Creanga, Teodor Marchitan and Liviu Dinu |
TBD |
15:25 – 15:32 | MARE: Automatic Modality-Agnostic Report Evaluation Authors: Alexander Martin, Kate Sanders, William Walden, Eugene Yang, Reno Kriz, Francis Ferraro and Benjamin Van Durme |
TBD |
15:32 – 15:39 | Controlled Retrieval-Augmented Context Evaluation Authors: Jia-Huei Ju, Suzan Verberne, Maarten de Rijke and Andrew Yates |
TBD |
15:39 – 15:46 | Open-ended error analysis in retrieval-augmented generation Authors: Nadezhda Chirkova |
TBD |
15:46 – 15:53 | Challenges in RAG Evaluation for Text Classification in Evidence Synthesis Authors: Sagar Uprety, Ailbhe Finnerty and James Thomas |
TBD |
15:53 – 16:00 | Automated Evaluations of RAG Systems in Customer Support in Automotive Applications Authors: Luis Wagner, Gayane Sedrakyan and Jos Van Hillegersberg |
TBD |
16:00 – 16:30 | Coffee Break – Organizers Collecting/Organizing Discussion Idea | |
16:30 – 16:45 | Discussion Idea Presentation and Group Assignment | TBD |
16:45 – 17:30 | Breakout Discussion | Attendees |
17:30 – 17:50 | Group Report Back | Attendees |
17:50 – 18:00 | Closing | TBD |
Keynote: A Journey Through Domain-Specific RAG and Agent Evaluation
Speaker: Fabio Petroni
Abstract
In this talk, I will share the evaluation journey at Samaya over the past two and a half years—starting from factoid questions we handcrafted internally, to evaluating complex, real-world predictions about the future. I’ll highlight the evolution of our methodologies and the growing challenges of assessing systems operating in specialized domains, with an eye toward actionable insights and open questions for the community.
Bio
Fabio Petroni is the Co-Founder and CTO of Samaya AI, specializing in the intersection of AI and knowledge. He holds a Ph.D. in Engineering of Computer Science from Sapienza University of Rome and has conducted research in leading industrial labs, including the FAIR team at Meta AI and the R&D department at Thomson Reuters. Fabio is known for his work on knowledge-intensive NLP, with awards such as first place in the NeurIPS Efficient Open-Domain Question Answering competition (2020) and the Google Best Paper Award at AKBC (2020). His research contributions include several high-impact publications, including Language Models as Knowledge Bases? and the original RAG paper.
Presentation Topics
To this end, we call for oral presentations in the workshop to help spawn discussion and share perspectives about RAG evaluation. The call covers but is not limited to:
- Desired evaluation aspects or qualities for RAG systems;
- Unification of RAG task structure, e.g., TREC topic structure and Cranfield paradigm;
- Bridging gaps between IR and other RAG-related fields, e.g., summarization, question answering, etc.;
- Proposed or published relevant evaluation methods or datasets for RAG;
- Automation of evaluation methods;
- Reproducibility for RAG evaluation;
- Reasons to abandon evaluation for RAG, i.e., we don’t need evaluation;
- Other RAG evaluation spicy topics.
Interested presenters should submit a one-page extended abstract in ACM two-column conference format (with unlimited reference) as the presentation proposal. The extended abstract should cover the relevancy of the presentation to the workshop, especially for presenting a published or accepted work(s). Both published and unpublished work are welcomed as long as the presentation is relevant to the workshop. Accepted extended abstracts will be given a short dedicated time slot to present their perspective in the workshop. We encourage authors of relevant accepted papers at the main conference to submit an extended abstract to present the work again at the workshop with a focus on evaluation.
Submissions will undergo a lightweight single-blind review, i.e., the author’s identities are visible to reviewers but not the other way around, by the program committee and workshop organizers.
- Submission portal: Easychair
- Abstract Submission Deadline: February 21, 2025 (AoE)
- Notification of Acceptance: March 7, 2025
- Workshop Date: April 10, 2025
Steering and Program Committee
- Ian Soboroff, National Institute of Standards and Technology, USA
- Jimmy Lin, University of Waterloo, Canada
- Charlie Clarke, University of Waterloo, Canada
- Mark Smucker, University of Waterloo, Canada
- Jaap Kamps, University of Amsterdam, Netherlands
- Tetsuya Sakai, Waseda University, Japan
- Dina Demner-Fushman, National Library of Medicine, USA
- Dawn Lawrie, Johns Hopkins University, USA
- James Mayfield, Johns Hopkins University, USA
- Doug Oard, University of Maryland, USA
Organizers
- Eugene Yang, Human Language Technology Center of Excellence, Johns Hopkins University, USA
- Ronak Pradeep, University of Waterloo, Canada
- Dake Zhang, University of Waterloo, Canada
- Sean MacAvaney, University of Glasgow, UK
- Maria Maistro, University of Copenhagen, Denmark
- Mohammad Aliannejadi, University of Amsterdam, Netherlands