Eval4RAG: Workshop on Evaluation of RAG Systems

Retrieval-augmented generation (RAG) systems have presented new challenges in terms of evaluation. The Eval4RAG Workshop at ECIR 2025 aims to help the community grapple with these challenges and reconcile the variety of evaluation proposed protocols. We ultimately aim to push towards a common mindset or conceptual framework for the evaluation of RAG systems that considers the diverse viewpoints of the community.

Tentative Program

Idea Board

	Program	Speaker or Participants
14:30 – 14:40	Opening with Overview of Existing RAG Shared Tasks	TBD
14:40 – 15:15	Keynote Talk – A Journey Through Domain-Specific RAG and Agent Evaluation	Fabio Petroni
15:18 – 15:25	Automated Evaluation of RAG in Romanian Authors: Claudiu Creanga, Teodor Marchitan and Liviu Dinu	TBD
15:25 – 15:32	MARE: Automatic Modality-Agnostic Report Evaluation Authors: Alexander Martin, Kate Sanders, William Walden, Eugene Yang, Reno Kriz, Francis Ferraro and Benjamin Van Durme	TBD
15:32 – 15:39	Controlled Retrieval-Augmented Context Evaluation Authors: Jia-Huei Ju, Suzan Verberne, Maarten de Rijke and Andrew Yates	TBD
15:39 – 15:46	Open-ended error analysis in retrieval-augmented generation Authors: Nadezhda Chirkova	TBD
15:46 – 15:53	Challenges in RAG Evaluation for Text Classification in Evidence Synthesis Authors: Sagar Uprety, Ailbhe Finnerty and James Thomas	TBD
15:53 – 16:00	Automated Evaluations of RAG Systems in Customer Support in Automotive Applications Authors: Luis Wagner, Gayane Sedrakyan and Jos Van Hillegersberg	TBD
16:00 – 16:30	Coffee Break – Organizers Collecting/Organizing Discussion Idea
16:30 – 16:45	Discussion Idea Presentation and Group Assignment	TBD
16:45 – 17:30	Breakout Discussion	Attendees
17:30 – 17:50	Group Report Back	Attendees
17:50 – 18:00	Closing	TBD

Keynote: A Journey Through Domain-Specific RAG and Agent Evaluation

Speaker: Fabio Petroni

Abstract

In this talk, I will share the evaluation journey at Samaya over the past two and a half years—starting from factoid questions we handcrafted internally, to evaluating complex, real-world predictions about the future. I’ll highlight the evolution of our methodologies and the growing challenges of assessing systems operating in specialized domains, with an eye toward actionable insights and open questions for the community.

Bio

Fabio Petroni is the Co-Founder and CTO of Samaya AI, specializing in the intersection of AI and knowledge. He holds a Ph.D. in Engineering of Computer Science from Sapienza University of Rome and has conducted research in leading industrial labs, including the FAIR team at Meta AI and the R&D department at Thomson Reuters. Fabio is known for his work on knowledge-intensive NLP, with awards such as first place in the NeurIPS Efficient Open-Domain Question Answering competition (2020) and the Google Best Paper Award at AKBC (2020). His research contributions include several high-impact publications, including Language Models as Knowledge Bases? and the original RAG paper.

Presentation Topics

To this end, we call for oral presentations in the workshop to help spawn discussion and share perspectives about RAG evaluation. The call covers but is not limited to:

Desired evaluation aspects or qualities for RAG systems;
Unification of RAG task structure, e.g., TREC topic structure and Cranfield paradigm;
Bridging gaps between IR and other RAG-related fields, e.g., summarization, question answering, etc.;
Proposed or published relevant evaluation methods or datasets for RAG;
Automation of evaluation methods;
Reproducibility for RAG evaluation;
Reasons to abandon evaluation for RAG, i.e., we don’t need evaluation;
Other RAG evaluation spicy topics.

Interested presenters should submit a one-page extended abstract in ACM two-column conference format (with unlimited reference) as the presentation proposal. The extended abstract should cover the relevancy of the presentation to the workshop, especially for presenting a published or accepted work(s). Both published and unpublished work are welcomed as long as the presentation is relevant to the workshop. Accepted extended abstracts will be given a short dedicated time slot to present their perspective in the workshop. We encourage authors of relevant accepted papers at the main conference to submit an extended abstract to present the work again at the workshop with a focus on evaluation.

Submissions will undergo a lightweight single-blind review, i.e., the author’s identities are visible to reviewers but not the other way around, by the program committee and workshop organizers.

Submission portal: Easychair
Abstract Submission Deadline: February 21, 2025 (AoE)
Notification of Acceptance: March 7, 2025
Workshop Date: April 10, 2025

Steering and Program Committee

Ian Soboroff, National Institute of Standards and Technology, USA
Jimmy Lin, University of Waterloo, Canada
Charlie Clarke, University of Waterloo, Canada
Mark Smucker, University of Waterloo, Canada
Jaap Kamps, University of Amsterdam, Netherlands
Tetsuya Sakai, Waseda University, Japan
Dina Demner-Fushman, National Library of Medicine, USA
Dawn Lawrie, Johns Hopkins University, USA
James Mayfield, Johns Hopkins University, USA
Doug Oard, University of Maryland, USA

Organizers

Eugene Yang, Human Language Technology Center of Excellence, Johns Hopkins University, USA
Ronak Pradeep, University of Waterloo, Canada
Dake Zhang, University of Waterloo, Canada
Sean MacAvaney, University of Glasgow, UK
Maria Maistro, University of Copenhagen, Denmark
Mohammad Aliannejadi, University of Amsterdam, Netherlands