Beyond Black-Box Interventions: Latent Probing for Faithful Retrieval-Augmented Generation

Abstract

Retrieval-Augmented Generation (RAG) systems often fail to maintain contextual faithfulness, generating responses that conflict with the provided context or fail to fully leverage the provided evidence. In this paper, we move beyond black-box interventions to analyze the model’s internal reasoning process. We discover that conflicting and aligned knowledge states are linearly separable in the model’s latent space, and contextual noise systematically increases the entropy of these representations. Based on these findings, we propose ProbeRAG, a novel framework for faithful RAG that operates in three stages: (i) fine-grained knowledge pruning to filter irrelevant context, (ii) latent conflict probing to identify hard conflicts in the model’s latent space, and (iii) conflict-aware attention to modulate attention heads toward faithful context integration. Extensive experiments demonstrate that ProbeRAG substantially improves both accuracy and contextual faithfulness.

Publication
Findings of the Association for Computational Linguistics, ACL Findings, 2026, pages 29981–30000