
Understanding the behavior of large language models (LLMs) is crucial for ensuring their safe and reliable use. However, existing explainable AI (XAI) methods for LLMs primarily rely on word-level explanations, which are often computationally inefficient and misaligned with human reasoning processes. Moreover, these methods often treat explanation as a one-time output, overlooking its inherently interactive and iterative nature. In this paper, we present LLM Analyzer, an interactive visualization system that addresses these limitations by enabling intuitive and efficient exploration of LLM behaviors through counterfactual analysis. Our system features a novel algorithm that generates fluent and semantically meaningful counterfactuals via targeted removal and replacement operations at user-defined levels of granularity. These counterfactuals are used to compute feature attribution scores, which are then integrated with concrete examples in a table-based visualization, supporting dynamic analysis of model behavior. A user study with LLM practitioners and interviews with experts demonstrate the system's usability and effectiveness, emphasizing the importance of involving humans in the explanation process as active participants rather than passive recipients.
Large Language Models (LLMs) have shown remarkable capabilities in interpreting textual instructions and solving complex tasks. As their adoption grows across a wide range of applications, understanding how and why these models generate specific outputs becomes increasingly critical for ensuring their safety and reliability. To this end, developing methods that enhance transparency and help users better understand the behavior and limitations of the model is essential.
Existing eXplainable Artificial Intelligence (XAI) approaches for interpreting local model behavior, such as feature attribution methods, primarily provide static, one-shot explanations. While these methods can be informative in certain contexts, they suffer from two major limitations. First, explanation is inherently an interactive and iterative process: users often seek to ask follow-up questions, test new hypotheses about model behavior, and progressively refine their understanding. Static and monolithic explanations do not accommodate this natural exploratory workflow. Second, these methods typically operate at the word level, such as quantifying the influence of individual tokens on the model's prediction. However, word-level representations cause unnecessarily long running time for certain algorithms, including most of the commonly-used removal-based methods, and fail to capture the semantic units that humans use for reasoning. Humans generally interpret and explain decisions in terms of higher-level meanings, such as phrases, propositions, or claims, rather than isolated words.
Large Language Models (LLMs) have shown remarkable capabilities in interpreting textual instructions and solving complex tasks. As their adoption grows across a wide range of applications, understanding how and why these models generate specific outputs becomes increasingly critical for ensuring their safety and reliability. The challenge of explainability in this context involves helping users develop accurate mental models of how a machine learning model behaves, enabling them to evaluate whether its behavior aligns with their knowledge and values.
Existing XAI approaches for understanding model behavior include feature attribution methods, counterfactual explanations, and anchor methods. Feature attribution methods use additive models to describe an ML model's local behaviors and quantify the influences of each feature to the prediction, helping answer Why and Why not questions. Removal-based methods, represented by LIME and KernelSHAP, are widely used because they are model-agnostic. Counterfactual explanations are one of the most commonly used example-based explanation methods, defined as counterfactuals with a minimal difference from the original instance, leading to a different prediction. The Anchor method uses rules to find sufficient conditions for model prediction. However, these methods provide partial insights and position users as passive recipients of unified, static explanations, overlooking the inherently interactive and iterative nature of explanation.
Our approach consists of a computational pipeline that constructs interpretable representations of text and performs perturbations to generate meaningful counterfactuals. The pipeline leverages the sentence's dependency syntax to determine whether pairs of words should be grouped together, segmenting the input sentence into interpretable components. This forms a simplified representation of the sentence as a binary vector. Based on this interpretable representation, the algorithm systematically generates all meaningful removal-based counterfactuals by selectively removing or replacing specific segments. By evaluating model predictions on these counterfactuals, the system quantifies the influence of each input component on the model's prediction through KernelSHAP aggregation. The resulting attribution scores are visualized to support user exploration and understanding.
We evaluate LLM Analyzer through a hypothetical use case, a user study, and feedback from XAI and NLP experts. In the user study with eight participants, we explored how users interact with the system when interpreting LLM behaviors and evaluated its usability and usefulness. Participants found the system easy to use and useful for understanding LLM behaviors, particularly when exploring what-if scenarios and addressing How to be questions. Additionally, we conducted interviews with three NLP researchers and three XAI researchers, each with between three and over ten years of experience. Experts from both domains found the system design to be concise and easy to comprehend, presenting a comprehensive overview of numerous functionalities.
Based on feedback from both users and experts, we outline two directions for future research and discuss the limitations of our system. The explanation process is inherently interactive, evolving through a dynamic exchange between the explainer and the explainee. A deeper understanding of this process can inform the development of more effective explanation tools that enhance both explanatory efficiency and user experience. For free-form text generation tasks, such as creative writing or complex reasoning, defining appropriate evaluators becomes much more challenging. We believe that logic-based evaluators offer a promising solution in these cases, as they are more general and better equipped to handle the variability inherent in natural language.
This paper presents LLM Analyzer, an interactive visualization system with an efficient counterfactual generation algorithm designed to support LLM practitioners and users in understanding LLM behaviors. The system facilitates interactive counterfactual generation and analysis, enabling users to actively engage in the exploration of LLM responses by varying target instances and analyzing outcomes at customizable levels of granularity. We conducted experiments demonstrating that our counterfactual generation algorithm is both time-efficient and capable of producing high-quality counterfactuals. Additionally, a user study and expert interviews with professionals in NLP and XAI validate the system's usability and usefulness. Our findings underscore the importance of involving humans as active participants in the explanation process, rather than as passive recipients of explanations.