Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations

Published in In revew at TOSEM, 2024

The paper introduces ASTrust, an innovative interpretability method for large language models (LLMs) focused on coding tasks, highlighting the crucial link between interpretability and trustworthiness. Current interpretation techniques often emphasize accuracy or performance metrics rather than providing detailed, fine-grained explanations needed for understanding model predictions. ASTrust addresses this gap by generating explanations grounded in the relationship between model confidence and the syntactic structures of programming languages, specifically using Abstract Syntax Trees (ASTs). It assigns confidence scores to well-known syntactic structures, facilitating comprehension at both local and global levels. The method includes automated visualizations that enhance user understanding of model predictions. Evaluations demonstrate a causal link between learning errors and the model’s ability to predict syntax categories, with users finding ASTrust’s visualizations beneficial for assessing the trustworthiness of model outputs.

Download paper here

Recommended citation: @misc{palacio2024trustworthyinterpretablellmscode, title={Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations}, author={David N. Palacio and Daniel Rodriguez-Cardenas and Alejandro Velasco and Dipin Khati and Kevin Moran and Denys Poshyvanyk}, year={2024}, eprint={2407.08983}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/2407.08983}, }