Zum Inhalt

DoDaS Guest: Bilal Zafar

Bitte Bildnachweis einfügen
06.06.2024, 10-11.45 Uhr (c.t.), Joseph-von-Fraunhofer-Straße 25, Raum 303 Prof. Dr. Bilal Zafar Ruhr Universität Bochum / RC Trust

Bilal Zafar is a joint guest of DoDaS, RC Trust & Lamarr-Institute

Title: On early detection of hallucinations in factual question answering

Abstract: While generative large language models (LLMs) show impressive abilities to create human like text, hallucinations remain a major impediment to their widespread adoption. In this work, we explore if the artifacts associated with the model can provide hints that a response will contain hallucinations. Specifically, we probe LLMs at 1) the inputs via integrated gradients based token attribution, 2) the outputs via the softmax probabilities, and 3) the internal state via the hidden layer and attention activations for signs of hallucinations on open ended question answering tasks. Our results show differences between hallucinations vs. non-hallucinations at all three levels, even when the first generated token is a formatting character, such as a new line. Specifically, we observer changes in entropy in input token attribution and output softmax probability for hallucinated tokens, revealing an “uncertain” behavior during model inference. This uncertain behavior also manifests itself in auxiliary classifier models trained on outputs and internal activations, which we use to create a hallucination detector. We further show that tokens preceding the hallucination can predict subsequent hallucinations before they occur.


Vita: Bilal is a professor of Computer Science at Ruhr University Bochum https://informatik.rub.de/en/ and the Research Center for Trustworthy Data Science and Security http://rc-trust.ai/ . Before joining RUB, he was a Senior Scientist at Amazon Web Services where he was building products https://aws.amazon.com/sagemaker/clarify/ to support trustworthy use of AI/ML. His research interests are in the area of human-centric Artificial Intelligence (AI) and Machine Learning (ML). His work aims to address challenges that arise when AI/ML models interact with human users. For instance, he develops algorithms for making AI/ML models more fair, explainable and robust. His work has received an Otto Hahn Medal from the Max Planck Society in 2021, a nomination for CNIL-INRIA Privacy Award'18, a Best Paper Honorable Mention Award at WWW'17, a Notable Paper Award at NeurIPS'16 Symposium on ML and the Law, and a Best Paper Award at COSN'15.