January 2026

Distilling Knowledge from Large Language Models

A concept bottleneck model for hate and counter-speech recognition (Information Processing & Management, 2026)

  • PYTHON
Visit live project
Distilling Knowledge from Large Language Models

Journal: Information Processing & Management, Vol. 63, Issue 2 (Part A), Article 104309 (2026)

Authors: R. Labadie-Tamayo, D. Slijepčević, X. Chen, A. J. Böck, A. Babic, L. Freimann, C. Atzmüller, M. Zeppelzauer


This journal paper explores how knowledge can be distilled from large language models into a more interpretable concept bottleneck model for recognizing hate speech and counter-speech. The approach pairs the predictive strength of LLMs with a transparent, concept-based layer, so classifications can be traced back to human-understandable factors rather than treated as a black box — an important property for sensitive moderation tasks.


The work ties directly into my research on counter-speech and digital humanism, contributing to methods that make automated hate-speech detection both effective and explainable.


Read the full paper: https://doi.org/10.1016/j.ipm.2025.104309