Raphaël Millière -11 janvier 2024
Titre: Mechanistic Explanation in Deep Learning
RÉSUMÉ:
Deep neural networks such as large language models (LLMs) have achieved impressive performance across almost every domain of natural language processing, but there remains substantial debate about which cognitive capabilities can be ascribed to these models. Drawing inspiration from mechanistic explanations in life sciences, the nascent field of "mechanistic interpretability" seeks to reverse-engineer human-interpretable features to explain how LLMs process information. This raises some questions: (1) Are causal claims about neural network components, based on coarse intervention methods (such as “activation patching”), genuine mechanistic explanations? (2) Does the focus on human-interpretable features risk imposing anthropomorphic assumptions? My answer will be "yes" to (1) and "no" to (2), closing with a discussion of some ongoing challenges.
BIOGRAPHIE:
Raphael Millière is Lecturer in Philosophy of Artificial Intelligence at Macquarie University in Sydney, Australia. His interests are in the philosophy of artificial intelligence, cognitive science, and mind, particularly in understanding artificial neural networks based on deep learning architectures such as Large Language Models. He has investigated syntactic knowledge, semantic competence, compositionality, variable binding, and grounding.
Elhage, N., et al. (2021). A mathematical framework for transformer circuits. Transformer Circuits Thread. Machamer, P., Darden, L., & Craver, C. F. (2000).
Thinking about Mechanisms. Philosophy of Science, 67(1), 1–25. Millière, R. (2023).
The Alignment Problem in Context. arXiv preprint arXiv:2311.02147. Mollo, D. C., & Millière, R. (2023).
The vector grounding problem. arXiv preprint arXiv:2304.01481. Yousefi, S., et al. (2023).
In-Context Learning in Large Language Models: A Neuroscience-inspired Analysis of Representations. arXiv preprint arXiv:2310.00313.