Anders SOGAARD – 23 novembre 2023
Titre : LLMs: Indication or Representation?
People talk to LLMs - their new assistants, tutors, or partners - about the world they live in, but are LLMs parroting, or do they (also) have internal representations of the world? There are five popular views, it seems:
(i) LLMs are all syntax, no semantics.
(ii) LLMs have inferential semantics, no referential semantics.
(iii) LLMs (also) have referential semantics through picturing
(iv) LLMs (also) have referential semantics through causal chains.
(v) Only chatbots have referential semantics (through causal chains)
I present three sets of experiments to suggest LLMs induce inferential and referential semantics and do so by inducing human-like representations, lending some support to view (iii). I briefly compare the representations that seem to fall out of these experiments to the representations to which others have appealed in the past.
Anders SOGAARD is University Professor of Computer Science and Philosophy and leads the newly established Center for Philosophy of Artificial Intelligence at the University of Copenhagen. Known primarily for work on multilingual NLP, multi-task learning, and using cognitive and behavioral data to bias NLP models, Søgaard is an ERC Starting Grant and Google Focused Research Award recipient and the author of Semi-Supervised Learning and Domain Adaptation for NLP (2013), Cross-Lingual Word Embeddings (2019), and Explainable Natural Language Processing (2021).
Søgaard, A. (2023). Grounding the Vector Space of an Octopus. Minds and Machines 33, 33-54.
Li, J.; et al. (2023) Large Language Models Converge on Brain-Like Representations. arXiv preprint arXiv:2306.01930
Abdou, M.; et al. (2021) Can Language Models Encode Perceptual Structure Without Grounding? CoNLL
Garneau, N.; et al. (2021) Analogy Training Multilingual Encoders. AAAI