Abstract
Humans can recognize and communicate about many actions performed by others. How are actions organized in the mind, and is this organization shared across vision and language? We collected similarity judgments of human actions depicted through naturalistic videos and sentences, and tested four models of action categorization, defining actions at different levels of abstraction ranging from specific (action verb) to broad (action target: whether an action is directed towards an object, another person, or the self). The similarity judgments reflected a shared semantic organization across videos and sentences, determined mainly by the target of actions, even after accounting for other semantic features. Large language model features predicted the behavioral similarity of action videos and sentences, and captured information about the target of actions alongside unique semantic information. Together, our results show how modality-invariant action concepts are organized in the human mind and in large language model representations.
Competing Interest Statement
The authors have declared no competing interest.