RT Journal Article
SR Electronic
T1 Vocal markers of autism: assessing the generalizability of machine learning models
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 2021.11.22.469538
DO 10.1101/2021.11.22.469538
A1 Astrid Rybner
A1 Emil Trenckner Jessen
A1 Marie Damsgaard Mortensen
A1 Stine Nyhus Larsen
A1 Ruth Grossman
A1 Niels Bilenberg
A1 Cathriona Cantio
A1 Jens Richardt Møllegaard Jepsen
A1 Ethan Weed
A1 Arndis Simonsen
A1 Riccardo Fusaroli
YR 2022
UL http://biorxiv.org/content/early/2022/02/17/2021.11.22.469538.abstract
AB Machine learning (ML) approaches show increasing promise in their ability to identify vocal markers of autism. Nonetheless, it is unclear to what extent such markers generalize to new speech samples collected e.g., using a different speech task or in a different language. In this paper, we systematically assess the generalizability of ML findings across a variety of contexts.We train promising published ML models of vocal markers of autism on novel cross-linguistic datasets following a rigorous pipeline to minimize overfitting, including cross-validated training and ensemble models. We test the generalizability of the models by testing them on i) different participants from the same study, performing the same task; ii) the same participants, performing a different (but similar) task; iii) a different study with participants speaking a different language, performing the same type of task.While model performance is similar to previously published findings when trained and tested on data from the same study (out-of-sample performance), there is considerable variance between studies. Crucially, the models do not generalize well to different, though similar, tasks and not at all to new languages. The ML pipeline is openly shared.Generalizability of ML models of vocal markers of autism is an issue. We outline three recommendations for strategies researchers could take to be more explicit about generalizability and improve it in future studies.LAY SUMMARY Machine learning approaches promise to be able to identify autism from voice only. These models underestimate how diverse the contexts in which we speak are, how diverse the languages used are and how diverse autistic voices are. Machine learning approaches need to be more careful in defining their limits and generalizability.Competing Interest StatementRiccardo Fusaroli has been a consultant for F. Hoffmann-La Roche on related but not overlapping topics. The other authors have no real or potential conflicts of interest that could have influenced the research.