Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks

Rishi Rajalingham, View ORCID ProfileElias B. Issa, Pouya Bashivan, Kohitij Kar, Kailyn Schmidt, View ORCID ProfileJames J. DiCarlo
doi: https://doi.org/10.1101/240614
Rishi Rajalingham
McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elias B. Issa
McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Elias B. Issa
Pouya Bashivan
McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kohitij Kar
McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kailyn Schmidt
McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James J. DiCarlo
McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for James J. DiCarlo
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

ABSTRACT

Primates—including humans—can typically recognize objects in visual images at a glance even in the face of naturally occurring identity-preserving image transformations (e.g. changes in viewpoint). A primary neuroscience goal is to uncover neuron-level mechanistic models that quantitatively explain this behavior by predicting primate performance for each and every image. Here, we applied this stringent behavioral prediction test to the leading mechanistic models of primate vision (specifically, deep, convolutional, artificial neural networks; ANNs) by directly comparing their behavioral signatures against those of humans and rhesus macaque monkeys. Using high-throughput data collection systems for human and monkey psychophysics, we collected over one million behavioral trials for 2400 images over 276 binary object discrimination tasks. Consistent with previous work, we observed that state-of-the-art deep, feed-forward convolutional ANNs trained for visual categorization (termed DCNNIC models) accurately predicted primate patterns of object-level confusion. However, when we examined behavioral performance for individual images within each object discrimination task, we found that all tested DCNNIC models were significantly non-predictive of primate performance, and that this prediction failure was not accounted for by simple image attributes, nor rescued by simple model modifications. These results show that current DCNNIC models cannot account for the image-level behavioral patterns of primates, and that new ANN models are needed to more precisely capture the neural mechanisms underlying primate object vision. To this end, large-scale, high-resolution primate behavioral benchmarks—such as those obtained here—could serve as direct guides for discovering such models.

SIGNIFICANCE STATEMENT Recently, specific feed-forward deep convolutional artificial neural networks (ANNs) models have dramatically advanced our quantitative understanding of the neural mechanisms underlying primate core object recognition. In this work, we tested the limits of those ANNs by systematically comparing the behavioral responses of these models with the behavioral responses of humans and monkeys, at the resolution of individual images. Using these high-resolution metrics, we found that all tested ANN models significantly diverged from primate behavior. Going forward, these high-resolution, large-scale primate behavioral benchmarks could serve as direct guides for discovering better ANN models of the primate visual system.

Footnotes

  • COMPETING FINANCIAL INTERESTS: The authors declare no competing financial interests.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted February 12, 2018.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks
Rishi Rajalingham, Elias B. Issa, Pouya Bashivan, Kohitij Kar, Kailyn Schmidt, James J. DiCarlo
bioRxiv 240614; doi: https://doi.org/10.1101/240614
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks
Rishi Rajalingham, Elias B. Issa, Pouya Bashivan, Kohitij Kar, Kailyn Schmidt, James J. DiCarlo
bioRxiv 240614; doi: https://doi.org/10.1101/240614

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Neuroscience
Subject Areas
All Articles
  • Animal Behavior and Cognition (4657)
  • Biochemistry (10309)
  • Bioengineering (7629)
  • Bioinformatics (26212)
  • Biophysics (13457)
  • Cancer Biology (10635)
  • Cell Biology (15354)
  • Clinical Trials (138)
  • Developmental Biology (8460)
  • Ecology (12764)
  • Epidemiology (2067)
  • Evolutionary Biology (16779)
  • Genetics (11368)
  • Genomics (15414)
  • Immunology (10560)
  • Microbiology (25064)
  • Molecular Biology (10164)
  • Neuroscience (54180)
  • Paleontology (398)
  • Pathology (1657)
  • Pharmacology and Toxicology (2878)
  • Physiology (4319)
  • Plant Biology (9206)
  • Scientific Communication and Education (1582)
  • Synthetic Biology (2543)
  • Systems Biology (6759)
  • Zoology (1453)