## ABSTRACT

Today most vision-science presentations mention machine learning. Many neuroscientists use machine learning to decode neural responses. Many perception scientists try to understand recognition by living organisms. To them, machine learning offers a reference of attainable performance based on learned stimuli. This brief overview of the use of machine learning in biological vision touches on its strengths, weaknesses, milestones, controversies, and current directions.

### GLOSSARY

*Machine learning*- is a computer algorithm that uses data from the environment to improve performance of a task.
*Deep learning*- is the latest version of machine learning, distinguished by having more than three layers. It is ubiquitous in the internet.
*Supervised learning*- refers to any algorithm that accepts a set of labeled stimuli — a training set — and returns a classifier that can label stimuli similar to those in the training set.
*Unsupervised learning*- works without labels. It is less popular, but of great interest because labeled data are scarce while unlabeled data are plentiful. Without labels, the algorithm discovers structure and redundancy in the data.
*Cost function.*- A function that assigns a real number representing cost to a candidate solution. Solving by optimization means minimizing cost.
*Gradient descent:*- An algorithm that minimizes cost by incrementally changing the parameters in the direction of steepest descent of the cost function.
*Convexity:*- A problem is convex if there are no local minima competing with the global minimum. In optimization, a convex cost function guarantees that gradient descent will always find the global minimum.
*Cross validation*- assesses how well a classifier generalizes. Usually the training and test stimuli are chosen to be identically-distributed independent samples.
*Backprop,*- short for “backward propagation of errors”, is widely used to apply gradient-descent learning to multi-layer networks. It uses the chain rule from calculus to iteratively compute the gradient of the cost function for each layer.
*Hebbian learning*- and spike-timing dependent plasticity (
). According to Hebb’s rule, the efficiency of a synapse increases after correlated pre- and post-synaptic activity. In other words, neurons that fire together, wire together (Löwel & Singer, 1992).*STDP* *Support Vector Machine (SVM)*- is a learning machine for classification. SVMs generalize well. An SVM can quickly learn to perform a nonlinear classification using what is called the “kernel trick”, mapping its input into a high-dimensional feature space (Cortes & Vapnik, 1999).
*Convolutional neural networks (ConvNets)*- have their roots in the Neocognitron (Fukushima 1980) and are inspired by the simple and complex cells described by Hubel and Wiesel (1962). ConvNets apply backprop learning to multilayer neural networks based on convolution and pooling (LeCun et al., 1989; LeCun et al., 1990; LeCun et al., 1998).

### GLOSSARY

*Machine learning*- is a computer algorithm that uses data from the environment to improve performance of a task.
*Deep learning*- is the latest version of machine learning, distinguished by having more than three layers. It is ubiquitous in the internet.
*Supervised learning*- refers to any algorithm that accepts a set of labeled stimuli — a training set — and returns a classifier that can label stimuli similar to those in the training set.
*Unsupervised learning*- works without labels. It is less popular, but of great interest because labeled data are scarce while unlabeled data are plentiful. Without labels, the algorithm discovers structure and redundancy in the data.
*Cost function.*- A function that assigns a real number representing cost to a candidate solution. Solving by optimization means minimizing cost.
*Gradient descent:*- An algorithm that minimizes cost by incrementally changing the parameters in the direction of steepest descent of the cost function.
*Convexity:*- A problem is convex if there are no local minima competing with the global minimum. In optimization, a convex cost function guarantees that gradient descent will always find the global minimum.
*Cross validation*- assesses how well a classifier generalizes. Usually the training and test stimuli are chosen to be identically-distributed independent samples.
*Backprop,*- short for “backward propagation of errors”, is widely used to apply gradient-descent learning to multi-layer networks. It uses the chain rule from calculus to iteratively compute the gradient of the cost function for each layer.
*Hebbian learning*- and spike-timing dependent plasticity (
). According to Hebb’s rule, the efficiency of a synapse increases after correlated pre- and post-synaptic activity. In other words, neurons that fire together, wire together (Löwel & Singer, 1992).*STDP* *Support Vector Machine (SVM)*- is a learning machine for classification. SVMs generalize well. An SVM can quickly learn to perform a nonlinear classification using what is called the “kernel trick”, mapping its input into a high-dimensional feature space (Cortes & Vapnik, 1999).
*Convolutional neural networks (ConvNets)*- have their roots in the Neocognitron (Fukushima 1980) and are inspired by the simple and complex cells described by Hubel and Wiesel (1962). ConvNets apply backprop learning to multilayer neural networks based on convolution and pooling (LeCun et al., 1989; LeCun et al., 1990; LeCun et al., 1998).

Copyright

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.