## ABSTRACT

Today many vision-science presentations employ machine learning and especially “deep learning”, one of the more recent and successful variants. Many neuroscientists use machine learning to decode neural responses. Many perception scientists try to understand how living organisms recognize objects. To them, deep neural networks offer several benchmark accuracies for recognition of learned stimuli. Originally machine learning was inspired by the brain. Today, machine learning is used as a statistical tool to decode brain activity. Tomorrow, deep neural networks might become our best model of brain function. This brief overview of the use of machine learning in biological vision touches on its strengths, weaknesses, milestones, controversies, and current directions. Here, we hope to help vision scientists assess what role machine learning should play in their research.

## GLOSSARY

*Machine learning*- is a computer algorithm that learns how to perform a task directly from examples, without a human providing explicit instructions or rules for how to do so. Correctly labeled examples are provided to the learning algorithm, which is then “trained” (i.e. its parameters are gradually adjusted) to be able to perform the task correctly on its own and generalize to unseen examples.
*Deep learning*- is a newly successful and popular version of machine learning that uses backprop neural networks with multiple hidden layers. The 2012 success of AlexNet, then the best machine learning network for object recognition, was the tipping point. It is now ubiquitous in the internet. The idea is to have each layer of processing perform successively more complex computations on the data to give the full “multi-layer” network more expressive power. The drawback is that it is much harder to train multi-layer networks (Goodfellow et al. 2016). Deep learning ranges from discovering the weights of a multilayer network to parameter learning in hierarchical belief networks. Note that the complexity of deep learning may be unwarranted for simple problems that are well handled by, e.g. SVM. Try shallow networks first, when they fail, go deep.
*Neural nets*- are computing systems inspired by biological neural networks that learn tasks by considering examples.
*Supervised learning*- refers to any algorithm that accepts a set of labeled stimuli — a training set — and returns a classifier that can label stimuli similar to those in the training set.
*Unsupervised learning*- works without labels. It is less popular, but of great interest because labeled data are scarce while unlabeled data are plentiful. Without labels, the algorithm discovers structure and redundancy in the data.
*Cost function.*- A function that assigns a real number representing cost to a candidate solution, i.e. a set of weights. Solving by optimization means minimizing cost.
*Gradient descent:*- An algorithm that minimizes cost by incrementally changing the parameters in the direction of steepest descent of the cost function.
*Convexity:*- A problem is convex if there are no local minima other than the global minimum (or minima if there are several equally good solutions). This guarantees that gradient-descent will converge to a global minimum. There might be more than one global minimum, with equal cost, e.g. in problems with symmetric solutions.
*Generalization*- is how well a classifier performs on new, unseen examples that it did not see during training.
*Cross validation*- assesses the ability of the network to generalize, from the data that it trained on, to new data.
*Backprop,*- short for “backward propagation of errors”, is widely used to apply gradient-descent learning to multi-layer networks. It uses the chain rule from calculus to iteratively compute the gradient of the cost function for each layer.
*Hebbian learning*- and spike-timing-dependent plasticity (
). According to Hebb’s rule, the efficiency of a synapse increases after correlated pre- and post-synaptic activity. In other words, neurons that fire together, wire together (Löwel & Singer, 1992).*STDP* *Support Vector Machine (SVM)*- is a learning machine for classification. SVMs generalize well. An SVM can quickly learn to perform a nonlinear classification using what is called the “kernel trick”, mapping its input into a high-dimensional feature space (Cortes & Vapnik, 1999).
*Convolutional neural networks (ConvNets)*- have their roots in the Neocognitron (Fukushima 1980) and are inspired by the simple and complex cells described by Hubel and Wiesel (1962). ConvNets apply backprop learning to multilayer neural networks based on convolution and pooling (LeCun et al., 1989; LeCun et al., 1990; LeCun et al., 1998).