RT Journal Article SR Electronic T1 Deep learning: Using machine learning to study biological vision JF bioRxiv FD Cold Spring Harbor Laboratory SP 178152 DO 10.1101/178152 A1 Najib J. Majaj A1 Denis G. Pelli YR 2017 UL http://biorxiv.org/content/early/2017/08/18/178152.abstract AB Today most vision-science presentations mention machine learning. Many neuroscientists use machine learning to decode neural responses. Many perception scientists try to understand recognition by living organisms. To them, machine learning offers a reference of attainable performance based on learned stimuli. This brief overview of the use of machine learning in biological vision touches on its strengths, weaknesses, milestones, controversies, and current directions.GLOSSARYMachine learningis a computer algorithm that uses data from the environment to improve performance of a task.Deep learningis the latest version of machine learning, distinguished by having more than three layers. It is ubiquitous in the internet.Supervised learningrefers to any algorithm that accepts a set of labeled stimuli — a training set — and returns a classifier that can label stimuli similar to those in the training set.Unsupervised learningworks without labels. It is less popular, but of great interest because labeled data are scarce while unlabeled data are plentiful. Without labels, the algorithm discovers structure and redundancy in the data.Cost function.A function that assigns a real number representing cost to a candidate solution. Solving by optimization means minimizing cost.Gradient descent:An algorithm that minimizes cost by incrementally changing the parameters in the direction of steepest descent of the cost function.Convexity:A problem is convex if there are no local minima competing with the global minimum. In optimization, a convex cost function guarantees that gradient descent will always find the global minimum.Cross validationassesses how well a classifier generalizes. Usually the training and test stimuli are chosen to be identically-distributed independent samples.Backprop,short for “backward propagation of errors”, is widely used to apply gradient-descent learning to multi-layer networks. It uses the chain rule from calculus to iteratively compute the gradient of the cost function for each layer.Hebbian learningand spike-timing dependent plasticity (STDP). According to Hebb’s rule, the efficiency of a synapse increases after correlated pre- and post-synaptic activity. In other words, neurons that fire together, wire together (Löwel & Singer, 1992).Support Vector Machine (SVM)is a learning machine for classification. SVMs generalize well. An SVM can quickly learn to perform a nonlinear classification using what is called the “kernel trick”, mapping its input into a high-dimensional feature space (Cortes & Vapnik, 1999).Convolutional neural networks (ConvNets)have their roots in the Neocognitron (Fukushima 1980) and are inspired by the simple and complex cells described by Hubel and Wiesel (1962). ConvNets apply backprop learning to multilayer neural networks based on convolution and pooling (LeCun et al., 1989; LeCun et al., 1990; LeCun et al., 1998).GLOSSARYMachine learningis a computer algorithm that uses data from the environment to improve performance of a task.Deep learningis the latest version of machine learning, distinguished by having more than three layers. It is ubiquitous in the internet.Supervised learningrefers to any algorithm that accepts a set of labeled stimuli — a training set — and returns a classifier that can label stimuli similar to those in the training set.Unsupervised learningworks without labels. It is less popular, but of great interest because labeled data are scarce while unlabeled data are plentiful. Without labels, the algorithm discovers structure and redundancy in the data.Cost function.A function that assigns a real number representing cost to a candidate solution. Solving by optimization means minimizing cost.Gradient descent:An algorithm that minimizes cost by incrementally changing the parameters in the direction of steepest descent of the cost function.Convexity:A problem is convex if there are no local minima competing with the global minimum. In optimization, a convex cost function guarantees that gradient descent will always find the global minimum.Cross validationassesses how well a classifier generalizes. Usually the training and test stimuli are chosen to be identically-distributed independent samples.Backprop,short for “backward propagation of errors”, is widely used to apply gradient-descent learning to multi-layer networks. It uses the chain rule from calculus to iteratively compute the gradient of the cost function for each layer.Hebbian learningand spike-timing dependent plasticity (STDP). According to Hebb’s rule, the efficiency of a synapse increases after correlated pre- and post-synaptic activity. In other words, neurons that fire together, wire together (Löwel & Singer, 1992).Support Vector Machine (SVM)is a learning machine for classification. SVMs generalize well. An SVM can quickly learn to perform a nonlinear classification using what is called the “kernel trick”, mapping its input into a high-dimensional feature space (Cortes & Vapnik, 1999).Convolutional neural networks (ConvNets)have their roots in the Neocognitron (Fukushima 1980) and are inspired by the simple and complex cells described by Hubel and Wiesel (1962). ConvNets apply backprop learning to multilayer neural networks based on convolution and pooling (LeCun et al., 1989; LeCun et al., 1990; LeCun et al., 1998).