ABSTRACT
Foraging for food, developing new medicines, and learning complex games are search problems with vast numbers of possible actions. Yet, under real-world time or resource constraints, optimal solutions are generally unobtainable. How do humans generalize and learn which actions to take when not all options can be explored? We present two behavioral experiments in which the spatial correlation of rewards provides traction for generalization, yet a limited search horizon allows for exploration of only a small fraction of all available options. We competitively test 27 different probabilistic and heuristic models for making out-of-sample predictions of individual’s search decisions. Our results show that a Gaussian Process function learning model, combined with an optimistic Upper Confidence Bound sampling strategy, robustly captures how humans use generalization to guide search behavior. Taken together, these two form a model of exploration and generalization that leads to reproducible and psychologically meaningful parameter estimates, providing novel insights into the nature of human search in vast spaces. We find a systematic—yet sometimes beneficial— tendency towards undergeneralization, as well as strong evidence for the separate phenomena of directed and undirected exploration. Our modeling results and parameter estimates are recoverable, and can be used to simulate human-like performance, bridging a critical gap between human and machine learning.