Abstract
Humans and animals consistently forego, or “discount” future rewards in favor of more proximal, but less valuable, options. This behavior is often thought of in terms of a failure of “self-control”, a lack of inhibition when considering the possibility of immediate gratification. However, rather than overweighting the near-term reward, the same behavior can result from failing to properly consider the far-off reward. The capacity to plan for future gains is a core construct in Reinforcement Learning (RL), known as “model-based” planning. Both discounting and model-based planning have been shown to track everyday behaviors from diet to exercise habits to drug abuse. Here, we show that these two capacities are related via a common mechanism – people who are more likely to deliberate about future reward in an intertemporal choice task, as indicated by the time they spend considering the choice, are also more likely to make multi-step plans for reward in a sequential reinforcement learning task. In contrast, the degree to which people’s intertemporal choices were driven by a more automatic bias did not correspond to their planning tendency, and neither did the more standard measure of discounting behavior. These results suggest that the standard behavioral economic measure of discounting is more fruitfully understood by decomposing it into constituent parts, and that only one of these parts corresponds to the sort of multi-step thinking needed to make plans for the future.