Structural properties of individual instances predict human effort and performance on an NP-Hard problem

Life presents us with decisions of varying degrees of difficulty. Many of them are NP-hard, that is, they are computationally intractable. Two important questions arise: which properties of decisions drive extreme computational hardness and what are the effects of these properties on human-decision making? Here, we postulate that we can study the effects of computational complexity on human decision-making by studying the mathematical properties of individual instances of NP-hard problems. We draw on prior work in computational complexity theory, which suggests that computational difficulty can be characterized based on the features of instances of a problem. This study is the first to apply this approach to human decision-making. We measured hardness, first, based on typical-case complexity (TCC), a measure of average complexity of a random ensemble of instances, and, second, based on instance complexity (IC), a measure that captures the hardness of a single instance of a problem, regardless of the ensemble it came from. We tested the relation between these measures and (i) decision quality as well as (ii) time expended in a decision, using two variants of the 0-1 knapsack problem, a canonical and ubiquitous computational problem. We show that participants expended more time on instances with higher complexity but that decision quality was lower in those instances. These results suggest that computational complexity is an inherent property of the instances of a problem, which affect human and other kinds of computers. 1 2 3 4 5 6 7 8 9 10 11 12

presented; each item has a weight w and a value v. In the decision variant, the task is to decide 26 whether there exists a subset A of items from the set I for which (1) the sum of weights ( i∈A w i ) is 27 lower or equal to a given capacity c and (2) the sum of values ( i∈A v i ) is at least as high as a given 28 target profit p. In the related optimization variant, the aim is to select the items that maximize the 29 the sum of values ( i∈A v i ) without exceeding the knapsack's capacity ( i∈A w i ≤ c). Both variants 30 are NP-hard (15). In our study, participants were asked to solve a number of random instances of 31 both the decision variant and the optimization variant of the problem. 32 The knapsack problem is a canonical computational problem whose mathematical structure 33 resembles that of many theories of decision-making such as utility maximization (16) and satisficing 34 (17). However, its relevance extends beyond decision theory. The problem manifests itself in 35 everyday life tasks involving choice of stimuli to attend to, budgeting and time management, 36 portfolio optimization, intellectual discovery as well as in industrial applications such as the cargo 37 business (15, 18). The problem has also been associated with symptoms of particular mental 38 disorders, in particular, attention-deficit/hyperactivity disorder (19,20). 39 We consider two metrics that generically capture instance hardness for digital computers tasked 40 with finding the solution. First, we exploit findings on typical-case complexity (TCC), a popular 41 approach for studying intrinsic hardness of random ensembles of instances of NP-hard problems 42 (11, 12, 21). It has been shown that there exists a topology with which one can predict the average 43 number of computations needed to find the solution of an instance (12, 22-27) (Fig 1a). We 44 conjectured that TCC would predict performance and time-on-task for humans. We tested this in 45 both the decision and optimization variants of the knapsack problem. 46 In a second approach, we construct a metric of complexity of instances of the decision variant of the 47 knapsack problem that is specific to a single instance. The metric, referred to as instance complexity

54
We studied how a set of mathematical properties of ensembles of instances (TCC) as well as of 55 individual instances (IC) affect human decision quality and time-on-task in the knapsack problem. 56 Participants first solved 72 instances of the decision variant of the knapsack problem (Fig 2a), 57 followed by 18 instances of the optimization problem (Fig 2b). 58 Knapsack decision task. 59 Summary statistics All instances in the experiment had n = 6 items. The number of items was 60 selected, based on pilot data, to ensure that the task was neither too difficult nor too easy. Mean 61 human performance, measured as the percentage of trials in which a correct response was made, was 62 83.1% (min = 0.56, max = 0.9, SD = 0.08). 63 PB, JPF, CM and NY designed the study; JPF and NY performed the instance selection; JPF performed data collection and analysis; PB, JPF, CM and NY wrote the manuscript.
No competing interests declared. 1 To whom correspondence should be addressed. E-mail: carstenm@unimelb.edu.au The green circle at the center of the screen indicated the time remaining in this stage of the trial. This stage lasted 3 seconds. Then, both capacity constraint and target profit were shown at the center of the screen. Participants had to decide whether there exists a subset of items for which (1) the sum of weights is lower or equal to the capacity constraint and (2) the sum of values yields at least the target profit. This stage lasted 22 seconds. Finally, participants had 2 seconds to make either a 'YES' or 'NO' response using the keyboard. A fixation cross was shown during the inter-trial interval (5 seconds). (b) Knapsack optimization task. Participants were presented with a set of items of different values and weights together with a capacity constraint shown at the center of the screen. The green circle at the center of the screen indicated the time remaining in this stage of the trial. Participants had to find the subset of items with the highest total value subject to the capacity constraint. This stage lasted 60 seconds. Participants selected items by clicking on them and had the option of submitting their solution before the time limit was reached. After the time limit was reached or they submitted their solution, a fixation cross was shown for 10 seconds before the next trial started.
On average, participants chose the 'YES' option in 48.1% of trials (min = 0.32, max = 0.60, 64 SD = 0.06). Performance did not vary during the course of the task (P = 0.196, main effect of trial 65 number on performance, generalized logistic mixed model (GLMM); S1 that the correct answer to the instance is 'yes'). In the phase transition, this probability changes 72 precipitously from one to zero. The probability that the instance is satisfiable can be expressed in 73 terms of a small set of instance parametersᾱ = {α p , α c }: where w i are the weights of the items, v i are values of the items, n is the number of items, c is the 76 weight capacity and p is the target profit.

77
Therefore, there exists a mapping from instance properties to computational complexity of the 78 instance. The phase transition separates instances of the problem into two regions: an under-79 constrained region where the constraints are lenient, and thus many solutions are likely to exist, and 80 an over-constrained region where the constraints are stringent, and thus the existence of a solution is 81 unlikely (that is, an instance is not satisfiable). Computing the solution of instances in the proximity 82 of the satisfiability threshold requires on average more computational resources than for instances 83 further away from it (Fig 1a), similar to what has been observed in relation to a number of other 84 NP-hard problems (12, 22, 28-31).

85
The instances in this study were chosen such that they were located at different distances from 86 the satisfiability threshold (Fig 1a). To examine this result in more detail, we hypothesized that performance is affected by the 98 tightness of the profit and capacity constraints. We tested whether performance on instances in the 99 under-constrained region (α p ≈ 0.4) was different to performance on instances in the over-constrained 100 region (α p ≈ 0.9). We found no significant difference in performance between the two regions with 101 low TCC (P = 0.355, main effect of region, GLMM; S1 Table Model 7 ; Fig 1c), but confirmed a 102 significant difference in performance between instances with high TCC and each of the other two 103 regions (P < 0.001, difference in performance between regions, GLMM; S1 Table Model 6). 104 We also hypothesized that the effect of TCC on performance is affected by the satisfiability of 105 an instance, that is, whether the answer to the decision problem is 'yes' or 'no'. This hypothesis is 106 based on an asymmetry of NP problems. Proving that an instance is satisfiable requires finding one 107 subset of items that satisfy the constraints. Such a set may be identified without exploring the full 108 search space and, additionally, there may be more than one such subset. In contrast, to conclude 109 that an instance is unsatisfiable requires proving that no such set exists. This might require a full 110 search over every possible subset of items in order to determine that none of the subsets satisfies the 111 constraints. We investigated the effect of satisfiability on performance and found that the effect 112 of TCC was still significant when controlling for satisfiability (P < 0.001, main effect of TCC on 113 performance, GLMM; S1  We further explored the link between TCC, which is related to the satisfiability probability, 127 and the number of witnesses. We studied the theoretical connection and found that the same 128 parameters (ᾱ = {α p , α c }) that characterize TCC also describe the expected number of witnesses 129 (see S4 Appendix). We then tested this link empirically and found, as expected, that satisfiable 130 instances with high TCC tend to have a lower number of witnesses than satisfiable instances with 131 a low TCC (P < 0.001, unpaired t-test; Fig 3a). Taken together, these findings corroborate that 132ᾱ = {α p , α c } capture the constrainedness of an instance (see S4 Appendix).

133
Instance complexity and human performance Next, we propose a measure of complexity that is 134 based on the properties of a single instance. We refer to this measure as instance complexity (IC). IC 135 is more comprehensive than the number of witnesses (it can quantify complexity for both satisfiable 136 and unsatisfiable instances), and is easier to compute. It is defined based on the distance between the 137 level of the profit constraint (target profit) and the maximum value attainable in the corresponding 138 instance in the optimization variant of the 0-1 knapsack problem, that is, an instance with the same 139 set of items and the same capacity constraint (Fig 3a). Specifically, where p is the target profit of the decision instance and p is the maximum value achievable of  . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted July 21, 2020. ; https://doi.org/10.1101/405449 doi: bioRxiv preprint

152
We explored the relation between IC and average human performance on individual instances 153 in the knapsack decision task. We found a positive non-linear relation between this measure and 154 average accuracy (R 2 = 0.542; best AIC among competing models; S7 Table; Fig 3b). We also 155 compared the model fit with respect to a model with only TCC as explanatory variable. As would 156 be expected, IC models performs better than the TCC model (R 2 = 0.21; AIC T CC is highest among 157 competing models; S7 Table). Finally, the effect of IC on performance was further corroborated 158 using a mixed effects model (P < 0.001, main effect of IC 0.01 on performance, GLMM; S1 Table   159 Model 9). or strategy for solving the instance. We now turn to a more fine-grained analysis of computational 163 resource requirements. To do this, we tested whether human performance was related to the number 164 of computational operations of particular (canonical) algorithms. 165 We considered two widely-known, generic solvers for constraint satisfaction problems, Gecode (32) 166 and Minisat + (33, 34). For each of these solvers, we chose a metric that indicates the difficulty 167 for the algorithm of finding a solution and whose value is highly correlated with compute time. . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted July 21, 2020. ; https://doi.org/10.1101/405449 doi: bioRxiv preprint Both metrics quantify the extent of the search effort the respective solver had to undertake to 169 find the solution, which is related to the number of computational steps performed and thus to 170 compute time (see S2 Appendix). We found that human performance in the instances was negatively 171 related to the computational steps used by the Gecode algorithm (P < 0.001, main effect of number 172 of propagations, GLMM; S1 Table Model 4). On the other hand, the relation between human 173 performance and the number of computational steps the Minisat + algorithm required was not 174 significant (P = 0.395, main effect of number of decisions, GLMM; S1 Table Model 5).

Knapsack optimization task.
176 Summary statistics We first analyzed participants' ability to find the optimal solution of an instance. 177 We define computational performance as a dichotomous variable that is equal to 1 if the participant achievable (see S1 Appendix). We hypothesized that computational performance in instances with  Table Model 2). 195 So far, we have defined computational performance as a dichotomous variable. We now look at a 196 finer-grained measure. To this end, we define item performance as the minimum number of item 197 replacements needed to reach the optimal solution. These include both the removal of items that are 198 not in the optimal solution and the addition of items that are in the optimal solution (but not part 199 of the candidate solution). The higher the value of this measure, the further away the submitted 200 solution is from the optimum in item space. We found that item performance was worse, on average,  Relation between performance in the knapsack decision task and the knapsack optimization task We 209 hypothesized that participants' performance in the two tasks is related and that participants who . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted July 21, 2020. ; https://doi.org/10.1101/405449 doi: bioRxiv preprint  We hypothesized that participants would expend more time on more difficult instances. As  . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted July 21, 2020.  In order to further examine the relation between optimization instances, time expended and 237 complexity, we examined the amount of time participants spent after each click at each selection of 238 items before performing the next click. After each click participants were faced with the question:

239
"Is there another set of items with a higher profit that still satisfies the weight capacity constraint?". 240 Previous results suggest that TCC of the decision problem faced after each click has an effect on 241 the time spent at each selection of items (28). We explored this link and found that the effect was 242 driven by the constrainedness of the instance (Fig 4c). We found that the time spent after each  Table Model 5). 257 We also examined whether these complexity measures were related to the time spent on each 258 of the instances. We found that instances with a higher number of Gecode propagations were  We also analyzed the relation between computational performance and Sahni-k, another algorithm-264 specific measure of complexity. Sahni-k is proportional to both the number of computations and the 265 amount of memory required to solve an instance of the knapsack optimization task. In general, a 266 higher Sahni-k is related to higher computational resource requirements. This metric has previously 267 been shown to be associated with performance in the knapsack optimization task (4, 18). In line 268 with these studies, we found a negative relation between Sahni-k and computational performance  . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted July 21, 2020. ; https://doi.org/10.1101/405449 doi: bioRxiv preprint episodic memory, strategy use as well as processing and psycho-motor speed. Correlations between 277 performance in these tasks and the knapsack tasks were all non-significant (see Materials and 278 Methodsand S6 Table for details).  We tested the effect of both measures of complexity on human problem-solving. We found that 287 performance was lower in random instances with high TCC in both variants of the knapsack problem 288 compared to instances with low TCC. Moreover, time expended was positively correlated with TCC. 289 We synthesized the features that characterize TCC in a new measure of instance complexity (IC) and   Our study both supports these proposals but also complements them. It provides empirical . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted July 21, 2020. ; https://doi.org/10.1101/405449 doi: bioRxiv preprint to people. Additionally, our study demonstrates that the study of hardness of individual instances 321 and of random ensembles provides insights that can be used to predict both human performance and 322 time-on-task. This approach could be used to refine frameworks like fixed parameter tractability 323 (1, 37). Specifically, it could be used to explore how the hardness of particular instances of a problem 324 are related to worst-case asymptotic complexity. 325 Moreover, our results have a significant implication in the approach used to study human decision-326 making. Previous findings suggest that humans deploy a vast range of strategies (4, 41, 42), 327 thus limiting the predictive power of current models of decision-making. Here, we postulate that 328 performance and time-on-task are driven by properties of the instance at hand rather than being an 329 idiosyncratic feature of the solver (the human), and suggesting that predictive power of decision 330 theoretic models can be improved by including instance properties. That is, the goal so far has 331 been to identify the procedures (algorithms, heuristics) that humans deploy in search of a solution 332 (41, 43), while instead performance and time-on-task may be more readily predicted from properties 333 of an instance.

334
Difficulty of cognitive tasks and computational complexity It has previously been suggested that 335 computational complexity could be used to measure difficulty of cognitive tasks. We would argue 336 that this is indeed the case but that complexity classes such as P or NP are too broad for this 337 purpose given that they are based on asymptotic worst-case approaches. We propose that TCC and 338 IC are better suited to study difficulty of tasks for humans.

339
TCC is a more fine-grained measure that studies complexity of "typical" instances of a problem.

340
Moreover, it captures complexity in a way that is independent of a particular algorithm or model 341 of computation (11, 12, 30) and it has been proven to be applicable to a large range of problems, 342 including the graph coloring problem (12, 22), the traveling salesperson problem (29) and the K-SAT 343 problems (Boolean satisfiability problems) (12, 22, 30, 31). Our findings show that TCC has an 344 effect on performance, as well as on time-on-task. 345 We also investigated IC as an alternative metric to capture difficulty of an instance and we find a 346 close relation between this measure and human performance. IC complements TCC by providing a 347 measure of difficulty at an individual instance level. While TCC maps the average constrainedness of 348 an ensemble of random instances to its average complexity, IC maps the constrainedness of a single 349 sampled instance of that random ensemble (after the instance has been generated) to its complexity.

350
In other words, IC measures difficulty of a single instance and it does not depend on a random 351 instance generation process. It is worth noting, however, that TCC has a key advantage over IC. In 352 order to compute IC, the corresponding optimization problem needs to be solved, whereas TCC is a 353 measure that can be estimated entirely from mathematical properties of the problem. This makes 354 TCC not only less computationally intensive, but perhaps also a better candidate for playing a role 355 in human meta-decisions such as strategy selection (44, 45).

356
This combination of findings provides support for the conceptual premise that computational 357 complexity constitutes a core aspect of task difficulty. Here, we looked at time complexity. We  . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted July 21, 2020. ; https://doi.org/10.1101/405449 doi: bioRxiv preprint another cognitive ability that is not captured by any of the tests we administered. Another possible 367 explanation is that we did not measure the active cognitive constraints that drove differences in 368 individual performance. It is, of course, also possible that our study did not have sufficient statistical 369 power to detect individual differences. Further research is needed in order to incorporate the full 370 spectrum of cognitive costs and resource limitations and link them to performance and time-on-task 371 in decision tasks.

372
Meta-decision making and adaptation of strategies In order for a theory of decision-making to be 373 plausible from a computational perspective, the computational requirements of a decision task need 374 to be within the cognitive resources available to a decision-maker. Indeed, it has been suggested 375 that the principle of rationality should not (only) be applied at the level of behavior (Marr's 376 computational level) but (also) at the level of computation (Marr's algorithmic level), an approach 377 known as resource rationality (48). In this framework, limited computational resources are allocated 378 to tasks in a way that is optimal relative to specified objective function (48). In order to understand 379 how limited computational resources are allocated, it is necessary to study both the cognitive 380 capacities of decision-makers as well as the cognitive requirements of a task. Our study provides a 381 framework for studying the interaction between the two; that is, the computational limitations in a 382 task. Specifically, they can be measured using insights from computational complexity theory to 383 predict performance and compute time.

384
A relevant dimension of expected costs of performing a task is, arguably, the computational resource 385 requirements of performing the task. Empirically, evidence from the current study suggests that 386 agents expend more time on instances with higher TCC. In particular, we found that that the time 387 participants spent in the task was modulated by TCC. Theoretically, TCC is particularly suitable as 388 an approximation of the expected computational requirements because of its characteristics. Firstly, 389 it is an ex-ante measure, that is, it is based on a set of features of the task, which could potentially 390 be identified and used by the agent before solving the task. Secondly, the set of features related 391 to TCC are intrinsic to the task, that is, they are not specific to the particular algorithm used to 392 solve a problem. Thirdly, TCC has been shown to be generalizable to a large set of computational 393 problems (12, 28-31). Further research could usefully explore whether humans compute TCC in 394 order to estimate the expected costs of performing a task.

395
Our approach provides a framework that lends itself to studying why particular heuristics or 396 algorithms are successful on some instances but not on others. Moreover, it could explain why 397 participants' use of heuristics changes with instance properties (49, 50). It is worth noting, however,  However, there are many other problems that are relevant for humans. The framework of TCC has 404 been shown to generalize to other NP-hard problems (12, 28-31). It is an open question whether 405 the applicability of TCC and IC to human problem-solving extends to these other problems as well.

406
Future work should address this question.

407
Furthermore, in our study, the optimization task involved finding the optimal solution. However, 408 finding the exact solution might not always be required in the real-world. In many cases, finding an 409 approximate solution might suffice. Future research should investigate whether the results found in 410 this study, for humans and other types of computers, can be extended to approximation instances.

411
Nevertheless, it is worth noting that for some hard problems, approximating their solution can be 412 computationally as hard as computing the solution itself (3, 53).

413
Our results for TCC are based on a particular sampling distribution. Specifically, we used the 414 uniform distribution to sample the knapsack instances. This approach has been used to understand 415 hardness and to study "typical" instances of a problem, but these instances might not necessarily

426
Another way that complexity could be related to behavior is through its effect on uncertainty. In (56). We leave it to future work to explore the effects of attitudes towards (or preferences over) 431 complexity in decision-making, as well as the relation between complexity, uncertainty and behavior.

432
This study provides evidence that computational complexity is an inherent property of a compu-433 tational problem and, in particular, of an instance (57). This supports the thesis that computational 434 complexity affects problem solving across computing systems, such as von Neumann architectures 435 and the brain, thus supporting frameworks that use computational complexity to study cognition, 436 such as the FPT Cognition Thesis (37).

437
In a broader context, the present study might help to identify the limits of human cognition and 438 decision-making; thus, providing a building block for the development of a theory of complexity of 439 human computation (40). This is crucial for the design of policies that wish to improve the quality 440 of decisions people make and the outcomes they achieve in areas such as financial investments or 441 the selection of health insurance contracts, among many others. In those cases where the task is too 442 demanding, mechanisms could be designed to help people improve the quality of their decisions. This buttons on the screen, in addition to the timer circle, and made a response using the keyboard (Fig 2a). A 469 fixation cross was then shown (5 seconds) before the start of the next trial.  We excluded a total of 13 trials (from 8 participants) in which no response was made.

474
All instances in the experiment had 6 items. Instances varied in their computational complexity. It 475 has been shown that computational complexity of instances in the 0-1 knapsack decision problem can be 476 characterized in terms of a set of instance properties (28). In particular, TCC can be characterized in terms 477 of the normalized capacity constraint α c (capacity constraint normalized by sum of all items weights) and 478 the normalized target profit α p (target profit normalized by sum of all item values) (Fig 1a). We made use 479 of this property to select instances for the task (see S3 Appendix; Fig 1b).  Analogously, the other half was selected to have low computational requirements (bottom 50%).

492
Knapsack optimization task. In this task, participants were asked to solve a number of instances of the 493 (0-1) knapsack optimization problem. In each trial, they were shown a set of items with different weights 494 and values as well as a capacity constraint. Participants had to find the subset of items that maximizes 495 total value subject to the capacity constraint. This means that while in the knapsack decision problem, 496 participants only needed to determine whether a solution existed, in the knapsack optimization problem, 497 they also needed to determine the nature of the solutions (i.e. the items in the optimal knapsack).

498
The task had two stages. In the first stage (60 seconds), the items were presented together with the 499 capacity constraint and the timing indicator. Items were presented like in the knapsack decision task.

500
Unlike the decision task, however, participants were able to add and remove items to/from the knapsack 501 by clicking on the items. An item added to the knapsack was indicated by a light around it (Fig 2b).

502
Participants submitted their solution by pressing the button 'D' on the keyboard before the time limit was 503 reached. If participants did not submit within the time limit, the items selected at the end of the trial were 504 automatically submitted as the solution. Participants were then shown a fixation cross (10 seconds) before 505 the start of the next trial.

506
Each participant completed 18 trials (2 blocks of 9 trials with a rest period of 60 seconds between blocks).

507
Each trial presented a different instance of the knapsack optimization problem. The order of presentation 508 of instances in the task was randomized for each participant. 509 We excluded 2 trials (from 2 participants) because solutions were submitted after less than 1 second into 510 the task. Additionally, 3 participants were excluded from the analysis of submission times because they 511 never submitted a solution before the time-out. This behavior suggests that these participants might have 512 failed to understand the submission instructions.

513
The characterization of complexity using TCC is based on the satisfiability probability and, therefore, only 514 directly applicable to decision problems. We propose a way in which TCC can be extended to optimization 515 problems by framing the optimization problem as a sequence of decision problems: "Is there another set of 516 items with a higher profit that still satisfies the weight capacity constraint?". In other words, we model the 517 decision-maker as selecting a subset of items that satisfy the capacity constraint and then decides whether 518 there exist another combination that would yield them a higher profit and still satisfy the constraint. If 519 the answer is yes, the agent chooses one of such combinations and asks themself the same question again.

520
This process is repeated until the answer is no, which means that the optimum has been reached. We 521 approximate the TCC of an optimization problem to be the TCC of the decision problem after reaching the 522 optimum solution. That is, the decision problem of choosing whether the optimal value is achievable (see 523 S1 Appendix).

524
To generate instances for the task, a sampling process similar to the one for the knapsack decision task 525 was used (see S3 Appendix for more information). We first selected the same normalized capacity bin as for 526 the knapsack decision task (α c ∈ [0.4 − 0.45]). Afterwards, in order to estimate the normalized profit of 527 the optimization problem, we calculated the optimal set of items A * ∈ A for each optimization instance. 528 We then estimated the corresponding optimal sum of values (p * = i∈A * v i ). The normalized profit was 529 then calculated by dividing the target profit by the sum of values of all of the items (α * p = p * i∈A v i ).

530
The normalized profit was then selected to lie in the same regions as in the knapsack decision task and 531 optimization TCC (TCC O ) was defined accordingly. 12 instances were selected from the high TCC O region 532 (α * p ∈ [0.6 − 0.65]) and 6 were selected from the low TCC O region (α * p ∈ [0.85 − 0.9]). It is worth noting 533 that this process did not generate instances in the under-constrained region (α * p ∈ [0.35 − 0.4]).

534
In order to also ensure enough variability between instances with high TCC O we added the same 535 additional constraint in the sampling as in the knapsack decision task. We forced half of the instances with 536 high TCC O to have high computational requirements (top 50%), according to an algorithm-specific ex-post 537 complexity measure of a widely-used algorithm (Gecode; see S2 Appendix). Analogously, the other half was 538 forced to have low computational requirements (bottom 50%).

539
Mental arithmetic task. In this task, participants were presented with 33 mental arithmetic problems (58).

540
The first three trials were considered test trials and thus were not included in the analysis. They were given 541 13 seconds to solve each problem. The task involved addition and division of numbers, as well as questions 542 in which they were asked to round to the nearest integer the result of an addition or division operation.

543
Basic cognitive function tasks.
In addition, we also tested participants' performance on four aspects of 544 cognitive function that we considered relevant for the knapsack tasks, namely, working memory, episodic 545 memory, strategy use, as well as processing and psychomotor speed. To do so, we administered the Reaction . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted July 21, 2020. ; https://doi.org/10.1101/405449 doi: bioRxiv preprint Procedure. After reading the plain language statement and providing written informed consent, participants 549 were instructed in each of the tasks and completed a practice session for each task. Participants first 550 solved the CANTAB RTI task, followed by the knapsack decision task. Then they completed the CANTAB 551 RTI task again, followed by the knapsack optimization task. Subsequently, they completed the remaining 552 CANTAB tasks in the following order: PAL, SWM and SSP. Finally, they performed the mental arithmetic 553 task and completed a set of demographic and debriefing questionnaires. Each experimental session lasted 554 around two hours.

555
Participants received a show-up fee of A$10 and additional monetary compensation based on performance.

556
They earned A$0.7 for each correct answer in the knapsack decision task and A$1 for each correct answer 557 in the knapsack optimization task.

558
Statistical analysis. All of the generalized logistic mixed models (GLMM) and linear mixed models (LMM) 559 included random effects on intercept for participants. Their p-values were calculated using a two-tailed 560 Wald test. All statistical analyses were done in R and mixed models were estimated using the R package 561 lme4 (60).