Extended Bayesian inference incorporating symmetry bias

In this study, we start by proposing a causal induction model that incorporates symmetry bias. This model is important in two aspects. First, it can reproduce causal induction of human judgment with higher accuracy than conventional models. Second, it allows us to estimate the level of symmetry bias of subjects from experimental data. We further propose an inference method that incorporates the aforementioned causal induction model into Bayesian inference. In this method, the component of Bayesian inference, which updates the degree of confidence for each hypothesis, and the component of inverse Bayesian inference that modifies the model of the hypothesis coexist. Our study demonstrates that inverse Bayesian inference enables us to deal flexibly with unstable situations where the object of inference changes from time to time. Author summary We acquire knowledge through learning and make various inferences based on such knowledge and observational data (evidence). If the evidence is insufficient, then the certainty of the conclusion will decline. Moreover, even if the evidence is sufficient, the conclusion may be wrong if the knowledge is incomplete in the first place. In order to model such inference based on incomplete knowledge, we proposed an inference system that performs learning and inference simultaneously and seamlessly. Prepare two coins A and B with different probabilities of landing heads, and repeat the coin toss using either of them. However, the coin that is being tossed is also replaced repeatedly. The system observes only the result of coin toss each time, and estimates the probability of landing heads of coin tossed at the moment. In this task, it is necessary not only to estimate the probabilities of the landing heads of coin A and B, but also to estimate which coin is being used at the moment. In this paper, we show that the proposed system handles such tasks very efficiently by simultaneously performing inference and learning.


Introduction
4 50 strong sense of causal relation when is high, as well as when is high, where is ( │ ) ( │ ) ( │ ) 51 a conditional probability of the antecedent occurrence of p, given the occurrence of q [13]. 52 Consider a simple causal induction model that infers the strength of a causal relation from the cause 53 candidate of event C to the effect event E from four pieces of co-occurring information concerning C and E: 54 the joint presence of C and E, absence of E given C, presence of the E given no C, and the joint absence of 55 C and E. The most representative model of causal induction is the model [14]. It takes the difference ∆ 56 between the conditional probability of occurrence of E given the occurrence of C and the ( │ ) 57 conditional probability of occurrence of E given non-occurrence of C (denoted by ) as an ( │¬ ) ¬ 58 index for causal strength, that is, . ∆ = ( │ ) -( │¬ ) 59 Hattori and Oaksford proposed the dual-factor heuristic (DFH) model [13]. This model is based on 60 the geometric mean of , which stands for the predictability of the effect from the cause, and its ( │ ) 61 inverse , that is, .
Both and models contain In other words, given the occurrence of C, if the ∆ ( │ ). 63 probability of occurrence of E following C is high, the chance of C to be the cause of E increases. Intuitively 64 speaking, the strength of the causal relation does not seem to be solely determined by The second ( │ ). 65 item in the model, shows that even if the probability of occurrence of E is high given the ∆ -( │¬ ), 66 occurrence of C, if the probability of occurrence of E is still high in the absence of the occurrence of C, that 67 is, if the probability of occurrence of E is high irrespectively of the occurrence of C, then the chance of C 68 being the cause of E decreases. 69 Whereas for the model, if the probability , which is the probability of the antecedent ( │ ) To view the relation between two events as a causal relation can therefore be understood as having 84 both symmetry and mutual exclusivity biases. 85 Bayesian inference is based on the notion of conditional probability. Bayesian inference speculates 86 the hidden cause behind an observation results from retrospectively applying statistical inferences. The 87 relation between Bayesian inference and brain function has been attracting attention in recent years in the 88 field of neuroscience [16,17]. 89 In Bayesian inference, the degree of confidence in a hypothesis is updated based on a model of 90 predefined hypotheses and current observational data. In other words, Bayesian inference is a process of 91 narrowing down hypotheses to one which best explains observational data. Changing the model of each 92 hypothesis or adding new ones in the course of performing Bayesian inference is not allowed. In addition, 93 Bayesian inference itself does not deal with alterations in the inference target during the inference or with its 94 replacement. Therefore, such inference substantially needs to assume the identity of the target. 95 Note, however, that requirements of the invariability of the hypothetical model and the identity of 96 the inference target stem from the theoretical framework, and they are not always met in actuality. For 97 instance, if the object is unknown, it would be impossible to infer what it is without adding a new hypothetical 6 98 model. Moreover, it is likely that, under unsteady circumstances, the inference target undergoes alteration 99 from time to time or is replaced by some other object. 100 In order to predetermine whether the object is replaced by another, one must first infer its identity. 101 A correct inference depends on as much observational data as possible. However, in order to properly use 102 accumulated observational data, it must be ensured that these data derive from the same object. In other 103 words, to determine whether the object has been replaced or not, the object must be hypothesized not to have 104 been replaced in the first place. In this sort of situation, it is necessary to infer what the object is while at the 105 same time evaluating the legitimacy of the inference itself. How, then, could we model the inference under 106 the situation described above? 107 Arecchi [18] proposed the concept of the inverse Bayesian inference where the hypothetical model, 108 which is fixed in the traditional Bayesian inference, is modified according to circumstances. Gunji

127
Proposal of extended confidence model 128 We seek to establish an extended model of degree of confidence as the generalised weighted average 129 of and its inverse using parameters and .
that controls that strength of the symmetry bias. When , obtains irrespectively of = 0 ( │ ) = ( │ ) 143 the value of , and expresses a normal conditional probability without the symmetry bias. ( 208 In this section, we propose the extended Bayesian inference where the conditional probability in 209 Bayesian inference is replaced by the extended confidence model. 210 First, we describe Bayesian inference. This study deals with the problem of inferring a generative 211 model (probability distribution) from observational data. To this end, in what follows, the hypothesis and ℎ 212 data will be used on behalf of and . Moreover, discrete models will be considered.

Proposal of extended Bayesian inference
Bayesian inference first defines several hypotheses and provides a model for each hypothesis ℎ 214 (probability distribution of data) in the form of conditional probability When data are fixed and P( │ℎ ). 215 regarded as a function of a hypothesis, this conditional probability is termed likelihood. The confidence P 216 for each hypothesis is given as a prior probability.
(ℎ ) 217 We can take and as initial values and calculate the posterior probability P( |ℎ ) P(ℎ ) P(ℎ | ) 218 when observing data using Bayes' theorem as follows.
Hereinafter, data observed at a point in time are represented by the bold . Afterwards, we can replace the 220 posterior probability with the prior probability using Bayesian updating.
By combining formulas (3) and (4), we get Whenever new data are observed, in the formula (5), i.e., confidence in each hypothesis, is (ℎ ) 223 updated and the inference continues. The inference distribution during this procedure can be expressed as

224
Note that in Bayesian inference, while the probability of each hypothesis changes over time, The extended Bayesian inference is an inference that has as parameters and and 228 accommodates normal Bayesian inference as its special case when . Specifically, it is constructed by = 0 229 the following two update formulas.
In formula (8), the bold-faced represents the hypothesis that has the highest confidence. See the methods 232 section below for a detailed derivation of the update formulas. Here, we can see that, supposing in = 0 233 formula (7), the right side shows the same form as that of Bayesian inference seen in formula (5).  is subject to the denominator in the right side, that is, the estimated value of the data. Following Gunji ( ) 240 et al. [19], the process shown in formula (8) is termed Inverse Bayesian Inference. 241 In what follows we show the processing flow of Extended Bayesian inference. First, we take ( |ℎ   242   and as initial values and substitute them with . using the formula (7) and (8) whenever is observed. Following the application of formulas (7) and (8), 246 we can normalise and .
247 Finally, we can calculate the estimated distribution values as with Bayesian inference. Whenever a coin toss result is observed, the correct probability of landing heads ( is estimated ) 265 by extended Bayesian inference. Additionally, for comparison with extended Bayesian inference, estimation using only inverse Bayesian inference and estimation using Exponential Moving Average (EMA) were also 267 carried out. 268 First, let heads be expressed as and tails as . Second, we prepare hypotheses (ℎ 1 , 269 and define the probability of heads and the probability of tails in each hypothesis as follows.
270 271 That is, the models for all hypotheses are the same, and it means that this system has substantially no 272 model of hypothesis at the initial stage. 273 Further, we must suppose that the prior probability for each hypothesis is equal.
Whenever a coin toss result is observed, by performing extended Bayesian inference using formulas (7), 276 (8), (12), and (13), is successively updated. In the simulations, . For the simplicity of ( ) = 3 277 subsequent analysis, in the following simulations, the parameter was fixed to in the extended -1 278 Bayesian inference. 279 When updating the degree of confidence for each hypothesis using formula (7), we set the (ℎ ) 280 minimum value to impose a restriction so that the degree of confidence will not be zero.
Where is a function whose output is a larger value of the two arguments . In the simulation, was set to 0.00001. 283 In case only inverse Bayesian inference is performed, the hypothesis is limited to only , the process of 284 formula (7) is not performed, and is always set to 1.0. ( ) In this paper, we deal with a task in which the probability of heads can take two values, and they are 286 replaced by the probability . If a uniformly distributed random number generated from interval [0.0, 1.0] 287 at the trial is denoted as , the probability of heads is expressed by the following formula.
In this simulation, was set to 0.0001. The initial value of the probability of heads was set to 0.85.
Here, and represent the correct value and the estimated value in trial, respectively. ℎ 345 represents the length of the interval. 346 We use the RMSE of the first half as a measure of the ability to follow rapid changes, and the RMSE of 347 the second half as a measure of the accuracy of the estimation in the stable period. The data of the inverse Bayesian inference was located slightly lower left on this trade-off curve. On the 353 other hand, the data of the extended Bayesian inference was almost the same as the data of the inverse 354 Bayesian inference with regards to the accuracy, but the followability was greatly improved. 355 That is, it can be seen that the extended Bayesian inference broke the trade-off found in EMA estimation.    At this step, formula (20) for inverse Bayesian inference can also be rewritten as follows.
The right side of this formula shows the weighted harmonic average of 1 and Since ( | ). 0 ≤ 374 , the denominator is necessarily less than 1, and the likelihood increases whenever ( | ) ≤ 1 ( | ) 375 updated. In other words, when certain data are observed, the connection between the data and the hypothesis 376 with the highest degree of confidence at that time is reinforced. Conversely, unobserved data, i.e., for 377 other than , can be standardised using formula (12) These considerations suggest that becomes larger according to increase of updates to the model. 382 In this sense, we can say that formula (20) for inverse Bayesian inference during the steady period represents 383 a process of learning, and corresponds to the rate of learning. 384 With respect to the portion that corresponds to Bayesian inference, suppose in the formula m = -1 385 (7), then we can rewrite it as:  high accuracy. However, as shown in Table 2, the values of the parameters are different for each experiment. 421 It is known that the interpretation of conditionals largely change depending on the type and the contents of 422 the conditionals, as well as subject's age [26]. Further studies are necessary to determine how parameters 423 change according to type and contents of conditionals, as well as age. 424 The performance of the present model and the catalogue performance of DFH and pARIs were Domarus principle' applies to the speech of schizophrenic patients [27] and refers to an inference of the form 434 'Men die. Weed die. Therefore, men are weed'. There is a widely observed tendency in schizophrenic patients 435 to identify two things as the same when they share a common property -a mechanism said to underlie 436 delusion [28]. Logically, it is wrong to conclude from 'A is C' and 'B is C' that 'A is B'. The subjects were asked to judge a number of problems, and each problem involved a sequence of instances 488 of these four information types. The frequencies of each information type varied from problem to problem. 489 At the end of a problem, the subjects were asked to enter a number from 0 to 100 that best reflected their 490 judgment of the drugs causing the side effects. In W03.2, the participants were 40 undergraduate students. They were given information on the 512 additives (manganese trioxide) contained in the foods a patient has eaten, and information on whether the 513 patient has developed an allergic reaction. They were asked to judge the extent to which the statement 514 'Manganese trioxide causes the allergic reaction in this patient' was right for that patient and to write a number from 0 (zero) to 100, where 0 (zero) means that the statement is definitely not right, and 100 means 516 that the statement is definitely right. 517 In W03.6. the participants were 43 first-year undergraduate students. Most features of method, 518 including initial written instructions; format of stimulus presentations; and procedure, were the same as in 519 W03.2. The studies differed in design, however.

526
In the next step, we replace the conditional probability on the right side of formulas (26) and (27)   527 with the extended confidence to make the formulas recursive, and then we replace the equation with the 528 update formula. [(1 -)( ( )) -+ ( (ℎ )) -] -1 (29) As seen in formula (28), in inverse Bayesian inference, the amount of modification to the model of 531 each hypothesis increases as becomes larger. However, in this article, not all hypothetical models are 532 uniformly modified, and the amount of modification changes according to confidence levels as follows. Here, formula (31) is a procedure from the field of machine learning called Softmax [29], and 536 is a parameter termed temperature. remains the same value for all hypotheses if the temperature ( > 0) 537 is high with the limit . On the other hand, if the temperature is low, becomes greater for hypotheses →∞ 538 with a higher confidence level.
takes the value 1 in the limit for hypotheses with the highest →0 539 confidence level, and takes value 0 for all the other hypotheses. 540 In inverse Bayesian inference, the hypothetical model is modified when is observed using as 541 follows. Here, there are reasons why the degree of modification for each hypothetical model changes 543 according to the level of confidence. First, this process is a modification of the hypothetical model, which 544 can be understood as a learning procedure rather than inference. Second, it is more likely that the currently 545 observed data derives from a hypothesis, if that hypothesis has a higher degree of confidence. Therefore, 546 when modifying the model for each hypothesis based on observed data , the hypothesis with a higher 547 degree of confidence requires a greater modification of its model. Of course, when , all hypothetical →∞ 548 models can equally be modified. On the other hand, when , only the hypothesis model with the highest →0 degree of confidence is modified. Moreover, supposing , in formula (30), obtains for all = 0 = 0 550 hypotheses. 551 In the simulation, was set. In other words, the inverse Bayesian inference was applied to only →0 552 the hypothesis that has the highest confidence value. 553 554