Abstract
Integrated Information Theory is currently the leading mathematical theory of consciousness. The core of the theory relies on the calculation of a scalar mathematical measure of consciousness, Φ, which is deduced from the phenomenological axioms of the theory. Here, we show that despite its widespread use, Φ is not a well-defined mathematical concept in the sense that the value it specifies is neither unique nor specific. This problem, occasionally referred to as “undetermined qualia”, is the result of degeneracies in the optimization routine used to calculate Φ, which leads to ambiguities in determining the consciousness of systems under study. As demonstration, we first apply the mathematical definition of Φ to a simple AND+OR logic gate system and show 83 non-unique Φ values result, spanning a substantial portion of the range of possibilities. We then introduce a Python package called PyPhi-Spectrum which, unlike currently available packages, delivers the entire spectrum of possible Φ values for a given system. We apply this to a variety of examples of recently published calculations of Φ and show how virtually all Φ values from the sampled literature are chosen arbitrarily from a set of non-unique possibilities, the full range of which often includes both conscious and unconscious predictions. Lastly, we review proposed solutions to this degeneracy problem, and find none to provide a satisfactory solution, either because they fail to specify a unique Φ value or yield Φ = 0 for systems that are clearly integrated. We conclude with a discussion of requirements moving forward for scientifically valid theories of consciousness that avoid these degeneracy issues.
1 Introduction
Integrated Information Theory (IIT) is a leading contender among theories of consciousness due to its provision of a scalar mathematical measure, Φ, posited to predict the overall level of consciousness in virtually any dynamical system. In comparison to other contemporary theories of consciousness, such as Global Neuronal Workspace Theory [1] or Predictive Processing [2–6], IIT is set apart by its mathematical rigor. The concrete, mathematical formulation of the theory is at least partially responsible for its widespread adoption over the past decade1. However, there is some debate about IIT’s mathematical implementation. One of the more widely-discussed problems is the ambiguity in the derivation of Φ from the phenomenological axioms of the theory [8–10]. Not often discussed nor addressed is that the value of Φ one predicts is not unique, even within the standard accepted mathematical definition of Φ put forth in IIT 3.0 [11]. Indeed, as we will show, IIT can simultaneously predict a system to be both conscious and unconscious, implying the theory is ill-defined.
It has been recognized since as early as 2012 that Φ may be indeterminate for some systems [12]. Yet, to date, an investigation of how this impacts conclusions that can be drawn from the theory across different systems it is applied to has not been undertaken. Here we show, even the simplest of systems can evade a straightforward evaluation of the mathematical value of Φ. The specifics of this problem result as a consequence of what has been referred to as “underdetermined qualia” or “tied purviews” [13, 14]. In short, the problem results as a consequence of the fact that Φ (“big Phi”) is defined as an information-theoretic distance between cause-effect structures (vectors) that are not guaranteed to be unique. The non-uniqueness of cause-effect structures is, in turn, caused by an inability to select between degenerate core causes and effects (probability distributions) based on their ϕ (“little phi”) values. According to IIT 3.0, the cause/effect repertoire with the smallest ϕ value must be identified as the core cause/effect to satisfy the exclusion axiom but, in practice, there is no guarantee that this element is unique.
A standard resolution to this degeneracy problem is to select a core cause and core effect repertoire arbitrarily from the set of possibilities. For example, using an if less than statement in the optimization routine will select the first of the degenerate core causes/effects while using an if less than or equal to statement will select the last of the degenerate core causes/effects. In the configuration files for the software package PyPhi [15], which is used ubiquitously for the calculation of Φ [16–25], there is an option to select whether to keep the smallest or largest purview element in the event of a tie (i.e., the same ϕ value for different causes/effects). However, this choice is ad hoc for obvious reasons (see, e.g., [13, 14, 26]): while it appears to provide a unique Φ value despite degeneracy, it is not a valid solution [14] because the tied purview elements are often the same size, meaning an element must still be selected arbitrarily. In this case, PyPhi defaults to selecting the first of the degenerate values which depends arbitrarily on the order in which elements are considered. A second significant issue with this solution is that there is nothing phenomenological built into the theory to suggest why the smallest or largest purview element should be retained as the unique cause/effect. Consequently, different authors have reached conflicting conclusions as to whether the smallest or largest purview element is more in line with the phenomenological axioms of the theory [13, 14, 26].
Here, we aim to shed light on these issues by calculating Φ for a very simple model system in the form of an AND gate connected to an OR gate. Taking the mathematical definition of Φ at face value, we demonstrate that a spectrum of 83 different Φ values results, corresponding to both conscious and unconscious predictions. Next, we provide a modified version of PyPhi called PyPhi-Spectrum that can be used to calculate the entire spectrum of Φ values for a given dynamical system with a single function call, as opposed to the singular value that is typically reported. We then apply this algorithm to a corpus of ten recently published Φ values, in order to determine the extent to which non-unique Φ values are overlooked in the literature. Last, we investigate whether or not proposed solutions adequately address this problem. We conclude with a philosophy of science discussion related to the scientific handling of ideas.
2 Methods
In what follows, we assume the reader is familiar with the basic terms in IIT 3.0. For a more detailed explanation of terms, we refer the reader to the original IIT 3.0 publication [11].
2.1 Preliminaries
In IIT 3.0, ΦMax is the overall level of conscious experience that is predicted for a given dynamical system. The calculation of ΦMax is notoriously difficult to perform. In total, five nested optimization steps are required, as shown in Algorithm 1. At the core of this routine is a simple distance measure in the form of an earth mover’s distance [27] between two probability distributions. This results in a measure of integration known as ϕ (“little phi”). However, this elementary distance calculation must be performed for every possible partition of a given “purview” in order to calculate ϕMIP, then every possible purview for a given “mechanism” in order to find ϕMax. This results in what is known as a cause-effect structure (CES) or “constellation”, which is defined by the set of mechanisms, their ϕMax values, and two probability distributions per mechanism corresponding to the “core cause” and “core effect”. Next, one must generate a constellation for every possible partition of the subsystem under consideration and use a modification of the earth mover’s distance to quantify how close this constellation is to that of the unpartitioned subsystem. This results in a second measure of integration known as Φ (“big Phi”), which is designed to quantify the effect of a system-level partition on the underlying ability for a system’s components (mechanisms) to integrate information. The system-level partition with the smallest Φ value is the minimum information partition (MIP) and the corresponding Φ value is ΦMIP. Last, this entire process must be repeated for every possible subsystem in a given system in order to find the maximum integrated information ΦMax. In total, this hierarchy of nested optimization routines results in a computational complexity that scales as , where m is the number of elements in the system and is unrealizable in practice for all but the smallest dynamical systems (c.f. Section 6.1).
Pseudocode Overview of the Routine to Calculate Phi Max
2.2 Example: AND+OR System
To demonstrate the problems inherent in the mathematical definition of ΦMax we will consider a simple system comprised of an AND gate and an OR gate connected to each another, as shown in Figure 1. Since there are only two elements, we need not worry about the outermost optimization as a subsystem must be comprised of at least two components in order to generate ΦMIP > 0, so ΦMIP = ΦMax in what follows. To calculate ΦMIP we first must initialize the system into a given state. We assume an initial state s0 = 00 in all that follows, though our results are not sensitive to this choice. The next step is to identify the cause-effect structure (CES) or constellation C corresponding to the transition probability matrix (TPM) of the unpartitioned system. To do this, one must find the core cause and core effect of every potential mechanism in the system, where a mechanism is any element in the power set of the subsystem. In our case, the potential mechanisms are in where the superscript c denotes the mechanism in its current state. For each element in this set, we must identify how well it constrains elements in the past power set , known as the past purview, as well as how well it constrains elements in the future powerset , known as the future purview.
Next, we measure the earth mover’s distance D between the constrained distribution of each purview element and the constrained distribution of each purview element under the minimum information partition (MIP): where z is the purview element and m is the mechanism. The distribution p(z|m = s0) tells us the likelihood of z given the current state of m is s0 which, compared to an unconstrained distribution, tells us how much information m is generating about z. However, we also need to know whether or not that information is “integrated” so we must break m and z up into all possible parts and ask whether the parts acting independently can generate the same amount of information as the whole. For example, to find how much integrated information is generated by the mechanism Ac about the purview element z = ABp we calculate the probability distribution p(ABp|Ac = 0) and compare this to the two possible partitions of the purview: Ac/ABp → (Ac/Ap × []/Bp) and Ac/ABp → (Ac/Bp × []/Ap). The first partition allows Ac to constrain Ap but leaves Bp unconstrained (denoted by an empty bracket []) while the second partition allows Ac to constrain Bp but leaves Ap unconstrained. The distributions generated by these partitions, shown in Figure 2, are then compared to the distribution generated by the unpartitioned system, and the partition that minimizes the earth mover’s distance to the unpartitioned system is the MIP for this purview/mechanism combination. If multiple partitions yield the the same earth mover’s distance to the unpartioned system, as is the case in Figure 2, it is irrelevant which one is chosen as all that moves forward in the computation is the scalar value of ϕMIP.
Once we have identified the MIP (and calculated ϕMIP) for all purview elements for a given mechanism, we define the core cause and core effect as the past and future purview elements with the greatest ϕMIP. We denote the integrated information of the former as and of the latter as and define the total integrated information ϕMax of a given mechanism as:
If ϕMax > 0 for a mechanism we say that the mechanism gives rise to a concept. A concept is fully specified by three things: its ϕmax value, the cause repertoire corresponding to the core cause, and the effect repertoire corresponding to the core effect. We have already provided an example of a cause repertoire in Figure 2, namely, it is the distribution over previous states of the purview element, given the current state of the mechanism. In Figure 2, the state of Ac constrains the probability of observing ABp and this constrained distribution is the cause repertoire for the purview element ABp. Any element not included as part of the purview is left unconstrained and must be independently “noised” [11]. For example, if the purview of mechanism Ac is Ap we generate the constrained distribution p(Ap|Ac = s0) and combine this with the unconstrained distribution for Bp (denoted puc(Bp)). For the AND+OR system, we have p(Ap|Ac = 0) = [2/3, 1/3] and puc(Bp) = [1/2, 1/2] which yields the cause repertoire: [1/3, 1/3, 1/6, 1/6], where states are ordered in binary with the most significant digit on the left (i.e. [00, 01, 10, 11]).
The effect repertoire for a given purview element is generated in the same way as a cause repertoire. For example, the probability of observing Af given the state of mechanism Ac = 0 is P (Af| Ac = 0) = [0, 1]. Combining this with the unconstrained future distribution for Bf yields the effect repertoire [1/4, 3/4, 0, 0]. Note, this is an example where the unconstrained future distribution for Bf is not uniform. This is a direct result of the noising procedure: an OR gate receiving uniform random input is three times as likely to be in state 1 as it is to be in state 0. Furthermore, when more than one target node is involved, we must send independent noise to each target (to avoid correlated input). For example, in a three-node system if we were looking at the purview element BCf and noising over the state of Ac, we must imagine that Ac has the ability to send different signals to Bf and Cf at the same time (hence the term “noising”).
We can now define the CES (constellation) C as the set of all concepts for the system in the given state. Recall, each concept corresponds to a single mechanism and is comprised of the mechanism’s core cause repertoire, core effect repertoire, and ϕMax value. In our example, there are at most three concepts, corresponding to the mechanisms {Ac, Bc, ABc}. The core cause repertoire for a mechanism is found by optimizing over all past purview elements and identifying the purview with the highest ϕMIP, where ϕMIP is found by further optimization over all possible partitions. Figure 3 shows ϕMIP and the corresponding partition for all possible purview elements given the mechanism Ac.
2.2.1 Degenerate Core Causes and Effects
We are now in position to see how degenerate core causes and effects can occur and their consequences. The postulates of IIT - and the exclusion postulate in particular - imply a unique core cause should be assigned to each mechanism, but the purview element that generates is not unique. As Figure 3 shows, Ap, Bp, and ABp all generate the same ϕMIP value for the mechanism in question. Since each purview/mechanism combination is associated with a different cause repertoire, the core cause repertoire and the resulting constellation C are not uniquely defined. If the scalar value of was all that mattered to the calculation of Φ, this degeneracy would be inconsequential (as is the case for partitions that generate the same ϕMIP value for a given purview element). However, system-level integrated information Φ is defined as the cost of transforming the core cause/effect repertoires from one constellation C into another C′. That is: where D is an extension of the earth mover’s distance that calculates the cost of moving ϕMax between repertoires. If the core cause or effect repertoire changes, the distance between constellations will change accordingly, as the distance metric that goes into the EMD calculation is sensitive to the relative shape of the distributions and not just the scalar ϕMax values. For example, if we were to choose ABp as the core cause as the core cause for mechanism Ac, this generates the concept in C given by the tuple {[1/3, 1/3, 1/3, 0], [1/2, 1/2, 0, 0], 1/6} where the first element is the core cause repertoire, the second element is the core effect repertoire, and the third element is the ϕMax value. However we could just as easily have chosen Ap as our core cause and Af as our core effect. In which case, the concept generated for Ac would be {[1/3, 1/3, 1/6, 1/6], [1/4, 3/4, 0, 0], 1/6}. Clearly, these choices have the same ΦMax value but significantly different core cause and effect repertoires.
To illustrate the consequences of this, let C be the constellation consisting only of the concept generated by Ac with core cause cause ABp and core effect ABf and let C′ be the constellation consisting of only the null concept for this system (the unconstrained cause and effect repertoires):
The extended earth mover’s distance is the cost of transforming C into C′ by moving ϕMax = 1/6 a distance given by the sum of the (regular) earth mover’s distance between cause repertoires and effect repertoires. Namely, we have which results in the integrated conceptual information:
Now, if we instead choose Ap and Af as our core cause and core effect we have: corresponding to an integrated conceptual information:
Thus, we get different values of ΦMIP depending on our choice of core cause and core effect.
2.2.2 A Spectrum of Non-unique Φ Values
Each combination of degenerate core cause/effect repertoires results in the potential for a different Φ value. For example, if the unpartitioned system has three degenerate core causes for Ac and two for Bc, then there are 3 × 2 = 6 non-unique constellations (CES) for the unpartitioned system. If the partitioned system also has six non-unique CES, then there are a total of 36 different combinations for the distance between constellations (Φ). For the AND+OR system, the unpartitioned system has a total of 81 non-unique combinations, while each cut has a total of 9 non-unique constellations. Thus, one must examine the distance between 81 × 9 × 2 different combinations of constellations in order to determine all possible Φ values, which we refer to as the “spectrum” of Φ values for the subsystem. Note, not all Φ values are valid ΦMIP values; it is only those between the upper and lower Φ value of the minimum information partition (MIP) that satisfy the definition of ΦMIP. In total, 83 non-unique ΦMIP values result for the AND+OR system, as shown in Figure 2.2.2. Again, we emphasize there is nothing in the axioms or postulates of IIT to suggest which of these 83 values IIT actually predicts, as all are equally valid according to the mathematical definition of Φ. Crucially, both ΦMIP = 0 and ΦMIP > 0 are present in the spectrum, meaning IIT cannot actually predict whether or not this simple system is conscious (or conversely, one could interpret the theory to simultaneously predict the system is both conscious and unconscious).
3 Results
We now apply our methodology to a variety of recently published systems where Φ was calculated, in order to determine the extent to which degenerate core causes/effects lead to a need for revised interpretation of reports that provide only a unique Φ value. We begin with a pedagogical case study, followed by the broad application of the PyPhi-Spectrum package, described in Section 6.3, to as large of a corpus as is computationally feasible.
3.1 Case Study: Three-node Fission Yeast Cell Cycle
As a case study, we consider the Boolean network model of the fission yeast cell cycle from Marshall et al. [16]. In this study, IIT is used to analyze the causal structure of a minimal biological system, namely, the cell cycle of the fission yeast S. pombe. Using Φ, the authors identify three integrated subsystems corresponding to the full system (eight nodes), a six node subsystem, and a three-node subsystem - all potentially of biological importance. Of these three systems, only the smallest may be subject to our analysis, though we expect similar results for the other two systems. Applying the methodology from Section 2, we find a spectrum of 244 non-unique Φ values, spanning a range from 0.00 – 0.83 bits, see Figure 5. Crucially, only one of these values (Φ = 0.09) is published as the unique ΦMIP value for this subsystem. In reality, all of these values are valid according to the mathematical definition of Φ. Furthermore, the inclusion of Φ = 0 in the spectrum of possibilities changes the biological interpretation of the results entirely. If the subsystem under consideration has ΦMIP = 0, rather than ΦMIP > 0, it would not be identified as “integrated” and its biological function would not be deemed of interest. Thus, the conclusion that this subsystem is of biological importance is entirely dependent on the arbitrary selection of a single Φ value from the spectrum of possibilities and, in general, it is impossible to tell a priori whether or not the narrative being built around a particular Φ value is valid without studying the entire spectrum of possibilities.
3.2 The Non-uniqueness of Published Φ Values
Next, we use PyPhi-Spectrum to anaylze a corpus of recently published Φ values, with the goal of understanding the extent to which underdetermined qualia affect interpretation of previously published Φ. The corpus we analyze is intended to be comprehensive, but we are limited by the computational resources required to perform these calculations (see Section 6.1). Of the dozen or so Φ values that are published in the literature [16–19, 21–26, 28–32], only a handful are small enough to be subjected to our analysis [11, 16, 18, 21, 22, 24, 26, 32]. These systems, summarized in Table 1, are selected primarily for their size, though size alone is not a good indication of computational tractability. For example, certain three-node systems, such as that found in Hanson and Walker 2020 [24], have thousands of degenerate cause-effect structures while others, such as that found in Farnsworth 2021 [18], have just a few. It is not readily apparent what dictates the number of non-unique cause-effect structures that result for a given system, though symmetric inputs almost certainly play a role (see Section 4). Consequently, our corpus is limited to small systems (2-4 nodes) which happen to allow relatively fast evaluation via PyPhi-Spectrum2. While improvements such as parallelization could certainly be made to improve performance and increase the size of our corpus, we do not believe doing so would add much to the interpretation of our results.
Our primary result is shown in Figure 6, which reports calculated values for the entire spectrum of ΦMIP values relative to the published value for every text in our corpus. There are several things to note. First, the existence of a unique Φ value is rare: only the photodiode has a spectrum consisting of a single value (the number of different ΦMIP values is denoted by |ΦMIP| in Figure 6). For the rest of the corpus, the spectra often consist of dozens if not hundreds of non-unique ΦMIP values, of which only one is published (denoted as a black “x” in Figure 6). Importantly, we do find cases where the spectrum contains both Φ = 0 and Φ > 0 values, indicating predictions consistent with IIT that allow interpretation of a system as conscious and unconscious depending on the value chosen. This occurs for three out of the ten published Φ values in our corpus: AND+OR, Marshall et al. 2017 [16], and Hoel et al. 2016 [21]. This has significant implications for the logical foundations of the theory (the exclusion postulate in particular). Additionally, the span of ΦMIP values is often comparable to the entire range of possibilities expected for systems of this size. A typical Φ spectrum spans roughly 1/2 of a bit (Figure 6). In comparison, a (deterministic) two-node Boolean system is bounded from above by ΦMIP ≤ 1.5 bits (Section 6.2), which implies that the Φ values calculated by IIT are not only non-unique but also non-specific (i.e. they don’t constrain the possible Φ values to a small portion of the range).
4 Existing Solutions
The problem of degenerate core causes/effects in IIT is already recognized, but so far its impact has not be extensively studied and therefore there has not been sufficient motivation to address it [11, 13, 14, 26]. Previous studies have pointed to the problem of degenerate core causes/effects in IIT (although not studying the extensive impact on reported literature as we do here). To our knowledge, there are several different solutions to this problem, with differing degrees of justification, which we review next.
The first solution is that put forward by Oizumi et al. in IIT 3.0 [11]. In Figure S1 of their Supporting Information, the authors argue that the degenerate core cause corresponding to the biggest purview element should be selected as the core cause. The justification is that the larger purview “specifies information about more system elements for the same value of irreducibility”: that is, the ϕmax values of the degenerate purview elements are the same (same value of irreducibility) but bigger purview elements constrain more of the system (e.g. ABp constrains more of the system than Ap). This motivation however does not make direct connection to the axioms and/or postulates of the theory [14], meaning another choice could be equally consistent with the phenomenology of the theory. Additional postulates are therefore required. The degenerate core causes in the example they consider correspond to purview elements of different sizes, which allows this simple criterion to result in a unique core cause. However, we have already shown that in general there is no guarantee this is the case. Therefore, this criterion cannot be used to guarantee a unique Φ value and additional criterion would need to be added..
By contrast, Krohn and Ostwald (hereafter KO17) reach the exact opposite criterion as a proposed solution to the same problem [13]; namely, KO17 argue that it is the smallest purview element that should be selected as the core cause/effect in the case of tied purviews. Their motivation is “causes should not be multiplied beyond necessity” [12]. A similar language is found in the phe-nomenology of IIT in the exclusion postulate, but it is not clear that it can be applied to the dimensionality of purview elements. Further motivation is required to illustrate why choosing Bp as the core cause of Ac instead of ABp multiplies the cause of Ac beyond necessity. Additionally, again we encounter that the purview element is not guaranteed to be unique, which means Φ is still not mathematically well-defined. To address this, KO17 present a completely novel solution in the form of a modified definition of Φ based on the difference in the sum of ϕmax values between constellations, rather than the extended earth mover’s distance. The benefit of this definition is that it depends only on the scalar ϕmax values associated with concepts, rather than the non-unique cause/effect distributions. In other words, it does not matter which of the degenerate core causes is selected because they all have the same ϕMax value and the integrated conceptual information is just the sum of ϕMax values over concepts.
Another solution is the “differences that make a difference” criterion proposed by Moon [14]. Here, the author argues that if degenerate core causes/effects exist then none of the corresponding purview elements should be selected as the core cause/effect (i.e. ϕmax = 0 for the mechanism). This solution is based on the idea that in IIT “to exist is to cause differences”. If tied purview elements exist with the same ϕMax value, then the ϕMax value does not change if one of the tied purview elements is excluded. For example, if Ap and Bp both give rise to a concept with ϕMax = 1/6, one can eliminate Bp without changing the ϕMax value for the mechanism; therefore, the existence of Bp does not make a difference “from the intrinsic perspective of the system”. The fact that one can do this individually for each of the degenerate core causes or effects implies that none can give rise to a concept and ϕMax = 0 for the mechanism.
Problems arise with the Φ values that result from all four proposed solutions to the non-uniqueness problem. Namely, selecting the smallest or largest purview element requires additional phenomenological assumptions, and even with these would not guarantee a unique Φ value. The KO17 and Moon criteria result in Φ = 0 for systems that are clearly integrated. In the case of the KO17 definition of Φ, under a partition, the sum of ϕMax values can actually increase, resulting in the confounding conclusion that the system is integrating more information after a cut than it was before (Φ < 0). These so-called “magic cuts” are discussed by Krohn and Ostwald [13], as well as in the PyPhi documentation. More immediately, if we apply the KO17 definition of Φ to the AND+OR system from Section 2, we find Φ = 0 due to the fact that the ϕMax values for each concept are the same before and after the system level minimum information partition (ΣϕMax = 5/12 in both cases). Consequently, we reject this modified definition of Φ outright based on the idea that an AND+OR system is integrated, e.g. it satisfies the general mathematical definition of integrated information, which is an inability to tensor factorize a system without changing the underlying dynamics [33, 34]. In the case of the AND+OR system, one cannot cut A from B without affecting the state of B, and vise versa, which implies the system is integrated. The same conclusion applies to Moon’s criterion, for which the AND+OR system again fails to yield Φ > 0 due to the presence of degenerate core causes/effects for all mechanisms. In addition, the “differences that make a difference” argument relies entirely on the assumption that the ϕ value being measured is an accurate reflection of what it means to make a difference. Put simply, cutting information from an AND gate to an OR gate does make a difference in terms of system-level integration, and the only way to justify that it doesn’t is to define differences that make a difference in terms of ϕ values.
As demonstration of these problems inherent with each of the four existing solutions, we reanalyze our corpus enforcing each criterion in turn (Figure 7)3. As expected, the KO17 and Moon solution yield Φ = 0 for several systems that are integrated (e.g., that cannot be tensor factorized without changing the underlying dynamics). The “smallest” criterion does little to mitigate the degeneracy. At first glance, however, it appears the “biggest” criterion avoids both of these issues and provides a positive Φ value in all cases. This is an idiosyncrasy of our data set, as selecting the biggest purview element suffers from the same problem as selecting the smallest purview element; namely, degenerate core causes/effects are often the same size. The reason that the “biggest” solution appears to yield unique Φ values for the systems under consideration is due entirely to the ubiquitous use of two-input logic gates (AND, OR,XOR, NOR, etc.) in our corpus. In such cases, the tied purview elements are almost always A, B and AB for which selecting the largest purview element (AB) results in a unique core cause/effect. However, this does not hold in general, as systems comprised of logic gates with more than two inputs (e.g. neurons in the human brain) have entirely different symmetries. As Figure 8 shows, a simple system of majority gates, each with three inputs, is enough to prove that a unique Φ value does not result from the “biggest” criterion. Thus, the fundamental problem remains unaddressed.
5 Discussion
In IIT, the constellation corresponding to the unpartitioned CES is equated with nothing less than subjective experience itself. The geometric shape of this constellation (meaning the shape of the probability distributions that comprise the core cause/effect repertoires) is identified as “what it is like” to be something (c.f. [35]) while the ΦMax value is identified as the overall “level” of consciousness. Our results demonstrate how this structure is underspecified. If held to IIT’s own criteria, Φ cannot be used to measure the contents or quantity of subjective experience. In the words of Moon, “the qualia underdetermination problem shakes IIT to the ground” [14].
It is critical that such challenges be addressed, especially given how IIT is more popular than ever and Φ is being used to bolster arguments about the nature of consciousness in neuroscience and beyond [17, 18, 36]. While it is important to acknowledge that IIT is not necessary tied to its current mathematical formalism, it is equally important to point out that the theory demonstrates several hallmark features of an unscientific handling of ideas to date [37]. The first is the use of an ad hoc procedure to resolve fundamental contradictions inherent in the theory: IIT tries to resolve the undetermined qualia problem with an ad hoc procedure of selecting the smallest or largest purview element. The second is a lack of transparency with regard to the mathematical implementation of the theory: PyPhi essentially acts as a black-box algorithm that buries many of the mathematical problems inherent in the Φ calculation, most notably the arbitrary selection of a unique Φ value. And third, the theory struggles to ground itself experimentally: recent proofs have demonstrated that IIT is either falsified or inherently unfalsifiable depending on the inference procedure that is used to independently test predictions from the theory. In combination, these three features strongly suggest that IIT is on the wrong side of the demarcation problem.
Granted, there is hope that IIT may yet provide a blueprint for a successful scientific theory of consciousness [38, 39]. Culturally, however, what we see is that IIT is presented as a self-consistent theory, without resolving contradictions inherent in its logical foundation that imply it is in fact not self-consistent [14, 40, 41]. On one hand, this could be seen as a positive, as Feyerabend argues that all new theories must proceed counterinductively for a matter of time if they are to succeed. That is, they should go against well-established principles and hold onto assumptions in the face of overwhelming evidence to the contrary [42]. Yet, even Feyerabend is careful to distinguish between scientific and unscientific handling of ideas, stating:
The distinction between the crank and the respectable thinker lies in the research that is done once a certain point of view is adopted. The crank usually is content with defending the point of view in its original, undeveloped, metaphysical form, and he is not prepared to test its usefulness in all those cases which seem to favor the opponent, or even admit that there exists a problem. It is this further investigation, the details of it, the knowledge of the difficulties, of the general state of knowledge, the recognition of objections, which distinguishes the “respectable thinker” from the crank. The original content of his theory does not. [43]
Going forward, it is of paramount importance that proponents of IIT seek to address the difficulties associated with such problems head-on, rather than bypassing them, as the scientific merit of the theory will ultimately depend on it. This consideration is especially important given the negative repercussions that could result from the misapplication of a theory of consciousness in medical, legal, and moral settings.
6 Appendix
6.1 On the Computational Complexity of Φ
For a subsystem of size n, the computational complexity scales as follows. First, one must calculate the cause-effect structure (CES) for every possible partition of the subsystem. If the partition is a bipartition, as is typically assumed [13, 15], the number of ways to do this is S(n, 2), where n is the size of the subsystem and S(n, m) are Stirling numbers of the second kind [44]. However, two small corrections should be considered: first, partitions are unidirectional, and second, the unpartitioned system must also be addressed. The former consideration results in twice as many bipartitions, while the latter results in a single additional partition. Combining these results in a total of 2S(n, 2) + 1 partitions which, for large n, is well approximated as 2S(n, 2). For each CES, there are 2n − 1 potential mechanisms, corresponding to the size of the powerset of elements excluding the empty set. For each potential mechanism, there are purview elements of size k, each of which can be partitioned S(k, 2) times. Therefore, there are elementary distance calculations that must be performed to calculate a single CES, where the additional factor of two is due to the need to optimize ϕmax over both past and future purviews. Putting this together, there are a total of 2S(n, 2 + 1) × (2n − 1) × 2(3n) ≈ 12n elementary distance calculations required to get the system-level integrated information ΦMIP for a given subsystem. For the global system, this calculation must be embedded in an additional optimization corresponding to maximizing over the powerset of all possible subsystems (i.e. ΦMax = max{ΦMIP}). For a global system of size m, there are subsystems of size n, each with 12n elementary distance calculations. Therefore, there are a total of elementary calculations required to find Φmax for a global system of size m. For all but the smallest m values, the computational resources required to actually calculate ΦMax are impossible to realize.
Interestingly, the scaling derived here is in tension with the previously published value of [15]. This could be due to the possibility that the scaling considers all possible partitions, rather than strict bipartitions, or perhaps it resolves the elementary computation in terms of some more fundamental operation (e.g. bit flips). Without additional information, it is difficult to say whether or not either of these considerations could resolve the tension between values. We do note, however, that the published values of t = 1, t = 16, and t = 9900s for n = 3,n = 5, and n = 7 ([15]) are within an order of magnitude of the predicted scaling while off by 40 orders of magnitude from an scaling, though the use of parallelization complicates this point.
6.2 Calculating an Upper Bound on Φ
It is relatively straightforward to calculate a loose upper bound on ΦMIP for a subsystem of size n. To do so, one need only understand the extension of the earth mover’s distance D that is used in the calculation ΦMIP = D(C||C→). By definition, the “earth” being moved is ϕMax between concepts in the unpartitioned CES (C) and the partitioned CES (C→), while the “distance” it is moved is measured by the regular earth mover’s distance between concepts [11]. In light of this, a straightforward upper bound on ΦMIP can be found by asking what the maximum value of ϕMax is for each concept and moving that amount as far away as possible. For a mechanism of size m, its ϕMax value is bounded from above by the maximum value of the regular earth mover’s distance, which is EMDMax(m) = m. It is easy to see this is the case, as EMDMax is achieved when all the probability (p = 1) is moved the maximum Hamming distance (HMax), which is m for a mechanism comprised of m bits. For example, EMDMax for a three-bit mechanism is achieved when p = 1 is moved from state 000 to state 111 (HMax = 3), so ϕmax = EMDMax = 3. Next, we must ask what the maximum distance DMax in conceptual space is that this amount of ϕMax can be moved. Since this distance is again a regular earth mover’s distance, we have DMax = EMDMax(m) = m. Thus, the maximum contribution a mechanism of size m can make to the extended earth mover’s distance D is upper bounded by ϕMax(m)DMax(m) = [EMDMax(m)]2 = m2. Of course, not all mechanisms are the same size, so the total contribution is bounded by the sum of the maximum contribution from mechanisms of each size, namely:
To date, this is the only known upper bound on ΦMax that we are aware of (though bounds on ϕmax and IIT 2.0 are readily available [13, 34, 45–47]), and it is a very loose bound. For a subsystem of size n = 2, as is the case for the AND+OR system we consider in the main text, we have ΦMIP (2) ≤ (1)2 +(1)2 +(2)2 = 6 bits. In practice, we cannot reasonably expect ϕMax = EMDMax for all mechanisms, as the existence of ϕMax = EMDmax for one mechanism almost certainly precludes the existence of ϕMax = EMDMax for another. Likewise, cutting a CES cannot possibly result in a distance of DMax = EMDMax for all concepts, as additional noise cannot be used to increase the fidelity of constraints. At best, it is likely that concepts map to the null concept in the CES of the MIP, corresponding to a maximum distance DMax(n) = n/2. In this case, the bound that results is ΦMIP ≤ 2n−3n(n + 1), which is still likely loose. To tighten it, one must consider the ϕMax values that can result for a system of mechanisms as an ensemble, rather than individually, which is a task that we found quickly became intractable.
6.2.1 Numerical Approach
Fortunately, for our purposes a numerical approach will suffice. Given a small enough system, it is possible to calculate the Φ values for every possible transition probability matrix (TPM) that results from Boolean logic on a two-bit system. Namely, each bit (A and B) takes one of two possible states in response to the global state of the system. This means there are 24 = 16 possible state transitions for each coordinate, for a total of 162 unique TPMs. For each, it is possible to calculate the Φ spectrum that results using the algorithm we describe in the main text. Then, the upper bound on ΦMIP is simply the maximum ΦMIP value over all possible TPMs in all possible initial states. Since the system is only two bits, this bound on ΦMIP is equivalent to the bound on ΦMax, as subsystems must be comprised of at least two bits to generate ΦMIP > 0. Performing this exercise results in the bound ΦMax ≤ 1.5, which is interestingly exactly 1/4 the analytical bound derived in the previous section; as discussed, it is likely that a factor of 1/2 is accounted for if DMax(n) = n/2, while the other factor of 1/2 may be accounted for by the same type of argument applied to ϕMax (rather than DMax). If so, the upper bound on ΦMIP would be 2n−4n(n + 1) and is potentially more tractable to calculate than previously believed.
6.3 The PyPhi-Spectrum Package
The Python package PyPhi (available for download at http://integratedinformationtheory.org) provides all of the basic functionality needed to calculate the spectrum of Φ values for a given system. Namely, it allows the user to calculate purviews, cause/effect repertoires, earth mover’s distance, CES distance (extended earth mover’s distance), and more. In addition, it contains prebuilt classes for data structures such as concepts that are useful in the calculation. Here, we wrap this basic functionality into a modified version of PyPhi called PyPhi-Spectrum that allows the user to calculate all Φ values for a given subsystem with a single function call. To install this package, one can simply download or clone the entire Phi-Spectrum repository (which includes core PyPhi functionality) from https://github.com/elife-asu/pyphi-spectrum.
A pseudo-code overview of the PyPhi-Spectrum wrapper is shown in Algorithm 2, while basic usage is shown in Algorithm 3. Note, the get-phi-spectrum call returns the Φ values that result from all possible concepts for each cut, while the get-phi-MIP call returns the Φ values corresponding to the minimum information partition (i.e. ΦMIP). Optimizing the latter over all possible subsystems would provide ΦMax for a given system. To install the code, download or clone the entire PyPhi-Spectrum repository (which includes core PyPhi functionality) from https://github.com/elife-asu/pyphi-spectrum.
Pseudocode overview of the PyPhi-Spectrum wrapper
Basic Usage for the PyPhi-Spectrum Wrapper
6.4 Additional Details Related to the Calculation of Φ Values
In this section, we provide the transition probability matrices and initial states necessary to replicate our results. The same data can be found in downloadable form via the GitHub repository: https://github.com/elife-asu/pyphi-spectrum.
6.4.1 Photodiode [11, 32]
A photodiode is a simple system of two interacting COPY gates, taking input from one another. It is arguably the simplest “integrated” system one can study, and has been studied in the context of IIT at least twice [11, 32]. Following Chalmers and McQueen [32], we set the initial state of the system to be s0 = 10. The transition probability matrix is given below.
6.4.2 AND+OR [23, 26]
Like the photodiode, the AND+OR system has been studied in the context of IIT at least twice prior to the current work [23, 26]. However, a concrete Φ value has yet to be published. Therefore, we take the “published value” to be that of the PyPhi value found in Section 2. Similarly, we take the initial state to be s0 = 00 in accordance with Section 2. The transition probability matrix is given below.
6.4.3 Hanson and Walker 2020 [23]
This system is a three bit digital counter in the initial state ’101’. The initial state is selected somewhat arbitrarily, since any initial state will work, but s0 = 101 results in a particularly fast evaluation. The TPM, from Figure 4 of the original publication, is as follows:
6.4.4 Majority Gate System
This system is comprised of three interconnected majority gates, each with three inputs, as shown in Figure 8. If the majority of inputs to a given node are 0 the state of the node at the next timestep is 0 and if the majority of inputs to a given node are 1 the state of the node at the next timestep is 1. In the main text, the system is evaluated in initial state s0 = 000. The transition probability matrix is provided below.
6.4.5 Gomez et al. 2021 [20]
This papers studies the p53–Mdm2 biological regulatory network. Typically, this network is multivalued, but there are two possible binarizations that make standard Φ calculations possible. Of these, we chose the Fauré and Kaji binarization as it is much faster to analyze than the Tonello binarization. Following the authors, we choose an initial state s0 = 0001 and use the following TPM. Note, the PyPhi value we compute for this TPM differs from that published by the authors due to their use of several non-standard configuration settings, such as Krohn and Ostwalds definition of Φ as a difference in integrated conceptual information rather than the IIT 3.0 definition.
6.4.6 Farnsworth 2021 [18]
In this paper a virocell (virus infected cell) is introduced into a Boolean network model of host cell dynamics. There are two network models provided, the first consists of five nodes and is the “full system”, while the second consists of three nodes and is the “reduced system”. For both systems, we study the case where all the nodes are ‘ON’ (i.e. s0 = 11111 and s0 = 111, respectively). Following the Supplementary Material provided by Farnsworth, the transition probabilities matrices are given below. Note, in the full system, the second node is an AND gate (as shown in Figure 6 of their main paper [18]) rather than a COPY gate as shown in Figure 8 of their Supplementary Material.
6.4.7 Oizumi et al. 2014 [11]
This is the canonical OR+AND+XOR system that is often used in demonstrating how to calculate Φ [11, 15, 48]. Following Oizumi et al., we take the system to be in the initial state s0 = 100. The transition probability matrix is given below.
6.4.8 Tononi et al. 2016 [22]
This paper demonstrates the calculation of Φ for a simple system of four interacting logic gates: MAJORITY+OR+AND+AND. Following the authors, we use the initial state s0 = 1110. The transition probability matrix is given below.
6.4.9 Hoel et al. 2016 [21]
This paper examines several small Boolean networks at both micro and macro scales. We choose to analyze the smallest of microsystems here, which is a system of four interconnected AND gates with noisy input. Following the authors, we analyze the system in initial state s0 = 0000. Due to the noisy input, the TPM is not deterministic and therefore cannot be written as an N by 2 matrix. Instead, it must be written as an N by N matrix where entry (i, j) specifies the probability of state i transitioning to state j at timestep t + 1 (a standard transition probability matrix). The transition probability matrix is given below.
6.4.10 Marshall et al. 2017 [16]
This model system is the fission yeast cell cycle from Marshall et al. 2017 [16]. As mentioned in the main text, we study the three node subsystem rather than the full eight node subsystem (plus one external node) studied in the original publication. To calculate the spectrum of Φ values for this subsystem (or just the PyPhi Φ value), the TPM for the entire system (all nine nodes) is required. Therefore, there are 512 states in the TPM. Following the authors, the initial state of the system is set to s0 = 000110011. In little-end binary notation (most significant bit on the right), the TPM used is as follows:
Footnotes
↵* jake.hanson{at}asu.edu; sara.i.walker{at}asu.edu
↵1 see e.g., citation tracking statistics for IIT and other theories using the SCOPUS database [7]
↵2 Less than a day on a Mac Pro; 3.5 GHz 6-Core Intel Xeon E5 Processor
↵3 Implementation of the “Smallest”,“Biggest”, and “Moon” solutions are available via keyword arguments in the PyPhi-Spectrum package (see Algorithm 3), while the KO17 solution is available by changing the USE-SMALL-PHI-FOR-CES option in the standard PyPhi configuration file.