## Abstract

The animal gut is a complex ecosystem containing many interacting species. A major objective of microbiota research is to identity the scale at which gut taxa shape hosts. However, most studies focus solely on pairwise interactions and ignore higher-order interactions involving three or more component taxa. Higher-order interactions represent non-additive effects that cannot be predicted from first-order or pairwise interactions.

Possible reasons as to why studies of higher higher-order interactions have been scarce is that many host-associated systems are experimentally intractable, gut microbiota are prohibitively species rich, and the influence of any given taxon on hosts is often context-dependent. Furthermore, quantifying emergent effects that represent higher-order interactions that are not simply the result of lower-order interactions, present a combinatorial challenge for which there are few well-developed statistical approaches in host-microbiota studies.

In this perspective, our goal is to quantify the existence of emerging higher-order effects and characterize their prevalence in the microbiota. To do so, we adapt a method from evolutionary genetics used to quantify epistatic effects between mutations and use it to quantify the effects of higher-order microbial interactions on host infection risk.

We illustrate this approach by applying it to an

*in silico*dataset generated to resemble a population of hosts with gut-associated microbial communities. We assign each host a pathogen load, and then determine how emergent interactions between gut taxa influence this host trait.We find that the effect of higher-order interactions generally increases in magnitude with the number of species in the gut community. Based on the average magnitude of interaction for each order, we find that 9

^{th}order interactions have the largest non-linear effect on determining host infection risk.Our approach illustrates how incorporating the effects of higher-order interactions among gut microbiota can be essential for understanding their effects on host infection risk. We conclude that insofar as higher-order interactions between taxa may profoundly shape important organismal phenotypes (such as susceptibility to infection), that they deserve greater attention in microbiome studies.

## Introduction

Animal guts contain complex microbial communities whose structure and function depend upon the interactions among microbes and the host. Gut microbiota serve as key actors in host health, impacting development, metabolism, and pathogen susceptibility (Brugman et al., 2018). The development of microbe-free (also known as germ-free) model hosts has made it possible to experimentally study how the microbiota influences host susceptibility to infection (Goodman et al., 2011; Ridaura et al., 2013). However, most studies rely on correlations between the relative abundances of individual bacterial taxa and host infection risk (e.g. pathogen load), ignoring the potential influence of higher-order interactions between taxa within the community. The field of complex systems is increasingly interested in understanding the emergent properties of higher-order interactions between objects (Lambiotte, Rosvall, & Scholtes, 2019a). Relatedly, a long-standing issue in ecology is to capture the vast diversity of multispecies species interactions—the unpredictable effects that arise when multiple species are present in an ecosystem (Hutchinson 1962). For example, the order of arrival of species into an ecosystem, and other factors (deterministic or stochastic in nature) can dictate species composition and the overall behavior of the system (Saavedra et al., 2017; Uricchio, Daws, Spear, & Mordecai, 2019). This problem has more recently become the object of inquiry in communities of microbes (Enke et al., 2019; Mickalide & Kuehn, 2019; Sanchez-Gorostiaga, Baji•, Osborne, Poyatos, & Sanchez, 2018). Many ecological studies involving complex network structures typically focus on pair-wise interactions and tend to ignore higher-order effects among three or more components (Kareiva, 1994; Levine, Bascompte, Adler, & Allesina, 2017; Mayfield & Stouffer, 2017). For example, in a system with two interacting microbes—*A* and *B*—the addition of a third microbe *C* may alter the pairwise interaction between *A* and *B* in a non-linear or non-intuitive fashion. This would constitute an emergent higher-order interaction between *A, B* and *C*. This is in contrast to a scenario where the microbe *C* interacts with either *A* or *B* in isolation, which constitute pairwise interactions with their own interaction effects. Therefore, quantifying emergent higher-order effects between microbial taxa is necessary to fully capture the structure and dynamics of biological systems.

Higher-order interactions have recently been the object of study in the realm of genetics, where they are discussed in light of epistasis, or non-linear interactions between genes and mutations (Mackay & Moore, 2014; Weinreich, Lan, Jaffe, & Heckendorn, 2018a; Weinreich, Lan, Wylie, & Heckendorn, 2013). A useful non-technical definition of epistasis is the “surprise at the phenotype when mutations are combined, given the constituent mutations’ individual effects (Weinreich, Lan, Jaffe, & Heckendorn, 2018b). This effectively captures what makes epistasis a provocative concept: the notion that interacting objects or parcels can have effects that are non-additive. In particular, higher-order epistasis is of interest, as it comprises all of the complexity and challenges of understanding and studying higher-order interactions in other systems (Lambiotte et al., 2019a).

Higher-order epistasis can have powerful effects on organismal phenotypes, which has complicated the genotype-phenotype mapping problem in genetics (Sackton & Hartl, 2016). To study higher-order epistasis in model organisms, molecular biologists engineer genes and mutations of interest in all possible permutations, a method labeled the “combinatorial approach.” (Weinreich et al., 2018b, 2013). Other studies resolve higher-order epistasis through more advanced statistical methods (Guerrero, Scarpino, Rodrigues, Hartl, & Ogbunugafor, 2019; Otwinowski, McCandlish, & Plotkin, 2018; Poelwijk, Krishna, & Ranganathan, 2016; Sailer & Harms, 2017).

Insect gut microbiota have been used as model systems to study the formation and assembly of microbial communities. Insect guts harbor relatively fewer microbial species, as compared to higher eukaryote hosts, with restricted core-members that can be grown axenically and manipulated genetically (Zheng, Steele, Leonard, Motta, & Moran, 2018). The protective function of microbes against invading pathogens have been studied across a range of insect hosts. For example, previous studies with bees found that core gut species were associated with increased host health, while non-core taxa were associated with decreased host health and increased pathogen infection (Cariveau, Elijah Powell, Koch, Winfree, & Moran, 2014; Koch & Schmid-Hempel, 2011; Raymann & Moran, 2018). However, other studies have also shown that pathogens alter the gut microbiota and facilitate gut infections (Abraham et al., 2017; Wei et al., 2017). Although many studies have shown correlations between core species and host traits, the extent to which individual versus species interactions facilitate or resist gut infections remains understudied.

Not unlike genomes, societies or neural circuits, insect gut microbiomes are complex systems defined by the interaction between individual parcels (component taxa in the microbiota). Consequently, we might predict that higher-order interactions between taxa in the microbiota might underlie microbiota-associated organismal phenotypes, such as susceptibility to infection. Recent work by Gould et al. 2018 found that higher-order interactions in the gut microbiota impact lifespan, fecundity, development time, and bacterial composition of *Drosophila* sp. With a gut community composed of 5 core taxa, they found that three-way, four-way, and five-way interactions accounted for 13-44% of all possible cases depending on the host trait. Yet, lower-order interactions (2-pairs) still accounted for at least half of all the observed phenotypes in the system (Gould et al., 2018).

Studies like Gould et al. 2018 provide an example of how higher-order interactions can be measured and suggest that they might be relevant for understanding how taxa influence certain phenotypes. But while the importance of diversity and host interactions is clear, no studies have attempted to specifically disentangle effects beyond four or five-way interactions. One major barrier to more of these studies is the paucity (or non-existence) of the datasets structured like those in an evolutionary genetics framework, such that existing statistical methods might be used to resolve interactions. (Tekin, Savage, & Yeh, 2017; Wood, Nishida, Sontag, & Cluzel, 2012). For example, the problem of constructing a set of insects that each carry a different combination of constituent taxa of interest grows exponentially with the number of taxa. And (perhaps) unlike genetics, constructing a different insect with a different set of bacterial taxa (corresponding with the possible combinations of taxa) is a non-trivial technical challenge. Nonetheless, the use of combinatorial complete datasets—insects containing all combinations of taxa— to explore higher-order interactions (beyond a single taxon or pairwise interactions) could help to inform how taxa interact in framing organismal phenotypes.

In this commentary, we propose a theoretical examination of higher order interactions in the gut microbiome. Specifically, we employ the Walsh-Hadamard transform (WHT), a mathematical regime that has been used to demonstrate how higher-order interactions between mutations influence fitness or other organismal traits (Poelwijk et al., 2016; Weinreich et al., 2013), to explore how higher-order interactions among gut taxa can influence host infection risk. We use it to quantify higher-order interactions in an *in silico* dataset resembling the type of data that can be empirically—that can be developed in the future—collected from insect guts. We introduce this approach with the hope that it may eventually be applied to a tractable experimental system for real-world validation, and believe that insect systems are among the most promising empirical systems.

## Methods

The Walsh-Hadamard Transform allows one to quantify the eminence of interaction effects of different order in a system of potentially-interacting objects or parcels. It yields a Walsh coefficient, which communicates the magnitude and sign of how a particular order interaction influences an output of interest. It implements phenotypic values in the form of a vector, before reformatting it into a Hadamard matrix (and is then scaled by a diagonal matrix). The output is a collection of coefficients which measure the degree to which the map is linear, or second order, third, and so forth. We provide a brief primer on the method, and refer readers to two published manuscripts—Poelwijk et al. (2016) and Weinreich et. al. (2013)—that outline and apply the method in good detail. Also see the Supplementary Information for a brief primer.

The Walsh-Hadamard Transform relies on the existence of combinatorial data sets, where the objects for which we are interested in understanding the interactions between (taxa in this study) are constructed in all possible combinations. Another limitation of the WHT is that it can only accommodate two variants per site, that is, two states per actor. In the case of taxa, we can think of this in terms of the presence/ absence of a certain taxon, and we can encode this in terms of 0 (absence) or 1 (presence). For each hypothetical insect with a different presence/ absence combination, we have a corresponding phenotypic measurement (e.g. parasite load). For example, if we wanted to measure the higher-order interactions between 4 taxa within an insect with regards to their role in parasite load (as a model phenotype), we would need 2^{L} = 16 individual measurements (insects in this case), with *L* corresponding to the number of different taxa whose effects we were interested in disentangling. We can encode this combination of 4 taxa in bit string notation (see Figure 1).

Each site (0 or 1) in the string corresponds to the presence or absence of a given taxa in a given insect. This notation allows us to keep a mental picture of which taxa are in which insect for which we have a phenotypic measurement and can be used to construct a vector of values. For example, the string 1010 corresponds to an insect with the pattern of present (1), absent (0), present (1), absent (0). The full data set includes a vector of phenotypic values for all possible combinations of taxa—0000, 0001, 0010, 0100, 1000, 0011, 0101, 0110, 1001, 1010, 1100, 0111, 1101, 1011, 1110, 1111. Note that these can be divided into different classes based on the “order” of the interaction. Order corresponds to the number of interacting actors. “Zeroth order” would correspond to the 0000 variant. This would translate to an insect that has none of the insect taxa present. There are 4, 1^{st} order interactions (0001, 0010, 0100, 1000), 6, 2^{nd} order (or pairwise) interactions (0011, 0101, 0110, 1001, 1010, 1100), 4, third-order interactions (0111, 1101, 1011, 1110), and 1 fourth order interaction (1111). The WHT will quantify

This vector of phenotypic values for the 16 will be multiplied by a (16 × 16) square matrix, which is the product of a diagonal matrix *V* and a Hadamard matrix *H*. These matrices are defined recursively by:
*n* is the number of loci (n = 4 in this hypothetical example). This matrix multiplication gives an output:

Where *V* and *H* are the matrices described in [1] and [2] above, and *y* is the Walsh coefficient, the measure of the interaction between parcels of information in a string. Using this, we compute *y* values for every possible interaction between bits in a given string. The *in silico* generated data discussed in this commentary are composed of 10-bit strings, each corresponding to the presence/ absence of a different microbial taxa. Such a case would have 2^{10} = 1024 total combinations of taxa, and corresponding phenotypic measurements (parasite load).

Similar to the 4-bit string example used to explain the method, note that each order has a different number of possible combinations. That is, the number of insects that can carry a combination of interacting taxa of a certain order. These are as follows: 0^{th} = 1; 1^{st} = 10, 2^{nd} = 45; 3^{rd} = 120; 4^{th} = 210; 5^{th} = 252; 6^{th} = 210; 7^{th} = 120; 8^{th} = 45; 9^{th} = 10; 10^{th} = 1. The methods offered here measure every one of these interactions (e.g. all 210 of the possible 6^{th} order interactions) between taxa. While our use of a 10-bit string (as opposed to an 8 or 15 bit string) is rather arbitrary, it is meant to highlight the vastness of the higher-interaction problem: Even if we suspect that only 10 taxa are meaningfully influencing a phenotype of interest (many studies contain more), the possible ways that these species are interacting, and the number of measurable coefficients between them can be astronomical in number.

Having outlined the method used to quantify higher-order interactions above, it is important to directly explain the presumptive biological interpretation of the values. The WHT returns a Walsh coefficient for each “order” of interaction. This corresponds to the relative strength or importance of that “order” in the phenotype being measured. Therefore, the Walsh-Hadamard Transform can help to interpret the overall presence and eminence of higher-order interactions between taxa in a microbiota.

## Results

Figure 2 depicts an *in silico* generated collection of 1024 insects, each containing one of the combinations of 10 taxa (2^{10} = 1024), organized into a fitness graph (see Supplemental Information for details on the in silico code and dataset). Each individual also has a parasite load. While other statistical methods may not require all possible combinations of taxa in order to extract meaningful information on the magnitude of higher-order interactions, creating the combinatorial set demonstrates the size and shape of the problem, all of the possible ways that taxa could interact.

Figure 3 depicts the raw calculations of the Walsh coefficients for all of the higher-order interactions (orders 2 – 9). Here we observe that the magnitude and direction of the interaction effect (Walsh coefficient) varies across different combinations of taxa. That the Walsh Hadamard Transform can disentangle these types of effects is a feature of the calculation and reveals the possibilities that exist in complex systems—like the microbiota—where many different objects are interacting. It is especially important to note that the specific identity of the taxa present is very important to understand in determining their interaction. We cannot assume that, for example, all third-order interactions (interactions between three taxa) will have the same magnitude or direction of interaction (e.g. positive or negative).

Figure 4 demonstrates the sum of the absolute values of the interaction coefficients highlighted in Figure 3. Here, we can observe the raw magnitude (leaving the sign— positive or negative—of it aside) of higher-order interactions as a function of interaction order. Between 1^{st} and 9^{th} order, higher-order effects increase, suggesting that they become more *meaningful* with the number of interacting microbes. Without knowing the specific mechanism at work, determining the mean magnitude of coefficients provides relevant information on the eminence of a given order in the microbiota. For example, in our in *silico* microbiota the 9th order taxa represents the highest magnitude of interaction relative to other taxa orders (Figure 4). As this is a theoretical, *in silico* generated microbiota, we can interpret this finding as meaning that 9^{th} order interactions contain the largest average deviation from additivity. That is, knowledge of how any given 9 taxa will interact requires very specific information on the identity of which 9 taxa are interacting. This is a characteristic of a highly non-linear, complex systems.

Note that all of these values—the raw *in silico* parasite load data, the interaction coefficients for all individual interactions, and the scaled, absolute value coefficients— can be found in the Supplementary Material.

## Discussion

In this commentary, we explore the possibility of higher-order interactions between taxa composing an insect gut microbiota. Using *in silico* and applied mathematical approaches, we demonstrate how higher-order interactions can be measured in a complex system of interacting microbial taxa. In our theoretical scenario, higher-order interactions are present and generally increase in relevance with the order of interaction. Though our results are theoretical, they are results nonetheless (Goldstein, 2018), highlighting the vast scope of the higher-order interaction problem, and outline one method that can be used to deconstruct them in biological systems. Though empirical data of the size and scope used in this study are currently challenging to generate, this intractability may be temporary, and future methods may permit the generation of data similar in structure to those explored in our theoretical examination.

The approach used in this study—the Walsh-Hadamard Transform—has been previously used by theoretical population geneticists to measure non-linear interactions between mutations (Weinreich et al., 2013). Several empirical data sets in genetics and genomics have demonstrated that the sign of interaction effects can change readily with the identity of the interacting parcels(Guerrero et al., 2019; Weinreich et al., 2018a, 2013). Given this, we predict that the taxa that compose the gut microbiota might be similarly defined by higher-order interactions. The capacity for measuring the effects of higher-order interactions on host fitness is an important step towards understanding the effects of microbiota on their host. Indeed, considering higher-order interactions can enable more robust information on non-linear interactions in microbiome communities.

We found that higher-order interactions were present, and that taxa interacted both positively and negatively. Combined interactions among taxa are augmented compared to what is expected from individual effects when phenotypic effects are positive. In contrast, higher-order effects are negative when combined interactions among taxa show a diminished return and are less fit than would be expected from their individual effects (fig 3). Such combinatorial complete data-sets can tell us what scale microbial interactions matter in predicting host infection. Moreover, they reveal patterns of interactions, particularly those combinations that interact synergistically or antagonistically (Hartl, 2014). One potential limitation of the outlined approach is the requirement for combinatorial complete datasets. For high-diversity microbiomes, including humans and plants, it is not currently feasible to carry out experiments measuring phenotypes for all the possible microbial interactions.

Microbe-mediated protection against pathogens depends on subtle differences in gut community structure. In North American wild bumble bees, lower *Chrithidia* parasite infection loads are associated with higher microbiota diversity. Using transplants to naive host, it was shown that the core-gut bacteria were responsible for conferring resistance to the *Chrithidia* parasite, while non-core gut bacteria were found to be less effective against the parasite (Mockler, Kwong, Moran, & Koch, 2018). In mosquitos, gut bacterial species can trigger an immune defense against *Plasmodium* parasites, the causative agent of malaria (Bahia et al., 2014). In sandflies, highly diverse midgut microbiota’s were found to be negatively correlated with the parasite that causes the vector-borne disease leishmaniasis (Kelly et al., 2017). While these studies did not investigate the effects of higher-order interactions on host fitness, future experimental studies manipulating microbial communities should consider combinatorial designs.

Recent theoretical work suggests that higher-order modeling approaches are able to capture volumes of rich data arising from complex ecological interactions (Lambiotte, Rosvall, & Scholtes, 2019b). In this perspective, we have adapted approaches from population genetics to the study of host-associated microbiota. Applying these methods to the analysis of real experiments will yield important insight into microbiome dynamics, towards a richer understanding of just how peculiar the microbiota is, and the many meaningful interactions that it is embodies.

## Data Availability

The *in silico* data used in this study and code used to generate them can be found on github: https://github.com/OgPlexus/MicrobeTaxa1

## Supplemental Information

The authors have prepared a simple mathematical primer on the Walsh-Hadamard Transform: https://github.com/OgPlexus/MicrobeTaxa1. For a more rigorous understanding, readers are encouraged to engage the works cited in this manuscript.

## Acknowledgements

We wish to acknowledge the organizers and participants of the 2017 RCN-IDEAS arbovirus workshop held in New Orleans. SY acknowledges funding support from NSF Postdoctoral Fellowship award number 1612302. CBO acknowledges funding support from NSF RII Track-2 FEC award number 1736253. The authors would like to thank Victor Meszaros and Miles Miller-Dickson for their input on the *in silico* data, figures and Walsh-Hadamard primer. We finally thank Lawrence Uricchio for constructive feedback on our manuscript.