Symmetry, gauge freedoms, and the interpretability of sequence-function relationships

Quantitative models that describe how biological sequences encode functional activities are ubiquitous in modern biology. One important aspect of these models is that they commonly exhibit gauge freedoms, i.e., directions in parameter space that do not affect model predictions. In physics, gauge freedoms arise when physical theories are formulated in ways that respect fundamental symmetries. However, the connections that gauge freedoms in models of sequence-function relationships have to the symmetries of sequence space have yet to be systematically studied. Here we study the gauge freedoms of models that respect a specific symmetry of sequence space: the group of position-specific character permutations. We find that gauge freedoms arise when model parameters transform under redundant irreducible matrix representations of this group. Based on this finding, we describe an “embedding distillation” procedure that enables analytic calculation of the number of independent gauge freedoms, as well as efficient computation of a sparse basis for the space of gauge freedoms. We also study how parameter transformation behavior affects parameter interpretability. We find that in many (and possibly all) nontrivial models, the ability to interpret individual model parameters as quantifying intrinsic allelic effects requires that gauge freedoms be present. This finding establishes an incompatibility between two distinct notions of parameter interpretability. Our work thus advances the understanding of symmetries, gauge freedoms, and parameter interpretability in sequence-function relationships.

Another method limits the mathematical models that one uses to models that do not have any gauge freedoms (19)(20)(21)(22)(23)(24).But despite being frequently encountered in the course of research, the gauge freedoms present in models of sequence-function relationships have received only limited attention (e.g., 3, 5-7, 12, 25).In particular, the mathematical properties of these 24 gauge freedoms have yet to be systematically studied.25 In physics, by contrast, gauge freedoms are a topic of fun-26 damental importance (26).Gauge freedoms are well-known 27 to arise when a physical theory is expressed in a form that 28 manifestly respects fundamental symmetries.For example, 29 the classical theory of electricity and magnetism (E&M) is 30 invariant to Lorentz transformations, i.e., relativistic changes 31 in an observer's velocity (27).Lorentz invariance is obscured, 32 however, when the equations of E&M are expressed directly in 33 terms of electric and magnetic fields.To express these equa-34 tions in a form that is manifestly Lorentz invariant, one must 35 instead formulate them in terms of an electromagnetic four-36 potential.Doing this introduces gauge freedoms because the ships are linear in ⃗ θ and thus have the form θixi(s), [1] 86 for all s ∈ S. Here, ⃗ x(•) is an M -dimensional vector of sequence 93 Here, l, l ′ ∈ {1, . . ., L} index the positions within each se- 99 † Our analysis is readily extended to models in which ⃗ x and ⃗ θ are complex vectors (e.g.24).All the results in Supplemental Information are, in fact, derived for this more general class of models.Here we restrict our discussion to the reals only to simplify the presentation.
x c l (s), which equal one if s l = c and equal zero otherwise (where s l denotes the character at position l in sequence s); and pairwise one-hot features, x cc ′ ll ′ (s), which equal one if both s l = c and s l ′ = c ′ , and which equal zero otherwise.

Gauge freedoms.
Gauge freedoms are transformations of model parameters that do not affect model predictions.Formally, a gauge freedom is any vector ⃗ g ∈ R M that satisfies For linear sequence-function relationships the set of gauge freedoms, denoted by G, forms a vector space in R M .It is readily shown that G is the orthogonal complement of the space spanned by sequence embeddings (31).In what follows, we use γ to represent the dimension of G, i.e., the number of (independent) gauge freedoms.
Gauge freedoms arise from linear dependencies among sequence features.By inspection we see that f ohe pair has parameters.However, the space spanned by the corresponding embedding, ⃗ x ohe pair , has only 1 + (α − 1)L + L 2 (α − 1) 2 dimensions.This difference reflects the presence of L + L 2 (2α − 1) constraints on the features: x0 = c ′ x c ′ l for all positions l ( 1 constraint per position), and both ll ′ for all pairs of positions l < l ′ (2α − 1 independent constraints per pair of positions).The model f ohe pair therefore has gauge freedoms.See our companion paper (31) for details, as well as (3,5,7,10) for earlier treatments of gauge freedoms in the pairwise one-hot model.

Fixing the gauge.
Fixing the gauge is the process of removing gauge freedoms by restricting ⃗ θ to a subset Θ of parameter space called "the gauge".Linear gauges are choices of Θ that are vector spaces.One useful property of linear gauges is that gauge-fixing can be accomplished through linear projection.Specifically, for any linear gauge Θ, there exists a projection matrix P that projects each parameter vector ⃗ θ ∈ R M onto an equivalent parameter vector ⃗ θ fixed ∈ Θ via ⃗ θ fixed = P ⃗ θ.Our companion paper describes a parametric family of linear gauges (including an explicit formula for the corresponding projection matrices) that includes as special cases many of the most commonly used gauges in the literature (31).

Results
We being this section by defining a specific symmetry group of sequence space-the group of position-specific character permutations-as well as the notion of model equivariance under this group.We then illustrate, for two example models, how model transformation behavior under this symmetry group impacts gauge freedoms and parameter interpretability.Next we formally investigate this relationship using methods from the theory of group representations (32).In doing so, we establish an "embedding distillation" procedure that, for any equivariant model, enables analytic calculation of the number p. 2 of gauge freedoms and efficient computation of a sparse basis for the space of gauge freedoms.We conclude by revisiting the issue of parameter interpretability in light of these results.
Position-specific character permutations.Different transformations of sequence space impact models of sequence-function relationships in different ways.Here we focus on a specific kind of transformation: position-specific character permutations.
These transformations of sequence space form a mathematical group, which we denote by H PSCP .The action of a transformation h ∈ H PSCP on a sequence s ∈ S is written hs.H PSCP is a symmetry group of sequence space in that transformations in H PSCP preserve the Hamming distances between sequences.
There are other symmetry groups of sequence space as well, but we ultimately find that these symmetry groups do not have the same connections to gauge freedoms that H PSCP does (discussed below and in SI Sec. 7).
Equivariance.We also focus on linear models of sequencefunction relationships for which H PSCP induces linear transformations of the embeddings and parameters.These linear transformations are called "representations" (32).In general, a representation R of a group H is a function that maps each In what follows, we say that an embedding ⃗ x is equivariant in H if and only if there is a representation R such that ⃗ x(hs) = R(h) ⃗ x(s) [6] for all h ∈ H and all s ∈ S. We also say that a model is equivariant if and only if it has an equivariant embedding.For an equivariant model whose embedding transforms as in Eq.
6, the transformation of S by any h ∈ H can be compensated for by the transformation of ⃗ θ by R(h) −1⊤ , in the sense that for every s ∈ S and every ⃗ θ ∈ R M (see SI Sec.3.2).Although linear models of sequence-function relationships can be equivariant in a variety of symmetry groups H, we use the term "equivariant" to specifically refer to equivariance under H PSCP unless otherwise noted.
One-hot models.The most commonly used equivariant models are based on single-position one-hot embeddings.We denote the single-position one-hot embedding for position l as ⃗ x ohe l and define it to be a binary vector of dimension α with features x c 1 l , . . ., x cα l , where c1, . . ., cα is an ordering of the characters in A. For example, Fig. 1A shows ⃗ x ohe  This and all other matrices in the representation R ohe l are 206 permutation matrices: all matrix elements are 0 or 1, and each 207 row and column contains a single 1.Consequently, multiplying 208 a vector by one of these matrices changes the order of the 209 elements in that vector, but does not change the overall set 210 values that those elements take.We refer to ⃗ x ohe l and other 211 embeddings that transform under permutation representations 212 as permutation embeddings; their corresponding models are 213 called permutation models.

214
The embeddings of many different models can be built by 215 taking direct sums of direct products of ⃗ x ohe l .For example, 216 the pairwise one-hot model of Eq. 2 is based on the embedding 217 where ⃗ x triv denotes the trivial embedding (defined to be the 219 1-dimensional vector [1] for all sequences).Because ⃗ x ohe l is a 220 permutation embedding, so is ⃗ x ohe pair .In fact, any embedding 221 constructed from a direct sum of direct products of ⃗ x ohe l is 222 a permutation embedding.We call this class of models the 223 generalized one-hot models.

224
How a single-position embedding transforms has important 225 consequences for how the parameters of models constructed 226 from that embedding are interpreted.For the pairwise one-hot 227 model, the fact that ⃗ x ohe l transforms under a permutation 228 representation imples that both ⃗ x ohe pair and ⃗ θ ohe pair do as well.229 A consequence of this is that the individual parameters in 230 ⃗ θ ohe pair can be interpreted as quantifying intrinsic allelic effects.231 For example, the transformation h A↔C induces a permutation 232 p. 3 of parameters that exchanges θ A l ↔ θ C l at all positions l, at all pairs of positions l < l ′ , and 234 so on.Model parameters therefore track their corresponding alleles: θ A l tracks sequences that have A at position l, θ AA ll ′ tracks sequences that have AA at positions l and l ′ , etc..

237
The fact that ⃗ x ohe l transforms under a permutation repre-238 sentation also means that the features therein are not linearly 239 independent.For example, the three embedding vectors in Fig.

240
1B lie within a two-dimensional affine subspace defined by the to be an α − 1 dimensional vector, the i'th element of which is . [11] 266 The corresponding pairwise simplex model has the form 267 268 where x i l denotes the i'th element of ⃗ x ohe l , and where The lack of gauge freedoms in f sim pair is one example of the fact that, as we will see, models defined using (non-redundant) simplex embeddings do not have gauge freedoms.In fact, multiple groups (20,22,23) have argued for the use of simplex models, rather than one-hot models, based on the former not having gauge freedoms.
We argue, however, that the parameters of simplex models are fundamentally more difficult to interpret than are the parameters of one-hot models.Because ⃗ x sim l does not transform under a permutation representation, neither does ⃗ x sim pair and neither does ⃗ θ sim pair .In the case of the three-character alphabet, one sees from Eq. 11 that h A↔C induces a transformation of model parameters that maps ll ′ , and so on.The fact that these parameters change in a way that is not described by a permutation but rather by nontrivial linear combinations, means that individual parameters cannot be interpreted as individual allelic effects.

Maschke decomposition of equivariant embeddings.
Having illustrated the connection between model transformation behavior, gauge freedoms, and parameter interpretability, we now use methods from the theory of group representations to formally investigate this connection.Every group representation is either reducible or irreducible.A representation is irreducible if and only if it has no proper invariant subspace.Maschke's theorem, a foundational result in representation theory, states that all representations of finite groups are equivalent (i.e., equal up to a similarity transformation) to a direct sum of irreducible representations.Because H PSCP is finite, any of its representations R can be expressed as where ≃ denotes equivalence, the R k are pairwise inequivalent irreducible representations of H PSCP , and each Q k denotes the multiplicity of R k in the direct sum.
In what follows, we say that a sequence embedding is irreducible if and only if it transforms under an irreducible representation of H PSCP .One consequence of Eq. 15 is that any embedding ⃗ x that transforms under R can be decomposed as where each ⃗ x kq is an irreducible embedding that transforms under R k .
This Maschke decomposition of R and ⃗ x is illustrated in Fig. 2A,B.Note: here and in what follows we assume that all ⃗ x kq are nonzero, but this assumption can be removed without fundamentally changing our results; see SI Sec.5.2 for details.
Distillation of equivariant embeddings.We now describe an "embedding distillation" procedure that connects the Maschke decomposition of ⃗ x to the gauge freedoms of the corresponding model.In SI Sec.5.1 we prove the following: Theorem 1 Any two nonzero sequence embeddings that transform under the same irreducible representation of H PSCP are equal up to a constant of proportionality.

p. 4
Using Theorem 1 we obtain, where ⃗ x k is any one of the irreducible embeddings ⃗ x kq in Eq.
6 and Q k again denotes the multiplicity of each term in the direct sum.Additional similarity transformations can then be performed to zero out all except one copy of ⃗ x k .There is therefore an invertible "distillation matrix" T such that where ⃗ 0γ is a γ-dimensional vector of zeros and is the distilled embedding.When applied to the representation R, this distillation procedure yields where the distilled representation, prises one copy of each R k present in Eq. 15, and where the redundant representation, prises the remaining copies of R k .The final distilled versions of R and ⃗ x are illustrated in Fig. 2C.Because ⃗ x dist is full rank, ⃗ g ⊤ ⃗ x(s) = 0 for all s ∈ S if and only if where T is the distillation matrix in Eq. 18 and ⃗ gγ is any vector in R γ .The space of gauge transformations G is therefore given by the set of vectors having the form in Eq. 21.In particular, the number of gauge freedoms is seen to be Equivalently, γ = deg R redun .We thus see that the number of gauge freedoms of an equivariant linear model is equal to the sum of the degrees of all the redundant irreducible representations under which that model's embedding (or equivalently, parameters) transforms.
Identification of all equivariant models.The mathematical structure of a group defines the models that transform equivariantly under that group.In the case of H PSCP , the relatively simple group structure allows the straight-forward identification of all inequivalent distilled embeddings, and thus all inequivalent equivariant linear models of sequence-function relationships.
H PSCP can be written as a product of simpler groups:

16). (C) By
Theorem 1, an additional similarity transformation can be performed that, for each value of k, zeroes out all but one ⃗ x kq ; the remaining ⃗ x kq is denoted by ⃗ x k (Eq.18 and Eq.19).Consequently, ⃗ x decomposes into a direct sum of a distilled embedding ⃗ x dist and a zero vector ⃗ 0γ having dimension γ (Eq.18).⃗ x dist is given by the direct sum of all ⃗ x k (Eq.19) and is full rank by Theorem 2. The distilled representation R dist describes how ⃗ x dist transforms and is given by a direct sum of one copy of each R k .The redundant representation R redun operates on ⃗ 0γ and comprises the Q k − 1 remaining copies of each R k .The number of gauge freedoms γ is equal to the degree of R redun (Eq.22).
p. 5 model type interaction orders no.parameters (M goh ) no. gauge freedoms (γ goh ) constant 0 all-order 0, 1, . . ., L (α + 1) where each H l denotes the group of character permutations at position l only.Each irreducible representation R k of H PSCP can therefore be expressed as where each ⃗ x k l is an irreducible representation of H l [e.g., see Theorem 1.11.3 of (32)].An embedding ⃗ x k that transforms under R k will therefore have the form where ⃗ x k l is an irreducible embedding that transforms under H l .
In SI Sec.4.3 we show that H l supports only two inequivalent equivariant embeddings (regardless of alphabet size): ⃗ x triv and ⃗ x sim l .Each ⃗ x k l must therefore be one of these two embeddings.
Ignoring factors of ⃗ x triv which do not alter direct products, Eq. 25 can therefore be written as where B k is a subset of the positions {1, . . ., L}.There are 2 L possible choices for each subset B k , and thus 2 L inequivalent irreducible embeddings ⃗ x k .Since each ⃗ x k can appear at most once on the left-hand side of Eq. 19, we find that there are For each choice of ⃗ x dist there are an infinite number of possible choices for T and γ that can be used, via Eq. 18, to define ⃗ x.The number of possible equivariant embeddings ⃗ x, and thus the number of equivariant models f , is therefore infinite.However, all models corresponding to a specific ⃗ x dist have the same expressivity, i.e., the set of sequence-function relationships that each model describes (considered over all possible values of model parameters) is the same.We therefore consider these models to be equivalent, and conclude that there are a total of 2 2 L inequivalent equivariant linear models of sequence-function relationships.
Analytical analysis of generalized one-hot models.We now use the embedding distillation procedure to compute the number of gauge freedoms of all generalized one-hot models.This derivation is based on the Maschke decomposition of ⃗ x ohe l , 418 which is [27] 420 see SI Sec.2.5 for details.

421
We first demonstrate this analysis on the pairwise one-hot 422 model.Plugging the decomposition of ⃗ x ohe l in Eq. 27 into the 423 definition for ⃗ x ohe pair in Eq. 9, then expanding the direct product 424 and grouping like terms, we find that , [28] 426 where the scalar coefficients correspond to the Q k in Eq. 17. 427 We derive the corresponding distilled embedding by simply 428 replacing each of these coefficients with 1. Doing so reveals 429 the distillation of ⃗ x ohe pair to be ⃗ x sim pair .The result for γ ohe pair in Eq. 430 14 is therefore just a manifestation of Eq. 22.

431
We now extend this analysis approach to all generalized one-432 hot models.The embedding ⃗ x goh of any generalized one-hot 433 model can be written as [29] 435 where A1, . . ., AJ denote J (not necessarily distinct) sets of 436 positions.Because the dimension of ⃗ x ohe l is α, the number of 437 corresponding model parameters is [30] 439 Again, using Eq.27 to decompose ⃗ x ohe l in terms of ⃗ x triv and 440 ⃗ x sim l , then expanding each tensor product and grouping the 441 resulting terms, we find that where B1, . . ., BK denote the distinct subsets of positions that 444 occur among the Aj.Because the dimension of ⃗ x sim The number of gauge freedoms of the generalized one-hot model having embedding ⃗ x goh is therefore given by Table 1 reports the number of gauge freedoms computed in this manner for a variety of generalized one-hot models (illustrated in Fig. 3).SI Sec. 6 provides expanded descriptions for each generalized one-hot model, as well as detailed computations of the results in Table 1.
A result of this analytic analysis is that all generalized one-hot models have gauge freedoms, save models for which the direct sum in Eq. 29 includes only one term.To see this, observe that Eq. 17 gives where each multiplicity value Q k is equal to the number of 460 sets Aj that contain B k .Using this together with Eq. 31 in 461 Eq. 22 gives We thus see that γ goh = 0 if and only if none of the Q k are 464 greater than 1.But the empty set is a subset of every Aj.465 p. 7  , where T (1) is the α × α matrix 488 Using T (1) , one can compute the distillation matrix T for any generalized one-hot model as the product of three matrices: The effects of these three matrices are illustrated in Fig. 4.
The "decomposition matrix", T decom , decomposes ⃗ x goh (Fig. 4A) into a direct sum of irreducible embeddings (Fig. 4B).The "thinning matrix", T thin , then zeros out all except the first copy of each irreducible embedding (Fig. 4C).The "sorting matrix", T sort, then rearranges the direct sum so that the remaining nonzero embeddings come first, followed by a zero vector of dimension γ goh (Fig. 4D).SI Sec. 8 provides explicit algorithms for constructing T decom , T thin , and T sort as well as the inverse of each of these three matrices.It is readily seen that all of these matrices are sparse in the large L limit when the maximal order of interaction described by the model is fixed.The resulting distillation matrix T as well as its inverse are therefore also sparse.Moreover, every nonzero element of T is +1 or -1 (Fig. 4E).Taking the last γ goh rows of T we obtain a basis for G goh comprising sparse vectors whose only nonzero elements are +1 and −1.These sparse forms for the factors of T and T −1 also allows us to compute a sparse gauge-fixing projection matrix P ; see SI Sec. 8 for details.
Other symmetry groups.The proof of Theorem 1 in SI Sec.Transformation behavior and parameter interpretability.We now return to the connection between parameter transformation behavior and parameter interpretability.Above we observed for pairwise models that the ability to interpret individual parameters as quantifying intrinsic allelic effects appears to require the presence of gauge freedoms.We now formalize this observation and conjecture an extension to all linear models.
We define an allele a to be to be a pattern of characters that is either present or absent in any sequence s ∈ S. The corresponding "allelic feature" xa is defined be the indicator function on S for whether a sequence has allele a, and an "allelic model" is defined to be a linear sequence-function relationship in which every feature is an allelic feature.In the context of an allelic model, the parameter θa that multiplies xa is said to be an "allelic effect."The parameters of a linear model can therefore be interpreted as allelic effects if and only if every one of the corresponding features is an indicator function on S.
For an allelic model to have parameters that describe intrinsic allelic effects, the model must be a "permutation model", i.e., the features and parameters of the model must transform under a permutation representation of H PSCP .Requiring an allelic model to be a permutation model puts strong constraints on which sets of alleles it can describe.Because H PSCP permutes sequences, it also permutes alleles.Given a specific allele a, we call the set of alleles created by the action of H PSCP on a an "allelic orbit".It is readily seen that, for an allelic model to be a permutation model, the set of alleles it describes must consist of an integral number J of complete allelic orbits.
All allelic models that comprise J ≥ 2 allelic orbits have gauge freedoms.To see this, observe that the features in each orbit transform among themselves according to a permutation representation.The features of the full model will therefore transform under a direct sum of J permutation representations.
Because every permutation representation contains the trivial representation in its Maschke decomposition, the decomposition of the full model's representation will contain at least J copies of the trivial representation.The model will therefore have at least J − 1 gauge freedoms.Additional gauge freedoms can be present as well, so this result only provides a lower 579 bound on the number of gauge freedoms.

580
This result is reflected in our above analytic analysis of 581 generalized one-hot models.All generalized one-hot models 582 are allelic permutation models (though the converse is not 583 true; see SI Sec.9.1), and each allelic orbit of a generalized 584 one-hot model corresponds to a position set Aj in Eq. 29.585 The lower-bound on the number of gauge freedoms identified 586 here recapitulates the finding above that generalized one-hot 587 models have no gauge freedoms if and only if J = 1.

588
An allelic permutation model that does not have gauge 589 freedoms must therefore comprise only one allelic orbit.Are 590 single-orbit allelic models useful in practice?We argue that 591 the answer is essentially "no".In SI Sec.9.1 we show that 592 single-orbit generalized one-hot models cannot describe co-593 occurring alleles.We regard such models as "trivial" because 594 the entire reason for quantitatively modeling sequence-function 595 relationships is to deconvolve the influence of co-occurring 596 alleles.There are single-orbit allelic permutation that describe 597 co-occurring alleles, but all the examples of these we have 598 analyzed either have gauge freedoms or are mathematically 599 equivalent to generalized one-hot models (see SI Sec.9.1).600 Moreover, among models whose embeddings are built from 601 direct sums of direct products of single-position embeddings, 602 the generalized one-hot models have the fewest gauge freedoms 603 (see SI Sec.9.2).Based on these findings, we conjecture 604 that all nontrivial allelic permutation models (i.e., all models 605 whose parameters describe intrinsic allelic effects) have gauge 606 freedoms.Motivated by the connection between gauge freedoms and 609 symmetries in physics, we investigated the relationship be-610 tween gauge freedoms and symmetries in quantitative models 611 of sequence-function relationships.We found that, for linear 612 models that are equivariant under the group of position-specific 613 character permutations (denoted H PSCP ), gauge freedoms arise 614 due to model parameters transforming according to redundant 615 irreducible matrix representations.From a practical stand-616 point, this result facilitates the analytic calculation of the 617 number of independent gauge freedoms, as well as the efficient 618 computation of a sparse basis for the space of gauge freedoms, 619 in a large class of commonly used models.From a conceptual 620 standpoint, this result links the gauge freedoms of models of 621 sequence-function relationships to the transformation behavior 622 of these models under a specific symmetry group of sequence 623 space.

624
We also investigated the link between parameter trans-625 formation behavior and parameter interpretability.In doing 626 so, we identified an incompatibility between two different no-627 tions of parameter interpretability: in linear models that are 628 equivariant under H PSCP , the ability to interpret individual 629 parameters as quantifying intrinsic allelic effects requires that 630 these parameters transform under a permutation representa-631 tion of H PSCP .But in many (and possibly in all) nontrivial 632 models, this requirement is incompatible with the ability to 633 interpret the values of individual parameters in the absence 634 of gauge-fixing constraints.Consequently, models that have 635 gauge freedoms can have advantages over equally expressive 636 models that do not have gauge freedoms.

637
It should be noted that there are indeed useful models that 638 p. 9 do not have gauge freedoms.One such class of models are the To study a sequence-function relationship of interest, researchers often propose a mathematical model, fit the parameters of the model to data, then biologically interpret the resulting parameter values.This interpretation step is complicated, however, by gauge freedoms-directions in parameter space along which model parameters can be changed without altering model predictions.If any gauge freedoms are present in a model, the values of individual model parameters cannot be meaningfully interpreted in the absence of additional constraints.
87 features, and each feature xi(•) is a function that maps S to 88 R. We refer to the space R M in which ⃗ x and ⃗ θ live as feature 89 space.† 90 An example of a linear model is the pairwise one-hot model,

94
quence s, c, c ′ ∈ A index the possible characters at these 95 positions, and the sequence arguments of both the model and 96 features are kept implicit.Pairwise one-hot models comprise 97 three types of features: the constant feature, x0(s), which 98 equals one for every sequence s; additive one-hot features, h ∈ H to a matrix R(h) in a way that preserves the multiplicative structure of H, i.e., R(h1h2) = R(h1)R(h2) for any two group elements h1, h2 ∈ H.The degree of the representation R (denoted deg R) is the dimension of the space on which R acts.

l
for the three-character alphabet A = {A, B, C}.The one-hot embedding ⃗ x ohe l transforms under a permutation representation, which we denote as R ohe l .For example, consider the transformation h A↔C that exchanges characters A and C at every position in a sequence.The effect of this transformation on ⃗ x ohe l (Fig. 1B) is equivalent to multiplying ⃗ x ohe l

Fig. 1 .are denoted x 1 l
Fig. 1.Transformation behavior of two single-position embeddings.(A) Two possible embeddings of characters at position l in a sequence built from the three-character alphabet A = {A, B, C}: the three-dimensional one-hot embedding ⃗ x ohe l and

261
Fig. 1A,C illustrate ⃗ x sim l 256 for the three-character alphabet.Unlike ⃗ x ohe l , ⃗ x sim l transforms 257 under a non-permutation representation, which we denote as 258 R sim l .For example, the effect of h A↔C on ⃗ x sim l As with one-hot embeddings, the embeddings of many dif-262 ferent models can be built from direct sums of direct products 263 of ⃗ x sim l .For example, a simplex embedding analogous to ⃗ x ohe pair 264 can be constructed as 265

Fig. 2 .
Fig. 2. Embedding distillation.(A) Given an M -dimensional embedding ⃗ x that is equivariant under H PSCP , let R be the representation describes how ⃗ x transforms.(B) By Maschke's theorem, R can be decomposed into a direct sum of irreducible representations, R k (k ∈ {1, . . ., K}), each of which occurs with multiplicity Q k (Eq.15).Similarly, ⃗ x can be decomposed into a direct sum of irreducible embeddings ⃗ x kq (q ∈ {1, . . ., Q k }), where each ⃗ x kq transforms under R k (Eq.16).(C) By

Fig. 3 .
Fig. 3. Structure of generalized one-hot models analyzed in Table 1 for sequences of length L = 5.Open circles represent sequence positions.Closed circles represent allelic orbits, i.e., sets of sequence features that are closed under the action of H PSCP .Edges indicate position indices shared by the features in each allelic orbit.(A) Structure of constant, additive, pairwise, nearest-neighbor, all-order, and all-adjacent models.(B) Structure of K-order models and K-adjacent models for various interaction orders K.

Fig. 4 .
Fig. 4. Embedding distillation for an example generalized one-hot model.(A) Embedding ⃗ x goh of the L = 3 sequence s = ABC for an all-order interaction model based on the alphabet A = {A, B, C}.Embedding has degree M goh = 64.(B) Result of multiplication by the decomposition matrix, T decom .(C) Result of subsequent multiplication by the thinning matrix T thin .(D) Result of subsequent multiplication by the sorting matrix Tsort, which yields ⃗ x dist ⊕ ⃗ 0γ goh with γ goh = 37 gauge freedoms.In B-D, dots indicate ⃗ x triv , dashes indicate zero vectors, and numbers indicate ⃗ x sim l or Kronecker products thereof for specified positions l. (E) Distillation matrix T that implements the full distillation procedure in A-D.The last γ goh rows of T provide a sparse basis for the gauge space, G goh .In A-E, vector and matrix elements are colored using: blue, +1; yellow, -1; gray, 0. l

Table 1 . Analytical results for various generalized one-hot models, computed using Eqs. 30 and 33. Columns show model type, the orders of interaction included in each model, the number of parameters of each model, and the number of gauge freedoms of each model. See SI Sec. 6 for derivations of these results
. K-adjacent models assume K ≥ 1. ‡ Only includes interactions among adjacent positions.
5.1, and thus our embedding distillation procedure, applies only to the symmetry group H PSCP .There are other symmetry groups of sequence space besides H PSCP , however, and it is worth asking whether Theorem 1, and thus Eqs.18-22, hold for those groups as well.One other symmetry group is the group of global character permutations, H GCP .This group comprises transformations that affect the same permutation of characters at every position.Another is the group of position permutations, H PP .This group comprises transformations that permute positions without otherwise changing characters.SI Sec.7.1 shows that Theorem 1 does not hold for either H GCP or H PP .Conse-Ham is the largest symmetry group that preserves Hamming distances (33), and includes H PSCP , H PP , and H GCP as subgroups.Theorem 1 does hold for H Ham , due the fact that H PSCP is a subgroup (see SI Sec.7.2).However, the set of models that are equivariant under H Ham is a subset of the models that are equivariant under H PSCP , and the irreducible representations of H Ham are more complex than those of H PSCP .H PSCP is therefore more useful than H Ham is for analyzing gauge freedoms.