The quantitative genetics of gene expression in Mimulus guttatus

Gene expression can be influenced by genetic variants that are closely linked to the expressed gene (cis eQTLs) and variants in other parts of the genome (trans eQTLs). We created a multiparental mapping population by sampling genotypes from a single natural population of Mimulus guttatus and scored gene expression in the leaves of 1,588 plants. We find that nearly every measured gene exhibits cis regulatory variation (91% have FDR < 0.05) and that cis eQTLs are usually allelic series with three or more functionally distinct alleles. The cis locus explains about two thirds of the standing genetic variance (on average) but varies among genes and tends to be greatest when there is high indel variation in the upstream regulatory region and high nucleotide diversity in the coding sequence. Despite mapping over 10,000 trans eQTL / affected gene pairs, most of the genetic variance generated by trans acting loci remains unexplained. This implies a large reservoir of trans acting genes with subtle or diffuse effects. Mapped trans eQTLs show lower allelic diversity but much higher genetic dominance than cis eQTLs. Several analyses also indicate that trans eQTL make a substantial contribution to the genetic correlations in expression among different genes. They may thus be essential determinants of “gene expression modules”, which has important implications for the evolution of gene expression and also how it is studied by geneticists.


Introduction
Gene expression as measured by RNAseq [1] is a quantitative trait.Expression scored from sequence-read counts is strongly influenced by environmental variables, measurement error, and the complex interaction of many genes [2].When RNAseq is applied to a population (genotypes randomly sampled from a deme), the machinery of quantitative genetics can be employed to address basic questions about expression variation: How many loci affect expression of each gene and how large are their respective effects?What is the allele frequency spectrum at expression Quantitative Trait Loci (eQTLs)?Is gene expression affected by genetic complexities such as dominance, epistasis, or genotype by environment interaction?Finally, recognizing that the entire transcriptome is just a very long vector of quantitative traits, the task is to determine the respective contributions of genetic and environment covariances to co-expression patterns across the genome.
A simple but important question: What fraction of the genetic variation in gene expression is generated by the cis-regulatory locus?Unlike with other quantitative characters, we know the location of this potentially key eQTL before collecting data.The DNA surrounding a gene is likely to contain regulatory sequences and is thus a strong candidate as an effector of expression.The priority of the cis-regulatory region suggests an oligogenic model of inheritance [3] with most variation generated by a major effect QTL with a lesser contribution of numerous unlinked modifiers (trans eQTLs in this context).However, association studies of gene expression variation in humans suggest a very different model.Even if the cis eQTL is the most important single locus, it may explain only a small fraction of the genetic variance in expression.The omnigenic model [4,5] posits that many small effect trans-eQTLs, distributed uniformly over the entire genome, generate the bulk of variation in expression.
Genetic dominance is likely to differ between cis and trans eQTLs.Additive gene action is expected for cis eQTLs [6] given that cis-regulatory molecules bind separately to each allele.With allele-specific effects on expression, additivity results if the overall expression of a gene is the sum of the mRNAs produced independently by each allele.This simple model can breakdown if there is imprinting [7] or if feedback mechanisms such as autoregulation [8] cause the realized mRNA levels of one allele to depend on the expression of the other.In contrast to cis, there is no a priori reason to assume additive gene action for trans eQTLs.The product of a trans acting locus (say a transcription factor protein) can affect both alleles of the expressed gene [2].
Cis and trans eQTLs should also contribute differentially to genetic covariances between expressed genes.Genetic covariances result from pleiotropy, linkage disequilibria, and in populations that inbreed to some extent, identity disequilibrium [9,10].In this paper, we apply a breeding design where all individuals have a known ancestry.This allows us to estimate the combined effects of pleiotropy and LD on the co-expression of genes and the contribution of individual QTLs to these covariances [11,12].When considering multiple expressed genes, a single locus can have multiple effects, both cis and trans.While it is typical to think of cis eQTLs as effectors of a single gene, a single mutation could affect the expression of multiple closely linked genes by altering local DNA accessibility.Distinct mutations in regulatory regions of closely linked genes will generate a genetic covariance if these mutations are in linkage disequilibrium in the population.Trans eQTLs can generate genetic covariances in several ways.Most obviously, a trans eQTL that affects many genes will generate covariation in expression among all its targeted genes.More directly, the cis effect of a mutation on a regulatory gene should generate a correlation between the expression of that gene and the expression level of downstream target genes (for which the mutation would be a trans eQTL).In this paper, we describe an experiment to characterize variation and covariation in gene expression, and then estimate the contribution of individual genetic loci to this (co)variation.We created a multiparental mapping population by intercrossing genotypes from one natural population and then measured gene expression in leaf tissue (Fig 1B).The replicated F2 crossing design (Fig 1A) produces high variance in relatedness of individuals, which is essential for estimating genetic (co)variances.It produces both homozygous and heterozygous genotypes at individual loci, necessary for characterizing how both the additive and dominance effects of eQTLs contribute to variation.We analyzed these data using two complimentary approaches.The "Cross-specific analysis" treats each of the nine families as a distinct entity and extracts estimates for QTL effects in the fashion of a single F2 mapping population, e.g.[13].The "Combined analysis" considers all plants simultaneously with the relatedness of each F2 plant to all other plants estimated through genomewide similarity [14].Given sufficient variation in relatedness, we partition expression variation into genetic and environmental components using the classical "animal model" ( [15], i.e. the linear mixed model [16]).Finally, we determine the contribution of individual loci to the genetic component of variation established in this context.

Mapping RNAseq reads to our denovo assemblies effectively genotypes F2 plants
Two of the ten parental lines (767 and 62) used in this study were sequenced and assembled by the Joint Genome Institute, while the other eight were assembled from our long-read sequencing (see Methods A).De novo assembly of the long reads yielded two to four large scaffolds per chromosome with a high inclusion of genes (BUSCO completeness 93-94%, Supplemental Table 8).We used genetic maps obtained from our F2 genotyping to assemble scaffolds into pseudo-chromosomes.We called SNPs among these lines and report the nucleotide diversity within and around each gene in Supplemental Table 1.These comparisons confirm our previous Illumina sequencing [17]: The 10 lines are about equally distant from each other in terms of sequence divergence (Supplemental Fig 1A) and can thus be treated as unrelated individuals from the natural population.
We genotyped F2 plants using the RNAseq reads (Methods C).Transcript reads can be suboptimal for genotyping owing to varying coverage per locus (expression levels differ among genes) and because the representation of the two parental alleles within the RNA of heterozygotes may be unequal (allele-specific expression).We address these difficulties by stringent filtering of genes, using only a minority as genetic markers.Next, we apply a Hidden Markov Model (HMM) to each chromosome of each individual allowing marker specific genotyping error rates (the emission probabilities of the HMM).The HMM leverages genetic information across the chromosome, and particularly from neighboring genes, to call the genotype (ancestry) at each locus [18,19].Given the recombination rate of M. guttatus (see below), a diploid F2 plant has an average of ~1.8 crossover events per chromosome.Consequently, there are large stretches of markers (usually hundreds of genes) between genotype transitions along chromosomes.Neighboring markers will (nearly) always have the same ancestry (homozygous for 767, heterozygote, or homozygous for the alternative allele), which greatly simplifies genotype inference.For the filtered dataset, we obtain posterior genotype probabilities of >99% at virtually all marker loci.The present effectiveness of the method is partly due to the experimental design -only two alleles segregating in each of our families and the number of genotype transitions along chromosomes is low owing to the single round of recombination from F1 to F2.The downside of limited recombination is limited mapping resolution [20], which is not a difficulty for cis eQTLs (since we already know their location) but is for trans eQTLs.
After filtering the RNAseq read data, we obtained an average of 4800 informative genetic markers per cross (family).The HMM yields genetic maps for each family (reported as Supplemental Fig 1B).The maps from different families are similar to each other by chromosome, and the average total genetic length (1260 centimorgans) is comparable to previously published maps of M. guttatus [19].Also, the maps exhibit the predicted pattern of recombination suppression over regions where large inversions are known to segregate.
Line 664 carries a large inversion on chromosome 6 [21] and the map for this family exhibits recombination suppression over the predicted region (1.22 to 8.57mb in the coordinates of the 767 genome assembly).The meiotic drive allele on chromosome 11 [22] segregates in families 62, 502, 541, 664, and 909, and these maps exhibit consistent suppression from coordinates 6.60mb to 17.62mb.As expected, this interval includes the centromere.Interpolating from the genetic markers, we established a nearly complete genotype matrix for eQTL mapping.For each gene measured for expression, we could score the locus as homozygous for the reference line (767), homozygous for the alternative line of that family, or heterozygous (the cis genotype).

Estimation of genetic variances and the contribution of specific eQTLs
Genetic variances are estimated by determining how phenotypic similarity increases with genetic similarity.
Estimation will be most effective when we can compare plants that range from unrelated (R=0) up to fully homozygous identical twins (R=2).We calculated the relatedness using the genotype matrix for the 1,588  We used simulations to select the statistic that we used to estimate the contribution of individual loci to genetic variation in expression (Appendix 1).The simulations used the observed genotype matrix as a framework with subsequent generation of expression levels with and without cis and trans eQTLs of varying effects.We tested the accuracy of three different methods: the least squares based Haseman-Elston (HE) regression [23] and two statistics derived from the maximum likelihood fit of the linear mixed model (Vg[r2] and Vg[a] are described in Methods D).Across a range of cases, all three estimators are nearly unbiased given our sample sizes and genotype matrix.In other words, the average of estimates across simulations is close to the true value of the parameter used to simulate data.However, when a locus contributes to genetic variation, the variance among replicate simulations is much larger for HE regression than for either Vg[r2] or Vg [a].The variance for Vg[a] is marginally lower than for Vg[r2] (see Supplemental Table 2).We chose Vg[a]to estimate the contribution of both cis and trans eQTLs to the genetic variance of expression in the real data because it was the most accurate.
In statistical terms, Vg[a] has the lowest mean square error.

The great majority of genes exhibit cis-regulatory variation with high allelic diversity
In the Combined analysis, 91% of genes have a significant cis eQTL (FDR < 0.05; Supplemental Table 3).The Cross-specific mapping identified 32,853 eQTLs over the 9 families, most (22,794) were cis to the affected gene.
The eQTL plot for four of the crosses (Fig 2 ) are typical of the full set (Supplemental Fig 2), with the many cis eQTLs filling the diagonal of this plot.Gaps along the diagonals are only present in centromeric parts of chromosomes where there are few genes.This is partly due to filtering: We did not test genes with a mean Count Per Million < 0.5 and average expression levels are lower for genes in centromeric regions (and for low recombination portions of the genome generally; Supplemental Fig 3).
For cis eQTLs, genetic effect estimates are very similar between Cross-specific and Combined analyses.When QTL effects are measured in standard deviations of the expression level, estimates for the same locus/cross are nearly equivalent between identical Cross-specific and Combined analyses: The correlation is 0.96 when including both significant and non-significant tests (n = 9 crosses x 12,987 genes = 116,883 estimates; Supplemental Fig 4).This high congruence is noteworthy given that (a) data transformations differ between pipelines, (b) the Cross-specific analysis considers only F2 individuals while the Combined analysis also includes data from the homozygous parental lines, and (c) the Combined analysis includes a random effect to absorb trans eQTL effects while the Cross-specific does not.The strength of evidence (level of statistical significance) for cis regulatory variation is much stronger from the Combined analysis because it integrates signal across families.However, the point estimates for allelic effects are remarkably consistent.
More important than the number of significant tests, we find that most of the additive genetic variation in gene expression is explained by cis eQTLs.From the Combined analysis, the mean values for VE, Vg(cis) and Vg(trans) were 0.828, 0.093, and 0.044, respectively.The fraction of the genetic variation generated by the cis locus varies (Fig 3A), but Vg(cis) > Vg(trans) for 63% of genes.Here, Vg(trans) is estimated using the relationship matrix and represents the combined effect of all trans acting loci on the affected gene.As expected, the strength of evidence for a cis eQTL is positively correlated with Vg(cis) (Supplemental Fig 5).The residual variance, VE, is the largest component for most genes where measurement error owing to finite sequencing depth contributes substantially.Average read depth per gene is a strong positive predictor of test significance, while average VE declines as coverage increases (Supplemental Fig 6).Both trends are expected if expression levels are less accurately estimated at genes with lower coverages The cis eQTLs identified by our Combined and Cross-specific analyses are significant when the overall expression of a gene differs among the three genotypes that segregate within each family (the alternative homozygotes and the heterozygote).Because genotype is called at the marker locus most proximal to the expressed gene, significant tests from this procedure are often called "local eQTLs" [24].Allele-specific expression provides an alternative method to detect cis eQTLs [25] based only on data from heterozygous individuals.If cis DNA variation only affects the expression of the physically linked gene (on the same chromosome), the "high allele" should be over-represented in the mRNA produced by heterozygotes.In this experiment, we have the genome sequences of all parental genomes and can distinguish alternative alleles within the mRNA produced by heterozygous plants for 46,828 gene/family combinations.In these cases, we see that whichever allele increases expression across all three genotypes is usually over-represented in the mRNA produced by heterozygous plants (Fig 3B).This is the signature of allele-specific expression.Given that both estimates (the additive effect on overall expression and allele frequency within heterozygotes) are subject to substantial estimation error, the high positive correlation (r = 0.83) provides a compelling corroboration of cis eQTLs identified using local markers.
Contradicting the assumption of additivity, about 20% of cis eQTLs exhibit some level of dominance (the test for dominance yields FDR<0.05).However, dominance is nearly always partial with heterozygote expression levels between the values of alternative homozygotes.Dominance is quantified by the parameter d, where the mean expression of genotypes RR, RA, and AA are ,  +  + , and  + 2, respectively (m is the mean expression of individuals homozygous for the reference allele (RR)).Partial dominance is implied if abs(d) < abs(a).The definition of "alleles" is different in multiparental mapping experiments (such as this study) than in genomewide association studies.In the latter, alleles are typically biallelic SNPs.Here, alleles are the distinct haplotypes carried by the founding parental lines in the vicinity of each gene.In our initial model fit, we allowed the cis allele from each of our nine alternative lines to uniquely differ from the allele carried by the reference line (767).From the point of view of statistical testing, it is appropriate to allow each allele to have a unique effect on expression that is characterized by a distinct free parameter.In fact, our simulations indicate this procedure to be slightly conservative for detecting QTLs (Appendix 1).However, in terms of characterizing QTL effects, this "full model" is overparameterized when fewer than 10 functionally distinct alleles segregate.
To address the number of functionally distinct cis alleles, we applied the allele partitioning method of King et al. [26] to each gene.The typical result is an allelic series with a median of 3 alleles per cis locus (Fig 3C).Some significant loci have only two distinct alleles, but in this case, each allele is typically carried by multiple ancestral lines.We can characterize allele number and relative frequency of alleles with heterozygosity: where   is the frequency of the i th allele and the sum is taken over all alleles at the locus.Across all cis loci (significant or not), the median H is 0.59 indicating high allelic heterogeneity.
We used multiple linear regression to test if sequence variation in the vicinity of genes can predict the strength of cis eQTL effects on expression.We measured variation within each of three windows around each gene: the 1kb upstream of the gene start codon, the gene itself, and the 1kb downstream.Within each region, we calculated nucleotide diversity (π) and a measure of insertion/deletion frequency (U) as potential effectors of cis eQTL significance.U = 0.0 if the ten lines are perfectly colinear over a region, but increases toward 1.0 as indels accumulate (see Methods B).Predicting the significance of cis eQTL effects on the full set of 12,897 genes, all six predictors are positive, but only four are strongly significant (Supplemental Table 7).cis eQTL significance increases most strongly with π within the genic region and with U for the upstream regulatory region.Indels in the genic and downstream regions have moderately positive effects on cis eQTL significance, while nucleotide diversity in the upstream and downstream regions are minimally important.

The characteristics of ascertained trans eQTLs are very different from cis eQTLs
While less frequent than cis eQTLs, Trans eQTLs are abundant: 10,059 significant tests across all nine families in the Cross-specific analysis (Fig 1 ).A substantial fraction of trans eQTLs occur in "hotspots" where a single locus effects the expression of many genes.Within each cross, we clustered significant trans eQTLs if located within 2 centimorgans of each other which distills all significant tests into 1,979 loci (Supplemental Table 4).The number of affected genes per locus is usually low (median = 2), but we can identify 35 hotspots where the trans eQTL affects the expression of 30 or more genes.In a few cases, hotspots in different families have roughly similar In the Cross-specific analysis, 98% of the trans eQTL / affected gene pairs were ascertained within only one family.This suggests low allelic diversity -a bi-allelic polymorphism with the minor allele carried by only one parental line.However, it is not tenable to assume the absence of significance as absence of effect.To provide a meaningful contrast of allelic diversity at cis and trans eQTLs, we estimated the effect of each trans eQTL considering all families simultaneously, essentially applying the Combined analysis model previously fit to cis loci.Across all trans eQTL/affected-gene pairs, the Combined analysis indicates that trans eQTLs explain about 10% of the genetic variance (Vg) on average.This is much less than the average cis contribution to Vg, and usually constitutes a minority of the overall Vg(trans) for genes.
The lower diversity at trans eQTLs is due first to a reduced number of functionally distinct alleles per locus  The average H of trans eQTLs (0.41) is only about two thirds that of cis eQTLs owing to the differences in allele number/evenness.This is a large difference, but less than suggested by literal extrapolation from the Crossspecific analysis.The "typical" trans eQTL has the minor allele present in two of ten lines.We obtain only one significant test in the Cross-specific analysis owing to sampling error and limited power (the Beavis effect [27]).
In fact, there are a small number of potentially important cases where the trans eQTL has high allelic diversity.
For example, MgIM767.11G072100.v1.1 (a MADS box transcription factor) is affected by a QTL about 17.5 mb into chromosome 4 that segregates within four of nine families.
Considering the results from perspective of the phenotype, we find that many genes are affected by multiple trans eQTLs, both within and between families.Across all genes measured for expression, the number of trans eQTLs ranged from 0 to 12.There is a strong positive relationship between the "trans heritability" (Vg(trans) as a proportion of the total variance in expression) and the number of significant trans eQTLs identified for that gene in the Cross-specific analysis (Fig 5A).For a given number of trans eQTLs affecting a trait, Vg(trans) increases with the amount of genetic variance generated by these loci (F1, 6808 = 58.4,p < 10 -13 ).In other words, the amount of variation contributed by mapped trans eQTLs is a strong predictor of the estimated Vg(trans) of a gene (which is the total contribution of trans eQTLs, mapped or not).However, even in this subset of genes where we identified at least one trans eQTL, the majority of Vg(trans) remains unexplained.This does not imply extensive over/under dominance because loci with complete dominance will produce point estimates with abs(d) > abs(a) about half the time owing to estimation error.We can test for over/under dominance by comparing the likelihood of the data with d unconstrained to the likelihood under complete dominance of either allele.For cis eQTLs, this test provides only one gene with a compelling case for over/under dominance: MgIM767.14G274100.v1.1, a pectin acetylesterase protein, is highly significant for overdominance in three families and marginally significant in a fourth.There are more trans eQTLs suggesting over/under dominance, but none rise to genomewide significance.

The contribution of trans eQTLs to genetic covariances
The additive genetic covariance between any two genes (CG) can be estimated simply by applying the Combined model analysis to the sum of expression at the two genes and then subtracting estimates obtained from the fits to each gene alone (see Methods D).To characterize the genomic distribution, we randomly paired each expressed gene with 10 other genes and estimated GG, and the environmental covariance (CE), for each of these 64,930 trait pairs.GG and CE each exhibit distributions with a roughly equal mixture of positive and negative values (Supplemental Fig 7).With expression levels standardized to unit variance, we can use squared covariances to measure the magnitude of genetic and environmental associations.For   2 , the genomewide mean is 0.000983 (SE = 0.000007), while the mean for   2 is 0.010545 (SE=0.000081).These low values reflect the fact that most genes are uncorrelated with the bulk of the transcriptome (thousands of pairwise comparisons), even if strongly correlated with a subset of other genes (tens to hundreds of comparisons).
We tested the effect of trans eQTLs on co-expression by identifying all pairs of genes affected by the same trans eQTL.Over 305,809 gene pairs, the mean   2 is 0.001639 (SE = 0.000007), which is the 67% greater than the genomewide average (t = 65.6,p < 10 -9 ; Fig 5B).This confirms the prediction that trans eQTLs contribute to coexpression.While most gene pairs were specific to one cross, there were 621 pairs mapped in two families and 4 pairs mapped in three families.The magnitude of the genetic covariance (mean   2 ) is much higher in these multi-family gene pairs (F2,305806=2848, p < 10 -9 ; Supplemental Fig 8).This is also expected given that intermediate allele frequency polymorphisms should generate more covariation than rare allele polymorphisms.An unexpected result is that   2 is also inflated in trans eQTL pairs where the mean of 0.014266 (SE=0.000053) is 35% greater than the genomewide average (t = 38.4,p < 10 -9 ).The standard approach to eQTL mapping is to progress gene by gene, predicting expression of each from genotype (as we did here, Figs 2-4).The obvious extension for correlations is to consider genotype effect on trait pairs.This is not a standard analysis, perhaps because the number of distinct gene pairs is very large.
Instead, researchers typically apply methods such as principal components analysis (PCA) or network analysis [28] or sparse factor analysis [29] to compress correlated expression patterns into a tractable number of aggregate traits (PCs or factors or modules).For the present dataset, we applied PCA to the transcriptome and tested for genetic effects on the resulting principal component scores.PCA combines the expression values from correlated genes to define PCs that are linear combinations of all expressed genes.After defining PCs, we applied our Combined model analysis to each, treating them as quantitative traits.We determined the genetic and environmental variance in each PC and mapped loci affecting these composite traits (pcQTLs).
We found that the first 50 principal components explain nearly half of the total variance in expression of the full set of 12,987 genes.Importantly, the average heritability of PC scores is considerably higher than for individual genes (the mean is 0.25 for the first 50 PC and 0.34 for the next 50; Supplemental Table 5).We mapped 196 pcQTLs that yield test p-values less than 10 -5 (Fig 6, Supplemental Table 6).These co-localize with the major trans eQTL hotspots.Moreover, an analysis of the loadings of different PCs on individual genes indicates that the pcQTLs are absorbing both cis and trans eQTLs to some extent (Appendix 2).It is also noteworthy that the dominance test is usually more significant than the additive test on pcQTLs (

Discussion
We found that the cis regulatory region of genes explains about two thirds of the genetic variance in expression on average (Fig 3A).This is an unexpectedly high proportion.Reviewing studies from a number of species, Liu et al [5] report that Vg(cis) is typically about one third of the genetic variance, half our estimate.It is difficult to know if the Mimulus estimate is atypically high, because while large Vg(cis) values are routinely obtained in eQTL mapping experiments (e.g.[24,26]), researchers usually only report estimates for genes with a significant cis eQTL.Our estimate and the summary by Liu et al [5] average over all measured genes.Methodological issues aside, a basic feature of Mimulus guttatus may be relevant to its high cis regulation.This species exhibits remarkably high gene sequence and insertion/deletion (indel) variation, even within local populations [30].In fact, it is difficult to reliably map Illumina sequencing reads to intergenic regions of the M. guttatus reference genome where indel variation is very high [31].Such variation could affect gene expression insofar as binding of regulatory elements to the DNA surrounding genes affects expression.
Speaking to the role of sequence variation, we find that genes with elevated indel variation upstream of the gene start codon and/or high nucleotide variation within the gene body have higher Vg(cis) (and increased statistical significance) than do genes with lower variation (Supplemental Table 7).This pattern has at least two non-mutually exclusive explanations.First, genes with high variation in general might be more likely to exhibit differences in the regions that are directly relevant to gene regulation [32].This is plausible but difficult to evaluate given that we cannot yet bioinformatically identify regulatory sequences (promoters, enhancers, etc.) in M. guttatus.Second, the level of sequence variation around a gene could be indicative of the history of natural selection at a locus.Genes under stronger purifying selection, or those that have recently experienced a selective sweep are expected to exhibit lower sequence variation.Such genes might also tend to exhibit lower standing variation in cis regulation.

Allelic heterogeneity and the allele frequency spectrum at eQTLs
Brown and Kelly [33] recently published a genomewide association study of gene expression variation (hereafter called the eGWAS) within this same IM population of M. guttatus.They examined a different plant tissue (flower buds instead of leaves), used a different experimental design (homozygous lines instead of lines intercrossed to produce F2 individuals), and employed a different allocation effort (the eGWAS scored 151 lines with few individuals per line, while here we have 10 lines with high replication of segregating variation between lines).Despite these differences, the current experiment ampliflies a key conclusion of the eGWAS.The eGWAS revealed a striking difference in the allele frequency spectrum between cis and trans acting variants.
"Cis-SNPs" have intermediate frequencies relative the overall genomic distribution while "trans-SNPs" exhibit a rare-alleles model.The former is consistent with balancing selection on cis eQTLs while the latter suggests purifying selection on trans eQTLs [34,35].
The high allelic heterogeneity documented in the present experiment suggests that the eGWAS may have actually underestimated the level of variation at cis loci because the analysis was at the level of biallelic SNPs rather than allelic series of haplotypes.The eGWAS tested biallelic SNPs.The maximum possible heterozygosity (H) with two alleles is 0.5, which is much below the average H for our cis eQTLs which usually segregate 3-4 alleles per gene (Fig 3C).The regulatory regions of our founding lines are haplotypes that differ in both SNPs and indels at many positions.Closely linked variants exhibit linkage disequilibrium in the IM population (see Supplemental Table 2 in [31]) aggregating mutations at distinct positions into functionally distinct alleles.This is an emerging empirical trend: Multi-parental mapping population in both plants and animals find that QTL are best described as allelic series and not binary alternatives [26,36].
We find that trans eQTLs have lower diversity than cis eQTLs, which corroborates the cis/trans difference in allele frequency discovered in the eGWAS.In truth, the current experiment overestimates the amount of variance generated by individual trans eQTLs because we only estimated the variance contribution of trans eQTLs that emerged as significant in the Cross-specific analysis.This is a simple manifestation of the Beavis effect [27], which does not apply to our cis eQTL because we could include all cis-loci in our estimation of effects (loci do not have to be discovered as significant to be included).Even with this inflation of importance, our mapped trans eQTLs explain only about 10% of the genetic variance in their affected genes on average.
Most of Vg(trans) remains unexplained and thus represents the aggregate contribution of loci with effects below our detection limit.
Ascertainment also implies that we are overestimating heterozygosity at trans eQTLs.The eGWAS discovered many trans-snps where the minor allele segregates at about 5% in the IM population (see Fig 2 of [33]).In the present design, we lose about half such loci just because the minor allele is not sampled into any of the ten parental lines.When such an allele is captured, its frequency in the experiment (at least 10%) is twice that in nature, which effectively doubles the heterozygosity estimate.Our estimation procedure is unbiased in the usual statistical sense (Appendix 1): Averaging over all trans acting loci, underestimates (loci where we fail to sample the rare allele) will cancel overestimates (loci where we do sample the allele) yielding the true heterozygosity on average.However, since we always focus on the significant tests after the experiment is completed, an inflated heterozygosity estimate is inevitable.

The multiparental mapping design enables the discovery of trans hotspots and the cis/trans difference for genetic dominance
A major advantage of multiparental mapping is that it can give a much better examination of rare alleles than GWAS [20].GWAS typically have low power for rare alleles, alleles carried by few individuals in the experiment.
As noted above, most rare alleles are not sampled into multiparental designs, but for those that are, there is high replication in measured individuals.The previous eGWAS [33] found many trans eQTLs but no hotspots.It is possible that the difference from the present study, where hotspots were evident in each cross, is biological (e.g. the bud transcriptome has a different architecture than the leaf), but a statistical explanation is more plausible.Because they tend towards rarity, trans-acting alleles were usually present in only 5-15 plants of the eGWAS.In this situation, a locus affecting many genes will yield genomewide significant tests on a minority of its targets just due to limited power.Most of the rare alleles present in the eGWAS were not sampled into the parental lines of this study.However, those included are likely carried by over 100 plants which will yield more reliable detection of trans effects.
To accurately estimate dominance at a QTL, we require substantial representation not just of alleles but also of diploid genotypes.Most multiparental mapping populations consist of inbred lines, which allows high replication of known genotypes.The replicated F2 design involves crosses among lines, not only to produce mosaics of the parental genomes (as is true of Recombinant Inbred lines e.g.[36]), but also to generate QTL heterozygotes.Owing to this feature, we find a striking difference in dominance between cis eQTLs, which tend toward additivity, and trans eQTLs that typically exhibit dominance (Fig 3D).The molecular biology of gene expression predisposes cis eQTLs to additivity.If each cis allele contributes independently by affecting only the linked gene copy, then additivity of overall expression results from a simple dosage effect.This logic does not apply to trans acting loci, but it is not clear why they should be so strongly skewed towards strong dominance at most loci.About 60% of trans eQTLs yield an absolute value for d/a that is greater than 0.75, which means that the heterozygote is closer to one of the homozygotes than to the additive midpoint.
While certainly more nearly additive than trans eQTLs, we can reject pure additivity of gene action at over 20% of cis eQTLs.At these loci, heterozygotes expression is nearly always intermediate (Fig 3D), but often closer to one homozygote than the other.A subtle deviation from the midpoint is expected given that we impose a nonlinear transformation on read counts prior to estimating allelic effects.For example, the log2 transform, which is particularly fashionable in gene expression studies, will tend to pull the heterozygote expression slightly towards the homozygote with higher expression at an additive locus.More substantial deviations suggest a feedback mechanism where expression of one allele is affected by the other.Autoregulation, which is well established in plants [8], provides one such mechanism.For example, transcription factors can increase or decrease their own transcription by binding their own promoter region.However, sequence differences in either the protein or the regulatory region could direct the feedback (enhancement or suppression) more to one allele than the other.

Trans eQTLs and genetic correlations in gene expression
The quantitative genetic summary of gene co-expression is the "G matrix" [37].Each of the n expressed genes is represented by a row and column in an n x n dimensional matrix with the additive genetic variances in expression on the diagonal.The additive genetic covariance of two genes is reported in the off-diagonal matrix elements corresponding to these rows and columns.The G matrix for the transcriptome is expected to be "sparse" relative to that for morphological traits.Morphological traits that emerge from common developmental processes routinely exhibit moderate to high correlations.In contrast, while individual genes may interact strongly within "expression modules" [38], we expect most interactions to be weak or at least diffuse.Consistent with this expectation, we find that genes typically have a low additive genetic covariance Given the high dimensionality of our G matrix (there are over 84 million distinct off diagonal terms), we applied principal components analysis (PCA) to the transcriptome and then mapped QTLs for the PC scores (pcQTLs).
For a plant, a PC score is a linear combination of the standardized expression levels at each gene.The weights (loadings) differ among the PCs, but in our case, each PC is strongly influenced by hundreds to a few thousand genes (Supplemental Table 9).Thus, pcQTLs likely affect many genes, although the effects on individual genes may be modest and below our detection limit for individual trans eQTLs.That said, there is a clear indication that pcQTLs "capture" some of the effects of our mapped eQTLs, both cis and trans.This is simply because the PC affected by a pcQTL tends to have higher loadings on genes with genomically proximal eQTLs (Appendix 2).
The association with trans eQTLs is stronger than with cis.Also, both pcQTLs and trans eQTLs show a much stronger signal of genetic dominance than do cis eQTLs (Figs 3D, 6).
PCA is a classic tool in quantitative genetics.It is applied directly to correlated traits to obtain uncorrelated predictors of fitness [39] and also to characterize the structure of the G matrix [40,41].PCA is also used in RNAseq experiments, often for data visualization but sometimes as a data cleaning tool to remove "confounding factors."If an environmental variable (say temperature) influences the expression of many genes, the failure to control for this variable can reduce the power to detect treatment effects.If the leading principal components 'absorb' the effects of unmeasured variables, the inclusion of PC scores as covariates can remove noise and improve power.Our results are cautionary with respect to this approach if genotype is the treatment.Genotypic differences generate a correlated response across genes.Pleiotropy can thus be (partly) responsible for the coexpression patterns that determine principal components.In this experiment, we found that PC scores actually have a higher genetic determination than individual genes on average (Supplemental Table 5).To statistically remove PCs before analyzing individual genes can thus eliminate signal (trans eQTLs) as well as noise.

Concluding remarks
Evolutionary inferences from genetic experiments always depend on sampling, on where genotypes come from.
Multiparental mapping populations are typically created from worldwide collections [20,36,42,43], a strategy designed to maximize genetic diversity.In this context, the frequency of alternative alleles at a QTL will depend on how variation is distributed within and among local populations of the species and the locations, habitats, etc. from which parents are sampled.In this experiment, we sampled parental genotypes from one natural population with the purpose of estimating features of that population.Allele frequencies thus reflect the balance of evolutionary processes (selection, migration, drift) at the Iron Mountain location.The high variation at cis eQTLs suggests that selection is maintaining variation at this local scale [33].Field studies at Iron Mountain directly measuring selection on genetic variants [17,44], as well as longer term studies measuring temporal changes in allele frequency [31], suggest that antagonistic pleiotropy and temporally fluctuating selection are both acting as balancing selective agents.
Trans eQTLs have lower allelic diversity than cis eQTLs and a greater contribution of uncommon alleles.
However, the aggregate of evidence from the current experiment and the previous eGWAS suggests that these relatively "minor" alleles segregate in the 1-10% range within Iron Mountain.This is considerably higher than the expected frequency of unconditionally deleterious alleles, which are likely to be less than 1% in a large population.The scale of sampling is a key consideration here.An allele that is uncommon within Iron Mountain, perhaps because it is usually disadvantageous under local environmental conditions, may be predominant in other populations.As in most widely distributed species, local adaptation is very common in M.
guttatus (e.g.[45,46]).If trans eQTLs are important to local adaptation in M. guttatus, we predict that the allele frequency spectrum for trans eQTLs will shift when we apply the same experimental design to a species wide sample of parental genotypes.
The characterization of gene expression in terms of genetic variances and covariances is necessary to predict the response to natural selection.Field experiments have demonstrated rapid evolution of gene expression in response to selection [47][48][49].From one generation to the next, the change in mean expression levels under selection can be predicted from the current G matrix without any information on the genetic architecture of expression variation [50,51].However, the rate that G matrix elements change is dependent on how eQTLs combine to determine genetic (co)variances.Our finding that a major locus, the cis eQTL, explains much of expression variation suggests that the G matrix will be malleable on ecological time scales.Shifts in allele frequencies at major QTLs rapidly change genetic variances and covariances [52,53].The usual view is that selection eliminates genetic variation, which should occur rapidly if fixation at one gene eliminates much of the variation.However, with the temporal fluctuations evident at Iron Mountain, variation can persist even with strong selection [54,55].
The finding of high allelic diversity at eQTLs further complicates G matrix dynamics, particularly when considering genetic covariances.With two alleles and additive gene action, we can describe a locus as either positive or negative with respect to the covariance of two affected traits.If the first allele increases expression at two genes, the alternative necessarily has a negative effect on both (because effects are defined by contrast between alleles).With multiple alleles, this is no longer assured.With four alleles, all directions for pleiotropic effects could be evident (positive/positive, negative/negative, positive/negative, and negative/positive).The extent to which allelic heterogeneity generates complex pleiotropy is currently unclear, making it an important target for future experimental work.

A. Study system, experimental protocols, RNA and DNA extractions, RNAseq library preparation, and sequencing
As parents, we used inbred lines of the yellow monkeyflower, Mimulus guttatus (syn Erythranthe guttata, Phrymaceae) extracted from the Iron Mountain (IM) population in the Cascade Mountains of Oregon (44.402217N, 122.153317W; [56]).This population is predominantly outcrossing with little internal population structure [57].In 2018, Troth et al. [17] sequenced whole genomes of 187 IM inbred lines.After selecting one of these lines (767) as the "reference" (which is common to all crosses), we sampled the nine alternative lines subject to the condition that they were fully unrelated to 767 and each other.Relatedness was based on genomewide pairwise nucleotide diversity () from the Illumina sequencing of these lines [17].The equidistance among the lines is confirmed by the subsequent denovo assembly of each line from long read sequencing data (described below, see Supplemental Fig 1A) . The programs used to analyze sequence differences between genomes, as well as all the operations described below, were written in Python (v3.7) and are provided in Supplemental File 1 along with a key to their use.
We crossed each alternative line to 767 with the latter used as the pollen donor (Fig 1A).We grew a single F1 plant from each cross and self-fertilized this individual extensively to produce >1000 seeds.We grew the F2 plants along with members of each parental line in four temporally overlapping cohorts using standard greenhouse conditions [33], about 500 plants per cohort.We collected whole leaves from the 2 nd leaf pair as soon as the third leaves were >1cm long (Fig 1B ), immediately flash froze the tissue in liquid N2.RNA was extracted after disrupting the frozen leaves with metal beads in RLT and β mercaptoethanol solution using a Qiagen RNeasy plant mini kit (Qiagen) according to manufacturer's instructions (without the optional DNase step).All samples were eluted in 60 µl RNase-free H2O.
We made RNAseq libraries using QuantSeq 3' mRNA-seq Library prep kits (Lexogen) at quarter volumes.We used four i5 primers (Lexogen), which along with the 96 i7 primers of the kit, allow barcoding of 384 samples per sequencing run.Each batch of 96 samples from one run of the protocol was pooled in equal volumes and checked for fragment size distribution using an Agilent TapeStation (Agilent, Santa Clara, CA, USA) and quantified using Qubit (Thermo Scientific) at the KU Center for Genomics.Four such batches, each with different i5 primers, were pooled equimolarly for a sequencing run.Sequencing was performed at the KU Center for Genomics on a NextSeq 2000 to obtain 75bp single end reads.For samples with low yield in sequencing, we remade libraries from the original RNA extraction and sequenced the remade libraries.

B. de novo assembly of parental genomes and annotation
From plants of each parental line, we extracted DNA using a modified PacBio protocol for high molecular weight DNA extraction using 5 g leaf tissue as starting material.The full protocol is available as Supplemental File 2.
We used Liftoff [62] to transfer the annotation of the 767 assembly onto the other genomes.Given an annotation file (gff3) from each assembly, we identified orthologs of each 767 gene in each alternative build.
Not all 767 genes were successfully located in the other assemblies and so we focused on the 12,987 genes discovered in all lines (and that also passed the minimum average expression level described below).We extracted the sequence of these genes in each build to create a line specific transcriptome for read mapping.
To score differences in the nucleotide sequences among our assemblies, we used Mummer 3.0 [63] and SVMU [64] with the following commands applied to each alternative genome (cross): From the output files [cross]_TO_IM767dnadiff2.1coords and [cross]_TO_IM767dnadiff2.snps we extracted all correctly aligned positions between genomes.We scored SNPs and indels (which were enumerated as gaps in 767 relative to the alternative genome and gaps in the alternative relative to 767) in each gene and in the 1kb upstream of the start codon and downstream of the gene end.For each interval, we calculated nucleotide diversity (π) as the fraction of aligned positions that differ in nucleotide and U = ( total bp -aligned bp )/ total bp.If the two sequences are perfectly colinear (no indels) then total bp = aligned bp and U = 1.U increases towards 1 as the region fills with indels and unalignable sequence.This yields six statistics for each gene (π and U for each of the three intervals), which we used as predictors of the LRT1 test statistic for a cis eQTL.We standardized each predictor (to mean zero and variance 1 across all genes) and then applied multiple linear regression using Minitab v19 to produce Supplemental Table 7.

C. Read mapping, genotype calling and scoring expression levels
We trimmed the RNAseq reads with Trimmomatic [65] and checked for contamination or mislabeling using custom python scripts (Supplemental File 1) that estimated the relatedness of all among samples including the parental lines.After eliminating dubious samples (low sequencing depth or questionable family assignment), we retained data from 1588 plants for subsequent analysis.We used Salmon v1.10.0 [66] to quantify gene expression.To remove bias caused by preferential mapping of alleles, we mapped the F2 individuals from each family to a composite genome including the (haploid) genome and transcriptome of each parental line.We excluded any gene that displayed aberrant mapping of reads from the parent line plants.Specifically, we required that reads from the inbred line plants (which are homozygous for a known parental allele at all loci) map specifically to the allele from that line (in which case the marker is informative for genotype as well as transcript level analysis) or that the line alleles map equally well to each allele.Only genes in the former category were amenable to allele specific expression analyses.
Within each cross (family), genotyping was based on the RNAseq data from the subset of genes where reads reliably map to each parental allele (identifying their origin).We used the count of reads to each parental allele to make a putative call within each marker locus (RR, RA, AA).These calls are error prone owing to low read counts at many loci (lowly expressed genes) and allele specific expression (which makes heterozygotes resemble the homozygotes associated with higher expression.Thus, we treat these putative calls as the "emitted states" with the true genotype treated as the latent states of a Hidden Markov Model (HMM).The HMM estimates the genotype of each F2 plant for each chromosome and is implemented using a series of python programs revised from the GOOGA pipeline [19].The model estimates marker-specific genotyping error rates (which determine emission probabilities) and the recombination rates between all adjacent markers (which determine the transmission probabilities of the Markov Chain).Given maximum likelihood estimates for all parameters, we extract the genetic map for each cross and the posterior genotype probabilities at each marker.The resulting genotype matrices are nearly complete given that the posterior probabilities for the most likely genotype at each marker are (almost) always greater than 0.99.To produce a genotype matrix with calls at each expressed gene, we interpolated calls from the scored markers that were immediately upstream and downstream of any gene not included in the HMM estimation (these are all genes without informative markers for parental assignment of reads).When adjacent markers differed owing to a recombination event, intermediate genes were scored as unknown genotype.Finally, we calculated the relatedness matrix using pairwise comparisons among all 1588 plants.The coefficient of coancestry at each marker is determined simply by the extent that the two individuals share alleles from the same parent line given our assumption that parental lines are unrelated.The overall relatedness between two plants (twice the coefficient of coancestry) is just an average across all loci.These programs to infer genotypes and relatedness (as well as those used for other aspects of the data analysis) are contained in Supplemental File 1 with an outline describing the sequence that programs are executed and the inputs/outputs for each step.
To obtain the phenotypes (the total expression of each plant at a gene), we summed the reads mapped to each allele of a gene within each plant.Lowly expressed genes, those with a mean expression less than 0.5 count per million, were not considered for further analysis.For the Cross specific analysis, we estimated the meanvariance relationship using Voom [67].Voom also generated a weight for each observation considering the growth cohort and Cross as factors.First, a DGEList object was generated in edgeR [68] from the raw counts, which included information on library size per individual using the calcNormFactors command.Then the DGEList object was Voom transformed given a matrix of cohort + group.The resulting normalized counts and weights were exported for the Cross specific analysis.The Combined analysis used the standardized counts per million for each gene of each plant directly.A Box-Cox transform was then applied, gene by gene, and finally standardized the transformed counts to unit variance.
For alleles specific expression analysis, we identified all genes in each cross where the total expression could be partitioned into reads contributed by each parental allele in heterozygotes.At such genes, we assembled a vector of count pairs (reads from the reference, reads from the alternative line) for all plant heterozygous at the cis locus.We treat these counts as samples from a beta-binomial distribution.We determine the maximum likelihood across all plant under the null model enforcing α = β (which implies no allele specific expression on average, equal expression of both parental alleles) to the more general alternative model α ≠ β (either parental allele can be overrepresented).The beta-binomial is superior to the usual binomial model for counts because it allows over-dispersion.However, we find that our MLE for the frequency of the alternative allele in heterozygote RNA (y-axis of Fig 3B) from the beta-binomial is usually very close to the "naïve estimate", which is the simple average of A/(R+A) across all heterozygous plants.

D. Testing procedures
For the Cross-specific analysis we used rQTL v 1.60 [69] to detect QTLs for the normalized expression of each gene using the scanone function with cohort as a covariate and taking the weights into account.The analysis was run separately on each family, and we extracted the LOD score, additive and dominance effects, and their standard errors for each marker.P-values were obtained from permutations specific to each gene using the scanone command of rqtl and the marker regression method.
For the Combined analysis, we fit a linear mixed model using maximum likelihood to the entire dataset for each gene, considering three different models in sequence.The fixed effects in the null model are cohort and the random effect absorbs all genetic effects on expression (the relatedness matrix determining the (co)variance matrix).This Model 0 yields a log-likelihood (LL0) and two variance estimates, Vg and Ve.Vg is the (whole genome) additive genetic variance in expression while Ve is the residual variance (environmental effects, measurement error, etc.).We next test for a cis eQTL by adding the genotype at this locus into the vector of fixed effects.We first consider purely additive gene action at the cis locus: Model 1 adds nine parameters, the effects of each alternative allele (specific to each line) that is crossed to 767.For all plants from cross z, the phenotype is incremented by az for heterozygotes and by 2 az for homozygotes for the allele from parental line z.Model fitting yields the log-likelihood (LL1), estimates for all nine az values, and the variance components, Vg and Ve.In the fit of model 1, Vg is the genetic variance due to trans-eQTL since any cis-locus effect has been absorbed into the fixed effects.The likelihood ratio statistic, 2(LL1 -LL0), is compared to a chi-square distribution with 9 df to test for an effect of the cis-locus.Finally, Model 2 allows dominance at the cis-eQTL with genotypic values of 0, az+dz, and 2az for reference homozygotes (767), heterozygotes, and alternative homozygotes (line z), respectively (the rQTL model used in the Cross specific analysis).Model 2 adds nine parameters, so the likelihood ratio test for dominance, 2(LL2 -LL1) is compared to a chi-square distribution with 9 degrees of freedom.We first applied models 0, 1, and 2 to each gene using the cis locus as the genotype.
Later, after identifying affected gene / trans eQTL pairs from the Cross-specific analysis, applied these models to each pair using the trans eQTL locus as the genotype.
From the ML model fits, we can estimate the genetic variance generated by the cis eQTL in two different ways.
First, we can subtract the Vg from model 1 (which includes only trans effects) from the Vg of model 0 (which includes both cis and trans effects).This estimator, denoted Vg(r 2 ), is similar to the 'variance explained by the QTL' method that has been applied to multi-parental mapping populations of Drosophila [26] and mouse [24].
A second estimator, Vg(a), is calculated from the estimated additive effects and their standard errors: where   2 is the variance among the nine az estimates and   2 ̅̅̅̅̅ is the average of squared standard errors on those estimates.This formula is the simple variance (s 2 ) among az values minus the variance generated by estimation error.The az estimates can be treated as unrelated because each is calculated from genotype/phenotype association within distinct families, The simulations described below indicate that both Vg(r2) and Vg(a) are (nearly) unbiased but Vg(a) has a lower mean square error.In other words, for this design Vg(a) is closer to the true variance on average.Both estimators perform substantially better than the HE regression approach [23], which also is nearly unbiased but has much larger mean square errors.
The alternative models (1 and 2) described above are fully unconstrained -each alternative line can be uniquely different from 767.This is necessary if variation is described by an 'infinite alleles' model [70], but over-parametrized if few alleles segregate at the cis locus.All the possible configurations, ranging from two to ten distinct alleles at a QTL, can be testing by first ranking the az estimates from most negative to most positive and then considering all "splits" that subdivide these estimates into discrete bins [26].We determined the maximum likelihood for all 511 distinct configurations for each gene.For a given number of partitions (e.g. two partitions equals three alleles), we selected the case with the highest likelihood.Starting with the 2-allele case, we accept increases in the allele number (rejecting the simpler model by accepting an additional parameter) only if twice the likelihood difference is greater than 3.84 (which is the p<0.05 threshold for a chi-square test with 1 df).
The linear mixed model was applied in two other analyses.To test cases of potential over/under dominance, we compared the likelihood of the data with d unconstrained (Model 2) to the likelihood under complete dominance of either allele.This has the same number of parameters as Model 1 but with a different assignment of genotype effect to heterozygotes.Second, we applied Model 0 to the sum of expression (Z) between pairs of genes (y1 and y2): Z=y1+y2.This yields estimates for genetic and environmental variance in Z. Consequently, we can solve for the genetic (CG) and environmental (CE) covariances given the corresponding variances in Z and as well as the single trait estimates for VG and VE.For all tests, we obtained a p-value across all 12987 genes.To assign the False Discovery Rate (Q-values), we used the R commands: library(data.

E. Mapping QTLs for expression principal components
We created a matrix with standardized expression levels for all genes (mean zero and variance one) for all plants prior to invoking Principal component estimation programs.Using the R libraries MASS, dplyr, and data.table,we applied the prcomp function to the expression matrix and extracted the variance explained by each principal component, the loadings on all 1588 PCs, and the PC scores for each plant for each principal component.We formatted each PC score list as a phenotype file for input to the Combined analysis pipeline.
We first fit the linear mixed model to each PC score without a QTL to estimate the genetic and environmental variance of that trait.Next, we added a QTL single to the model with the position adjust incrementally along each chromosome.Here we tested every other gene location (a step size << 1 cm).We retained the model fits with the highest log-likelihood per chromosome as putative pcQTLs.

Fig 1 (
Fig 1 (A) A diagram of the replicated F2 design with the number of plants used after filtering in parenetheses.Each "P" is an unrelated inbred line.(B) A photo of the plant with leaf number noted at the developmental stage when 2nd leaves were harvested.(C) The distribution of relatedness values from all pairwise comparisons of individuals.The set of contrasts centered on 0.5 corresponds to F2 individuals of different families.The contrasts centered on R = 1.0 come from intra-family comparisons.
plants.The distribution of pairwise relatedness values (depicted in Fig 1C) confirms that our crossing design produced the high variance in relatedness necessary for accurate estimation of the genetic variances in gene expression.At the low end (R = 0) are 451,200 contrasts among unrelated individuals -plants of different inbred lines compared to each other and contrasts of F2s to unrelated parental lines.The next most frequent contrast is among F2 plants in different families (average R = 0.5), which are related through shared inheritance of DNA from their common parent (767).F2 plants within a family (average R = 1.0) can have loci Identical by Descent through both parents.The variability in relatedness around the modal points of 0.5 and 1.0 are due to the stochastic events of segregation and recombination in gamete formation that will makes siblings more/less similar by chance.Finally, there are several thousand contrasts of genetically identical individuals within parental lines (R = 2.0).

Fig 2 .
Fig 2. All significant eQTLs are reported by QTL (x-axis chromosomal locations) and affected gene position (yaxis) in four of the crosses (alternative lines 502, 909, 541, and 1034).The vertical "chimneys" highlighted by arrows are trans eQTL hotspots, the locations of which are unique to each cross.
For cis eQTLs, 98% of point estimates for d and a satisfy this condition (Fig 3D).Partial dominance at cis eQTLs is corroborated by the allele-specific expression data.If we regress the alt-allele frequency in reads from heterozygotes (y-axis of Fig 3B) onto the estimates for both a and d simultaneously, both are highly significant as positive predictors of allele frequency (p < 10 -9 for each coefficient).The positive coefficient for d means that when the overall expression of the heterozygotes exceeds the value predicted by additivity, there is a corresponding increase in alt-allele frequency within the reads produced by heterozygous plants.When d < 0, this frequency is reduced relative to additivity.Thus, fluctuations in allele-specific expression parallel dominance estimates.

Fig 3 .
Fig 3. (A) The fraction of the total genetic variance due to the cis locus is reported for each of the 12,987 genes.Genes with negative Vg(cis) estimates are reported as 0. (B) The estimated effect of the alternative allele (not 767) on total gene expression (across genotypes) is a strong predictor of allele frequency within the reads produced by heterozygotes.The units for a (estimated additive effect) are standard deviations of expression.(C) The number of functionally distinct alleles per cis (blue) and trans (red) eQTL are reported.(D) The distributions of d/a are reported for cis and trans eQTLs.The end categories bin all estimates that are less than -1.1 or greater than 1.1.
genomic locations (Supplemental Fig 2), but since the affected genes are different, they are likely caused by different mutations.For example, three different families (155, 664, and 909) each have a trans eQTL hotspot within the first Mb of chromosome 1.However, the 68 affected genes in family 155 are different from the 30 genes affected in family 664, and the 35 in family 909 are non-overlapping with either previous set.

(
median of 2 instead of3; Fig 3C).Second, for a given allele number, the average heterozygosity is lower at trans than cis eQTLs because the latter exhibit more a even distribution of alleles(Fig 4).At two allele loci, a trans eQTL is more likely to be 1:9 than 5:5 for allele counts, while the reverse is true for cis eQTLs.With three alleles, there are a greater number of configurations, but cis eQTLs are over-represented in high heterozygosity categories (right side of Fig4b) and trans eQTLs in the lower categories (left side).The differences in the distributions of cis and trans are highly significant for both the two allele (X 2 = 505, df = 4, p < 10 -16 ) and three allele (X 2 = 274, df = 7, p < 10 -16 ) loci.Comparisons beyond three alleles are not possible owing to absence of trans eQTLs with high allele counts.

Fig 4 .
Fig 4. The distribution of allele configurations for cis (blue) and trans (red) eQTLs for loci with two alleles (A) or three alleles (B).The bracketed numbers refer to the counts of the minor and major alleles (A) or the counts of minor, intermediate and major alleles (B).The allelic configurations are ordered left to right according to increasing heterozygosity (H).

Fig 5 .
Fig 5. (A) Trans heritability, the fraction of the variance in expression explained by Vg(trans) , is predicted by the number of trans eQTLs affecting a gene (n = number of genes in each category and the rectangle is the 95% CI on the mean).(B) The frequency distributions (density) for genetic covariance between genes affected by the same trans eQTL and all gene pairs.Allelic dominance at trans eQTLs is both more frequent and more severe than at cis eQTLs.In the Cross-specific analysis, 60% of trans eQTLs are significant for dominance and abs(d) < abs(a) in only 51% of cases (Fig 3D).

Fig 6 .
Fig 6.The strength of evidence (LRTtotal) is reported for each of the 196 pcQTLs with test p-values < 10-5.The total LRT is the sum of the additive test LRT (yellow) and the dominance test LRT (red).

Fig 6 )
, which mirrors the prevalence of dominance for individual trans eQTLs (Fig 3D).

(
CG) with most other genes (grey in Fig 5B).Our experiment shows that cis eQTLs are the primary determinant of the G matrix diagonal (Fig 3A) while mapped trans eQTLs contribute incrementally to genetic correlations in expression.The latter effect is subtle for individual loci (Fig 5B), which may reflect the fact that our mapped trans eQTLs explain a minority of the genetic variation generated by trans acting loci.