FIT-GWA : A new method for the genetic analysis of small gene effects , high precision in phenotype measurements and small sample sizes

Small gene effects involved in complex traits remains difficult to analyse using current genome-wide association methods (GWAS) due to the number of individuals required to return meaningful association(s), a.k.a. study power. Inspired by physics fields theory we provide a different method called Fields Informational Theory for Genome-Wide Associations (FIT-GWA). Contrary to GWAS, FIT-GWA that the phenotype is measured precisely enough and/or the number of individuals in the population is too small, to permit categories. To extract information FIT-GWA use the difference in the cumulated sum of gene microstates between two configurations: (i) when the individuals are taken at random without information on phenotype values and, (ii) when individuals are ranked as a function of their phenotype value. Such difference can be accounted through the emergence of ‘phenotypic fields’. We demonstrate that FIT-GWA recovers GWAS, i.e., Fisher’s theory, when the phenotypic fields are linear. However, unlike GWAS, FIT-GWA permits to demonstrate how the variance of microstate distribution density functions are also involved in genotype-phenotype associations. Using genotype-phenotype simulations based on Fisher’s theory we illustrate the application and power of the method with a small sample size of 1000 individuals.


Introduction
Identifying association between phenotype and genotype is the fundamental basis of genetic analysis. In the early days of genetic studies, beginning with Mendel's work at the end of the 19 th century, genotypes were inferred by tracking the inheritance of phenotype between individuals with known relationship (linkage analysis). In recent years, the development of molecular tools, culminating in high density genotyping and whole genome sequencing, has enabled DNA variants to be directly identified and phenotype to be associated with genotype in large populations of unrelated individuals, through association mapping. Genome-wide association studies (GWAS) have become the method of choice, largely replacing linkage analyses, because they are more powerful for mapping complex traits, i.e., can be used to detect smaller gene effects, and they provide a greater mapping precision as they depend on population level linkage disequilibrium rather than close family relationships. GWAS have been employed in many species, and especially in the study of human disease (1). The 2021 NHGRI-EBI GWAS Catalog currently lists 316,782 associations identified in 5149 publications describing GWAS results (2). Additionally, extensive collection of data has been initiated through efforts such as the UK Biobank (3), Generation Scotland (4) and NIH All of Us research program (https://allofus.nih.gov/) in the expectation that large-scale GWAS will elucidate the basis of human health and disease and facilitate precision medicine.
While genomic technologies used to generate data have rapidly advanced within the last 20 years, the statistical models used to analyse the data are still based on the linear models derived by Fisher more than 100 years ago (5,6). GWAS essentially makes use of Fisher's method of partitioning genotypic values by performing a linear regression of the phenotype on marker allelic dosage (7). The regression coefficient estimates the average allele effect size, and the regression variance is the additive genetic variance due to the locus (8). However, ongoing 4 debate exists over whether the present analysis paradigm in quantitative genetics is at its limits for truly understanding complex traits.
For example, the phenotype height in humans is a classical quantitative trait that has been studied for well over a century as a model for investigating the genetic basis of complex traits (5,9) and whose measured heritability is well known to be around 80% (5,10,11). Yet, this phenotype remains controversial (12) as current association methods have not been able to recover fully the measured heritability (13,14). Whilst some explain this discrepancy as being associated with the restricted sample size used or the involvement of an ill-defined environment (7,15), one may wonder whether Fisher's method is accurate enough to detect small gene effects.
Another issue concerns the role of microstate variances. Whilst Fisher's method uses the distribution density function of microstates to detect differences between averages, it is still unclear as to how the variance of microstates are involved in genotype-phenotype associations, beyond their role in heredity (12).
Altogether, those results suggest that current genotype-phenotype mapping methods, based on Fisher's works, i.e., the method of averages, are potentially limited to capture the full potential of genotype-phenotype associations. R.A. Fisher's work is based on statistics and by definition statistics deals with the measurement of uncertainties (16). To draw inferences from the comparison of data, a method is needed that requires some understanding about its accuracy, including ways of measuring the uncertainty in data values. In this context statistics is the science of collecting, analysing, and interpreting data; whilst probability, defined through relative frequencies, is central to determining the validity of statistical inferences. In practice, the use of frequentist probability, and the resulting binning or categorization of data, is justified when inaccuracy exists in experimental 5 measurement. For example, measuring a continuous phenotype such as the height of individuals with a ruler with centimetre graduations, i.e., to the nearest centimeter, warrants the use of frequentist probability. In this case, a frequency table of phenotype values can be defined through 1cm-width bins or categories, from which the probability density functions can be deduced to address the statistical inferences.
However, this method becomes problematic when the measurement of phenotype values can be carried out with very high precision, for example using highly advanced imaging techniques or biosensing technologies (17). In this case each individual measured could return a unique phenotype value. The phenotype values being unique how can 'randomness in the data' be defined, and frequentist probability used, to determine any inferences?
If this happens, only two options are possible. The first one is to artificially create categories and bin or group the data; the other is to increase the population size so as to ensure categories can be defined to match the precision required. Creating artificial imprecisions is clearly a step backwards in an era of advanced measurement technologies and increasing the number of individuals seems to be the way forward. However, even if the latter option was chosen, there is no guarantee that any population will be large enough to match the precision achieved by biosensing technologies, namely, to permit the creation of categories. This implies that the notion of 'precision' that can be achieved when frequentist probability is used, is always bounded by the population size, be it the world population. Figure 1 exemplifies the difficulty to extract genotype-phenotype association(s) with current methods when the sample size is restricted.
There is, therefore, a need to formulate new methods that make full use of the information generated through accurate and highly precise phenotyping which do not require categorization of the data. In fact, this problem is the equivalent of finding a way to resolve genotypephenotype mapping by assuming a finite size population with phenotype values measured 6 precisely enough such as to rule out the possibility that two phenotype values are identical.
Taking this challenge as the starting point a new and relatively simple method of extracting information for genotype-phenotype mapping can be defined. Whilst this method is remarkably simple when explained in lay terms, its theoretical framework requires the introduction of a new concept that we call the 'phenotypic field'. It will also be seen that phenotypic fields can be defined in the context of Fisher's theory.
The paper is structured as follows. A theoretical framework is provided with a particular focus on small gene effects. It is demonstrated that the method converges rapidly, ruling out the need for large population size to extract precise genotype-phenotype information. Finally, the application of the method is illustrated using simulated and real data. 7 2. Heuristic presentation of FIT-GWA: Introduction to the genetic path method Let us consider a population of 'N' genotyped individuals and where the phenotype of interest has been measured precisely enough such as each individual has a unique phenotype value.
For diploid organisms, such as humans, and for a binary (bi-allelic, A or a) genetic marker, any microstate (genotype) can only take three values that we shall write as '+1', '0' and '-1' corresponding to genotypes aa, Aa and AA, respectively. To develop the model, we consider two configurations and to illustrate those two configurations, we assume the following setting. The individuals are horses in a yard where adjacent to this yard there are 'N' non nominative aligned paddocks numbered i = 1 … N ( Fig.2A). The variable 'i' shall be referred as the (paddock) position.
In the first configuration, the horses are allocated randomly to paddocks, that is, there is no information about any phenotype values. As a result, when individuals are randomly selected by phenotype their genetic microstates taken together is a disordered string of '+1', '0' and '-1'. The cumulative sum of genetic microstates found is noted 'θ (i)' where 'i' is the paddock's number or, equivalently, the position in the string of microstates. Given the random allocation, the probabilities of finding '+1', '0' or '-1' as genetic microstate at any position 'i' are ω , ω and ω , respectively with a resulting cumulative sum that is: θ (i) = (+1 • ω + 0 • ω − 1 • ω )i. 'θ (i)' and is therefore a straight line. We shall call 'θ (i)' the 'default genetic path' (Fig.2B&C).
In the second configuration, after selecting on phenotype the horses are ranked by phenotype value. For example, if the phenotype is the height, the smallest horse is allocated to the first 8 paddock and the highest horse the last one. The new cumulative sum of microstates is now calculated at each position in the microstate string. If an association exists between the genome position considered and the phenotype, then one may expect a change in the configuration of the string of microstates and resulting cumulative sum of microstates.
Thus, the only difference between the first and the second configurations, is the resulting shape of the cumulative sum of genetic microstates.
As a result, the signature of a gene interacting with the phenotype when considering the two aforementioned genetic paths is the difference: 'θ(i) − θ (i)', including a conservation relation since the two paths must meet when i = N, i.e., θ(N) = θ (N), for they contain the same number of '+1', '0' and '-1' (Fig.2B&C).
Using the terminology of physics, one could say that 'θ(i) − θ (i) ≠ 0' because the phenotype acts as a 'field' to change the configuration of the string of microstates. More precisely, since no information is available on the phenotype in the first configuration, the shape of the cumulative sum as a straight line arises from the random ordering of genetic microstates corresponding to a maximisation of the entropy of the set of genetic microstates. When an association exists between genotype and phenotype the signature is a cumulative sum with a curved shape and in this case, the phenotypic 'field' is involved in ordering the genetic microstates.
The core of the present study is to model 'θ(i) − θ (i)' and demonstrate how genotypephenotype associations can be inferred.
3. The genetic path model: mathematical principles underlying the method 9

Differential expression of the genetic path in the space of phenotype values
We assume that the phenotype values are measured precisely enough such that each individual has a unique phenotype value noted 'Ω '. As the population is composed of 'N' individuals and to use the continuum limit one defines, ı̂≝ i/N and 'Ω ̂' the new position and its corresponding phenotype value. Thus 'Ω / ' and 'Ω ' are the smallest and largest phenotype values, respectively.
As a result, the cumulative sum of presence probability of genetic microstates as a function of 'ı' can be written as: where 'ω (ȷ)' and 'ω (ȷ)' are the presence probabilities of microstates '+1' and '-1' at the position 'ȷ'. Using the continuum limit: As the ranking of phenotype values was introduced to define 'θ(ı)', the genetic path can also be expressed as a function of phenotype values under the form: ω Ω ̂ where the hat ' ' is added to describe new functions linked to phenotype values.
The different elements leading to the formation of a genetic path can now be addressed.

Entropy of genetic paths
Keeping the notations, ω , ω and ω , for the genetic microstate frequencies of a given genome position across the population of individuals we aim to determine an expression for ω (ı), ω (ı) and ω (ı) given the information obtained upon ordering the genotypes as a function of phenotype values along the ı-axis.
The default genetic path 'θ ' as a straight line is defined by the absence of information on phenotype values that is similar to an absence of association between the genetic microstates and the phenotype values leading, as a result, to an apparent disordering of genetic microstates.
One way to measure this disordering is by using the 'entropy' of the string of genetic microstates for the genome position considered. In this context, the entropy is given by When information about phenotype values and their ranking is given and when the genome position considered is associated with the phenotype, 'S ' is transformed to 'S' where: As a result, the entropy difference 'S − S ' when non null provides information as to whether the genome position is associated with the phenotype values. Thus, 'S − S ' can be thought of as a 'transformation' in a physics/thermodynamic sense. Namely that such entropy difference must be balanced by a term that is linked to the association (or interaction) between the genetic microstates and the phenotype values.

Introduction of phenotypic fields as a source for genotype-phenotype associations
As 'S − S ' is linked to the information gained on phenotype values and given the existence of three distinct genetic microstates, one can define three distinct functions a.k.a. phenotypic fields 'u (Ω)', 'u (Ω)' and 'u (Ω)' that are fundamentally related to changes in the phenotype-associated genetic path.
In this context, the entire genetic path can be defined with a function representing the sum of interactions between each of the genetic microstates and the phenotypic fields: There are two different but mathematically equivalent ways to consider the notion of phenotypic fields. The first way is to consider that the fields only appear due to the ranking of individuals. In this context one may consider that the set of microstates change configuration because the fields are 'switch on'. This implies that for the genome positions that are not involved in the formation of the phenotype considered, the switch does not work, namely the fields are null. In this context one can consider the equivalence: S − S ~E.
The second way is to consider that the fields are always switched on for all genome positions, but that some 'work' is provided such as to transform a 'phenotype-responding genetic path' to a non-responding one. In this context the sum of interactions defining the default, i.e., nonresponding, genetic path must be defined as: Then, a connection between 'S − S ' and 'E − E ' can be postulated through the existence of a genotype-phenotype association constant 's' such as to provide the equivalence: S − S = s(E − E ); where 's' provides the 'magnitude' of the interaction or association between the genotype and the phenotype. It is this last consideration that we shall use throughout. As a result, the relation to optimize is: Recalling the conservation of genetic microstates for the genome position considered: together with the conservation of probability, ω (Ω) + ω (Ω) + ω (Ω) = 1, Euler-Lagrange's method can then be used to determine the optimal configuration for ω (Ω), ω (Ω) and ω (Ω) in a context where 's' is fixed and the phenotypic fields are imposed. By defining α , α and α , the Lagrange multipliers for the conservation of genetic microstates, the relation to optimise with regard to the genetic microstate frequencies ω (Ω), ω (Ω) and ω (Ω) is then: Using the conservation of genetic microstate frequencies, 'ω (Ω)' can be replaced by '1 − ω (Ω) − ω (Ω)', and a variational calculus can be performed on the genetic microstate frequencies leading to two conditions that are: Where δω (Ω) and δω (Ω) are small variations in the presence probabilities of microstates '+1' and '-1', and δu ( can be deduced using ω (Ω) = 1 − ω (Ω) − ω (Ω).
Making use of the conditions: ω (Ω) = ω , ω (Ω) = ω and ω (Ω) = ω when s = 0, we obtain finally: In order to make the asymmetries of the problem more apparent the following are defined for the genetic microstates: ∆ω ≝ ω − ω ω ≝ ω + ω = 1 − ω and for the phenotypic fields: Then the difference and sum of ω (Ω) and ω (Ω) can be rewritten under the forms: Noting that: −1 ≤ ∆ω /ω ≤ +1, a new phenotype value is defined and noted 'Ω ' by setting th s∆u(Ω ) ≝ ∆ . Then, the difference and sum of ω (Ω) and ω (Ω) can be rewritten as: The new variable 'Ω ' is then the phenotype value corresponding to the condition: The meaning of the constant 'α ' can be related to Hardy-Weinberg Law. Hardy-Weinberg Law based on random matting in a population provides a relationship between the genetic microstate frequencies under the form: p + 2pq + q = 1, where p and q are the genotype frequencies of genetic microstates '+1' and '-1', i.e. homozygote genotypes aa and AA, respectively; and 2pq the genotype frequency for genetic microstate '0', i.e. the heterozygote genotype Aa. In our case this corresponds to replacing p , q and 2pq by, respectively, ω , ω and ω . As a result, Hardy-Weinberg Law imposes α = 1; and α ≠ 1 corresponding to a deviation from the Law. However, this term is expected to remain stable upon any changes of allele or genotype frequencies suggesting therefore that, genetically, any changes in '∆ω ' are to some extent compensated by corresponding changes in 'ω '.
We can now turn to the full expression of the genetic paths difference in the space of phenotype value 3.4 Expression of the genetics paths difference and conservation of genetic microstate frequencies.
The phenotype-associated genetic path is simply the integration of Eq.5 over the phenotype values that is given, as seen above (Eq.2), by: genetic path is deduced from considering that the difference in the presence probabilities between the genetic microstates '+1' and '-1' is constant, i.e.: ∫ ∆ω ∆( ) dΩ / .
By rewriting '∆ω ' as 'ω (∆ω /ω )', where ∆ω /ω = th sΔu(Ω ) and deducing 'ω ' from α = (1 − ω )ch sΔu(Ω ) /ω ; it follows: ∆ω = ( ) ( ) . As a result, the difference between the phenotype-associated and default genetic paths expressed as a function of phenotypic fields is, in the continuum limit: To which the conservation of genetic microstates needs to be added whatever the genetic path taken, i.e. ∆θ (Ω ) = 0, expressed as: Therefore, as 'α ' is constant since a single genome position is considered, the genetic paths difference can be re-expressed integrally using two independent reduced phenotypic fields, i.e., it is then deduced: Eq.11 demonstrates that '∆θ (Ω)' is sensitive to the probability density functions involved as a whole and not just to average values. That is to say that the variance of microstates and their average values will impact on genotype-phenotype associations.
Note that in Eq.11 the integration interval is unchanged. However, the convergence in The signification of 's∆u(Ω)' can be addressed using Fisher's approach.

Definition of Fisher's fields
In his seminal paper (5) Fisher hypothesized that, in a context where the population is infinite to use the normal distribution, the genetic variance 'α ' is much smaller than the phenotype variance and that the variances of microstate distribution density functions for each gene are similar to that of the variance of the phenotype. Whilst his hypothesis can be understood intuitively when all distribution density functions nearly overlap, it can also be demonstrated using Eq.9b. Indeed, assuming α ≪ σ implies σ − α ~σ and therefore σ ~ω σ + ω σ + ω σ . As ω + ω + ω = 1, posing σ ~σ ~σ ~σ as Fisher did, is one valid solution.
However, the relation: σ ~ω σ + ω σ + ω σ , is the equation of an ellipse and an infinite number of solutions are, in theory, possible that will depend on the variances of microstates (see 3.5.4., i.e., the definition of fields linked to the variance of microstates below). In Eq.12a and Eq.12b the term: δa − δa = a − a = 2a (see Fig.1A), is known as the 'gene effect' in GWAS. In Eq.12c the term: 2δa − δa − δa = 2a − a − a = d (see Fig.1A), is the dominance as defined in GWAS. As it turns out, Fisher considered: d~0.
Altogether these results demonstrate that Fisher's theory can be described by the minimalist model aforementioned (SM2 in supplementary materials). Using those fields, it is also possible to determine a generic solution to Eq.8 (see SM4 in supplementary materials).

Implication for small gene effects
Complex traits involve genes with very small effects that are difficult to characterise. The aim now is to determine the resulting genetic paths difference in this case, i.e., when a → 0.
As probability density functions have been used, the integration interval can be altered using the convergence property of distributions. In this context, the conservation of genetic microstates (Eq.8) can be written using Fisher's fields as: Therefore, small gene effects imply: δΩ /σ ≪ 1.
Using this result, the genetic paths difference can then be developed when gene effects are small and by assuming that P Ω / ≪ 1 and that 'P (Ω)' is normally distributed one obtains at the leading order: Eq.13 shows that, in the context of Fisher's theory, a small gene effect corresponds to an overlapping symmetry between the genetic microstates and the phenotype distribution, with an amplitude proportional to the gene effect.

Fields linked to the variance of microstates
The involvement of variances of microstate distribution functions in genotype-phenotype associations is a highly debated matter, see (12) and references within.
As said above, the expression of the genetic paths difference considers the distribution density function as a whole including the role of variances. In this context, we saw that Eq.9b provides a relation between variances under the form of an ellipse. Whilst assuming a single variance 22 for all microstate and phenotype distributions, as Fisher did, is plausible; other solutions exist that would not violate Eq.9b.
In this context, let us imagine that the gene effect and dominance are nulls but that the

Illustration of the application of FIT-GWA using simulated data
It is our intention to illustrate how FIT-GWA can be applied using simulated data and assess, at least qualitatively, its sensitivity to extract information.
Data was simulated according to quantitative genetic models defined by Falconer and Mackay (1996) (18). A single bi-allelic quantitative trait locus (QTL) associated with a continuous phenotype was modelled, with additive allele effect, a, and allele frequencies, p and q, where 23 p + q = 1. Simulation parameters were set as the number of individuals sampled, N = 1000; number of simulation replicates, n = 1000; allele frequency, p; additive allele effect, a, and dominance, d. Note that the number of simulation replicates allows one to determine the best outcomes from Fisher's theory. Whilst the theory provided in this paper is general, the simulation of data will be restricted to individuals' genotypes allocated according to Hardy-Weinberg proportions, for N individuals Np had genotype AA (corresponding to microstate -1), 2pqN had genotype Aa (microstate 0) and Nq had genotype aa (microstate +1). The allele effect, a, is defined as half the difference between the +1 and -1 genotype (microstate) means, and, d, is the position of the 0 genotype (microstate) mean ( Figure 1). Dominance is measured as the deviation of the mean of microstate 0 from the mid-way point between the means of the +1 and -1 microstates. For the purposes of the simulation dominance, d, was 0, i.e. the mean of microstate 0 was mid-way between the mean of microstates +1 and -1.
The additive genetic variance due to the quantitative trait loci (σ ) is defined as (18) Note that the standard deviation(s) arising from genotype-phenotype simulations were not taken into consideration in the analysis that will follow. Instead, we report the theoretical analysis of the convergence of the genetic paths difference method and its self-consistency, as well as its sensitivity to detect genotype-phenotype associations using simulations, in SM5 and SM6, respectively (see supplementary materials).
For information, Table 1 shows how genetic variance, gene effect, i.e. 'a/σ', and allele frequency are numerically related using GWAS method. Similarly Figures 3A and 3B represent, in the context of GWAS and for the allele frequencies p=0.5 (∆ω = 0) and p=0.2 (∆ω = −0.6) that will be used below as examples, the relationship between the power of the study, the gene effect and the sample size as described in (20). The conclusion from Figure 3 is that 1000 individuals would not allow 80% power to be achieved unless the gene effect is large enough, i.e. a/σ ≥ 0.5.
Using simulated data, we can now represent the genetic paths difference and its log In Figures 4C, the profile of the phenotype distribution density function is recovered with an amplitude that decreases as 'a/σ' decreases. The red vertical line represents the average phenotype value. With the data simulated, Figure 4D demonstrates that a difference between genetic paths can be seen relatively clearly between the gene effects a/σ~0.1 and a/σ~0.01.
One may then compare how perceptible are associations using the genetic paths difference by comparing Figures 1B&1C (method  However, the shift of the phenotype value for which '∆θ (Ω)' is maximal is of interest as Eq.13 demonstrates that for small gene effects the genetic paths difference should be proportional to the phenotype distribution, namely that the phenotype value for which the genetic paths difference is extreme should be the average value of the phenotype.
Thus, in order to obtain a better visualization of the impact of the gene effect on the positioning of the phenotype value 'Ω ∆θ ' for which the genetic paths difference is maximal, a set of simulations were also performed based on allele frequencies defined by p ∈ {0.5; 0.4; 0.3; 0.2; 0.1} for log-ranging values of gene effects (Fig.5C). Note that p ∈ {0.5; 0.6; 0.7; 0.8; 0.9} can be deduced by symmetry around the average value of the phenotype.
Whilst the standard deviations obtained were not always negligible concerning 'Ω ∆θ ', typically between 0.5 and 1 phenotypic standard deviation for small gene effects; Figure 5C demonstrated trends toward the average value of the phenotype with small gene effects. Indeed, below a simulated gene effect of a/σ~10 , the average value of 'Ω ∆θ ' was remarkably 26 similar to that of the average value of the phenotype, marked by the horizontal dashed line in Fig.5C.
To confirm this trend for small gene effects we varied the population size for i.e. a/σ ≤ 0.1, from N = 10 to N = 10 to determine the presence of potential variations in 'Ω ∆θ ' linked to simulations. Results summarized in Fig.5D demonstrate that the only difference was a reduction in the standard deviations obtained for 'Ω ∆θ ' for the simulated gene effects comprised between 0.01 and 0.1 (see arrows Fig.5C and Fig.5D pointing to different magnitude of the standard deviations). Namely, the initial symmetry of the phenotype distribution density function reappears as expected (Eq.13).
Then, as ∆θ (Ω)~e e • with such a fit, one expects by identification to Eq.13 that for small gene effects: Ω ~− , − ~A and ~ e . Tables 2 and 3  Thus, recalling that the phenotype average and variance of the population modelled are, respectively, 68.17 inch and 16.24 inch 2 ; Table 2 and Table 3 demonstrate that fitting the genetic paths difference as a function of phenotype values with a quadratic curve recovers the magnitude of the average and variance of the phenotype used for the simulations for most log-27 scale values of the gene effect. Furthermore, the amplitude of the genetic paths differences is also indicative of the gene effects involved.

Discussion
In his seminal paper (5), Fisher provided a synthesis between genetic inheritance of continuous trait and the Mendelian scheme of inheritance using statistics and probability. His theory has become a landmark in the field of genetics and heredity. Whilst statistics is a natural field to employ when dealing with large data sets, the interpretation of data as well as the inferences that can be drawn rely fundamentally on the probability of occurrence of observables.
Mathematically, the probability of one particular event is only fully defined once are known: (i) the set of all possible outcomes and, (ii) the process by which the event is generated. Without this information only conjectures can be made based on interpolating data. Thus, using a normal distribution as an underlying template for the occurrence of a measurable variable, as Fisher did for the phenotype and gene microstates, comes with limitations.
The first limitation, as mentioned in the introduction, is that the normal distribution expressed in the continuum limit is a probability density function that originates from the creation of categories. The act of creating categories implies losing information that, in turn, defines the notion of 'randomness in the data'. Whilst randomness in the data can be linked to genuine imprecision in measurements, there is no justification to use such a method when precision is potentially available.
The second limitation is linked to the first one in the sense that if randomness in the data is considered as a nuisance, then the only meaningful parameter is often the average. However, it is not because an average can be calculated from any data set that it is the most pertinent parameter. Inspired by statistical physics (21) suggests that adding the notion of 'environment' is not required at this stage. The same consequence also applies to heredity whose definition, in a broad sense, is given by the ratio between the genetic and the phenotype variances. Finally, Eq.9a and Eq.9b demonstrate also that it is not possible to fully dissociate variances and averages and as a result any genotypephenotype statistical associations must consider both, and not only averages. This point is visible when the fields linked to microstate variances are considered. Therefore, the paradigm used by Fisher is limited and the method of averages too restrictive.
FIT-GWA is a method that circumvents this set of conceptual difficulties by concentrating on the information contained in genotype and phenotype in a different way. FIT-GWA can be used with Fisher's assumptions to recover key concepts from quantitative genetics including: (i) the Hardy-Weinberg coefficient locally, (ii) the Hardy-Weinberg coefficient at the 29 population level, (iii) the gene effect, (iv) the dominance and, (v) that small gene effects involve common allele frequencies (22). In addition, FIT-GWA has enabled us to define new parameters linked to the microstate variances, i.e., the pseudo-gene effect and pseudodominance, that will probably help resolving controversies (12). Therefore, taken as a whole the coarse-grained version of FIT-GWA generalizes Fisher's method.
Finally, applying FIT-GWA on simulated data using Fisher's assumption proves its sensitivity for extracting information on genotype-phenotype associations when sample sizes and gene effects are small. SM7 provides an example of application of FIT-GWA using real data (see supplementary materials).

6.Conclusion
A century ago, Fisher presented a statistical method to map genotype and phenotype that was essentially based on the measure of uncertainty. We present here a method taking as a paradigm the fact that certainty can exist with the possibility to measure phenotype and genotype with very high precision and in an associated paper in preparation, we will present a theoretical methodology based on Shannon's information enabling the significance of correlation using real genotype-phenotype data to be quantified. To conclude, this new method opens a new way to analyse genotype-phenotype mappings.
Data accessibility. The datasets are available upon request.     phenotype association. However, the method of averages discards the information that is available in the spreading of data for each genotype. The method we suggest will make use of this information to describe genotype-phenotype association. Note that dominance can occur (d≠ 0) at which point and with the method of average a linear regression of the genotype means, weighted by their frequency, on the number of alleles needs to be performed to provide a new intercept and slope (see (7)). The only difference will concern their positioning in the string. If the genome position is related to the phenotype, then a migration of genetic microstates should ensue between the two configurations linked to the knowledge on the phenotype values. (C) In this context, a very simple way to represent whether the genome position is related to the phenotype is to calculate the cumulated sum of genetic microstates in either configuration. The random allocation, or absence of information on the phenotype, should always return a straight line. On the contrary, when the genome position 'interacts' with the phenotype and when the information on the phenotype is available, changes in the shape of the curve are expected due to the 'migration' of certain microstates in the string (B). One can then suggest that the cumulative sum 'responds' to the information available regarding the phenotype or, equivalently, that the phenotype acts as a field on the genetic microstates to generate the 'migration' of genetic microstates. The cumulative sum of genetic microstates is named 'genetic path' in the present work, and it is its variations that need to be quantified to determine the magnitude of the association between the phenotype and the genotype.      1.E-6 1.E-5 1.E-4 1.E-3 1.E-2