Sensei: How many samples to tell evolution in single-cell studies?

Cellular heterogeneity underlies cancer evolution and metastasis. Advances in single-cell technologies such as single-cell RNA sequencing and mass cytometry have enabled interrogation of cell type-specific expression profiles and abundance across heterogeneous cancer samples obtained from clinical trials and preclinical studies. However, challenges remain in determining sample sizes needed for ascertaining changes in cell type abundances in a controlled study. To address this statistical challenge, we have developed a new approach, named Sensei, to determine the number of samples and the number of cells that are required to ascertain such changes between two groups of samples in single-cell studies. Sensei expands the t-test and models the cell abundances using a beta-binomial distribution. We evaluate the mathematical accuracy of Sensei and provide practical guidelines on over 20 cell types in over 30 cancer types based on knowledge acquired from the cancer cell atlas (TCGA) and prior single-cell studies. We provide a web application to enable user-friendly study design via https://kchen-lab.github.io/sensei/table_beta.html.

human diseases. For instance, the human immune system requires constant trafficking of different cell types to disease sites to mount innate and acquired immune responses [3,4,6]. In addition, the immune system has resident cells present in almost all organs [7,8]. Observing temporal dynamics within the immune cell compartment is critical to understand processes such as autoimmunity [6,9,10], susceptibility to infections [6,8], and development of cancers [3,4,6]. Changes in the abundance of specific immune cell types within the tumor microenvironment (TME) over time reflect the evolution of cancer across the successive stages of premalignancy, invasion, local recurrence and distant metastatic spread [5,11,12]. Differences in TME composition are also reflective on different subtypes of tumors associated with different coevolving immune responses, thus reflecting two of the hallmarks of cancer: evasion of immune detection, and tumor promoting-inflammation [13]. Therefore, these pieces of information are critical to understand the role of the immune system during cancer evolution and metastasis and also to develop immune interception strategies for both cancer prevention and treatment [2].
For example, the intestinal mucosa is populated by intra-epithelial lymphocytes and mucosa associated lymphoid tissue. Proportions of T cells may vary in mucosa specimens obtained from healthy individuals at average-risk for colon cancer development (general population) compared to individuals at high-risk as a consequence of genetic predisposition due to an inherited condition such as Lynch syndrome. Lynch syndrome is the most frequent hereditary syndrome predisposing for the development of colorectal cancer and is secondary to the presence of germline mutations in one of the DNA mismatch-repair (MMR) genes. The deficiency of this mechanism leads to the accumulation of hundreds of point mutations and insertion-deletion loops (indels) that generate hypermutant neoplastic lesions [14]. These mutations constitute antigenic peptides (also known as neoantigens) that are recognized by the immune system, thus leading to an activation of different immune cell populations. Therefore, studying changes in immune cell proportions at single-cell resolution could help understand the immune response triggered at the intestinal level, thus helping to envision strategies to enhance it to prevent cancer or to decrease it to treat conditions such as inflammatory bowel disease [15]. This type of study would require the use of multi-color flow cytometry [16] and also intersects with microbiome [17] datasets, but it can be now accomplished with much higher accuracy due to the rise of single-cell RNA-sequencing (scRNA-seq) and single-cell ATAC (assay for transposaseaccessible chromatin) sequencing (scATAC-seq) [18,19]. To observe and confirm cell type differences, samples from multiple research participants will need to be collected and sequenced; thus, accurately estimating the adequate sample size is critical for the feasibility and success of these type of studies due to the current high cost of these technologies. On the other hand, an insufficient number of samples can lead to a false-negative result [20].
Various sources of variability can complicate the ascertainment of cell type abundance. Sample preparation and single-cell sequencing reactions can introduce undesirable technical biases and variations [21]. For example, cell types that are hard to harvest intact such as neurons and adipocytes may be disproportionately underrepresented. As for single-cell profiling, scRNA-seq can introduce dropouts of lowly expressed genes, low total gene counts per cell, and high bias for 3' coverage [5], while scATAC-seq can be confounded by sampling efficiency resulting in a highly sparse profiling [22]. Furthermore, mass cytometry brings its own challenges as it is susceptible to oxidization and signal spillover [23]. All of these factors often lead to uncertainty in cell typing and, therefore, need to be properly accounted for sample size estimation before the experiments are performed. Moreover, selection of the type of platforms relies on the number of cells that can be assayed, ranging vastly from 100 to 10,000 [5] and the fact that in many occasions few cells remain after performing quality control. In general, a limited number of cells leads to underrepresentation of cell types and drift in their proportions. Therefore, a method that considers these factors is urgently needed.
However, it is challenging to model the effects of these factors in a mathematical model. Several approaches have utilized statistical models to estimate the number of cells that are required for a single-cell study. "Howmanycells" (https://satijalab.org/howmanycells). It uses negative binomial distributions to estimate how many cells assayed in total ensure sufficient representation of a given cell type, assuming that the number of cells in different cell types are mutually independent. However, if the proportion of one cell type rises, the proportions of other cell types must fall. Accordingly, SCOPIT [24] uses Dirichlet-multinomial distribution to add negative correlations between cell types. Nevertheless, the authors of SCOPIT have verified that calculations based on the independence assumption are very similar to that of SCOPIT, only off by a maximum of one cell [24]. Further improvement in modeling is possible, but it will likely result in non-analytical solutions. Also, validating the accuracy of more sophisticated models will be unrealistic, as it requires datasets providing impractical and, most of the time, unfeasible numbers of technical replicates.
Most importantly, those previous approaches were designed to estimate the number of cells in a single biological sample, but not to estimate the number of biological samples that are required to ascertain changes in cell type abundance across biological conditions, a very different goal.
For biological sample size estimation, the legacy sample size estimation approach for the t-test (Methods) does not factor in the variance introduced by insufficient number of cells. Thus, the estimation can be over-optimistic, especially for rare cell types.
Here, we present a new approach, Sensei, to provide accurate estimation of the sample size (or, equivalently, statistical power or false negative rate) for a variety of single-cell studies.
Sensei takes into consideration both the number of samples and the number of cells within a unified mathematical framework and accounts for the abovementioned variabilities. We validate the accuracy of Sensei using multiple datasets and demonstrate Sensei's utility in a wide range of study settings that can impact broadly on both cancer prevention and treatment. We have also developed an online web application making Sensei accessible for clinical and basic science researchers during a study design.

SENSEI
The framework of Sensei to model a controlled clinical study is illustrated in Fig. 1. The study design includes a control group and a case group of participants of certain sizes (Fig. 1a). The proportion of a cell type, T cell as an example hereafter, in a specific tissue varies among participants. While a level of difference is expected between the means of the T cell proportions in the two groups, within-group variances blur it, thus making statistical test necessary for ascertainment. Because a proportion falls between 0 and 1, Sensei uses a beta distribution to model the true proportion of T cells in each group, which parametrizes difference between groups and variance among participants within each group (Fig. 1b). For studies involving matched-pairs of specimens, e.g., autologous samples from one group of participants, additional statistical power can be acquired from modeling positive correlation of proportions of each cell type between pairs of samples.
From each participant, a biopsy of a tissue of interest is extracted, dissociated, and assayed using one of the single-cell profiling protocols. The single-cell profile is analyzed in silico and the cells are clustered and classified into cell types (Fig. 1c). Two types of technical variations are introduced in this step. Firstly, a major source of variation is limited cell number, especially for rare cell types, which reduces the statistical power of a study. To model it, we assume that profiled cells are chosen randomly from the population, i.e., all cells in the tissue of interest, which is consistent with SCOPIT [24] and "Howmancells". Because the total number of cells in the tissue (population) is typically larger than that assayed in a single-cell experiment by several orders of magnitude, the number of sampled cells from a specific cell type would closely follow a binomial distribution, given its true proportion in the population (Fig. 1d). Secondly, sample preparation, sequencing and data analysis also raise uncertainty, which is highly complex and may not be modeled analytically. Precise modeling would require exhaustive quantification of a specific protocol, which is not readily available. Thus, we factor such variances in the beta distributions ( Fig. 1b) mentioned above, which is consistent with the empirical understanding in the field [25]. The conjugacy of beta distribution and binomial distribution facilitates such modeling, allowing for efficient computation. Also factored in is the correlation between paired samples, if applicable.
After cell types are identified, assuming that the distributions of the proportions are approximately normally distributed, the t-test, one of the most widely used statistical tests [26,27], can be applied to ascertain the between-group difference. Indeed, the observed skewness and kurtosis of cell type proportions validates the assumption of normality (Supplementary Text) [28] and justifies the use of the t-test. The t-statistics is calculated and compared with a critical value corresponds to a significance level (also referred to as false positive rate and type I error rate, 0.05 and 0.01 being the typical choice) (Fig. 1e). Sensei estimates the false negative (type II error) rate by inferring the distribution of the t-statistics and calculates the probability of it failing to reach the critical value (Fig. 1f). The correlation of samples in the paired test ( Fig. 1b) is also accounted for.
Sensei is implemented as a web application powered by JavaScript, and as a Python package.
Required as input are the sample sizes, cell numbers, estimated cell proportions, false positive rate and the type of t-test (Fig. 1g). Output is a table of false negative rates for various sample sizes for researchers to identify feasible study designs (Fig. 1h). An example for performing a paired t-test is shown in Supplementary Figure 1. Mathematical modeling is detailed in Methods.

Because Sensei's analytical solution includes necessary approximations (Methods, Equation 7
and Equation 10), we performed a simulation experiment to validate that Sensei accurately estimates the sample size for ideal beta-binomial distributions. We simulated 10,000 datasets using the beta-binomial model that Sensei aims to approximate (Methods). We set sample sizes 0 , 1 = 5~12, cell numbers per sample 0 , 1 = 1,000, mean proportions 0 = 0.03, 1 = 0.05, and variances 0 = 0.015, 1 = 0.01 for control and case samples, respectively. We performed a one-sided unpaired t-test with a significance level of = 0.05 on each dataset and counted the number of negative results to determine the false negative rate. We then used Sensei to estimate the false negative rates with the same parameters. For comparison, we also applied on the same data the legacy t-test approach, which makes predictions assuming a normal distribution instead of the beta-binomial distribution. As shown in Fig. 2a, the estimation error of Sensei against the simulated ground-truth (7.9% on average) is much smaller from that of the legacy approach (38.2%). The latter tends to be over-optimistic, because it does not account for insufficiency in cell number, which has relatively large effects on such a rare cell type.
Because real tissue data may not follow exactly the assumed distributions in the simulation, to further assess the accuracy of Sensei, we evaluated it on a breast cancer dataset, which contains 144 tumor samples, and 46 juxta-tumoral samples [29]. The proportions of T cells are available as ground truth for each sample, with an average of 56% in the tumor samples and 42% in the juxtatumoral samples (p-value = 6.6 × 10 −6 , two-sided t-test). We considered the tumor samples as the case group and the juxta-tumoral samples as the control group and assumed that a study plans to involve 12 to 20 participants per group to ascertain a change in T cell abundance. For each combination of sample sizes of both groups, we obtained the estimation from Sensei and the legacy approach using a simulated dataset generated according to the original data (Fig. 2b, Methods). A very high degree of consistency can be observed between the "Sensei" and the "Simulation" results ( Fig. 2b). For 100 cells per sample, Sensei halved the average error of the "Legacy" approach (2.5% versus 6.6%). Because T cell is With Sensei being validated, we comprehensively examined datasets from current large-scale cancer genomic studies that have over 30 cancer samples [2]. We applied Sensei to estimate how many samples are required to detect compositional changes in over 20 cell types in a particular cancer type. Our results can be utilized as a guideline for designing preclinical studies and clinical trials in a variety of settings.

SAMPLES
Changes in tumor clonal fractions have been widely used to track cancer evolution dynamics [30][31][32]. As important are changes in immune cell abundance in the TME [33]. In many studies, case and control samples are collected from different groups of patients. Thorsson et al. [2] deconvolved bulk RNA-seq data from TCGA data (Supplementary Figure 3 Incidentally, a CyTOF study of LIHC (liver hepatocellular carcinoma) is available, involving 12 tumor samples and 7 normal tissue samples [34]. Sensei estimated a power of 75% for identifying an increase in regulatory T (Tregs) cells using the study sample size. Indeed, the study successfully detected an increase in Tregs at a statistically significance level between 0.05 and 0.01.
Similarly, we calculated the sample size needed for studying cancer progressions from primary tumors to recurrent tumors, since differences in the TME may indicate cancer metastasis and treatment resistance [35]. We have used a data set from a study assessing 13 samples of glioblastoma multiforme (GBM) and 18 of low grade glioma (LGG). Unlike tumor versus normal studies, the difference between recurrent and primary tumors is generally more subtle  (Fig. 3b). This is relevant, as previous studies have detected a significant decrease in monocyte proportions over malignant transformation of glioma [36]. A decrease in neutrophils proportion, which is known to be negatively correlated with glioma grade [37], also requires relatively modest sample sizes to detect. For GBM, Sensei predicts that a study design with 80% power needs at least 37 samples per group for dendritic cells, and more for other cells (Fig. 3b). A recent pivotal single-cell study finds 13 primary and 3 recurrent GBM samples are likely insufficient to ascertain changes in immune cell types [38]. Consistently, Sensei predicts a power of only 33% for dendritic cells, 9% for T cells, and even less for other cell types for such a setting. It should be noted that the data for recurrent tumors are limited in TCGA. Thus, more pilot experiments may be advised for designing related studies.
Cancer heterogeneity is driven by both genetics and epidemiology. Often performed are pancancer studies that categorize tumors based on shared genetic and/or epidemiological features [39,40]. For example, patients with Lynch Syndrome or inflammatory bowel disease often develop colorectal cancers displaying high level microsatellite instability (MSI-H), while sporadic tumors more frequently display microsatellite stability (MSS). Molecular subtyping based on microsatellite instability is not only used in colorectal cancers, but also in other cancers such as endometrial and stomach tumors. Multiple clinical studies have shown that immune checkpointblockade therapy is more effective on MSI-H cancers, potentially because of a higher T cell infiltration rate compared to MSS cancers [41,42]. To extrapolate those findings to a wider variety of cancer types, it is important to have a study design that can ensure the ascertainment of immune cell abundance.
As an example, we selected a set of MSI-H and MSS tumors samples in TCGA produced by Hause et. al [43]. In this dataset, MSI-H tumors comprise approximately 30% of uterine corpus endometrial carcinoma (UCEC), 20% of colon adenocarcinoma (COAD), 20% of stomach adenocarcinoma (STAD), and much lower in other cancer types (Supplementary Figure 8a).
Using the cell type abundance deconvolved from the bulk RNA expression data [2] and the microsatellite instability labels obtained from genomic testing [43], we summarized the immune cell type abundance for the three cancer types (Supplementary Fig. 8b). For one-tailed unpaired Welch's t-test at a significance level of 0.05 with 80% power, the sample sizes estimated by Sensei

TESTING CELL TYPE ABUNDANCE DIFFERENCE IN PAIRED CANCER SAMPLES
Paired studies involve the use of autologous samples from the same patients and can reveal more pathologically relevant changes in cell type abundance. It is an ideal way for assessing differences in the TME between not only primary and metastasis/recurrent tumors, but also primary and adjacent normal samples [44,45].

DESIGNING A PRECANCER CLINICAL TRIAL FOR CANCER PREVENTION
Effective eradication of cancer relies on not only treatment but also prevention [47]. The AACR White Paper for cancer prevention [48] calls for acquisition of more longitudinal data from precancer samples to facilitate the modeling of progression and regression of pre-cancerous lesions. Sensei can be of great use in designing such studies. We used Sensei to design a randomized, placebo-controlled clinical trial involving patients diagnosed with Lynch syndrome, as a continuation of a pilot study [15]. The objective of the study is to evaluate whether the experimental intervention leads to recruitment and/or activation of immune cells in colorectal mucosa. Participants will be randomized to receive placebo or the experimental drug for a total of 12 months. After the treatment period, colorectal tissue samples will be collected. The percentage of immune cells within the mucosa will be measured by scRNA-seq to determine whether there were significant differences between the mean percentage of immune cells in the experimental treatment arm versus that in the placebo. Based on in silico deconvolution of bulk RNA-seq data from untreated colorectal mucosa, we estimated that the immune cell population is approximately 18.6% at baseline, with a standard deviation of around 5%. We hypothesized that the population will increase by 10 percentage points to 28.6% in the experimental arm and the standard deviation will remain the same. Based on these pieces of information, Sensei estimated that for a one-sided t-test, 6 samples in each group is needed to yield a false negative rate = 0.062 ≤ 0.1 if 1,000 cells are collected in each specimen (Table 1). Furthermore, if as many as 5,000 cells are collected in each specimen, then 5 samples in each group will be enough to achieve = 0.1, i.e., 90% power.
It should be noted that this experiment compares pre-cancerous tissues in a placebo-controlled study, which is different from comparing tumor with normal tissues in TCGA. Thus, the estimated sample size is different compared to that of COAD in Fig. 3a. Sensei can be broadly utilized in clinical trial design, as estimations of the prior parameters are often available from preclinical/pilot studies.  The estimation of Sensei is based on several assumptions on existing single-cell profiling protocols. Firstly, the profiled cells are assumed to be chosen at random from the tissue of interest, which leads to the assumption of binomial distribution. An experimental validation to this assumption would require a large number of technical replicates profiled from the same biological sample, which is neither currently available, nor practically viable. Notably, the same assumption is adopted by SCOPIT [24] and "Howmanycells" and appears widely accepted.

DISCUSSION
Choosing beta distribution conveniently models cell type proportion among participants and greatly facilitates efficient computation via beta-binomial conjugacy. Further, the beta distribution can be uniquely determined by a mean and a standard deviation, which are widely accessible from preclinical studies. Like the beta distribution, its multidimensional extension, Dirichlet distribution, has been used in similar contexts [24]. More realistic modeling is possible, should become available more prior knowledge about biological variances, technical noise, and experimental biases, although an analytical solution may not exist. The power estimation will then be based on sampling, which requires substantially more computational resources. These pieces of information may not become clear, until experimental protocols become standardized and large single-cell atlases are completed [48][49][50].
In rare cases where the assumptions are violated, researchers should be able to observe a large skew in the distribution in data analysis. In those cases, new single-cell-aware power estimation methods based on non-parametric Wilcoxon rank-sum test might be more advisable [27,28]. Sensei also assumes that the ascertainment bias within individual studies is consistent and well-controlled across experiments, i.e. equally applied to all the study samples and there are no significant batch effects, or that the batch effects have been alleviated by other systematic approaches. If severe batch effects are expected samples, stratified sampling, stratified test [51], and corresponding power estimation methods [52] should be used.
The effect on the false-negative rate of the total number of cells in each sample is generally minimal, when the number of cells is greater than 1,000. Only for rare cell types (<5% proportion) will further increasing become necessary to ensure statistical power. Our model assumes that is the same for all samples in group , which is a reasonable simplification because the number of cells generated by an assay is usually consistent in a systematic study We have shown that the t-test is appropriate for most cell types, based on the TCGA data. We have also shown that the correlations of cell type proportions between paired samples are positive for many cell types, which empowers the paired test. We also provide a guideline for setting the parameters including mean, variance, and correlation in Sensei.
We expect that Sensei, with rich information we summarized from various datasets including normal/tumor, primary/metastasis/recurrent tumor, and pre-/post-treatment data, will meet the demand of many projects that are being planned, such as those in the Human Tumor Atlas Network [53] Pre-Cancer Atlas [48,50], and clinical trials. Similar single-cell studies are on the rise at present. For example, even for colorectal carcinoma, where a relatively large cohort of data have been collected, more samples for colorectal adenoma are still needed to study the recruitment of immune cells throughout the lesion to find interventions that intercept premalignancy and prevent cancer [47]. In turn, data collected from these projects will inform Sensei to provide more realistic estimate.

CONCLUSIONS
This study reports a user-friendly web application for estimating sample size and statistical power in studies that apply single-cell profiling technologies to compare cell composition across samples. Both the number of participants and the number of cells per sample are taken into consideration. With an emphasis on cancer evolution, our results provide a guideline for designing studies to ascertain changes in cell type abundance among normal/tumor, primary/metastasis/recurrent tumor, and pre-/post-treatment conditions. We expect that Sensei will have applications in different single-cell studies involving differential abundance analysis.

BETA-BINOMIAL MODELING OF SENSEI
We assume that the study design includes 0 and 1 participants in the control and case  Table 2.

Number of cells in each sample
Mean and standard deviation of proportions for each beta distribution ,

False positive rate
We assume that in the tissue to be studied, the true proportion of the cell type of interest, is .
For the th participant in group , we denote the total number of such cells which is a random variable and has the following conditional distribution,

|~( , ). (1)
Because is largely unknown in real cases, we model using the conjugate prior of binomial distribution, Therefore, the cell number have the beta-binomial distribution, It is worth mentioning that beta-binomial distribution has been applied on modeling in compositional analysis [25,55]. It is also a simplified version of Dirichlet-multinomial distribution used in sample size calculation [24,56]. The and can be reparametrized from the userdefined mean and standard deviation and . Formally, Practically, we require that the resulting and to be both greater than 1 to confine the beta distribution to be of unimodality. Using the properties of beta binomial distribution, we can get The corresponding cell type proportion is defined as ̂= , which follows a scaled beta binomial distribution. Thus, We now assume that the beta binomial distribution can be approximated by a normal distribution ̂~� + , ( + + ) ( + ) 2 ( + + 1) � .
The approximation is justified by the fact that the L1 distance between the scaled beta-binomial distribution and Equation 7 is sufficiently small, especially for large and small (Supplementary Figure 11a). We experimented on a few examples, for = 0.3, = 0.2, the underlying beta distribution is skewed to the left and deviates from a normal distribution. That results in a slightly unprecise, but still largely acceptable normal approximation ( Supplementary   Figure 11b). For = 0.5, = 0.1, the beta distribution itself is already close to a normal distribution, and the generated ̂ can be perfectly approximated by a normal distribution (Supplementary Figure 11b).
For a two-sided test, the null hypothesis is formulated as 0 : where ̂ denotes the cell proportion in sample from group . For a one-sided test, the "=" is substituted by "<" or ">". Thus, for a t-test allowing different variances in two samples [57], the t-value in Welch's t-test follows a noncentral t-distribution, i.e., where the ̂ and � are sample mean and sample standard deviation of ̂, which are random variables. The distribution of t can be approximated by where the second term is a constant. The degree of freedom, , is calculated as which degrades to ( 1 + 2 − 2), the same as Student's t-test, when [̂1] = [̂2] and 1 = 2 [57]. Thus, the false negative rate can be calculated as where , is the CDF of the Student's t-distribution. * = 1− 2 , , as 2ℙ[ ≥ * ] < , for a twosided test [58], or * = 1− , for a one-sided test.

PAIRED TEST
Paired samples are usually collected from normal and malignant tissues, or primary and recurrent/metastatic tumors. Longitudinal data from one patient, such as pre-treatment and post-treatment also form paired samples. In such cases, paired test can exploit the correlation between paired samples to improve the statistical power. Sensei has a functionality to help design studies with paired samples. In addition to the unpaired test, we naturally require sample size 0 and 1 to be the same (denoted as ) and require one more parameter, = corr( 0 , 1 ), the correlation of the true proportions of cells between two conditions in the paired study. Note that cell number of cell type 0 and 1 are solely depend on 0 and 1 , respectively.
Thus, they are conditionally independent given 0 and 1 . Consequently, we can use law of total covariance to derive Thus, we have the distribution of the cell numbers and proportions where cov(̂0,̂1) = � 0 0 .
[⋯ ] and [⋯ ] remains the same as those in unpaired test. Note that corr(̂0,̂1) is in fact , which approaches the same as when numbers of cells, 0 and 1 are large. The difference between a pair of samples is Thus, the paired t-statistics can be calculated as =� (17) where ̂ and � are sample mean and sample standard deviation of Δ. Thus, satisfies It can be observed that the t-statistics will be the same as the unpaired test when the covariance is zero, and even smaller should the covariance be negative. In other words, paired test needs a positive correlation to gain statistical power. Also note that paired t-test does not assume an equal variance. Finally, the false negative rate is where * = 1− /2, for a two-sided test, or * = 1− , for a one-sided test, where = − 1.

LEGACY SAMPLE SIZE ESTIMATION
We refer to the sample size estimated using the mean, variance, and correlation without the beta-binomial modeling in Equation 5 and Equation 13. Consequently, the effect of number of cells is not accounted for. It is effectively assuming an infinite number of cells.

SMALLEST EFFECT SIZE OF INTEREST AND TWO ONE-SIDED T-TEST FOR EQUIVALENCE
Being Scientifically significant is usually different from being statistically different. For example, when enough samples are collected, even a 0.01% change in the proportion of a cell type can be statistically significant. However, the difference may be too small to induce any actual effect, and thus is rarely considered biologically interesting (i.e., not scientifically significant). Smallest effect size of interest (SESOI) is a way to set a threshold of scientific significance into statistical test [59]. Instead of performing t-test on the experimental group with the control group directly, it translates the control group by SESOI, the level to be considered biologically interesting, by adding or subtracting a constant from the control group. SESOI can also be used on the opposite side, to conclude that it is statistically significant, that the change in cell type abundance does not exceed the SESOI. We provide sample size estimation for t-test with SESOI in Sensei.
If two t-test with SESOI find that the different is statistically significantly within a range that is considered negligible in terms of biology, the proportion can be claimed to be effectively unchanged. This approach is formally called two one-sided t-test (TOST) for equivalence [59].
Sensei can also estimate the sample size for TOST.

MEAN, VARIANCE, CORRELATION, AND THEIR CONFIDENCE INTERVALS
The correlation and its confidence interval are obtained by standard ways [60], i.e., for cell type proportions in matched pairs {( 0 , 1 )}, = 1 … , the sample correlation coefficient and its Sensei may use ̅ and as input directly because they are the maximum likelihood estimates of parameters of a beta distribution. The confidence intervals may help evaluate the reliability of the prior knowledge. Note that the confidence limits may exceed [0, 1] in some cases, and we cut it to 0 or 1 in such cases. As a footnote, complementary log-log transform may be used to confine the limits, but it also skews the values and complicates interpretation. Bootstrap may also be used to construct the confidence interval.

SIMULATION STUDY BASED ON T CELL ABUNDANCE IN BREAST CANCER DATA
The breast cancer dataset contains 144 tumor samples, and 46 juxta-tumoral samples [29]. The proportions of T cells were available as ground truth for each sample, with an average of 56% in the tumor samples and 42% in the juxta-tumoral samples. We considered the tumor samples as the experimental group and the juxta-tumoral samples as the control group. Because the proportions of T-cells are significantly different (p-value = 6.6 × 10 −6 , two-sided t-test) between the two groups, we assume that true difference exists. We use the mean and standard deviation calculated as the input of Sensei. To validate Sensei's accuracy, we randomly drew 0 and 1 samples respectively from the juxta-tumoral and tumor samples. If we were to perform singlecell assays on these samples, we would observe T cells in each sample, according to a binomial distribution parameterized by and ( = 0,1). Binomial distribution is a reasonable assumption since a tissue sample often contains millions of cells, which is several orders of magnitudes higher than . We then perform a one-tailed unpaired t-test between the set of { 0 } and that of { 1 } at = 0.05, and record a true positive when the test is positive, and a false negative otherwise. We estimate the false negative rate by repeating the above process 1,000 times for each combination of 0 and 1 .

ETHICS APPROVAL AND CONSENT TO PARTICIPATE
Not applicable.

CONSENT FOR PUBLICATION
Not applicable.

AVAILABILITY OF DATA AND MATERIALS
The datasets analyzed during the current study are available with the original publications:

COMPETING INTERESTS
Dr. Vilar has a consulting and advisory role with Janssen Research and Development, and Recursion Pharma. The rest of the authors declare that they have no competing interests.

AUTHORS' CONTRIBUTIONS
All authors conceptualized the research. SL and JD conceived the statistical model. SL

ACKNOWLEDGEMENTS
We thank Alex Davis and Nicholas Navin for their comments.