DeepKin: precise estimation of in-depth relatedness and its application in UK Biobank

Accurately estimating relatedness between samples is crucial in genetics and epidemiological analysis. Using genome-wide single nucleotide polymorphisms (SNPs), it is now feasible to measure realized relatedness even in the absence of pedigree. However, the sampling variation in SNP-based measures and factors affecting method-of-moments relatedness estimators have not been fully explored, whilst static cut-off thresholds have traditionally been employed to classify relatedness levels for decades. Here, we introduce the deepKin framework as a moment-based relatedness estimation and inference method that incorporates data-specific cut-off threshold determination. It addresses the limitations of previous moment estimators by leveraging the sampling variance of the estimator to provide statistical inference and classification. Key principles in relatedness estimation and inference are provided, including inferring the critical value required to reject the hypothesis of unrelatedness, which we refer to as the deepest significant relatedness, determining the minimum effective number of markers, and understanding the impact on statistical power. Through simulations, we demonstrate that deepKin accurately infers both unrelated pairs and relatives with the support of sampling variance. We then apply deepKin to two subsets of the UK Biobank dataset. In the 3K Oxford subset, tested with four sets of SNPs, the SNP set with the largest effective number of markers and correspondingly the smallest expected sampling variance exhibits the most powerful inference for distant relatives. In the 430K British White subset, deepKin identifies 212,120 pairs of significant relatives and classifies them into six degrees. Additionally, cross-cohort significant relative ratios among 19 assessment centers located in different cities are geographically correlated, while within-cohort analyses indicate both an increase in close relatedness and a potential increase in diversity from north to south throughout the UK. Overall, deepKin presents a novel framework for accurate relatedness estimation and inference in biobank-scale datasets. For biobank-scale application we have implemented deepKin as an R package, available in the GitHub repository (https://github.com/qixininin/deepKin).


Introduction
Detecting relationships among samples is fundamental in genetics and epidemiological analysis, particularly in the context of genome-wide association studies (GWAS) and polygenic risk score (PRS).Conventionally, relatedness has been estimated based on pedigrees, which represents the expected level of genetic similarity.However, with the abundance of genome-wide single nucleotide polymorphism (SNP) data, we can now measure realized relatedness that explicitly captures actual relationship.However, SNP data itself introduces complexity due to a variety of genotyping technologies, quality control (QC) procedures, linkage disequilibrium (LD).Consequently, interpreting estimated relatedness based on genome-wide SNPs can be intricate.
Moments-based estimators are often preferred despite their lower precision, because they are computationally efficient (Speed and Balding, 2015).While few factors have been studied on influencing genome-wide relatedness estimations, Hill and Weir made an attempt to explore this within the framework of linkage analysis (Hill and Weir, 2011).They have examined the variation for various pairwise relationship as a consequence of Mendelian sampling and linkage.In contrast, current practice on SNP-based measures is more embraced in the framework of population genomewide association, while their variation has not been explored.The sampling variance of SNP-based measures varies depending on the LD of SNP data and the level of relatedness.For example, the sampling variance of estimate relatedness can be significantly larger when using more correlated SNPs (due to LD), impacting the statistical power to detect related pairs significantly deviated from unrelatedness.Although factors affecting the variation of method-of-moments relatedness estimators have not been fully explored, static cut-offs have been commonly adopted for inferring relatedness, such as kinship coefficients and IBD coefficients (Manichaikul et al., 2010;Ramstetter et al., 2017).
Neglecting the sampling variance in inference can lead to false positives and misclassifications.
Applying static cut-off thresholds without considering sampling variance which is data-specific can generate spurious inferences of relatedness that may not be significantly different from unrelatedness, leading to false positives (Ramstetter et al., 2017).
In this study, we present a novel moment-based framework for genome-wide relatedness inference, which is called deepKin.Distinguishing significantly from previous moment estimators, such as KING, deepKin offers advanced capability in relatedness inference, supported by the following statistical features.One remarkable feature is its ability to evaluate and provide the sampling variance of estimated relatedness.By leveraging its asymptotic distribution, deepKin facilitates the computation of p-values for each relatedness pair, enabling the assessment of significant deviations from unrelatedness.Furthermore, key principles are emerging when performing relatedness estimation and relatedness inference.I) deepKin determines the critical value that separates significant relatedness estimation from insignificant ones based on sampling variance, which we refer to as deepest significant relatedness; II) it identifies the minimum effective number of markers required for detecting the target degree of relatives to be significantly different from unrelated pairs; and III) Given the target degree of relatedness, it provides the amount of statistical power to be improved or compromised.We verified the performance of deepKin through simulations and demonstrated its effectiveness using the UK Biobank dataset.We also implemented deepKin estimation and the new inference framework in an R package named "deepKin", which is available in the GitHub repository (https://github.com/qixininin/deepKin).

Materials and Methods
One of the attempts of the study is to explore the sampling variance for moment-based genetic relatedness.We propose the deepKin estimator, which resembles the KING's original estimator and differs in the choice of genotype scaling factors.Two forms of genetic relationship matrix (GRM) can be found in Speed and Balding's review (see Eq 8 and Eq 9 in their review) (Speed and Balding, 2015) or in VanRaden (VanRaden, 2008), which are  !" = ∑ (% !" &'( ) " )+% #" &'( ) " , .0 = -. # !0  # "0 .Here,  !0 and  "0 are the genotypic values (0, 1, or 2, according to the number of reference alleles) at  locus for individuals  and , while ̂0 is the allele frequency at -th locus and  is the total number of variants.

KING's estimator
The original kinship estimator of KING is (Equation 5 in their original publication (Manichaikul et al., 2010)  .

Eq 1
Here the definition of  .!" is the same as that in KING's paper but the expectation of KING's estimator is only half of the relatedness .Therefore, in the following text, we tend to use 2 times of the KING's estimator ( . 1 = 2 .!" ) so as to make a clearer comparison (Table 1). has the expected values of 1, 0.5, 0.25, and 0.125 for the zero-(monozygotic twin), first-(full sib, or parentoffspring), second-(half sib, or grandparent-grandchild), and third-degree (first cousin, or great grandparent-great grandchild) relatives, respectively.However, measured through genome-wide similarity, the realized value of  could cover any value between 0 and 1.Note that, the sampling variance of  1 has not been explored since it was proposed and the absence of sampling variance hinders the feasibility of conducting precise statistical tests in the statistical framework.

deepKin estimator
Inspired by the construction of two GRMs with different scaling factors and in contrast to Eq 1, the deepKin estimator ( . 2 ) is constructed as, After some rearrangement, Error!Reference source not found.can be decomposed into three parts, which are , and  # !" are matrix elements of the GRM, which has the computational cost of ( ' ).
This decomposition gives the relationship between GRM and deepKin that  . 2 for any pair of individuals can be quickly realized through addition of these three GRM elements.The computational cost of deepKin is consequently ( ' ), which is identical to the cost of constructing GRM.
We first give the expectation and variance of deepKin under the assumption of a binomial distribution at one locus, namely the "single locus model" (see Appendix I).The expectation and variance of deepKin for a marker locus  ( . 2 " ) are However, in practice we have many variants that are often in LD, therefore, we derive the expectation and variance of deepKin under the "multiple locus model" (see Appendix II).We assume that the aggregated standardized genotypes follow the normal distribution and have the expectation of ( # !0 ) = 3 # "0 4 = 0 and variance of ( # !0 ) = 3 # "0 4 = 1, which meets the requirement of Isserlis's theorem (Isserlis, 1918).After recursively applying Isserlis's theorem, we are able to derive the expectation and asymptotic variance of  . 2 , which are,

Eq 4
The sampling variance of  . 2 between two individuals is subject to the true relatedness (, lessened sampling variance for more related samples) and the global LD between variants ( 6 = .% ∑ 7 " ' " % % $ " ' ," % is the effective number of markers, in which  0 ' 0 % ' is squared correlation between loci  -and  ' (see more discussion on  6 below).3 . 2 4 depends on the underlying normal distribution for the aggregated variants and may be disrupted by low-frequency variants, therefore, we restrict MAF to at least 0.05.For details of deriving the expectations and variances under "single locus model" and "multiple loci model" please refer to Appendix I and II.The "single locus model" is exact but is limited to independent variants, while the "multiple loci model" is asymptotic but takes into account the global LD between  variants.Therefore, the multiple loci model is preferred in the real data analysis and has been adopted to perform subsequent relatedness inferences.Considering the context of Eq 3 and Eq 4, a negative  . 2 can be attributed to sampling variance, but it may also imply possible diversity among the samples.

Relatedness inference: deepest significant estimation
The deepKin's framework provides statistical inference for the estimated relatedness score.To test whether a pair of individuals is related, the null distribution of unrelated pairs is (0, ) is The critical value is considered as the deepest significant estimation and  = log' will be the deepest significant degree by log-transformation from relatedness to relationship degree.

Relatedness inference: classification
Although the relatedness score can be continuous, we classify them into discrete classes for easier access, as often done in real data analysis (Table 1).Since Eq 4 gives the sampling variance for any relatedness, we can make a concrete inference for any observed significant relatedness between  and  + 1 degree using hypothesis testing.
By comparing these two p-values, we are able to infer the classification for any observed  % .
Intuitively, there exists a specific point of relatedness where the two p-values are equal.At this crossover point for p-values, the relatedness value can be utilized as a new boundary for direct classification (Appendix III).

Derived guidelines from deepKin
Two parameters are involved in the estimation of relatedness with deepKin: the sample size () and the number of markers ().However, it is actually  6 rather than  that acts as an indicative parameter in relatedness inference, affecting the performance of deepKin.Making relatedness inference of ( − 1)/2 pairs of samples is a decision-making process.Our subsequent presentation follows the framework of power calculation, which aims to consider Type-I () and Type-II () error rates and offer preliminary yet practical guidelines that can be applied.

Guideline I: thrifty choice for markers conditional on a target degree
Based on the expectation and variance of deepKin estimator under "multiple locus model", we can estimate the minimum  6 that is required for detecting the target degree () of relatedness ( ) from unrelated pairs, which is related to Type-I and Type-II error rates ( and ), Here,  -&; and  -&@ are the quantiles from the standard normal distribution.For more details see Appendix IV.In particular, we set  under experiment-wise control after Bonferroni correction, the corresponding  is upon the total comparisons .We give a concrete numerical example for a simple illustration.Suppose that we have 500,000 individuals, which generates  ≈ 1.2511 pairs of comparisons.We wanted to detect relatives up to the third-degree from unrelated pairs at Type I error rate of  = 0.05/ ( -&; = 7.161) and Type II error rate of =0.1 ( -&@ = 2.326), and the number of  6 based on Eq 6 should be no less than 8,780.Then a thrifty choice of a variant set can greatly reduce the financial cost of genotyping and computation.This gives the first guideline that select only a subset of the genetic variants with the required  6 is sufficient for detecting certain degree of relatedness in  comparisons.Consequently,  6 / reflects the balance between statistical power and computational cost though economical choice for markers.

Guideline II: the statistical power conditional on target degree and 𝒎 𝒆
As often  is fixed in a study -such as after Bonferroni correction, we are able to determine how much of the power () could be compromised or improved for any target relatedness ( 2 < ), For the same example above, if the target degree is 5, the power increases to 0.925; and if the target degree is 6, the power is rather reduced to 0.002.To increase statistic power, an applicable way is to increase the effective number of markers.When the effective number of markers increases, the power to detect the target relatedness increases (Figure S1).For more real data examples please refer to Table S1.

On the effective number of markers (𝒎 𝒆 )
The above analysis relies on the effective number of markers  6 , which is defined as  6 = . % ∑ 7 " ' " % % $ " ' ," % so as to describe the average LD between genome-wide variants (Chen, 2014;Visscher et al., 2014;Zhou, 2017).Intuitive  6 can be directly calculated through calculating all squared correlations between variant pairs (such as -r2 command in PLINK), but the computational cost is ( ' ) and it soon becomes unaffordable when  and especially, , is large.However, estimating  6 by direct calculation can be computationally substantial, whilst  6 is needed as a hyperparameter in determining the potential of the data such as characterized by Eq 5-7.Here, we propose a pair of estimators for  6 .

Estimator I: GRM-based estimator
We first introduce the GRM-based estimator below in which  BCC refers to the off-diagonal elements of GRM, namely GRM-based estimator in the following context (Chen, 2014). f 6 estimated from Eq 8 is asymptotically normal with its sampling variance of The computational cost is ( ' ), which is often smaller than ( ' ).A more detailed investigation on the relationship between genome-wide LD and GRM-based  6 estimator can be found in Huang et al. (Huang et al., 2023).

Estimator II: randomization-based estimator
The second estimator is based on a randomization algorithm, which reduces the computational cost of estimating  6 from ( ' ) to ().Here,  is the number of iterations and is practically sufficient if  > 100.The randomization method uses an  ×  matrix , whose element is randomly sampled from the standard normal distribution.By calculating a statistic , where  m is the standardized genotype matrix, the empirical randomizationbased estimator is then which  G , ' can be estimated from the  rounds of iteration for the estimation of  E .The randomized estimation is inspired by randomized estimation of heritability (Wu and Sankararaman, 2018).
In the simulations below,  6 was estimated using GRM-based estimator.Even though GRM will be calculated during deepKin estimation and consequently  6 could be estimated, the randomizationbased estimator still has its advantages of a faster approximation thus a faster evaluation of the guidelines, which promises a faster decision on SNP set selection.
6 used to calculate the asymptotic variance was estimated from GRM-based estimator.Ten repeats were used to achieve the standard deviation of each variance estimation.

Data-based threshold determination
We showed the difference in the inference frameworks between KING and deepKin in simulation.
Only unrelated individuals ( = 2,000) were simulated based on different number of markers ( = 1,000, 5,000, 10,000, and 50,000).The genotype simulation followed the LD scenario as described above.MAF was sampled from a uniform distribution (0.05,0.5) and ′ was sampled from a uniform distribution (0.1, 0.2), and correspondingly  6 = 978, 4,896, 9,782, and 48,967 as estimated by the GRM-based estimator.We applied the KING's inference criteria of 0.707 (the geometric mean of 1 and 0.5, Table 1) for monozygotic twins and duplicate samples, 0.354, 0.177, 0.088, and 0.044 for first-, second-, third-, and fourth-degree of relatedness, respectively.We calculated the deepest significant relatedness estimation ( : ) from Eq 5 based on  6 at the significant level of =0.05/1,999,000.This enables data-specific thresholds for relatedness inference.

Relatedness inference based on p-values
To validate the role of p-values in inferring relatedness, we simulated various related pairs in two cohorts.We simulated 200 individuals each for cohort 1 and cohort 2 ( -=  ' = 200).Between cohort 1 and cohort 2, we generated 10 pairs of related samples up to the fourth-degree.MAF was sampled from a uniform distribution (0.05,0.5) and ′ was sampled from a uniform distribution (0.1, 0.2).The numbers of markers were =133, 687, 3,086, 13,050, and 53,643, corresponding to identical, first-, second-, third-, and fourth-degree relatives, respectively.Taking fourth-degree as an example, the number of markers was determined by the following procedures.the minimum number of  6 based on Eq 6 for fourth-degree was 17,881 at Type I error rate of =0.05/40,000 and Type II error rate of =0.1.To make sure that the actual  6 of simulated markers meets the requirement, we simulated  markers that are three times of the minimum  6 ,  = 3 6 = 53,643.The actual  6 of 53,643 markers were 18,612.All sets of selected markers met the requirement of the target minimum number of  6 . 6 was estimated from GRM-based estimator.The deepest significant relatedness estimations were calculated according to  6 based on Eq 5 at a significant level of 0.05/40,000.

Oxford demo for the proof of principle
We drew 3,000 UK Biobank (UKB) participants from Oxford and analyzed four sets of SNPs, each We estimated  6 for each of the four SNP sets using the randomization method.To further describe how the choice of the SNP sets could result in the fluctuation of  6 , we examined a total of 13 conditions, 6 of which were randomly selected markers with different sizes ( = 5,000, 10,000, 50,000, 100,000, 150,000, and 200,000) and 7 of which had different pruning thresholds ( ' < 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, and 0.8 in a 50-variant window and a 5-variant count to shift the window).All conditions were performed on SNP set 2.

Relatedness in UKB white British ancestry subset
We considered a subset of 427,287 participants with self-reported white British ancestry from 19 assessment centers with a sample size greater than 5,000.Leeds had the largest sample size of 39,707, while Bury had the smallest sample size of 7,701.72,016 imputation SNPs remained after QC (Figure S3).The inclusion criteria for autosome variants were: i) MAF > 0.05; ii) HWE test pvalue > 1e-7; iii) no locus missingness; iv)  ' <0.1 in a 50-variant window and a 5-variant count to shift the window.To explore the geographical connection between relatives, we included the grid coordinates for 19 assessment centers, which were downloaded from the UKB website (https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=11002).

Simulation results
We first validated the variance derived from deepKin in two models, the "single locus model" and the "multiple loci model".By simulating paired individuals with relatedness up to third-degree and calculating their relatedness estimation using KING-homo and deepKin, we presented their observed variances, together with the expected variances derived under the "single locus model" (Figure 1 A-D).The observed variances of KING-homo and deepKin did not differ significantly from the expected one at all degrees and all MAF scenarios.When a more generalized context was introduced -MAFs were randomly assigned and different conditions of LD were considered, the observed variances resembled the expected variances derived under the "multiple loci model" at all degrees of relatedness and fluctuated at different MAF scenarios (Figure 1 E-L).The two observed variances of KING-homo and deepKin showed no significant difference when the lower boundary of MAF was above 0.1.It was worth noting that, the inclusion of low-frequency variants (MAF<0.05)introduced an increase of sampling variance significantly.With choice of common variants, the expected variance is often conservative to the observed one.(0.05,0.5), (0.10,0.5), (0.15,0.5), (0.20,0.5), (0.25,0.5), (0.30,0.5), and (0.35,0.5). + is used to depict the linkage disequilibrium and is sampled from a uniform distribution (0.1, 0.2) and (0.5, 0.8) to represent low LD (E-H) and high LD (I-L).A total of 2,000 first-degree, second-degree, third-degree, and unrelated pairs are simulated and evaluated.All scenarios are performed in 10 repeats.95% confidence intervals are given as error bars.
We then performed simulation to intuitively show the difference in the inference of relatedness between KING and deepKin (Figure 2).Only unrelated individuals were simulated.deepKin took into account the distribution of null hypothesis, which was related to the effective number of markers of the data, and provided dynamic cut-offs for relatedness inference where all unrelated pairs were inferred as "insignificant" in all scenarios.The quantile-plots for the p-value of deepKin showed that

Multiple loci model
the observed distribution matched with our expectation.However, fixed thresholds of KING could lead to false positives either because there was limited number of markers or the target degree was too distant, for instance, the lower boundary for third-degree (0.088) was within the distribution of unrelated individuals when =1,000 and 5,000, and the lower boundary for fourth-degree (0.044) was within the distribution of unrelated individuals when =1,000, 5,000, and 10,000.We also demonstrated the distribution of p-values for different degrees of relatives (identical, first-, second-, third-, and fourth-degrees) in simulation (Figure 3).The closer the relatives were, the more significant their p-values were.It turned out that when the actual size of  6 met the requirement of that for target degree at the experiment-wise Type I error rate of 0.05 and Type II error rate of 0.1 based on guideline I, all target relatives were clearly separated from unrelated pairs based on the significant thresholds.

UKB Oxford demo
We first described how  6 was fluctuated with different choices of markers in the Oxford demo (Figure S4).A total of 13 conditions were made up of two procedures, random selection or LD pruning.By randomly selecting variants along the genome,  6 increased sharply at first and then slowly, eventually approximating that of the overall QCed variants.For example, the number of markers and the effective number of markers were  = 298,211 and  6 = 7,634 for all SNPs.
However, when the threshold of LD pruning was  ' < 0.4, the number of markers and the effective number of markers were  = 91,670 and  6 = 41,000.Based on these four sets of SNPs, we demonstrated the role of  6 in affecting the sampling variance of relatedness estimation and subsequently relatedness inference using 3,000 Oxford samples of white British ancestry.deepKin took  6 into consideration and calculated the deepest significant relatedness supported by each SNP set (Table 2).The degrees of deepest significant relatedness supported by SNP sets 1-3 were closer than the fourth degree, while SNP set 4, which had the largest size of  6 , harbored the deepest significant degree up to 4.409.Therefore, SNP set 4 discovered a total number of 57 significant relatedness, while SNP set 1-3 discovered 37, 32, and 38 significant relatedness (see Data S1).We showed the expected power and classification p-values using these four sets of SNPs (Figure 4A and 4B).SNP set 4, which had the largest  6 and the smallest expected sampling variance, offered the most powerful inference for distant relatives and was the most reliable one in relatedness classification.Taking SNP set 2 and 4 as examples, deepKin estimations were very consistent to the "KING-robust" estimations at positive values (Figure 4C and 4D).No additional relatives were found for deepKin under two different SNP sets.However, casual usage of KING's relative inference cut-offs, which were constant values regardless of the changing of SNP sets, might lead to substantial difference (Table 2).
To further investigate relatedness classification, the numbers of pairs assigned to each degree of relationship based on deepKin and KING were compared (Table 2).The numbers of relative pairs were quite consistent among the four SNP sets for identical and first-degree pairs (0 and 17 respectively) using deepKin or KING.The numbers of relative pairs were also quite consistent for second-and third-degree, 4 or 5 for second-degree, and 8 to 11 for third-degree.However, for fourth-degree relatives, the number of relative pairs discovered by KING inflated and fluctuated dramatically among the four SNP sets, which were 2,769, 20,090, and 7,521 using SNP set 1-3 and 209 using SNP set 4. As deepKin only performed classification on those significant signals, no abnormal inflation of distant relatives should be observed.Simple inference based on static criteria without considering the sampling variance could introduce unexpected false positives, at least for distant relatives who might be slightly related but were not fully supported by the data.The calculation of deepest significant relatedness by deepKin offered safe examinations for each data set, and supported further relatedness classification.
As GRM was often used for sample QC in GWAS, we also investigated the number of pairs that exceed the normal cut-offs for GRM (Table S2).Using different SNP sets, the numbers of excluded relative pairs ranged from 60 to 2,381 and from 655 to 72,032 using GRM's fixed cut-offs >0.05 and >0.025,respectively.These results provided a comprehensive understanding of how deepKin interacted with both the data itself and user-specified QC.As low-frequency variants might breach the asymptotic sampling variance for deepKin (Eq 4), we examined the sampling variance of deepKin estimations in the Oxford demo when using different MAF thresholds (0.01, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, and 0.35) (Figure S5). was the number of variants that remained after applying MAF thresholds and  6 was the effective number of markers calculated from the randomization method.There was no clear gap between the observed histogram and the asymptotic normal distribution with  ' = 2/ 6 when MAF thresholds were above 0.05.We strongly suggest that low-frequency variants should be removed during quality control.

UKB white British ancestry subset
We then applied deepKin to the entire subset of participants with white British ancestry in UKB, which had =427,287 and were from 19 assessment centers.After basic QC and LD pruning, the  6 of the 72,016 markers was 56,945.We used deepKin to estimate nearly  ≈ 9.1310 pairs, which took approximate 193 hours with 48 threads of two Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz.A total of 232,552 (54.4%)UKB participants with white British ancestry were inferred to be related to at least one other person in the subset at the significant level of 0.05/, and formed a total of 212,120 statistically significant related pairs (Figure 5A).Based on Eq 5, this SNP set held the deepest significant degree of =4.567(95% CI: 4.565~4.569).These 212,120 significantly relatedness estimations were classified into six degrees (Table S3): 162 pairs of identical pairs/monozygotic twins, 25,699 pairs of first-degree relatives, 9,455 pairs of second-degree relatives, and 53,221 pairs of third-degree relatives.Dividing by the total number of comparisons , these forms were equivalent, but different numbers, to the relative components as reported in the original UKB report (Bycroft et al., 2018).Besides, a total of 129,930 (30.4%) participants with white British ancestry were related up to at least third-degree, where this ratio for all participants was 30.3% in Bycroft et al. (Bycroft et al., 2018).Moreover, since the deepest significant degree was deeper than third-degree, we also reported 91,977 and 31,606 pairs of fourth-and fifth-degree relatives.Pairs of related individuals within the UKB white British subset formed networks of related individuals.While in most cases these were networks of size two or three, there were also many groups of size four or even larger in the subset (Figure 5B).If we only considered related individuals up to third-degree, then the largest group size was reduced to 10 (data not shown).
We also analyzed the cross-cohort relatedness by assigning participants into 19 cohorts based on their assessment centers.As anticipated, we discovered varying numbers of significant cross-cohort relatives, with a notable pattern: the closer the geographical proximity, the more significant relatives identified (Figure 5C and 5D).We define the proportion of cross-cohort significant relatives () as the ratio between the number of significant cross-cohort relatives ( I!J ) and the total number of comparisons between the two cohorts ().We observed several cohort pairs with a high proportion of cross-cohort significant relatives, including Manchester and Bury, Glasgow and Edinburgh, and Newcastle and Middlesbrough.The relative proportions of these three pairs were significantly higher than the other top 20 most related pairs.Conversely, certain cohort pairs, such as Glasgow and Cardiff, exhibited a low proportion of cross-cohort significant relatives, likely due to their considerable geographical distance from each other.Specifically, the average grid distances for the top 20 and bottom 20 pairs with the highest and lowest  values were 525 and 3,463, respectively (Figure 5C and Figure S6).
In each cohort, we considered its within-cohort averaging relatedness score ( x ) and significant relatedness score ( x I!J ) and we observed another notable pattern: the more diverse the population, the closer the relatives are within each cohort (Table 3 and Figure 5E).Notably, Glasgow ( x =0.00968) and Edinburgh ( x =0.00688), both located in Scotland, exhibited higher levels of relatedness between individuals compared to other cohorts.On the other hand, Barts displayed the lowest averaging relatedness ( x =-0.00160), suggesting potential population diversity and even a slightly distinction in population structure (Figure S7).Intriguingly, Figure 5E revealed a sensible trend: a negative correlation (-0.809) between  x and  x I!J .In particular, Barts, with the lowest  x , showed the highest  x I!J of 0.262, while Glasgow, with the highest  x , had the lowest  x I!J of 0.171.
This trend remained significant even after adjusting for sample size or the number of significant pairs within each cohort.These findings robustly indicated both an increase in close relatedness and a potential increase in diversity from north to south across the 19 cohorts.

Discussion
Between KING and deepKin, the major difference is that deepKin presents missing elements of KING, such as sampling variance, and consequently sophisticated statistical inference can be conducted.It is known that the resolution of the relationship inference depends on the sampling variance of the estimator.As previously demonstrated for the IBD sharing inference, the sampling variance has been carefully calculated for relatives as defined in a traditional pedigree (Hill and Weir, 2011).However, this general principle has not been fully established in the current framework of genome-wide relatedness measures using SNP data, even though KING has been proposed more than a decade ago, and widely implemented in KING software and other popular GWAS tools such as PLINK.In this study, we present a moment-based estimator deepKin whose sampling variance has been analysed to integrate the characteristics of data, such as MAF and LD, and eventually led to a practically useful asymptotic sampling variance.The availability of the sampling variance brings out a rigorous inference framework for moment-based relatedness estimators.Therefore, deepKin can assess p-values for each pair of relatedness, give statistical inference accordingly, divide them as "unrelated" and "related", and refine classification of each pair into t-degree relatives.
Throughout the work, the effective number of markers plays a pivotal role in uncovering the sampling variance of deepKin estimator under the "multiple loci model".The parameter  6 is a generic population parameter, which characterizes global LD of GWAS markers (Huang et al., 2023).Compared to previous studies on the variation of IBD sharing, the counterpart parameter of  6 is the length of genome in terms of Morgan as for linkage-style analysis (Hill and Weir, 2011).
Our Although we have demonstrated in principle the theoretical merits and utility of deepKin, there are several questions that should be taken into account for the further development of deepKin.When comparing deepKin with KING, our analysis primarily focused on KING-homo, which performs relatedness estimation and inference within a homogeneous population.deepKin prefers to employ the strategy of selecting non-ancestry-informative (non-AIM) markers to mitigate the potential influence of differentiated allele frequencies.The influence of population stratification on  6 , and thus the sampling variance of relatedness estimation is still under investigation.One further investigation is on the union of unilineal relatives and bilineal relatives when detangling the variance of the estimator.In most of the application studies that are based on genome-wide similarity measures, first-degree is usually where they stopped at when distinguishing unilineal relatives (parent-offspring) and bilineal relatives (full siblings).This study, by all means, will be helpful in the construction of a more systematic framework on performing statistical inference on genome-wide similarity measures.

Appendix I Variance of deepKin under the single locus model
We derived the variance of deepKin to the context of binomial distribution.The rearranged probabilities and relatedness scores are: ) ' ] or  '3∑  # "0 ' .0
Therefore, the z score and corresponding p-value on  . 2 are  = ). is the standard normal cumulative distribution function.If we assume a significant level of , the critical value ( 2 : representing a different effective number of markers.The first set had 693,666 imputation loci after QC, and the distribution of MAF was as shown in FigureS2.The inclusion criteria for autosome variants were: i) minor allele frequency (MAF) > 0.05; ii) Hardy-Weinberg equilibrium (HWE) test p-value > 1e-7; iii) no locus missingness.The second SNP set had the same inclusion criteria as SNP set 1, but the MAF threshold was increased from 0.05 to 0.2 and resulted in 298,221 SNPs.The third SNP set excluded variants that had high population differentiation in SNP Set 2, which remained 237,642 variants and were often called as non-ancestry informative markers (non-AIM)(Chen et al., 2016).The fourth SNP set was performed LD pruning on SNP set 2 ( ' <0.1 in a 50-variant window and a 5-variant count to shift the window), which reduced the number of variants to 36,425.

Figure 2
Figure 2 Different ways of relatedness inference between KING and deepKin.(A, C, E, and G) Histograms of relatedness score estimations by deepKin on unrelated individual pairs using different number of markers ( = 1,000,5,000, 10,000, and 50,000).Only unrelated individuals (=2,000) were simulated.Line in black indicates the deepest significant relatedness calculated at the significant level of =0.05/1,999,000.Lines in color indicate KING's lower boundaries of 0.707 (blue), 0.354 (yellow), 0.177 (green), 0.088 (red), and 0.044 (purple) for zero-, first-, second-, third-, and fourth-degree of relatedness.Those lower boundaries that is smaller than deepest relatedness supported by deepKin is plotted in dashed lines.The grey background indicates the observed relatedness range.(B, D, F, and

Figure 3
Figure 3 The distribution of p-values for different degrees of relatives in simulation up to fourth-degree.The plot shows the estimated relatedness scores and the corresponding p-values for unrelated (circle) and related (cross) pairs.The horizontal line indicates the significant level of =0.05/40,000.The vertical dashed line indicates the corresponding deepest significant relatedness based on Eq 5.

Figure 4
Figure 4 Oxford demonstration.(A) The expected power of detecting different relatedness using four SNP sets.The dashed grey line indicates 90% power.Suppose Type I error rate of  = 0.05/ and Type II error rate of =0.1,where  = 4,498,500.(B) The expected p-values for relatedness classification using four SNP sets.p-values under two null hypotheses on adjacent  (solid curves) and  + 1 (dashed curves) degree are plotted.(C and D) Scatter plots of the estimated relatedness by KING-robust and deepKin for all individual pairs in Oxford 3K demo using SNP set 2 (C) and SNP set 4 (D).Colors indicate the number of pairs that fall within the range of each hexagonal

Figure 5 Avg
Figure 5 Relatedness in UKB white British ancestry subset (=427,287).(A) The histogram shows the distribution of relatedness estimation for 212,120 pair of significant

Figure S2 Figure S3 Figure S4 Figure S5 Figure S6 Figure S7
Figure S2The histogram of MAF distribution in Oxford 3K demo after quality control (MAF > 0.01; HWE test p-

Table 1 Degrees of relatedness with expected relatedness scores (𝜽) and kinship coefficients (𝝓).
Includesexample relationships for each degree and the corresponding lower boundaries from KING.Table does not include all possible relationship types for these degrees of relatedness.Note that, this table shows relatedness up to the t-th degree.Actually, in KING's original publication, they only considered relatedness up to third-degree, and any relatedness score <0.088 is considered as unrelated.

Table Notes :
3,000 individuals and a total number of ~4.5 million comparisons are considered. is the deepest significant degree that the data would support to detect from unrelated individuals.* / is the deepest significant relatedness score corresponding to ,  / = (1 2 ⁄ ) / .

Table 3 Summary for within-cohort relatedness for 19 UKB assessment centers.
is the averaging relatedness score of  pairs within each cohort; 508  I 678 is the averaging relatedness score of  678 significant pairs within each cohort.
deepKin guidelines may provide a general solution for choosing optimized SNPs for searching relatedness powerfully, since a larger  6 is more powerful for detecting deeper relatedness.It can be applied in more specific scenarios, such as designing optimized SNP chips for relatedness assessment in forensic applications.SNP based measures of genome similarity are highly sensitive to the minor allele frequencies in the SNP set.MAFs are influenced by factors such as the choice of SNP genotyping technology and quality control procedures.Although the expected variance under single locus model showed high precision in simulation, it is not practical to assume the same MAF for all variants.At this moment, we extended our derivation to a multiple loci model by assuming a normal distribution of the genotypes after locus-wise standardization, but this assumption may be violated when low-frequency variants are included.We suggest removing low-frequency variants and avoid false positives.Our results demonstrated that the asymptotic variance based on the multiple loci model performed well in both simulations and real dataset applications; future work should focus on extending the assumption of binomial distribution to multiple loci, which takes both the distribution of MAF and LD into consideration.
If we consider two diploid individuals, each with two alleles ( and ) at one locus.The allele frequencies are  and  for allele  and , respectively.For a pair of relatives, the probabilities of two alleles conditioning on the probability of relatedness  is: 'The above table can be applied to their respect second allele pair.The probabilities of sixteen genotype combinations between two individuals are filled in this 4 × 4 table, which is symmetric:

Table S1
Real dataset examples for applying the deepKin guidelines.

Table notes :
is the number of sample size;  is the number of markers;  . is the effective number of markers and is estimated by randomized estimation.We suppose Type I error rate of  = 0.05/N and Type II error rate of =0.1,where  = 40,000.

Table S2
QC details for four SNP sets and the numbers of pairs exceed relatedness cut-offs based on two normal GRM cutoffs in the Oxford demo.

Table Note :
3,000 individuals and a total number of ~4.5 million comparisons are considered.* is the deepest significant degree that the data would support to detect from unrelated individuals.TableS3Summary of the significant related pairs in UKB British white dataset ( = 0.05/,  = 9.1310).
* . is estimated by randomized estimation.