Leveraging correlations between polygenic risk score predictors to detect heterogeneity in GWAS cohorts

Evidence from both GWAS and clinical observation has suggested that certain psychiatric, metabolic, and autoimmune diseases are heterogeneous, comprising multiple subtypes with distinct genomic etiologies and Polygenic Risk Scores (PRS). However, the presence of subtypes within many phenotypes is frequently unknown. We present CLiP (Correlated Liability Predictors), a method to detect heterogeneity in single GWAS cohorts. CLiP calculates a weighted sum of correlations between SNPs contributing to a PRS on the case/control liability scale. We demonstrate mathematically and through simulation that among i.i.d. homogeneous cases, significant anti-correlations are expected between otherwise independent predictors due to ascertainment on the hidden liability score. In the presence of heterogeneity from distinct etiologies, confounding by covariates, or mislabeling, these correlation patterns are altered predictably. We further extend our method to two additional association study designs: CLiP-X for quantitative predictors in applications such as transcriptome-wide association, and CLiP-Y for quantitative phenotypes, where there is no clear distinction between cases and controls. Through simulations, we demonstrate that CLiP and its extensions reliably distinguish between homogeneous and heterogeneous cohorts when the PRS explains as low as 5% of variance on the liability scale and cohorts comprise 50, 000 − 100, 000 samples, an increasingly practical size for modern GWAS. We apply CLiP to heterogeneity detection in schizophrenia cohorts totaling > 50, 000 cases and controls collected by the Psychiatric Genomics Consortium. We observe significant heterogeneity in mega-analysis of the combined PGC data (p-value 8.54e-4), as well as in individual cohorts meta-analyzed using Fisher’s method (p-value 0.03), based on significantly associated variants.

For heterogeneous cases comprising disjoint sub-phenotypes (E,F,G), associated SNP subsets S 1 and S 2 pertain to two independent PRSs, and passing the threshold of at least one of these PRSs is sufficient to select a case (E). Genotypes sampled from this model produce a mixture of positive and negative correlations. 7 (CLiP-X), and also with quantitative phenotypes for which there is no definition of a "case" (CLiP-Y). Next 135 we describe the generative process for simulations of homogeneous and heterogeneous PRS data used to test 136 the performance of these methods.
where 148 w j = p j 1 − p j (γ j − 1) (γ j − 1)p j + 1 The score S het is a weighted sum of difference in correlation between cases and controls, to account for To generate cases and controls, we iteratively generate batches of transcripts by random sampling, and 181 compile those that pass or fail the threshold cutoff into case and control cohorts. We generate heterogeneous 182 cohorts, by concatenating simulated cases and controls, with the fraction of cases π set to 0.5 for simplicity. 183 We reasoned that this procedure was a conservative representation of a large number of possible heterogeneity 184 scenarios including those with multiple independent sub-phenotypes. If the PRSs of these sub-phenotypes 185 are independent, then a large number of correlations between predictors will be evaluated close to zero, Given N × L matrices of quantitative expression measurements Z among cases and Z 0 among controls, we would like to determine whether Z comprises a homogeneous or heterogeneous set of cases as generated in Supplemental Algorithm 1. When Z is heterogeneous, we assume the individuals in Z can be assigned to one of two subtypes: one sampled according to the liability threshold model for the simulated phenotype, and one sampled randomly as controls. For a given predictor indexed by j ∈ [1, . . . , L], assume Z ij is sampled according to a mean and variance specific to the subtype of individual i, denoted by Z + i· for the case subtype and Z − i· for the control subtype. The distribution of the variables need not be discrete or even normally distributed, as the heterogeneity score is computed from correlations, which in turn rely only on the mean and variance of the input variables. Therefore the score can be calculated assuming any probability distribution provided that the mean and standard deviation are obtainable. For an arbitrary probability distribution D parameterized by its mean and standard deviation, we have: Assume that the proportion of individuals belonging to the group + is π. For a homogeneous group of 192 case, π = 0, and our simulations assume π = 0.5, but in practice this proportion is unknown. Incorporating where A π (x, y) = πx + (1 − π)y. We would like to make use of these expectations over correlations by incorporating them as weights in the 199 heterogeneity score as in Han et al. [30]. As predictors with high mean differences between subgroups and 200 high effects are expected to contribute more signal to the score, weighting them higher than other predictors 201 will increase power to detect heterogeneity. Therefore, we would like to define a set of weights w ij for each 202 expected r ij .

203
We derive the weights for continuous variables in an analogous manner to Han  Note.
The same weights defined in Han et al. [30] for Bernoulli variables is a special case of this general 216 formulation. These weights can now be substituted into the heterogeneity score.

217
In practice we do not know the value of µ + j because the membership of individuals in each of the 218 subsets is unknown. However, we do know the mean values of the heterogeneous case group which we denote 219 as µ j . We can use this value as an approximation for µ + j , and calculate an approximate weight: We can also quantify the errors we are making by this approximation. We have the following rela-221 12 tionship for any distribution of the genotype random variables: The approximation in Eq. 10 will attenuate the magnitude of µ + j with respect to the true value of the 223 weight. However, we also see that: As each weight is scaled by a constant factor, their relative magnitudes are unchanged. Consequently, 225 the heterogeneity score for continuous input variables does not change after this approximation. Thus we 226 can still achieve optimal estimates of heterogeneity despite lacking access to the true mean for the underlying 227 case subgroup.

229
The basic CLiP test for heterogeneity relies on differential enrichment of SNP effect sizes or odds ratios 230 across subtypes, and thus requires ascertainment for cases. But one can presume that heterogeneity exists 231 in quantitative phenotypes as well; e.g., are there distinct genetic mechanisms predisposing individuals 232 to being tall? But extending this method to quantitative phenotypes presents a challenge as there is no 233 dichotomous delineation between cases and controls. A naive solution may be to pick an arbitrary z-234 score as a threshold and denote samples who score higher as "cases" and those lower as "controls." This 235 introduces a trade-off between sample size and signal specificity, as lowering this threshold provides more 236 samples for the correlation analysis but also introduces more control-like samples which will attenuate SNP 237 associations, and the correlations themselves. A more principled method would allow for the inclusion of 238 all continuous samples, but give higher weight to those with large polygenic SNP scores. Thus we propose 239 to score heterogeneity by a weighted correlation with polygenic risk scores serving as a measure of the 240 importance of a sample in the case set. These weights determine the degree to which individuals count as a 241 "case", and therefore their contribution to the total heterogeneity score of the genotype matrix. Artificially We define a weight over individuals such that those with higher phenotype values contribute more strongly to the heterogeneity score. For a cohort of N individuals let X ij ∈ {0, 1, 2} be the number of risk alleles of SNP j in individual i, and let Y = (y 1 , . . . , y N ) values of quantitative trait 1 for the respective individuals. We introduce a normalized weight vector across the N individuals defined as φ ∈ R N such that ∀i, φ i ≥ 0 and where the weight values would reflect normalized scaling of the trait φ i = F (yi) j F (yj ) by a monotone function F. Dichotomous, case/control weighting is the special case of: SNP j, we define the weighted mean value across N individuals as: Between two SNPs j and k, we define the weighted covariance as: 15 We define the weighted correlation matrix R φ for any weighting φ as: The heterogeneity score tallies the entries of the upper-triangular correlation matrix for the phenotype-276 weighted individuals R φ(F ) . As we now lack a held-out set of controls to cancel the contribution of correla-277 tions unrelated to the phenotype, we instead calculate a conventional correlation uniformly weighted across . Additionally, we introduce a scaling factor of ( the change in variance resulting from re-weighting the correlation according to individual weights φ i . These 280 changes produce the following preliminary heterogeneity score for quantitative phenotypes: . The contribution of SNP j to the heterogeneity score is then scaled by where is a weighted generalization of an odds ratio. where given case allele frequency p + j , control allele frequency p 0 j , and sample odds ratio .

292
Combining these intermediate calculations, the heterogeneity test statistic for continuous phenotypes 293 is: For high N , this test statistic approaches the standard normal distribution, and can be evaluated as 295 a z-score hypothesis test.

296
Note that even when applying a dichotomous weighting scheme, dividing the cohort with quantita-297 tive phenotypes into artificial cases and controls, CLiP-Y still differs slightly from a direct application of 298 the case/control score. If a dichotomous weight function produces N φ artificial cases, the scaling factor where K is the total number of cohorts and p i is the p-value of the CLiP heterogeneity score for 317 cohort i. The p-value of this test statistic is evaluated on a chi-square distribution with 2K degrees of 318 freedom. Additionally, we calculated the meta-analysis Z-score of the CLiP score in a manner analogous to 319 the conventional GWAS approach, but with a 1-tail test for highly positive scores only. The meta-analysis 320 Z score is calculated according to where Z CLiP is the CLiP Z-score evaluated against the expected score with a standard deviation of where V is the disease prevalence (0.01 for schizophrenia), and F Logistic (x) =  the multiplicative model. We simulated individual data using 100 independent SNPs whose effects were standard-normally distributed. Across logistic and multiplicative models, the same odds ratios are ascribed 356 to each SNP to facilitate comparison. Also, for simplicity, we assumed a variance explained of 1 to better 357 observe the resulting correlation signal. We evaluated the dichotomous-trait score relative to sample sizes 358 for both homogeneous and heterogeneous groups. The result is shown in Figure 2A, where it is apparent 359 that the behavior of the two models diverges drastically. In particular, when a logistic model is assumed the   Table S2. Thresholded models for case/control data such as logistic and probit (liability threshold) regression produce negative correlations between predictors, while the simpler multiplicative (Risch) model does not. Here case/control cohorts are generated from logistic or Risch models with 10 diploid SNPs with allele frequency 0.5 and OR 1.16 (set to keep Risch probabilities ≤ 1). As described in [30], Risch model cases exhibit no correlations in homogeneous cases, and positive correlations in heterogeneous cases, producing zero and positive heterogeneity scores, respectively. However, in thresholded models, negative correlations in homogeneous models produce negative scores. This negative bias in homogeneous scores is unaccounted for in the method by [30], significantly increasing the probability of type II errors. (B) Heterogeneity scores (y-axis) on simulated case/control cohorts as a function of sample size (x-axis) with a fixed variance explained of 0.034 as in [31]. Simulations are run with a PRS of 100 SNPs with total variance explained of 0.034. Heterogeneous cohorts (Green) are equal-proportion mixtures of controls (Black) and homogeneous cases (Red). The expected homogeneous score (Blue) is calculated from effect sizes and allele frequencies of PRS SNPs only, and should be used as the true null score in CLiP. (C) Heterogeneity scores (y-axis) as a function of variance explained (x-axis) with a fixed sample size of 30,000 cases and 30,000 controls.
We also evaluated the performance of CLiP when heterogeneity consists of multiple potentially inde-386 pendent sub-phenotypes, each with a distinct PRS, such that an individual is considered to be a case when 387 it is a case for one or more of these sub-phenotypes. Discovering heterogeneity in these cohorts is more 388 challenging because correlations between SNPs involved in different sub-phenotype PRSs are expected to 389 be zero rather than positive, and if there are any SNP associations shared between sub-phenotypes, nega-390 21 tive correlations will be expected between them despite the presence of distinct sub-phenotypes. We tested 391 the performance of CLiP by fixing the number of cases and controls at 50,000 each, the total number of 392 SNPs at 100, and the total variance explained at 0.05, while varying the number of sub-phenotypes and the 393 fraction of SNPs that are shared across all sub-phenotypes. When this fraction is zero, the sub-phenotypes 394 are completely independent, and the SNPs are divided into mutually exclusive subsets associated with each 395 sub-phenotype. When the fraction is non-zero, that fraction of SNPs has the same effect size across all 396 sub-phenotypes. Results of these simulations are shown in Figure 3B as well as Supplemental Table S1.

397
Note that by dividing associated SNPs into associations with particular sub-phenotypes, the total variance 398 explained for each sub-phenotype is reduced, and the observed variance explained of the entire heterogeneous 399 cohort will be lower in a simple linear regression. Step Function Thresholds linear −log(1 − x) deg-2 polyn deg-4 polyn deg-6 polyn sigmoid step Figure 4: A Learned weight functions φ(x) for scoring heterogeneity in quantitative phenotypes. A local search over polynomial coefficients is performed such that the resulting function maximizes the difference between the heterogeneity scores of simulated samples of homogeneous and heterogeneous cohorts. B Tests for heterogeneity in quantitative phenotypes using multiple weighting functions over individuals, including those in A, as a function of variance explained by the PRS. Plotted are scores of heterogeneous cohorts minus the expected score of a homogeneous cohort over 20 trials. One Hundred SNPs are simulated with cohorts of 100,000 individuals. C Tests for heterogeneity in quantitative phenotypes using multiple weight functions. Plotted are mean scores of heterogeneous cohorts minus the expected score of a homogeneous cohort over 20 trials, as a function of sample size. One hundred SNPs are simulated with a total variance explained of 0.1. For comparison these scores are plotted on the same Y-axis as scores generated from step function weights at various thresholds on the percentile scale of a standard normal quantitative phenotype distribution. For each of these step function scores, the expected homogeneous score is estimated by the mean of 20 sampled homogeneous cohorts, to limit computation time.  Table 1. Generally, we observe more positive heterogeneity scores for 452 larger cohorts, though only three pass a significance p-value threshold of 0.05. The scores in Table 1   showed that the existing score assumes independence between SNPs that is absent from modern models. It 478 therefore fails to account for a negative bias in the score due to negatively correlated SNPs, implying an 479 excess of false negatives and motivating recalibration.

480
The real data results presented in this manuscript only consider PRS based on SNPs whose association quantitative GWAS, and CLiP-X to TWAS is still limited by related issues of small association signals or low 486 variance explained by significant predictors. CLiP-X with measured expression data requires larger cohorts 487 that typically assembled, as mega-analysis is often hampered by batch effects.

488
Given the above extensions to the correlation-based framework, it can now be applied broadly across 489 many different traits to look for genotypic heterogeneity among cases in diseases with previously reported 490 pleiotropic effects. While detection of heterogeneity signifies that the involved SNPs cannot be considered 491 strictly pleiotropic, heterogeneity also suggests that distinct subgroups of significant size for a separate 492 phenotype must exist among the cases for a particular primary phenotype under study. As incidence rates for 493 most diseases are low, detection of significant heterogeneity may suggest a higher degree of comorbidity than 494 is expected at random. Therefore, among disease pairs for which heterogeneity is discovered, identifying the 495 particular subgroups underlying an elevated heterogeneity score may lead to further insights into pleiotropic