Abstract
As more and more large psychiatric genetic cohorts are becoming available, more and more independent investigations into the underlying genetic architecture are performed, and an expanding set of replicates for estimates of key genetic parameters, namely, liability scale SNP heritability and genetic correlations – is amassing in the literature. In recent work, we published a set of SNP-heritability and genetic correlation estimates for major psychiatric disorders using data from the iPSYCH case-cohort study, and presented them alongside estimates gleaned from large, independently collected, analyzed and published meta-studies of the same disorders. Although in the broadest sense the estimates from iPSYCH and external meta-studies were concordant, and requiring strict statistical significance could reject the null hypothesis for few pairs, there were enough subtle trends in the differences to warrant further investigation. In this work, we consider a set of factors related to sample ascertainment, including the lifetime risks for disorders for the sampled populations, the use of age censored or partially screened controls, the sampling of extreme cases and controls, and diagnostic error rates, and attempt to assess their potential contributions to estimates of genetic parameters in the context of the difference trends observed in our previous work.
Introduction
The underlying theory for the estimation of heritability on the liability scale1-3 is based on a model that defines the case-control status of individuals in the study as simple partitioning of a sample from a single population into two groups (cases and controls) at a threshold in unobserved liability that is defined by the lifetime risk for becoming a case. Whether real disease data conform to this assumption cannot be proven, but with data currently available the model seems robust and cannot be rejected. Given the model, transformations can be made from statistics estimated from the data to generate estimates of parameters on the liability scale, which at least allow benchmarking under a comparable scale across different scenarios. However, sometimes data collected for GWAS may be ascertained in such a way that they violate the assumptions inherent in the transformations. A series of recent papers have shown these deviations can introduce bias under the liability-scale SNP heritability framework, if the ‘standard’ transformations are applied, and provide revised transformations to account for sample ascertainment. In the following we discuss the implications of these papers to provide a qualitative investigation into the bounds on potential contributors to differences trends observed in the genetic parameters presented in Schork et al4.
Baseline Difference Trends in Published Genetic Parameters
In Tables 1 and 2 we briefly summarize describe the key estimates presented in Figure 1 of Schork et al4 which we examine under a closer lens throughout, and their underlying sample sizes. All estimates in these tables have been published previously in Schork et al4, which itself culled GCTA GREML estimates of SNP heritability for external meta-studies from Lee et al5. LD-Score regression (LDSC)6,7 estimates from Schork et al4 used published meta-study GWAS summary statistics that have been studied widely: eXDX8, eADHD9, eAFF10, eANO11, eASD12, eBIP13, and eSCZ14. We present these published estimates to simply note the presence of some difference trends in estimates of genetic parameters in iPSYCH and published meta-studies for the same disorder or disorder pairs. What follows is an assessment of the potential contribution of ascertainment related factors to these observed (potential) differences.
Estimates of Population Lifetime Risk
Whether estimated by GREML or LD-score regression, SNP-heritability15 is the proportion of phenotypic variance explained by common SNP-variation from unrelated subjects. For quantitative traits the phenotypic variance is estimated well from a sample drawn from the population. For binary case-control (0/1) data the phenotypic variance in a sample is P(1-P), where P is the proportion of the sample that are cases, or the lifetime risk. Hence, the estimate for the variance explained by all SNPs is made on the “observed scale” (SNP ). This estimate is then transformed to be an estimate on the liability scale (SNP ) according to parameters describing the underlying population distribution in liability: 1) K, the lifetime risk of the disorder and 2) z, the height of the standard normal curve at the liability quantile (the case threshold) corresponding to a tail probability of K, and here assuming that our case-control sample has been randomly drawn from the population and everyone in the population who will get the disease has the disease, P=K (i.e. no age censoring).
Often GWAS data are not consistent with the assumptions of this simple transformation because they are typically ascertained for cases at levels well above the proportion expected given the population lifetime risk, which will produce biased (i.e., inflated or deflated) estimates15,16. Lee et al15 propose corrections to the transformation applied to estimates of , the magnitude of which depends on P for the sample. To emphasize that the estimated is dependent on the properties of the case-control sample, rather than the population, the estimate from ascertained data is called . The updated transformation accounts for ascertainment through its effect on the variance in the population K(l-K) relative to the variance in the sample P(1-P).
If an inaccurate lifetime risk is used for the above transformation, then an inaccurate heritability estimate will result. True lifetime risk estimates for a sampled population can be surprisingly hard to find, and so sensitivity analyses can be conducted considering a plausible range of K. The impact is usually trivial when K is 0.01 (with a plausible range of 0.005 to 0.02) relative to the assumptions of the model, but can become less trivial when K is 0.1 with a plausible range from 0.05 to 0.15. Although in Lee et al5, the study from meta-study GCTA estimates were obtained, this is discussed, errors in estimates of K may not be regularly accounted for a source of variability in estimates, intuitively or statistically (e.g. via adjusted standard errors). Estimating heritability in meta-studies aggregating multiple case-control samples from different populations using different case definitions requires assuming a risk that is not well defined, limiting the objective precision of SNP-heritability estimates. iPSYCH has the unique advantage of published lifetime risk estimates17 calculated for the same diagnostic criteria and within the same population SNP-heritability is studied in (XDX, 0.349; ADHD, 0.032; AFF, 0.128; ANO, 0.010; ASD, 0.013; BIP, 0.016; SCZ, 0.017), which provide similar, but not identical numbers to those emphasized in Lee et al (ADHD, 0.05; AFF, 0.15; ASD, 0.01; BIP, 0.01; SCZ, 0.01). In Figure Set 1 below, we show the effect of using different lifetime prevalence estimates on the estimates of SNP-heritability for each diagnosis in iPSYCH and for the external meta-studies. These data show that, especially for rare diseases where the immediate slope of the liability heritability versus assumed lifetime risk curve is steepest, assuming the wrong lifetime risk just slightly could change the estimate of heritability by as much as a standard deviation (half of the confidence interval). The iPSYCH data is linked with unambiguously appropriate estimates of lifetime risk, although these are still subject to sampling variance3 of K(1−K)/z2N, where N is the size of the population from which the estimates are made. So, for a single birth cohort year from Denmark (population 5.7M, ~61,000 births per year), the standard error on the estimate of K =0.01 is 0.015, but reducing to a standard error of 0.0015 when 10 years of birth cohort data are used. For the external meta-studies, accurate estimates of K are not available, as the aggregated data may come from different populations, with different phenotype definitions, and thus different lifetime risks, introducing uncertainty. In Lee et al5, the plausible ranges of K they consider can result in estimates of heritability for the upper and lower bounds of K that are as much as two standard deviations apart (the width of the confidence intervals barely overlap). This additional source of variability, which is not typically reflected in the estimated standard errors and confidence intervals, and the likelihood of its effect in both cohorts should be considered intuitively when thinking about estimates of heritability from iPSYCH and external studies. Estimation of genetic correlations are scale independent15, so these estimates are not likely to be susceptible to misestimation of K.
Use of (Partially) Screened Controls
One assumption of the standard transformation equation of SNP-heritability estimates from case-control to liability scale, i.e., to is that control samples are completely screened for the condition being studied18. This is not always the case, for example when a sample from the general population is used as controls and diagnoses are unavailable. The work by Peyrot et al18 shows how using unscreened controls leads to a downward bias in the estimation of heritability and derives a correction factor. The correction adjusts for the proportion of the controls that are expected to be unscreened cases, which they call false controls (F = Nfalse controls / (Nfalse controls + Ntrue controls)) and extends the transformation above.
In addition, if control subjects are young enough that they have not expressed their lifetime risk, then they can be viewed as “partially screened” – they have been screen for disease only up to their current age. In a large enough sample of young individuals, a (substantial) proportion will be expected to develop a disorder later in life. This could be especially pertinent for the iPSYCH cohort given its youth (all controls born between 1981 and 2005) relative to external studies which may employ older controls.
We can again take advantage of the extensive epidemiological work that preceded the iPSYCH study17, which not only provides estimates lifetime risk, but also gender specific cumulative risk estimates across the lifespan (Supplementary Tables of Pedersen et al17). We leverage these estimates to explore the potential under-estimation of heritability in iPSYCH due to the youth of the subjects, assuming the framework proposed in Peyrot et al18. For each control subject, we define their remaining risk (Kage+) for a disorder as the lifetime risk of the disorder (K), minus the cumulative risk up to their age (Kage−), conditional on their gender. This number can be seen as a rough estimate of the probability that a given control will convert to being a case at some point in their life. For the control group for each disorder, we estimate the expected number of false controls (Nfalse controls) by summing these probabilities across all control individuals for the disorder. We set a plausible range for these numbers by estimating the expected number of false controls using the upper and lower 95% confidence intervals for the lifetime cumulative risk estimates (Supplementary Tables of Pedersen et al17). We further extend this range to account the smaller sample in iPSYCH, relative to Pedersen et al. We extend the lower and upper bounds by two standard deviations of binomial distributions where the probability is defined the by the estimated number of false controls over the total number of controls (F) and the number of draws by the total number of controls. Our estimates for the expected number and proportion of false controls are described in Table 3 below.
In Figure Set 2 below we describe the potential effects of having a relatively young control cohort on our estimates of SNP heritability. These data show that in general the youth of the iPSYCH cohort is not expected to have introduced substantial downward bias in our estimates of SNP-heritability, as even in the worst-case scenario, the prevalence of any single mental disorder is not sufficient to result in large numbers of false controls. An exception for this trend may be for AFF, where the prevalence is relatively higher and there is an appreciable proportion of risk that appears late in life17. The effect on XDX estimates also appear large, however, these numbers may be less accurate because an implicit assumption of the correction factor is that the misclassified cases are equivalent genetically to the identified cases, which may not hold in this case. For a definition of XDX that is “all mental disorders,” this will not hold because our definition of XDX is enriched for the iPSYCH ascertained diagnoses (i.e., it is not perfectly representative of the population of all psychiatric patients). Also, any changes in genetic architecture, (e.g. the magnitude of heritability or levels of genetic heterogeneity) that are a function of age of onset would result in differences between the observed cases and false controls and would be expected to alter these descriptions.
Unfortunately, the necessary data for the external studies are not available to explore these transformations, but we would expect the effect to be reduced as iPSYCH is exceptional in its youth. This means the youth of iPSYCH would likely lead to exaggerated differences between iPSYCH and external studies of heritability (assuming the true values are the same). However, the contribution of this potential downward bias (which is already modest) is expected to be further tempered by the same (but less severe) age censoring in published data. We are not aware of a published framework for investigating the impact of unscreened or partially screened controls on estimates of SNP-based genetic correlation and see this as an avenue for future research.
Extreme sampling of cases and/or controls
Another assumption of the standard transformation from to is that cases and controls are a random sample of a population’s cases and controls, with respect to the single liability partitioning. Data collected for GWAS, as used for heritability estimation in the external studies, may not only be ascertained for extra cases, but also often employ extreme cases and/or controls, which could violate this assumption. Extreme controls can be defined as ascertained control subjects with, on average, less genetic liability for the disorder being considered than would be expected from a random sample of the underlying population of unaffected subjects. In practice, this could arise when subjects are selected for an absence of family history, to be free of specific disorders that are genetically correlated with the outcome of interest, or for exceptional mental health, in general. It is known that in the Psychiatric Genetics Consortium (PGC, who published most of the external studies we cite), for example, some contributing cohorts use the same control individuals for multiple disorders or select for other aspects of psychiatric health within the control samples. Extreme cases can be similarly defined as ascertained cases with, on average, more genetic liability than would be expected in a representative sample of cases from the underlying population. In GWAS, cases may be ascertained for prevalent cases, treatment resistant cases, disease severity, or archetypical presentation. If any of these features are correlated with increased genetic liability, as, for example, has been recently suggested for schizophrenia19,20, an upward bias in genetic liability relative to a random selection cases is expected and it is expected to translate to an upward bias in heritability.
This issue was introduced formally in a recent commentary21 describing the mis-estimation of heritability for male pattern baldness that was a result of censoring cases of “rather dubious baldness.” Yap et al21 suggested that this censoring resulted in case and control populations that were extreme with respect to genetic liability by unintentionally censoring an intermediate portion of the underlying genetic liability distribution. As a result, the original paper22 produced a SNP-heritability estimate biased substantially upwards, by approximately 50% of the corrected value. To counter this, Yap et al (Supplementary Methods)21 derive a transformation for observed scale heritability that takes into account ascertainment of extreme cases and/or extreme controls by decoupling the lifetime risk of being a case from that of remaining a control. In the original transformations proposed by Lee et al15, cases represent the upper K proportion of the population liability and controls the lower 1-K proportion, reflecting the notion that the single partitioning of the underlying population is assumed to be mirrored in the study sample. Yap et al21, drawing on prior work of Gianola23 and Golan et al16, generalize this to reflect ascertainment of case and control samples that represent more extreme proportions of the underlying liability, which may include a “gap” due to intermediate individuals being excluded from the study. This model uses two independent thresholds on the underlying liability scale, defining the lifetime risk of being an extreme case (KU) and extreme control (KL) independently, with the definition of ZU and ZL the height of the standard normal at the quantiles defined by these tail probabilities. When Ku = K, and KL=1-K, their correction reduces to the equation from Lee et al15 above.
In iPSYCH, the data should be fairly, and perhaps uniquely, free from potential extreme sampling ascertainment bias because of how case and control populations have been drawn from the broader Danish population and aggregated for analysis (see Pedersen et al24 for a description of the sampling scheme). The iPSYCH groups were defined to mirror a simple, single partitioning of underlying population liability. Controls are random samples of the population of unaffected subjects, without additional censoring, and cases include all diagnosed individuals, regardless of features that may reflect genetic severity. The extent of extreme sampling bias in the external meta-studies is hard to determine because the contributing cohorts will have different criteria for inclusion, and the censoring of controls or ascertainment for cases may not be directly on the genetic liability for the disorder, as was the case for the baldness study, obscuring appropriate estimates for KU and KL. To illustrate, consider screening schizophrenia controls additionally for bipolar disorder. Because of the high genetic correlation that is more or less accepted between these disorders, the remaining control population will be depleted for the portion of genetic risk that is shared between the disorders. In essence, this censors individuals that are expected to be intermediate with respect schizophrenia liability, similar in concept to the example from Yap et al. However, because the genetic correlation between the disorders is not one, they are not directly selected to be depleted of schizophrenia liability, and so the proportion of the population being represented by the screened controls is not simply 1-KSCZ-KBIP. The lower tail proportion (KL) of the population represented by sampling bipolar censored schizophrenia controls may be difficult to define but should certainly be less than the assumed l-KSCZ The same complications arise when oversampling severe, prevalent or archetypical cases. The proportion of the underlying population with respect to liability represented by ascertained subjects, KU, is surely less than K (i.e., they are likely more extreme with respect to underlying liability than an average case), but the correct choice for KU requires knowing the relationship between the ascertained features and underlying liability. Investigating this fully in the context of psychiatric disorders requires additional data from external studies and the development of novel analytic paradigms. As such, we can only attempt to provide crude benchmarks for this effect in hope to motivate more thorough future work.
In order to qualitatively explore this phenomenon, we again take advantage of the ascertainment in iPSYCH, providing unselected cases and controls. We benchmark a plausible bound on the value for KL in ascertained GWAS data, by estimating the SNP-heritability for each disorder after censoring the control population for all other psychiatric diagnoses, and comparing the resulting SNP-heritability with the estimate obtained when using the appropriate uncensored controls. In Figure Set 3 below (lightest grey bars) we show a universal overestimate, albeit modest, of heritability when using extreme controls created by censoring subjects with additional diagnoses, consistent with the concepts described in Yap et al21. By assuming the true heritability (h2) is equal to the estimate computed with uncensored controls, and, due to the iPSYCH design where our cases should not be ascertained for higher than expected liability (i.e., Ku=K), we can solve equation 4 above (taken from Yap et al21) for KL. This estimates the proportion of the population for, say, schizophrenia liability, that are being sampled from when controls are indirectly censored by excluding subjects with any other psychiatric diagnoses. We provide benchmark estimates of KL for each disorder in Table 4, below. Consistent with intuition, the difference between KL and 1-K when controls are censored for secondary conditions is largest for phenotypes with the largest genetic correlation with the remaining diagnoses (SCZ, BIP, AFF; see Schork et al4 Supplementary Figure 4) and smallest for those with less genetic correlation (ANO; see Schork et al4 Supplementary Figure 4).
To explore a possible bound on the values for KU in ascertained GWAS data, we censored cases according to a plausible proxy for severity, the total number of hospital contacts an individual experienced for the disorder, and re-estimated SNP-heritability using the iPSYCH data. Frequency and length of hospitalization was recently shown to correlate positively with genetic liability for schizophrenia19 and it has been supposed that clinically ascertained research cohorts may be enriched for prevalent cases which are noted by more frequent and longer duration of hospital contacts, among other potentially exaggerating features19,25. Because the exact function defining the relationship between liability and this proxy for severity is unknown, and may vary across disorders, we estimated the SNP-heritability censoring cases for greater than the 0.25, 0.5 and 0.75 quantiles in number of hospital contacts (data not shown) and report the largest SNP-heritability from these three case definitions. This will surely over-fit the effect of censoring on hospital contacts, but provides us an estimate that we can use as an intuitive, albeit imperfect, attempt to provide a worst-case upper bound of the effect. As such, our estimates should be treated as a speculative demonstration of the underlying concept, with limitations that are left for future work.
In Figure Set 3 below (medium grey bars) we suggest that, relative to unselected cases, using severity enriched cases can inflate estimates of liability scale heritability, relative to unascertained cases. As above, we similarly use equation 4 to back calculate the necessary KU to return the heritability estimate provided in the unascertained scenario, providing a benchmark estimate for bounding potential impact of sampling extreme cases. In Figure Set 3 (dark grey bars) we show how using both extreme cases and controls compounds the effect introduced by each individual factor, as expected. Intuitively, we can interpret the estimates of KL and KU as demonstrating that extreme sampling of cases and controls can lead to a “gap” in a study’s representative of the underlying population liability, which should follow the implications of Yap et al.
In Figure Set 4 below we show the corrected external study SNP heritability estimates that attempt to account for various levels of extreme sampling of controls or cases, independently, adapting the framework of Yap et al21. These corrections using the bounds for KL and KU we derived from resampling iPSYCH cases or controls suggest the heritability estimates could potentially be fairly substantially biased upwards (black dot and confidence interval vs. the grey dot and confidence intervals, along the continuous colored lines). In Figure Set 5 below we show corrections for nine permutations assuming both extreme cases (KU = K*p, p=0.1, 0.5, 0.75) and extreme controls (KL = (1-K)*p, p=0.75, 0.85, 0.95), where our scaling factors, p, were chosen to span the plausible ranges gleaned from Table 4, below. Even positing among the weakest scenarios is enough to remove the differences between external estimates and those made in iPSYCH for some disorders (ADHD, AFF, ANO, XDX), although quite strong selection in both directions is needed for others (ASD, BIP, SCZ).
As was described by Yap et al, when cases and controls have been selected to occupy more extreme portions of the underlying population liability, substantial bias in heritability can arise, however, precise estimates of the size of the effect require precise study into the strength of extreme sampling. We emphasize that this exercise and our attempts to bound potential overestimation from external estimates of SNP-heritability for psychiatric conditions in data that has been ascertained for GWAS are limited and should be seen as speculative. However, we do feel it suggests at least some overestimation of heritability should be expected in the published studies, given our knowledge of their ascertainment schemes. More precise investigations into the extent of overestimations, and whether the effects have a substantial impact on our broader conceptualizations of the genetic architecture of psychiatric conditions, is a topic for future research, as is the effect of extreme sampling on SNP-based genetic correlations, for which we are unaware of an established framework for investigating.
Diagnostic Errors
Our interpretation of statistics associated with a disease (observed disease) may differ from our belief of the disease (true disease). When cases are ascertained through noisy diagnoses, the estimated of SNP-heritability from the observed disease are typically downward biased compared to the true disease26, of course ascertaining on true disease status may never be achievable. For example, if some proportion of cases of one disorder are instead diagnosed with a different disorder, and/or vice versa, then estimates of genetic correlations between these two disorders will be inflated26. Wray et al.26 introduced a framework for exploring the expected impact of various levels of diagnostic error for two hypothetical disorders, A and B, in terms of unstandardized additive genetic variances and covariance for the diagnosed (σg,DA2, σg,DΒ2, σg,DA,DB) and true disorders (, , σg,ΤΑ,TΒ), and the proportions of cases correctly diagnosed (MTA, MTB) and misdiagnosed (MFA=1-MTA, MFB=1-MTB). Antilla et al27 re-derived the single variance component (e.g., σg,DA2) equation in terms of liability scale heritability (so σg,DA2 = hDA2 as the liability variance is taken to be one) and defining the covariance term (σTA,TB) using the genetic correlation (e.g., rG,TA,TB=σg,TA,TB*(hTA2,hTB2)−0.5), giving an equation for describing the relationship between the true heritability and that estimated from diagnosed individuals.
Under the simplifying assumption that for a given pair of disorders, the misclassification occurs only between the two disorders for which the correlation is computed, such that the misdiagnosed proportion of those diagnosed with disorder A (MFA) in actuality have disorder B, we can use the same notation to rewrite the equations for σg,DΑ,DΒ from Wray et al26 in terms of genetic correlations and liability scale heritability.
From our knowledge of the external meta-studies, the contributing cohorts are predominantly ascertained according to research diagnoses, while in in iPSYCH we aggregate register records of clinical diagnoses. Although psychiatric diagnoses in the Danish register have been shown to be highly reliable in multiple validation studies (e.g.,28-31) the diagnoses in iPSYCH cohort, perse, have not been systematically re-assessed and the trends towards lower SNP-heritability and higher SNP-based genetic correlations could, in theory, be taken as evidence for a relative increase in misdiagnosis when compared to the rate in external studies. Previous work in the Danish health registers32 has suggested that ~15% of patients first diagnosed with BIP will be subsequently diagnosed with SCZ and ~6% initially diagnosed with SCZ will be subsequently diagnosed with BIP, numbers that more or less agree with an independent study33 conducted in the U.S. (~15% and ~4%). We can speculate whether these shifts represent initial misdiagnoses, disease progression, comorbidity, or some other factor, and the extent to which they also affect external meta-studies (an un-measureable phenomenon, given the lack of lifetime data). Regardless, we can anchor our investigations into the magnitude of effects of potential misdiagnosis on a range with some empirical support (~5-15%), which may represent an upper bound given the similarities between the symptoms of SCZ and BIP, relative to other disorders, and because we attribute the entire cause of these later diagnoses to initial misdiagnoses.
In Figure Set 6, below, we investigate the potential effects of various levels misdiagnosis on estimates of SNP-heritability. In order to predict a “true” SNP-heritability from a “diagnosed” SNP-heritability , we need to specify not only the misdiagnosis rate (MFA), but also the composite SNP-heritability of the disorders the misdiagnosed patients truly have (, 0 if they should be a control), and the genetic correlation between the true patients and misdiagnosed patients (rG,TA,FA). We consider three values for each which span the range of estimates in Table 1 (; ). Under the intuition that a patient diagnosed is unlikely to be a true control but may have been assigned the wrong disorder, we place special emphasis on the dashed blue and purple lines, which consider the misdiagnosed patients to be a group that has moderate genetic correlation with the disorder of interest, consistent with an “average psychiatric patient.” These data suggest that misdiagnosis rates ranging from 5-15% can lead to a modest under estimation of SNP-heritability, but unreasonably high misdiagnosis rates (>40%) are needed to rectify the largest differences. For some disorders and parameter scenarios (e.g., BIP, purple lines), misdiagnosis alone cannot, even in theory, explain the observed differences in SNP-heritability, no matter the proposed rate.
Unlike the previous examples, the effects of misdiagnosis on SNP-based genetic correlations has been described analytically, albeit only for the scenario where cases from the two disorders for which genetic correlations are being estimated are misdiagnosed for each other, only. In Figure Set 7, below, we show the levels of inflation that are expected for a given true value of genetic correlation (green bars), when various proportions of the two hypothetical disorders are misclassified as the other (grey bars). When true genetic correlations are low (0 or 0.2) and misdiagnosis rates are high (0.15), the genetic correlations can be exaggerated by nearly 0.40 above their true value. When true genetic correlations are more moderate and misdiagnosis rates lower (0.05), the expected inflation is more modest.
Discussion
In this work we have, where established frameworks were available, attempted to provide context for the potential sources of uncertainty in SNP-based heritability and genetic correlation estimates that could contribute to subtle difference trends between iPSYCH and external meta-study reports. In doing so, we hope to emphasize a few points. First, strong interpretations of exact point estimates for SNP-heritability and genetic correlations may impose more precision than the underlying models and concepts beget. We see these quantities as providing broad-scale benchmarks for more qualitative claims regarding the amount of genetic variation remaining to be discovered by GWAS, general importance of common SN P variation and the plausibility of pleiotropic signals among risk variants. Second, ascertainment of cases and controls may be an important consideration when estimating SNP-heritability and one that could be studied further. Here, we consider two sets of studies, iPSYCH and external meta-studies, which were ascertained according to different sampling schemas (registers vs. predominantly clinical ascertainment) and with different primary goals in mind (epidemiological validity vs. GWAS power). As we have attempted to demonstrate, the differences in these schemas may make the individuals studies susceptible to different sources of bias that may not be a part of the current intuition when considering estimates of SNP-heritability and genetic correlations. More work in this area may be needed as large genetic cohorts with different ascertainment schema emerge.
We can summarize the totally of these demonstrations by integrating them with our own intuition about the most likely consequences of differences in ascertainment between iPSYCH and external studies. Taken together, we could speculate that the variability in SNP-heritability estimates from the meta studies is likely underestimated due to variability in the assumed lifetime risk that is not accounted for, may be underestimated modestly in iPSYCH due to the relatively younger age of control subjects and potentially higher incidence of misdiagnosis, and overestimated in external studies due to the use of extreme cases and controls. In terms of genetic correlations, we are unaware of frameworks for exploring the potential effects of age or severity censoring, but increased misdiagnosis rates could lead to upwardly biased estimates of genetic correlations. These conclusions should be viewed against the background of the inherent lack of precision in the models and concepts of SNP-heritability and genetic correlations, large sampling variances for the estimators, expectations for real differences when genetic parameters are estimated in different populations, and the need for future development of frameworks for further studying the effects of ascertainment on genetic parameters.