Abstract
Using genome-wide data from 697,828 research participants from 23andMe and UK Biobank, we increase the number of identified loci associated with being a morning person, a behavioural indicator of a person’s underlying circadian rhythm, from 24 to 351. Using data from 85,760 individuals with activity-monitor derived measures of sleep timing we show that the chronotype loci influence sleep timing: the mean sleep timing of the 5% of individuals carrying the most “morningness” alleles was 25 minutes earlier than the 5% carrying the fewest. The loci were enriched for genes involved in circadian regulation, cAMP, glutamate and insulin signalling pathways, and those expressed in the retina, hindbrain, hypothalamus, and pituitary. We provide evidence that being a morning person is causally associated with better mental health but does not appear to affect BMI or Type 2 diabetes. This study offers new insights into the biology of circadian rhythms and links to disease in humans.
Introduction
Circadian rhythms are fundamental cyclical processes that occur in most living organisms, including humans. These daily cycles affect a wide range of molecular and behavioural processes, including hormone levels, core body temperature and sleep-wake patterns1. Chronotype, often referred to as circadian preference, describes an individual’s proclivity for earlier or later sleep timing and is a physical and behavioural manifestation of the coupling between internal circadian cycles and the need for sleep, driven by sleep homeostasis. Significant natural variation exists amongst the human population with chronotype often measured on a continuous scale2, though individuals are often separated into “morning people” (or “larks”) who prefer going to bed and waking earlier, “evening people” (or “owls”) who prefer a later bedtime and later rising time, and “intermediates” who lie between the two extremes3,4. Age and gender, as well as environmental light levels explain a substantial proportion of variation in chronotype, but genetic variation is also an important contributor5,6,7,8.
There is evidence that alterations to circadian timing are linked to disease development, particularly metabolic and psychiatric disorders9,10. Animal model studies have shown that mutations in, and altered expression of, key circadian rhythm genes can cause obesity, hyperglycaemia and defective beta-cell function leading to diabetes11–13. In humans, there are many reported associations between disrupted circadian rhythms and disease14,15, but the evidence for a causal role of chronotype on disease is limited16. For example, evening people have an increased frequency of obesity17, Type 2 diabetes18 and depression19 independent of sleep disturbance, and studies of shift workers show an increased risk of diabetes, depression and other diseases20. However, these associations could be explained by reverse causality (diseases affecting sleep patterns or dictating job options) or confounding (common risk factors influencing both chronotype and disease). Genetic analyses identifying variants robustly associated with putative risk factors, such as chronotype, can improve causal understanding by providing genetic instruments for use in Mendelian Randomization (MR) analyses21–23, which minimise the effect of both reverse causality and bias caused by confounding. Identifying genetic variants associated with chronotype and sleep timing will also provide new insights into the biological processes underlying circadian rhythms and sleep homeostasis.
Three previous genome-wide association studies (GWAS)24–26, using a maximum of 128,286 individuals, identified a total of 24 independent variants associated with self-report chronotype. In this study, we performed a GWAS meta-analysis of a substantially expanded set of 697,828 individuals, including 248,098 participants from 23andMe Inc., a personal genetics company, and 449,734 participants from UK Biobank27,28. In addition to confirming an enrichment of circadian rhythm and brain expressed genes at chronotype-associated loci and genetic correlation with mental health disorders25,26, in this substantially enlarged GWAS study we identify 327 additional chronotype-associated loci and demonstrate that the chronotype-associated variants are associated with objective measures of sleep timing, but not sleep duration or quality, in 85,760 UK Biobank participants. By fine-mapping the genetic associations at all loci, we identify 10 coding variants with a high likelihood of being the causal variant, providing new targets for biological investigation of chronobiology and go on to provide evidence of a causal link between chronotype and mental health.
Results
351 loci associated with morning chronotype from a GWAS meta-analysis including 697,828 individuals
We performed a GWAS of self-report chronotype (Table 1) using 11,977,111 imputed variants in 449,734 individuals of European ancestry from the UK Biobank and meta-analysed with summary statistics from a self-report morningness GWAS using 11,947,421 variants in 248,098 European-ancestry 23andMe research participants. We identified 351 independent loci at P<5×10-8, of which 258 reached P<6×10-9, a correction for the significance threshold based on permutation testing (Supplementary Methods). Of the 351 loci, 24 had been previously reported in earlier GWAS of chronotype24–26 and 327 were novel associations. The primary meta-analysis, based on sample size, and individual study results are shown in Figure 1 and Supplementary Table 1. Conditional analysis identified 49 loci with multiple independent signals (Supplementary Table 2). A sensitivity analysis was performed in the UK Biobank data alone, excluding shift workers and those either on medication or with disorders affecting sleep (see the Methods section and Supplementary Methods for details). Effect sizes were similar to those in the full UK Biobank GWAS (Supplementary Table 1 and Supplementary Figure 1).
Known circadian genes amongst associated loci
Well-documented circadian rhythm genes were among the most strongly associated loci (Supplementary Table 1). These genes included the previously reported loci containing RGS16, PER2, PER3, PIGK/AK5, INADL, FBXL3, HCRTR2 and HTR624–26, and newly associated loci containing known circadian rhythm genes PER1, CRY1 and ARNTL (Supplementary Figure 2). At the PER3 locus, two highly correlated low frequency missense variants (rs150812083 and rs139315125, MAF=0.5%), previously reported to be a monogenic cause of familial advanced sleep phase syndrome29, were associated with self-reported morningness (OR=1.44 for minor allele; P=2×10-38) but with a lower magnitude of effect on sleep timing than expected in the activity-monitor derived measures of chronotype, advancing sleep timing (as measured by time of minimum activity) by only 8 minutes (95% CI: 4, 13, P=4.3×10-4) as opposed to the average 4.2 hours reported in the previous study29.
Chronotype associated variants affect objective sleep timing, but not quality or duration
Self-report assessments of sleep and chronotype can be subject to reporting bias30–33. To assess and quantify the effect of the genetic associations on objective measures of sleep timing, duration and quality, we tested the association of the chronotype-associated variants with sleep estimates derived from the UK Biobank activity monitor data. Derived phenotypes included sleep timing, efficiency and duration. Timing was determined by timings of midpoint of sleep, the least active 5 hours of the day (L5 timing) and midpoint of the most active 10 hours of the day (M10 timing). Summary statistics of these derived phenotypes and their associations with self-report morningness are presented in Supplementary Table 3, and their associations with the newly identified chronotype SNPs are provided in Supplementary Table 4. To avoid inflation of associations due to overlapping samples, we performed an additional GWAS meta-analysis of self-reported morningness excluding all UK Biobank individuals with activity monitor data. Of the 292 lead chronotype variants reaching P<5×10-8 from this meta-analysis that were available in the UK Biobank imputed genotype data, 258 had a consistent direction of effect for sleep midpoint (binomial test P=3.8×10-44), 262 with L5 timing (binomial test P=9.3×10-48) and 260 with M10 timing (binomial test P=6.4×10-46). A genetic risk score (GRS) of these 292 variants was associated with earlier sleep midpoint, L5 timing and M10 timing (P=4×10-128, P=1×10-182 and P=7×10-130 respectively). There was little evidence of association between the chronotype GRS and the activity monitor sleep phenotypes that estimate sleep duration and fragmentation (Supplementary Table 5), indicating a specific effect of the chronotype SNPs on sleep timing and circadian metrics. Limiting the analysis to the 109 lead variants identified from the independent 23andMe GWAS provided similar results (Supplementary Table 5). Using the activity-monitor derived estimates of sleeping timing, the 5% of individuals carrying the most morningness alleles at the 292 associated loci had L5 timing shifted earlier, on average, by 25.1 minutes (95% CI: 22.5, 27.6) compared to the 5% carrying the fewest morningness alleles: a mean L5 time of 03:06 rather than 03:32. The data suggests that self-report chronotype strongly relate to an individual’s sleep timing and therefore provide valid instruments for MR and compelling insights into circadian biology.
Circadian rhythm, neuron development and cell signalling pathways, and brain and retina expressed genes are strongly enriched at the associated loci
To identify biological pathways and tissues enriched for genes at the associated loci, we used MAGMA34, implemented as part of the FUMA GWAS35 platform (Figures 2-3. Supplementary Tables 6-7). Because of the variety of methods available and databases employed, we also performed secondary gene-set and tissue enrichment using the software packages PASCAL36, MAGENTA37 and DEPICT38 (Supplementary Tables 8-11). We identified strong enrichment in circadian rhythm and circadian clock pathways as with previous morningness GWAS24–26. We also identified multiple pathways that correspond to (central) nervous system and brain development, components of neuronal cells such as synapses, axons and dendrites, as well as neurogenesis. There was clear enrichment in all types of brain tissue (Figure 4, Supplementary Tables 7 and 11), in behavioural pathways, containing genes responsible for mediating behavioural responses to internal and external stimuli, and in retinal tissue (Supplementary Table 11). The genes in the associated loci were also enriched in multiple pathways relating to the regulation and metabolism of cyclic nucleotides, such as cAMP and cGMP, as well as pathways involved in G-protein signalling and activation. The NMDA glutamate signalling pathway was also enriched and MAGMA mapped genes in this pathway include NRXN1 and RELN, which have been shown to influence risk of schizophrenia39,40, but for which there is limited evidence of a role in circadian rhythm regulation.
Fine-mapping identifies likely causal variants and genes
To highlight putative causal variants and genes, we fine-mapped the associated loci using FINEMAP41. FINEMAP uses a shotgun stochastic search to identify the most plausible causal variant configuration given the GWAS association statistics and local LD patterns and outputs posterior probabilities for each variant configuration being causal. Forty-two loci had a single variant with a probability of >50% of being causal (Supplementary Table 12). Annotation of these likely causal variants identified 10 coding variants. These include a low frequency missense variant in RGS16 (MAF=3%, morningness OR=1.26 for minor allele), previously associated with chronotype25 and the most strongly associated with morningness in this study, and missense variants in the INADL, HCRTR2, PLCL1 and CLN5 genes, all four genes having been identified in previous GWAS24,26. Fine-mapping also identified missense variants in PCYOX1 and SKOR2, and a stop gain variant in the MADD gene, as likely causal variants in these loci, highlighting new candidate genes for chronotype. The MADD stop gain variant (rs35233100) has previously been associated with levels of proinsulin42, providing a potential link between insulin secretion and chronotype. To gain further insight into additional genes that may play a role in determining chronotype, we annotated the putative causal variants using the GTEx eQTL database (Supplementary Table 12). There were 90 variants across 51 loci that were eQTLs for one or more genes, with a total of 208 mapped genes. As an example, this included a putative causal variant in the promoter of FBXO3 which represents the strongest eQTL for FBXO3. FBXO3 is in the ubiquitin-proteosome pathway; protein (de)ubiquitination has been shown to be involved with the degradation of several core clock genes43,44, influencing the build-up up these proteins and the pace of the circadian clock45,46. FBXO3 expression has been shown to be altered by light treatment and to demonstrate rhythmic expression46.
Integration of GWAS and SCN-enrichment data highlights potential circadian clock genes at associated loci
The suprachiasmatic nucleus is a small region of the brain, consisting of around 20,000 neurons, that is integral to maintaining circadian rhythms in humans and is a likely mechanism of action for at least some of the associated genes and loci. Indeed, the associated loci included many key mammal SCN clock genes including PER1, PER2, PER3, CRY1, FBXL3 and ARNTL (Supplementary Table 13). To identify new genes important in setting and modulating circadian rhythms in the SCN, we assessed expression, enrichment and fluctuation of proximal or eQTL-mapped genes using expression data from the mouse SCN. We cross-referenced all mapped genes at the fine-mapped loci against whether there was evidence for enrichment of expression in the SCN compared to other brain tissue47,48 and whether the genes demonstrated evidence of fluctuation in expression over the 24-hour cycle47 (Supplementary Table 12). We also annotated the genes against a set of 343 putative clock genes identified from RNAi knockdown experiments a human cellular clock model49 (Supplementary Table 12). Of the 22.5% of all genes tested that were enriched in the SCN48, 28.0% of the 804 genes mapped using MAGMA and present in the enrichment analysis, were enriched in the SCN representing a significant excess (P=2×10-4). As a negative control, we tested the enrichment of MAGMA-mapped genes for several unrelated GWAS phenotypes, finding no significant excess of SCN-enriched genes (all P>0.05) (Supplementary Table 14). Similar enrichment was found for those chronotype genes fluctuating in the SCN, but no significant excess from the RNAi knockdown study. Enriched and fluctuating genes from the fine-mapping efforts include known circadian genes such as FBXL3 and putative genes such as LSM7 and VIP. LSM7 encodes core components of the spliceosomal U6 small nuclear ribonucleoprotein complex for which some previous studies have suggested a role in circadian timing49,50. VIP encodes a vasoactive peptide hormone that lowers arterial blood pressure and relaxes muscles of the stomach and trachea. Evidence from mouse models indicates that it has a role in generating and light-entrainment of circadian oscillations51.
Chronotype is heritable and demonstrates strong genetic correlation with several psychiatric traits
As a strategy to prioritise traits for subsequent causal analyses, as previous studies have shown a strong correlation between genetic and phenotypic correlations52,53, and to identify genetic overlap between chronotype and other diseases and traits, we performed LD score regression analyses against a range of other diseases and traits where GWAS summary statistics are publicly available (Supplementary Table 15). We estimated the heritability of chronotype to be 13.7% (95% CI 13.3%, 14.0%), as calculated by BOLT-REML in the UK Biobank data alone, which is towards the lower end of previously reported figures (12-21%)24–26. The most genetically correlated trait was subjective well-being, which was positively correlated with being a morning person (rG=0.17, P=6×10-9). Psychiatric traits schizophrenia (rG=-0.11, P=1×10-7), depressive symptoms (rG=- 0.16; P=2×10-6), major depressive disorder (rG=-0.19; P=3×10-5) and intelligence (rG=-0.11; P=8×10-6) were all negatively correlated with the morning chronotype. Metabolic traits fasting insulin (rG=-0.09, P=0.03) and HOMA-IR (rG=-0.12, P=0.009) were negatively correlated with being a morning person but did not reach our Bonferroni-corrected significance threshold. BMI (rG=0.007, P=0.74) and T2D (rG=0.02, P=0.60) were not genetically correlated with morningness.
Mendelian randomisation (MR) analyses provide evidence for a causal link between chronotype and mental health
Genetic correlations do not allow for magnitudes of causality to be determined between an exposure and an outcome. We therefore performed two-sample Mendelian Randomisation (MR) analyses against the five psychiatric traits that showed evidence of a genetic correlation to estimate causal effects. Because of the extensive literature on the link between chronotype and metabolic disease and because the well-known SNP in FTO (rs1558902) previously associated with higher BMI54,55 was also associated with being a morning person (OR=1.04, P= 4.9×10-32), we also performed two-sample MR against the metabolic phenotypes BMI, type 2 diabetes and fasting insulin levels. For individual instrument effects on chronotype, log odds ratios (representing liability for morningness) from the secondary morning person meta-analysis were used, as no effect sizes were obtained in the primary meta-analysis. With chronotype as an exposure, we implemented the R package TwoSampleMR56 to report causal associations of chronotype on these eight outcomes (Supplementary Table 16). We saw evidence that being a morning person confers a liability to lower risk of schizophrenia and greater subjective well-being, with a genetically determined unit log odds increase in self-report morningness being associated with a liability for reduced schizophrenia (odds ratio of 0.89 (0.82, 0.96); IVW P=0.004) and higher subjective well-being (0.04SD (0.02, 0.06); IVW P=5×10-5), and with good agreement amongst the different MR methods (Figures 5-6). There was suggestive evidence that morningness decreases the liability of depression: one-unit log odds increase in morningness was associated with an odds ratio of 0.65 (0.44, 0.95; IVW P=0.03) for major depressive disorder and 0.02 SD lower (0.002, 0.04; IVW P=0.03) for depressive symptoms (Supplementary Figures 3-4) but these did not reach our multiple testing threshold of Pbonf=0.005. There was no strong statistical evidence that chronotype was causally associated with BMI, fasting insulin and risk of type 2 diabetes (IVW P>0.1), as previously reported24–26.
No evidence that schizophrenia or depression influence morningness
To assess whether our genetically-correlated phenotypes were causally influencing chronotype, we performed two-sample Mendelian Randomisation analyses with chronotype as the outcome. Owing to a limited number of genetic instruments, of the original five genetically-correlated psychiatric phenotypes we were able to test only schizophrenia and major depressive disorder, in addition to the metabolic phenotypes BMI, insulin secretion and type 2 diabetes (Supplementary table 17). We observed only weak evidence of liability effects of type 2 diabetes (IVW P=0.01), insulin secretion (IVW P=0.04) and BMI (IVW P=0.05) on chronotype. Despite strong genetic correlations with chronotype, we see no strong evidence that schizophrenia (IVW P=0.07) or major depressive disorder (IVW P=0.62) causally influence liability for morningness.
Discussion
Using data from 697,828 individuals, we have performed the largest GWAS study of chronotype and expanded the number of chronotype-associated loci from 24 to 351. Using activity monitor data from 85,760 we show that these variants are associated with measures of objective sleep timing. We confirm previously reported enrichment of circadian rhythm pathways and retina and brain expressed genes at associated loci, and demonstrate further enrichment of genes in the cAMP, cGMP, NMDA and insulin signalling pathways as well as those in pituitary gland tissue and the SCN. We fine-map the loci and provide target genes for other researchers to perform in depth functional investigation into chronobiology. We have provided more accurate genetic correlation estimates of chronotype with a range of traits and disease and provide some evidence for a causal link between chronotype and mental health.
We have found evidence that the natural variation in circadian preference amongst the human population can be ascribed to several different mechanisms. Given the prominence of genetic variants in or near multiple core circadian rhythm genes (PER1, PER2, PER3, CRY1, FBXL3, ARNTL), we infer that some of the variation is attributed to subtle differences in the biochemical feedback mechanism of the circadian clock. This is supported by evidence of the chronotype-associated loci being enriched in the SCN, suggesting that variants that also subtly affect the modification and regulation of the circadian clock contribute to the population variation of chronotype. Entrainment of circadian rhythms through external stimuli such as light and temperature is well-known but through this study and previous GWAS efforts, we find that an individual’s chronotype is also influenced by variants in genes important in the correct formation and functioning of retinal ganglion cells (RGS16 and INADL), highlighting that some natural variation could be explained by better detection and communication (to the SCN) of external light signals. Variants in genes with known roles in appetite regulation (FTO), insulin secretion (MADD) and even nicotine and caffeine metabolism (CYP2A6) point to other processes that impact an individual’s chronotype, though it is unclear whether the effect of these on chronotype are mediated through the modulation of the circadian clock or by other means, such as through sleep-wake homeostasis.
Reported observational associations of chronotype with metabolic diseases are particularly strong57,58, but we found no evidence for a causal effect of morningness on type 2 diabetes, BMI or insulin levels and could exclude the observational association effect sizes. One possibility which future studies should investigate is whether circadian misalignment, rather than chronotype itself, is more strongly associated with disease outcomes. For example, are individuals who are genetically evening people but have to wake early because of work commitments particularly susceptible to obesity and diabetes?
There are clear epidemiological associations reported in the literature between mental health traits and chronotype, with mental health disorders typically being overrepresented in evening types59–61, and in this study we show that morningness is negatively genetically correlated with both depression and schizophrenia, and positively correlated with wellbeing. Previous studies have found a link between schizophrenia and circadian dysregulation and misalignment62,63 with schizophrenics displaying greater variation in sleep and activity timing and misaligned melatonin and sleep cycles, but no evidence exists for the effect of chronotype on schizophrenia risk. Our Mendelian Randomisation analyses support a causal role of eveningness on increased risk of schizophrenia, though the statistical significance is not overwhelming strong. We do not find evidence of schizophrenia causally influencing chronotype. However, several of the mapped genes at the chronotype associated loci are well known schizophrenia loci such as NRXN1 (as well at NRXN2 and NRXN3) and RELN39,40 and subsequent studies will be necessary to understand the shared biological mechanisms between chronotype and schizophrenia risk.
Chronotype is influenced by circadian rhythms and innate sleep homeostatic mechanisms but is also dependent on societal pressures. It is also a self-report measure which means the interpretation of the phenotype and the genetic association is complicated. In this study, however, we show, using objective measures derived from activity monitor data that these chronotype variants do affect objectives measures of sleep timing, but not other aspects of sleep including duration and timing, providing evidence that we are identifying biologically meaningful associations and allowing us to quantity the effect of these variants on sleep timing.
The response to UK Biobank participation was <5% and this has resulted in selection for healthier individuals, which may introduced bias into our analyses, including in GWAS and MR64. Here GWAS results replicated those of 23andMe, a study that may also suffer from selection bias but of a different nature to UK Biobank. Adopting two-sample MR we attempted to maximise statistical power by using publicly available aggregated data based on consortia of studies that had considerably greater response rates, and avoided winner’s curse which can lead to underestimation of causal effects65. MR of a binary (or other broad category) exposure that is derived from an underlying continuous trait, as is the case with chronotype, may be biased by horizontal pleiotropy from within-category variation in the trait that cannot be identified by alternative MR methods, such as MR-Egger. As effect sizes for MR analyses were derived from log odds ratios in the secondary morning person meta-analysis, there may be the possibility of undetected pleiotropy and so our findings should therefore be treated with some caution.
In conclusion, we have identified 327 novel loci that regulate circadian rhythms and sleep timing in humans and provide new insights into the association of chronotype with disease.
Materials and Methods
Cohorts
The UK Biobank is described in detail elsewhere27. We used data on 451,454 individuals from the full UK Biobank data release that we identified as White European and that had genetic data available. To define a set of White Europeans, we performed Principal Components Analysis (PCA) in the 1000 Genomes (1KG) reference panel using a subset of variants that were of a high quality in the UK Biobank. We projected these principal components into the set of related UK Biobank participants to avoid the relatedness confounding the principal components. We then adopted a k-means clustering approach to define a European cluster, initialising the ethnic centres defined by the population-specific means of the first four 1KG principal components. This analysis was performed only within individuals self-reporting as “British”, “Irish”, “White” or “Any other white background”. Because association analyses are performed using linear mixed-model (LMM) method, we included related individuals.
We used summary statistics from a morning chronotype GWAS performed by 23andMe of 248,100 (Ncase=120,478; Ncontrol=127,622) participants with a minimum of 97% European ancestry. GWAS analysis was performed in a maximal set of unrelated participants, where pairs of individuals were considered related if they shared 700cM IBD of genomic segments, roughly corresponding to first cousins in an outbred population. The 23andMe cohort is described in more detail elsewhere24.
Activity Monitor Data
A subset of the UK Biobank cohort was invited to wear a wrist-worn activity monitor for a period of one week. Individuals were mailed the device and asked to wear it continuously for seven days, including while bathing, showering and sleeping. In total, 103,720 participants returned their activity monitor devices with data covering at least three complete 24-hour periods. We downloaded the raw activity monitor data (data-field 90001) for these individuals, in the form of binary Continuous Wave Accelerometer (cwa) files. Further information, along with details of centrally-derived variables, is available elsewhere66. Detailed protocol information can be found online at http://biobank.ctsu.ox.ac.uk/crystal/docs/PhysicalActivityMonitor.pdf and a sample instruction letter at http://biobank.ctsu.ox.ac.uk/crystal/images/activity_invite.png (UKB Resources 131600 and 141141, respectively; both accessed January 30th 2018). We converted the .cwa files to .wav format using the open-source software “omconvert”, recommended by the activity monitor manufacturers Axivity, which is available online (see https://github.com/digitalinteraction/openmovement/tree/master/Software/AX3/omconvert). To process the raw accelerometer data, we used the freely available R package “GGIR” (v1.5-12)67,68. The list of our GGIR settings is provided in Supplementary File 1 and the full list of variables produced by GGIR can be found in the CRAN GGIR reference manual (see https://cran.r-project.org/web/packages/GGIR/GGIR.pdf).
Genotyping and quality control
The 23andMe cohort was genotyped on one of four custom arrays: the first two were variants of the Illumina HumanHap550+ BeadChip (Ncase=4,966; Ncontrol=5,564), the third a variant of the Illumina OmniExpress+ BeadChip (Ncase=53,747; Ncontrol=61,637) and the fourth a fully custom array (Ncase=61,765; Ncontrol=60,421). Successive arrays contained substantial overlap with previous chips. These genotypes were imputed to ∼15.6M variants using the September 2013 release of the 1000 Genomes phase 1 reference panel. For analyses, we used ∼11.9M imputed variants with imputation r2 ≥ 0.3, MAF ≥ 0.001 (0.1%) and that showed no sign of batch effects.
The UK Biobank cohort was genotyped on two almost identical arrays. The first ∼50,000 samples were genotyped on the UK BiLEVE array and the remaining ∼450,000 samples were genotyped on the UK Biobank Axiom array in two groups (interim and full release). A total of 805,426 directly-genotyped variants were made available in the full release. These variants were centrally imputed to ∼93M autosomal variants using two reference panels: a combined UK10K and 1000 Genomes panel and the Haplotype Reference Consortium (HRC) panel. For all analyses, we used ∼12.0M Haplotype Reference Consortium (HRC) imputed variants with an imputation r2 ≥ 0.3, MAF ≥ 0.001 (0.1%) and with a Hardy– Weinberg equilibrium P>1×10-12. We excluded non-HRC imputed variants on advice from the UK Biobank imputation team. Further details on the UK Biobank genotyping, quality control and imputation procedures can be found elsewhere28.
Self-Report Phenotypes
Morning Person (23andMe)
Responses to two identical questions were used to define the dichotomous morning person phenotype in the 23andMe cohort, with one question having a wider selection of neutral options. More details are given in Supplementary Table 2 of the 23andMe morning person GWAS24. Morning people were coded as 1 (cases; N=120,478) and evening people were coded as 0 (controls; N=127,622).
Chronotype (UK Biobank)
The UK Biobank collected a single self-reported measure of Chronotype (“Morning/evening person (chronotype)”; data-field 1180). Participants were prompted to answer the question “Do you consider yourself to be?” with one of six possible answers: “Definitely a ‘morning’ person”, “More a ‘morning’ than ‘evening’ person”, “More an ‘evening’ than a ‘morning’ person”, “Definitely an ‘evening’ person”, “Do not know” or “Prefer not to answer”, which we coded as 2, 1, −1, −2, 0 and missing respectively (distribution summarised in Table 1). Prior to association testing, we adjusted the phenotype for age, gender and study centre (categorical). Of the 451,454 white European participants with genetic data, 449,734 were included in the GWAS (had non-missing phenotype and covariates).
Morning Person (UK Biobank)
In order to provide interpretable odds ratios for our genome-wide significant variants, we also defined a binary phenotype using the same data-field as for Chronotype. Participants answering “Definitely an ‘evening’ person” and “More an ‘evening’ than a ‘morning’ person” were coded as 0 (controls) and those answering “Definitely a ‘morning’ person” and “More a ‘morning’ than ‘evening’ person” were coded as 1 (cases). Participants answering “Do not know” or “Prefer not to answer” were coded as missing. A total of 403,195 participants were included in the GWAS (252,287 cases and 150,908 controls).
Activity monitor Phenotypes
Identifying the sleep period window
The software package GGIR68,69 produces quantitative and timing measures relating to both activity levels and sleep patterns, with a day-by-day breakdown, as well averages across the period of wear. A new algorithm, implemented in version 1.5-12 of the GGIR R package and validated using PSG in an external cohort70, allows for detection of sleep periods without the use of a sleep diary and with minimal bias. Briefly, for each individual, median values of the absolute change in z-angle (representing the dorsal-ventral direction when the wrist is in the anatomical position) across 5-minute rolling windows were calculated across a 24-hour period, chosen to make the algorithm insensitive to activity monitor orientation. The 10th percentile was incorporated into the threshold to distinguish movement from non-movement. Bouts of inactivity lasting ≥30 minutes are recorded as inactivity bouts. Inactivity bouts that are <60 minutes apart are combined to form inactivity blocks. The start and end of longest block defines the start and end of the sleep period time-window (SPT-window).
Activity monitor exclusions and adjustments
The UK Biobank made multiple activity monitor data-quality variables available. From our activity monitor phenotypes, we excluded 4,925 samples with a non-zero or missing value in data field 90002 (“Data problem indicator”). We then excluded any individuals with the “good wear time” flag (field 90015) set to 0 (No), “good calibration” flag (field 90016) set to 0 (No), “calibrated on own data” flag (field 90017) set to 0 (No), “data recording errors” (field 90182) > 788 (Q3 + 1.5×IQR) or a non-zero count of “interrupted recording periods” (field 90180). Phenotypes determined using the SPT-window (all phenotypes except L5 and M10 timing) had additional exclusions based on short (<3 hours) and long (>12 hours) mean sleep duration and too low (<5) or too high (>30) mean number of sleep episodes per night (see below). These additional exclusions were to ensure that individuals with extreme (outlying), and likely incorrect, sleep characteristics were not included in any subsequent analyses. Prior to association testing, we adjusted all phenotypes for age activity monitor worn (derived from month and year of birth and date activity monitor worn), gender, season activity monitor worn (categorical; winter, spring, summer or autumn; derived from date activity monitor worn) and number of valid measurements (SPT-windows for sleep phenotypes, number of valid days for diurnal inactivity or number of L5 or M10 detections).
Sleep midpoint (UK Biobank)
Sleep midpoint was calculated as the time directly between the start and end of the SPT-window and is defined as the number of hours elapsed since midnight at the start of the calendar day on which the STP-window started (e.g. 02:30 = 26.5; 23:45 = 23.75) with a cut-off at midday (12:00 and 36:00). This takes account of participants whose sleep midpoint occurs before midnight. Our sleep midpoint phenotype represented the average of each participant over all their valid SPT-windows. After exclusions and adjustments, 84,810 participants had valid sleep midpoint, covariates and genetic data.
L5 and M10 timing (UK Biobank)
L5 and M10 refer to the least-active five and the most-active ten hours of each day, and are commonly studied measures relating to circadian activity and sleep. L5 (M10) defines a five-hour (ten-hour) daily period of minimum (maximum) activity, as calculated by means of a moving average with a five-hour (ten-hour) window. As with sleep midpoint, we defined our L5 (M10) timing phenotype as the number of hours elapsed from the previous midnight to the L5 (M10) midpoint, averaged over all valid wear days. Of the 103,711 participants with activity monitor data, there were 85,205 and 85,670 with valid L5 and M10 timing measures respectively, covariates and genetic data. Basic summaries of these and other raw activity monitor phenotypes are given in Supplementary Table 3.
Sleep duration (UK Biobank)
Sleep episodes within the SPT-window were defined as periods of at least 5 minutes with no change larger than 5° associated with the z-axis of the activity monitor, as described previously68. The summed duration of all sleep episodes provided the sleep duration for a given SPT-window. We took both the mean and standard deviation of sleep duration across all valid SPT-windows to provide a measure of average sleep quantity and a measure of variability. After exclusions and adjustments, we had 85,449 (84,441) participants with valid sleep duration mean (SD), covariates and genetic data.
Sleep efficiency (UK Biobank)
This was calculated as a ratio of sleep duration (defined above) to SPT-window duration. The phenotype represented the mean across all valid SPT-windows and after exclusions and adjustments, left us with 84,810 participants with valid sleep efficiency, covariates and genetic data.
Number of sleep episodes (UK Biobank)
This is defined as the number of sleep episodes of at least 5 minutes separated by at least 5 seconds of wakefulness within the SPT-window. The phenotype represents the mean across all SPT-windows and can be interpreted as a measure of sleep disturbance or fragmentation. After exclusions and adjustments, we had 84,810 participants with valid sleep efficiency, covariates and genetic data.
Diurnal inactivity duration (UK Biobank)
The total daily duration of estimated bouts of inactivity that fall outside of the SPT-window. This comprises the total length of periods of sustained inactivity (>5 minutes) and captures sleep (naps) but does not include other inactivity such as sitting and reading or watching television, which typically involve a detectable level of movement. This variable likely captures some non-sleep rest as it is impossible to separate these without detailed activity diaries. The phenotype is calculated as the mean across all valid days and, after exclusions and adjustments, we were left with 84,757 participants with a valid measure, covariates and genetic data.
Genome-wide association analysis
We performed all association test using BOLT-LMM71 v2.3, which applies a linear mixed model (LMM) to adjust for the effects of population structure and individual relatedness, and allowed us to include all related individuals in our white European subset, boosting our power to detect associations. This meant a sample size of up to 449,734 individuals, as opposed to the set of 379,768 unrelated individuals. BOLT-LMM approximates relatedness within a cohort by using LD blocks and avoids the requirement of building a genetic-relationship matrix (GRM), with which calculations are intractable in cohorts of this size. From the ∼805,000 directly-genotyped (non-imputed) variants available, we identified 524,307 “good-quality” variants (bi-allelic SNPs; MAF≥1%; HWE P>1×10-6; non-missing in all genotype batches, total missingness<1.5% and not in a region of long-range LD72) that BOLT-LMM used to build its relatedness model. For LD structure information, we used the default 1000 Genomes LD-Score table provided with the software. We forced BOLT-LMM to apply a non-infinitesimal model, which provides better effect size estimates for variants with moderate to large effect sizes, in exchange for increased computing time. Prior to association testing, continuous phenotypes were first adjusted for relevant covariates, as indicated above, and at runtime we included “release” (categorical; UKBiLEVE array, UKB Axiom array interim release and UKB Axiom array full release) as a further covariate. The binary morning person phenotype was adjusted at runtime for age, gender, study centre and release.
In the 23andMe morning person GWAS, the summary statistics were generated through logistic regression (using an additive model) of the phenotype against the genotype, adjusting for age, gender, the first four principal components and a categorical variable representing genotyping platform. Genotyping batches in which particular variants failed to meet minimum quality control were not included in association testing for those variants, resulting in a range of sample sizes over the whole set of results. A AGC of 1.325 was reported for this GWAS. Lead variants for the 23andMe only morning person GWAS are provided in Supplementary Table 18.
Sensitivity Analysis
To avoid issues with stratification, we performed a sensitivity GWAS, in UK Biobank alone, to assess whether any of the associations were driven by a subset of the cohort with specific conditions. We excluded those reporting shift or night shift work at baseline, those taking medication for sleep or psychiatric disorders and those with either with a HES ICD10 or self-reported diagnosis of depression, schizophrenia, bipolar disorder, anxiety disorders or mood disorder (see Supplementary Methods for further details). Results for the 341 lead chronotype variants available in the UK Biobank are provided in Supplementary Table 1 alongside the main meta-analysis results.
Meta-analysis of GWAS results
Meta-analysis was performed using the software package METAL73. To obtain the largest possible sample size, and thus maximising statistical power, we performed a sample-size meta-analysis, using the results from the UK Biobank chronotype GWAS and the 23andMe morning person GWAS. Genomic control was not performed on each set of summary statistics prior to meta-analysis but instead the meta-analysis chi-squared statistics were corrected using the LD score intercept (ILDSC = 1.0829), calculated by the software LDSC, as using λGC is considered overly conservative and the LD score intercept better captures inflation due to population stratification74. For interpretable results, we reported the odds ratio from a secondary effect size meta-analysis between our dichotomous UK Biobank morning person GWAS and the 23andMe morning person GWAS. The primary chronotype sample-size meta-analysis produced results for 15,880,941 variants in up to 697,828 individuals, with the secondary effect-size morning person meta-analysis producing results for 15,880,664 variants in up to 651,295 individuals (372,765 cases and 278,530 controls).
Post-GWAS analyses
Pathway analysis and tissue-enrichment
We used MAGENTA37, DEPICT38, PASCAL36 and MAGMA34 to perform pathway and tissue enrichment. For MAGENTA and DEPICT, we included all variants from the meta-analysis, whereas for PASCAL, we included only those with an RSID as the software assigns variants to genes using their RSID. For the MAGENTA analysis, we used upstream and downstream limits of 110Kb and 40Kb to assign variants to genes by position, we excluded the HLA region from the analysis and set the number of permutations for gene-set enrichment analysis (GSEA) to 10,000. For DEPICT, we used the default settings and the annotation and mapping files provided with the software. As each of the four pieces of software adopts a different gene prioritisation method or relies on different databases, we included results from all four to cover all bases and to allow for better comparison with other studies, where only a single method may have been used. Briefly PASCAL corrects for the effect of LD blocks by accounting for the LD structure between associated variants, MAGENTA uses distance-based mapping but allows the user to set the upstream and downstream distances for inclusion, DEPICT makes use of large-scale data on gene co-regulation to prioritise genes before calculating enrichment in its own reconstituted gene sets and MAGMA, the most recent method (and implemented in the FUMA GWAS35 platform), claims greater statistical power to detect enriched gene sets than methods such as MAGENTA and PASCAL, without affecting the type 1 error rate. By using multiple methods and looking for consistency, we provide more compelling evidence of enrichment in specific pathways and tissues.
Genetic correlation and heritability analyses
We used the LD Score Regression (LDSC) software, available at https://github.com/bulik/ldsc/, to quantify the genetic overlap between the trait of interest and 222 traits with publicly available GWA data. Details of methodology are available elsewhere74. We considered any correlation as statistically significant if it had a Bonferroni corrected P<0.05.
Fine-mapping association signals
Fine-mapping analyses were performed using FINEMAP v1.141 using a shotgun stochastic approach, allowing up to 20 causal SNPs at each locus and by focussing on a 1Mb (±500Kb) region around each index variant. As FINEMAP assumes a fixed sample size for all variants, we excluded variants not present in both the UK Biobank and 23andMe data, and to make the LD calculations more tractable we excluded variants with P>0.01 to limit the total number of variants at each locus. We constructed an LD matrix for each locus by calculating the Pearson correlation coefficient for all pairs of variants using dosages derived from the unrelated European-ancestry subset of the UK Biobank imputed genotype probabilities (N=379,769). A variant was considered to be causal if its log10 Bayes factor was 2 or larger, a limit recommended by the FINEMAP documentation (http://www.christianbenner.com/index_v1.1.html).
Alamut annotation, eQTL mapping and circadian enrichment analyses
We annotated variants identified by FINEMAP as likely to be causal using Alamut Batch v1.8 (Interactive Biosoftware, Rouen, France) with genome assembly GRCh37 and all options set to default. We retained only the canonical (longest) transcript for each variant and reported the variant location and coding effect (if applicable) in this transcript. To identify whether variants were cis-eQTLs for nearby genes, we performed a lookup of our variants in the GTEx single-tissue cis-eQTL dataset (v7), accessed at the GTEx portal (https://www.gtexportal.org/home/datasets) on 13/07/18, for significant associations. A variant was reported as an eQTL for a gene if the variant-gene association was significant (q-value ≤ 0.05) for one or more brain or non-brain tissues.
With the aim of highlighting genes that have a role in regulating the internal circadian clock, we cross-referenced the genes identified by eQTL mapping, in addition to the two nearest genes (within 1Mb), with catalogues from three gene expression studies. Firstly, we used data from an RNAi screen of circadian clock modifiers49, in which a genome-wide scan was performed on the effects of single-gene knockouts on the amplitude and period of the circadian expression. Secondly, we used data from a study of gene expression in SCN tissue over a 24-hour light/dark cycle47 to identify whether our genes exhibit fluctuating expression in SCN tissue and whether the genes show enriched expression in the SCN compared to other tissues. Finally, we used data from a meta-analysis of gene expression in the SCN48 to investigate whether the genes were preferentially expressed in the SCN when compared to other brain tissues.
Mendelian Randomisation analyses
We undertook MR analyses to explore both the effect of chronotype on different outcomes and the effect of different exposures on chronotype as an outcome. These two-sample MR analyses can be summarised by:
Chronotype exposure using the 351 variants and effect sizes discovered in this meta-analysis against the five significant psychiatric outcomes from the genetic correlation analyses and three metabolic outcomes, using summary data from published GWAS (Supplementary Table 16).
Two of the five significant psychiatric exposures from the genetic correlation analyses and four metabolic exposures, all using variants from published GWAS, against chronotype as an outcome, using summary data from this meta-analysis (Supplementary Table 17).
In both analyses, we tested four MR methods:
Analysis 1 (chronotype exposure) was performed using the R package TwoSampleMR using aggregated summary statistics available through the MR-Base platform56. We implemented the four MR methods listed above and also included the MR-Egger bootstrap to provide better estimates of the effect sizes and standard errors as compared to the MR-Egger method. We used data from published GWAS to test the effect of chronotype on the following exposures: schizophrenia77, major depressive disorder78, depressive symptoms79, subjective wellbeing79, PGC cross-disorder traits80, fasting insulin81, BMI55,82 and T2D83,84. To provide meaningful effect sizes for MR analyses, we used betas from the secondary effect size meta-analysis of the dichotomous UK Biobank and 23andMe morning person GWAS.
For analysis 2 (chronotype outcome) we applied the four MR methods listed above, utilising a custom pipeline. Using data from published GWAS, we tested whether chronotype is influenced by the following exposures: schizophrenia77, major depressive disorder78, insulin secretion85, favourable adiposity86, BMI55 and T2D87. As with analysis 1, chronotype effect sizes represented morningness liability and were taken from the secondary morning person meta-analysis.
We used the inverse variance weighted approach as our main analysis method and MR-Egger, weighted median estimation and penalised weighted median estimation as sensitivity analyses in the event of unidentified pleiotropy of our genetic instruments. MR results may be biased by horizontal pleiotropy, i.e. where the genetic variants that are robustly related to the exposure of interest (here chronotype) independently influence the outcome, through association with another risk factor for the outcome. IVW assumes that there is either no horizontal pleiotropy (under a fixed effect model) or, if implemented under a random effects model after detecting heterogeneity amongst the causal estimates, that:
The strength of association of the genetic instruments with the risk factor is not correlated with the magnitude of the pleiotropic effects
The pleiotropic effects have an average value of zero
MR-Egger provides unbiased causal estimates if just the first condition above holds, by estimating and adjusting for non-zero mean pleiotropy. However, MR-Egger requires that the InSIDE (Instrument Strength Independent of Direct Effect) assumption88 holds, in that it needs the pleiotropy of the genetic instruments to be uncorrelated with the instruments’ effect on the exposure. The weighted median approach is valid if less than 50% of the weight in the analysis stems from variants that are pleiotropic (i.e. no single SNP that contributes 50% of the weight or a number of SNPs that together contribute 50% should be invalid because of horizontal pleiotropy). Given these different assumptions, if all methods are broadly consistent this strengthens our causal inference. Additional care should be taken interpreting results from binary exposures or outcomes, as these MR methods assume that horizontal pleiotropy due to within-category variation of dichotomous or categorical traits is negligible.
Data availability
Summary statistics for the top 10,000 chronotype meta-analysis variants are provided in Supplementary Table 13 and the full set of UK Biobank-only chronotype and morning person GWAS summary statistics can be found at http://www.t2diabetesgenes.org/data/. Full meta-analysis summary statistics can be requested directly from 23andMe Inc. (see https://research.23andme.com/collaborate/#publication).
Funding Information
S.E.J. is funded by the Medical Research Council (grant: MR/M005070/1) M.A.T., M.N.W. and A.M. are supported by the Wellcome Trust Institutional Strategic Support Award (WT097835MF). A.R.W., T.M.F and H.Y. are supported by the European Research Council grants: SZ-245 50371-GLUCOSEGENES-FP7-IDEAS-ERC and 323195. R.M.F. is a Sir Henry Dale Fellow (Wellcome Trust and Royal Society grant: 104150/Z/14/Z). R.B. is funded by the Wellcome Trust and Royal Society grant: 104150/Z/14/Z. J.T. is funded by a Diabetes Research and Wellness Foundation Fellowship. The funders had no influence on study design, data collection and analysis, decision to publish, or preparation of the manuscript. JB and DAL work in a Unit that receives support from the University of Bristol and UK Medical Research Council (MC_UU_00011/3 and MC_UU_00011/6, respectively). DAL’s contribution to this work is also supported by the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement (Grant number 669545; DevelopObese). EMB is supported by grants from the National Health and Medical Research Council of Australia 1145645, 1078901 and 1087889
Duality of Interest
DAL receives support from Roche Diagnostics and Medtronic for research that is unrelated to this study. MKR reports receiving honoraria and consulting fees from Novo Nordisk, Ascensia, Cell Catapult and Roche Diabetes Care.
Supplementary Figure 1. Scatter plot of UK Biobank chronotype GWAS effect sizes for all 341 lead variants present in UK Biobank, against their effect sizes in the UK Biobank chronotype sensitivity GWAS. The dashed line indicates identical effect in both analyses.
Supplementary Figure 2. FUMA regional plots identifying LD patterns between chronotype-associated variants in the a) RGS16, b) PER2, c) PER3, d) PIGK/AK5, e) INADL, f) HCRTR2, g) HTR6, h) PER1, i) CRY1 and j) ARNTL loci. LD r2 of each variant with the lead is indicated by the colour of the points, with mapped genes highlighted beneath in red.
Supplementary Figure 3. Scatter plot of Chronotype meta-analysis variants and their effects on major depressive disorder in the PGC GWAS78 (outcome) versus odds of being a morning person (exposure). Lines identify the slopes of the five methods tested. Log odds (and SEs) for morningness were taken from the secondary effect-size meta-analysis.
Supplementary Figure 4. Scatter plot of Chronotype meta-analysis variants and their effects on depressive symptoms in the SSGAC GWAS79 (outcome) versus odds of being a morning person (exposure). Lines identify the slopes of the five methods tested. Log odds (and SEs) for morningness were taken from the secondary effect-size meta-analysis.
Supplementary File 1. R wrapper script used to call GGIR to process the converted (.wav) UK Biobank activity monitor files. All settings, except file locations, are identical to those used in this study.
Acknowledgements
This research has been conducted using the UK Biobank Resource (application 9072). We would like to thank the research participants and employees of 23andMe for making this work possible. We also wish to acknowledge the Genotype-Tissue Expression (GTEx) Project, supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS, for providing access to the data we required.
Contributor list for the 23andMe Research Team: Michelle Agee, Babak Alipanahi, Adam Auton, Robert K. Bell, Katarzyna Bryc, Sarah L. Elson, Pierre Fontanillas, Nicholas A. Furlotte, David A. Hinds, Karen E. Huber, Aaron Kleinman, Nadia K. Litterman, Jennifer C. McCreight, Matthew H. McIntyre, Joanna L. Mountain, Elizabeth S. Noblin, Carrie A.M. Northover, Steven J. Pitts, J. Fah Sathirapongsasuti, Olga V. Sazonova, Janie F. Shelton, Suyash Shringarpure, Chao Tian, Joyce Y. Tung, Vladimir Vacic, and Catherine H. Wilson.
Footnotes
↵* Equal Contributions
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵