Abstract
Expression quantitative trait loci (eQTL) data have proven important for linking non-coding loci to protein-coding genes. But eQTL studies rarely measure microRNAs (miRNAs), small noncoding RNA known to play a role in human brain development and neurogenesis. Here, we perform small-RNA sequencing across 212 mid-gestation human neocortical tissue samples, profiled 907 miRNAs, discovering 111 novel early brain-expressed miRNAs, and identified 85 local-miRNA-eQTLs. Colocalization of miRNA-eQTLs with GWAS summary statistics yielded one robust colocalization of miR-4707-3p expression with educational attainment and head size phenotypes, where the miRNA expression increasing allele was associated with decreased head size. Exogenous expression of miR-4707-3p in primary human neural progenitor cells led to increased proliferative and neurogenic gene markers, indicating miR-4707-3p modulates progenitor proliferation. Integrating miRNA-eQTLs with existing GWAS yielded discovery of a miRNA modulating developmental fate decisions that alter human brain size.
Introduction
Genome-wide association studies (GWAS) have identified many genetic loci influencing human behavior, cognition, and brain structure1–5. Expression quantitative trait loci (eQTL) data is often used to link non-coding brain-trait associated loci with genes that putatively mediate their effects6–8. Brain eQTL studies are most often conducted in bulk adult post-mortem tissue and are focused on measuring mRNA expression levels from protein-coding genes. Though these methods have been successful in linking a subset of non-coding brain-trait associated loci to genes, there may be multiple mechanisms by which a single locus influences a complex trait and many loci are still unlinked to genes9–14. This suggests other types of RNAs, unmeasured in previous eQTL studies, may be mediating the genetic associations.
MicroRNAs (miRNAs) are poorly measured in standard eQTL studies because the library preparation methods effectively remove small RNAs. Library preparation methods have been developed that specifically measure small RNAs and allow measurement of miRNA expression in large sample sizes necessary for miRNA-eQTL studies. To date, relatively few miRNA-eQTL studies have been published, and those that have, are often underpowered or in tissues not directly implicated in brain-related phenotypes15–19. Given evidence that miRNAs are strongly involved in fate decisions during neurogenesis and brain development, there is an increasing need to understand the genetic basis by which miRNAs are regulated20–23.
Previous studies have found enrichment of brain structure and cognition GWAS heritability within regulatory elements active during mid-fetal development1,24,25. Mapping mRNA-eQTLs in human mid-gestation cortical tissue or neural progenitor cells derived from that tissue has revealed novel developmentally-specific colocalizations with brain structure and cognitive traits8,9. These findings are consistent with the radial unit hypothesis, which posits that increases in size of the neural progenitor pool, present only in mid-gestation, leads to increases in the size of the cortex26,27. The discovery of miRNA-eQTLs during prenatal cortical development may highlight additional molecular mechanisms by which non-coding loci influence brain-related traits.
In this study, we performed a local-miRNA-eQTL analysis in 212 mid-gestation human cortical tissue donors to discover the common genetic variation associated with expression of nearby miRNAs (Figure 1A). We identified 85 local-miRNA-eQTLs (variant - miRNA pairs) associated with expression of 70 miRNAs. These miRNAs were often found within host mRNAs (49 of 70 miRNAs), but the genetic signal associated with miRNA expression was seldom colocalized with a signal associated with mRNA expression (observed only in 3 of 49 loci). One robust colocalization was detected between a miRNA-eQTL for miR-4707-3p expression and GWAS signals for educational attainment and head size phenotypes. Experimental manipulation of miR-4707-3p expression within primary human neural progenitors during proliferation showed miR-4707-3p increased the number of proliferating cells. This example confirms the utility of miRNA-eQTLs in understanding how genetic variation may influence brain-related traits through regulation of miRNA expression.
Results
MicroRNA expression profiling
We profiled the expression of miRNAs across 223 fetal cortical tissue samples from donors between 14 and 21 gestation weeks using high-throughput small-RNA-sequencing. We used a specialized miRNA quantification algorithm, implemented in the miRge 2.0 package28, to measure the expression of known miRNAs cataloged in miRBase release 22 (March 2018)29. Combined with total-RNA-sequencing using an rRNA depletion based library preparation in these same samples, collected in a previous study9, we used rigorous quality control criteria (Methods) to eliminate 11 samples that were possible swaps, mixtures, or expression outliers (Extended Data Figure 1A-C). Following batch effect correction for known technical confounding variables (Extended Dat Fig 1A-B), a principal component analysis (PCA) on miRNA expression across the 212 remaining samples revealed the primary variation between samples was driven by gestation week at time of sample collection (Figure 1B). This finding is consistent with tissue composition differences between tissue from early gestation, composed of a greater proportion of neural progenitor cells, and late gestation, composed of a greater proportion of neurons30,31. This analysis shows that after controlling for known technical variation, miRNA expression captures expected biological variability across these samples.
We conducted a differential expression analysis to identify the miRNAs changing across gestation weeks (Figure 1C). We found 269 miRNAs positively correlated and 246 miRNAs negatively correlated with gestation week (false discovery rate (FDR) < 10%, Supplementary Table 2). Examples include miR-92b, which has known roles in maintaining stem cell proliferation and is higher expressed in neural progenitors relative to differentiated neurons32,33. By contrast, miR-124 is known to be higher expressed in post-mitotic neurons relative to neural progenitors and plays a role in promoting neuronal differentiation34,35. Consistent with these roles, we observed miR-92b was significantly upregulated in early gestation week samples, and miR-124 was significantly upregulated in late gestation week samples (Figure 1D and 1F). Furthermore, validated mRNA targets of miR-92b (SMAD7, TSC1, PER2, and CDKN1C) show differential expression between early and late gestation week samples consistent with targeting and downregulation of mRNA by a miRNA (Figure 1E)36–41. Validated mRNA targets of miR-124 (RHOA, PTBP1, ACTL6A, and SP1) also show consistent expression patterns in early and late gestation week samples (Figure 1G)34,42–45.This differential expression analysis and the expression patterns of known miRNA and mRNAs are expected given known cell-type compositions of cortical tissue during neuronal differentiation over developmental time.
In addition to quantifying the expression of known miRNAs in miRBase release 22, we quantified the expression of recently discovered miRNAs from studies by Friedländer et al46 (72 miRNAs from the Friedländer dataset were detected in this study) and Nowakowski et al20 (7 miRNAs from the Nowakowski dataset were detected in this study). Finally, using two annotation packages, miRge 2.028 and miRDeep247, we discovered 111 putatively novel miRNAs that were not previously annotated in miRBase release 22, Friedländer et al, nor Nowakowski et al (Supplementary Table 1). Novel miRNAs discovered in this study showed sequencing read coverage plots consistent with known miRNAs and many were differentially expressed between early and late gestation week samples (Extended Data Figure 2). This represents a novel resource of miRNAs that may not have been previously detected due to unique expression of these miRNAs in developing brain tissue or lower read depth and sample size obtained in previous studies. Though these novel miRNAs have characteristic sequencing read patterns consistent with known miRNAs, they require validation to demonstrate they function as miRNAs48.
Local-miRNA-eQTLs
Genotyping information from each of the 212 remaining donors revealed a sample of diverse ancestry (Extended Data Figure 1E). Following TOPMed mixed-ancestry imputation, 12.4 million genetic variants were combined with the expression of 907 known and novel miRNAs across 212 fetal cortical tissue samples to perform a local-miRNA-eQTL analysis (Figure 1A)49,50. To control for population stratification in our association testing, we used a mixed-effects linear model which included a kinship matrix as a random effect and 10 genotype principal components (PCs) as fixed effects51,52. We included 10 miRNA-expression PCs, technical variables such as sequencing-pool and RNA integrity number (RIN score), and the biological variables of sex and gestation week as additional fixed effect covariates in the model.
Following stringent local and global multiple testing correction (Methods), we identified 70 miRNAs with a local-eQTL, hereafter referred to as emiRs, using a hierarchical multiple comparisons threshold (FDR <5%, see Methods). Of these primary eQTLs, we identified an additional 14 loci with secondary eQTLs and one tertiary eQTL for a total of 85 conditionally-independent local-miRNA-eQTLs (Figure 2A). Of the 70 emiRs, one miRNA was cataloged in Friedländer et al, two in Nowakowski et al, and eight were novel miRNAs discovered within this fetal tissue dataset (Supplementary Table 3). To assess enrichments in functionally annotated genomic regions, we also used a relaxed, global-only, multiple testing correction threshold (see Methods) which increased the number of local-miRNA-eQTLs to 200 across 153 emiRs (153, 30, 13, 3, and 1 eQTLs of degree primary, secondary, tertiary, quaternary, and quinary respectively). Discovery of these local-miRNA-eQTLs shows that genetic variation influences miRNA expression in the developing cortex, including the expression of previously unannotated miRNAs.
To characterize whether these miRNA-eQTLs are found in functionally annotated regions of the genome, we assessed whether eQTL signals were enriched in chromatin annotations that were from fetal tissue that were previously separated into male and female sexes53. We identified significant enrichments of miRNA-eQTLs within active transcription start sites (TssA) and chromatin associated with strong transcription (Tx), weak transcription (TxWk), enhancers (Enh), and ZNF genes and repeats (ZNF/Rpts). There was also a significant depletion of miRNA-eQTLs within quiescent chromatin (Quies) (Figure 2B). These enrichments were robust to either stringent or relaxed multiple testing correction methods used to declare significant eQTLs (Extended Data Figure 3). Enrichment of miRNA-eQTL signals within transcribed chromatin is expected given that most miRNAs are found within hosts or immediately adjacent to genes54.
Colocalization of miRNA-eQTLs with mRNA-e/sQTLs
Since over 50% of miRNAs are found within host genes, we classified the miRNA-eQTLs based on whether the emiR is located within a host gene or intergenically54. Of the 70 emiRs at the stringent significance threshold, 49 are located within a host gene (100 of 153 emiRs are within hosts using the relaxed threshold). We found that miRNA-eQTLs are often close to their emiRs, and this trend is consistent whether or not the emiR is within a host gene (Figure 3A).
To further character these miRNA-eQTLs, we conducted a colocalization analysis to discover if the same genetic variants regulating miRNA expression also regulate mRNA expression and splicing. mRNA-eQTLs and mRNA-sQTLs were discovered in an expanded set of fetal tissue samples largely overlapping with those samples used in our miRNA-eQTL analysis8,9. We found 17 and 12 colocalizations with eQTL and sQTLs, respectively (Supplementary Table 4). For emiRs within a host mRNA, we observed that miRNA expression was often positively correlated with mRNA expression (Figure 3B). Of these emiRs within hosts, we found 3 with mRNA-eQTL colocalizations and 4 with mRNA-sQTL colocalizations. Interestingly, expression of the few emiRs with a co-localized mRNA-eQTL were positively correlated with expression of their mRNA hosts, while expression of the few emiRs with a co-localized mRNA-sQTL were negatively correlated with their host mRNA expression.
This phenomenon is highlighted by a colocalization between a miRNA-eQTL of hsa-miR-1307-5p with an mRNA-sQTL for ATP5MK (Figure 3C). Hsa-mir-1307 sits within exon three of the 5’ UTR of ATP5MK. In our dataset, we found evidence for five distinct intron excisions within the 5’ UTR of ATP5MK (labeled SpliceA-SpliceE). Intron excision was quantified as percent spliced in (PSI) and normalized across all junctions in this cluster. Among these splice sites, we observed an association between genotypes (rs7911488 genotypes A and G) and splice site utilization at SpliceA and SpliceD. This same variant was associated with expression of miR-1307-5p (Figure 3D). The G allele was associated with an increased utilization of SpliceA and SpliceD, while these same samples showed a decreased expression of miR-1307-5p. These data are consistent with biogenesis of miR-1307-5p from an exon of its host gene, ATP5MK. Removal of exon three of ATP5MK results in more miR-1307-5p. In this case, common genetic variation influences on both splicing and miRNA expression led to an understanding of the miRNA biogenesis.
miRNA-eQTL tissue specificity
We next sought to quantify the degree to which our fetal cortical tissue miRNA-eQTLs are distinct from miRNA-eQTLs discovered in other tissues. We compared our brain miRNA-eQTLs (70 emiRs at the stringent testing correction threshold) to those of a large eQTL analysis in blood15. Of the 76 and 70 total emiRs in blood and brain tissue respectively, most are unique to a given tissue (65 and 54 are unique to blood and brain respectively; Figure 4A). There are only 11 miRNAs that are emiRs in both blood and brain tissue. Of these emiRs present in both blood and brain, only three of these eQTLs have a colocalized genetic signal, implying the same causal variants affect miRNA expression in both tissues. The remaining eight emiRs found in both tissues do not share causal variants and therefore have tissue-specific genetic mechanisms regulating expression of these miRNAs. This shows that miRNA-eQTLs between blood and brain are highly tissue-specific, and the genetic variants regulating expression of shared emiRs can also be tissue-specific.
To further characterize the tissue-specificity of miRNA-eQTLs, we calculated the fraction of brain variant/miRNA pairs that are true associations within blood miRNA-eQTLs (π1)55. Of the 76 blood emiRs, 52 were expressed at sufficient levels within our brain samples to test for genetic association. The fraction of blood associations also found in brain was 0.23 (+/- 0.2 95% conf. int.) (Figure 4B). By comparison, the fraction of mRNA-eQTLs from blood tissue that are true associations in brain was 0.51 (+/- 0.04 95% conf. int.)56. This provides further evidence that miRNA-eQTLs have stronger tissue specificity as compared to mRNA-eQTLs.
miR-4707-3p is implicated in brain size and cognitive ability
To determine if our miRNA-eQTLs may explain molecular mechanisms underlying disease risk imparted via GWAS loci associated with brain disorders or inter-individual differences in brain traits, we performed colocalization analyses between our 85 stringently-defined local-miRNA-eQTLs and 21 GWAS summary statistics (Supplementary Table 4). We discovered one robust colocalization between an eQTL for miR-4707-3p and multiple brain phenotypes, including educational attainment, head size, and a subthreshold association for cortical surface area (Figure 5A). The eQTL for miR-4707-3p expression (rs4981455, alleles A/G) also co-localizes with an mRNA-eQTL for HAUS4 expression. Hsa-mir-4707 is located within the 5’ UTR of the HAUS4 gene. Despite this, and in contrast to the above example for mir-1307, the allele associated with increased expression of miR-4707-3p is also associated with increased expression of its host gene, HAUS4 (Figure 5B and 5C). The index variant, rs4981455, is in high linkage disequilibrium (LD; r2 > 0.99) with another variant (rs2273626, alleles C/A) which is within the “seed” sequence of miR-4707-3p (Extended Data Figure 4). The A allele at rs2273626, corresponding to index variant allele G, would most likely change miR-4707-3p targeting. However, we did not detect any miR-4707-3p expression in samples with the G/G genotype at the index variant, therefore we did not study altered targeting of miR-4707-3p-G as it is not expressed (Figure 5B, Extended Data Figure 4B). This finding is unlikely to be caused by reference mapping bias, because the miRNA quantification algorithm we used, miRge 2.0, accounts for common genetic variants within mature miRNA sequences (Methods)28. We also performed an allele specific expression analysis for donors that were heterozygous at rs2273626 (Extended Data Figure 4C). rs2273636-A showed consistently lower expression, providing further support for the detected miRNA-eQTL (p=3.63×10-14, using a paired, two-sided t-test). We detected only one read containing the A allele in three rs2273626 heterozygote donors. This indicates that miR-4707-3p is either expressed at levels too low for detection or not at all in chromosomes harboring the rs4981455 G allele.
In addition to the HAUS4 mRNA-eQTL colocalization, the miRNA-eQTL for miR-4707-3p expression is also co-localized with a locus associated with educational attainment (Figure 5A)2. In this case, the alleles associated with increased expression of miR-4707-3p are also associated with decreased educational attainment. We also highlight here associations to global cortical surface area (GSA)1. Although this locus does not have any genome-wide significant associations to GSA, a pattern of decreased p-values within the same LD block associated with miR-4707-3p expression implies that this locus may hold a significant association in future GSA GWAS with increased sample size. Variants associated with increased expression of miR-4707-3p are associated with decreased GSA. Supporting the association to cortical surface area, variants associated with increased miR-4707-3p expression are also co-localized with variants associated with decreased head size, a phenotype highly correlated with cortical surface area (data not shown from a publication in-preparation)4. This evidence suggests that the genetic risk for decreased head size and decreased cognitive abilities may be mediated through increased expression of miR-4707-3p in developing human cortex. Few publications imply a known function for miR-4707-3p, however, HAUS4 is known to play a role in mitotic spindle assembly during cell division and a potent regulator of proliferation57–59. Unifying these observations lead us to a hypothesis, consistent with the radial unit hypothesis26, whereby increased expression of miR-4707-3p may influence neural progenitor fate decisions during fetal cortical development ultimately leading to a decreased cortical surface area.
miR-4707-3p modulates proliferation in phNPCs
Given the genetic evidence implicating miR-4707-3p in cortical development and size, we next asked whether increased expression of miR-4707-3p in primary human neural progenitor cells (phNPCs), which model neurogenesis, influenced proliferation or cell fate decisions (Figure 6A)60. Using lentiviral transduction, we exogenously expressed mir-4707 in phNPCs derived from two genetically distinct donors. The cells were cultured in media with growth factors that retain the phNPCs in a proliferative state25. After confirming over-expression of miR-4707-3p (Figure 6C), we measured proliferation using an EdU pulse to label cells in S-phase of the cell cycle. At eight days post-transduction, we observed an increase in the number of EdU positive nuclei in samples which over-expressed miR-4707-3p, which indicates this miRNA causes an increased rate of proliferation (Figure 6C). Increases in proliferation could be due to either more neurogenic or more self-renewal fate decisions. Investigating further, we measured gene expression for a set of proliferation markers, progenitor markers, and neuronal markers in a time course experiment in phNPCs transduced with our expression construct (Figure 6D). At four, six, and eight days post transduction, we observed increased expression of the proliferation markers, KI67 and CCND1, which corroborate our findings using the EdU assay. We also observed an increase in the progenitor markers, PAX6 and SOX2, as well as the neuronal markers, DCX and TUJ1 (Beta-Tubulin III).
Discussion
Using small-RNA-sequencing, we reveal robust miRNA expression across cortical tissue during mid-gestation, a stage and tissue which has not previously been captured in previous eQTL studies using standard RNA-sequencing techniques. In addition to the known roles of miR-92b and miR-124 on progenitor proliferation and neurogenesis, our differential expression analysis shows many other miRNAs likely play crucial roles in cortical development33,34. We were also able to find greater than 100 likely novel miRNAs and further evidence of recently discovered miRNAs within these tissue samples. These novel miRNAs have sequencing read coverage characteristic of known miRNAs, and they are differentially expressed across gestation weeks like miRNAs with known roles in neurogenesis. Investigating how these known and novel miRNAs function during neuronal differentiation may yield new gene regulatory mechanisms involved in human neurogenesis.
We also discovered 85 local-miRNA-eQTLs in a tissue-type and at a developmental time point with known influence on brain structure and cognitive traits. Despite many emiRs residing within host genes, miRNA-eQTLs seldom colocalize with mRNA-eQTLs for their host mRNAs. This implies a regulatory mechanism by which miRNA expression is largely independent of that which governs host mRNA expression, highlighting the unique information gained from miRNA-eQTLs that would otherwise be missed in standard mRNA-eQTL analyses. We found that the small subset of miRNA-eQTLs which colocalize with their host mRNA eQTLs have positively correlated expression, which would indicate a common genetic regulatory mechanism governing expression of both RNAs. While miRNA-eQTLs which colocalize with a host mRNA-sQTL have negatively correlated expression. The genetic regulatory mechanisms governing miRNA biogenesis and expression uncovered by our eQTL analysis provides mostly unique mechanisms when compared to current mRNA-eQTL datasets. These miRNA-eQTLs will be a continued resource in the pursuit of describing the genetic risk loci uncovered by current and future GWAS for brain traits and disorders.
In contrast to mRNA based eQTLs, miRNA-based eQTLs appear to map less frequently to known brain disease loci. The lack of colocalizations are highlighted when compared to mRNA-eQTLs in the same tissue where 844 colocalizations with brain-trait and disorders were discovered across 18,667 mRNA-eQTLs despite a similar sample size (n=235 for mRNA-eQTLs vs n=212 for miRNA-eQTLs)8,9. This suggests that genetically regulated miRNAs either may not be a major contributor to neuropsychiatric disorders or current GWAS are underpowered to detect loci mediated through miRNA expression. We suspect that this may reflect the effect of purifying selection on miRNAs. MiRNAs are known to have broad downstream regulatory effects across hundreds or thousands of targeted mRNAs, and therefore the genetic mechanisms regulating miRNA expression may be more tightly regulated than for mRNA expression, as has previously been shown for transcription factors61,62. Rare variants, less subject to the influences of selective pressure, may be governing miRNA expression which this study did not have the power or methodology (i.e., whole genome or exome sequencing) to detect.
Nevertheless, we did find one colocalization between a miRNA-eQTL for miR-4707-3p expression and GWAS signals for head size phenotypes and educational attainment. This revealed a possible molecular mechanism by which expression differences in this miRNA may influence brain size and cognition. Experimental over-expression of miR-4707-3p in proliferating phNPCs showed an increase in both proliferative and neurogenic gene markers. These findings are consistent with the radial-unit hypothesis which would explain a decreased cortical surface area by depletion of the neural progenitor pool of cells through increased neurogenesis26,27 (Figure 6E).
An interesting feature of this genomic locus is the presence of both a miRNA-eQTL, for miR-4707-3p, and a mRNA-eQTL, for HAUS4. Although yet to be experimentally tested, the known effects on cell proliferation by the HAUS4 gene implies that increased expression of HAUS4 in neural progenitors would most likely lead to increased proliferation58,59. It is not yet known whether miR-4707-3p and HAUS4 have similar or opposing influences on fate decisions during neurogenesis. Furthermore, we did not detect expression of miR-4707-3p in samples with the genotype G/G at the eQTL index variant rs4981455. This implies that the presence of the A allele turns on miR-4707-3p expression and individuals with the G/G genotype have no miR-4707-3p expression in developing cortical tissue. This further highlights the utility of studying miRNA-eQTLs, as uncovering only the mRNA-eQTL at this locus would not reveal the full genetic mechanism leading to inter-individual differences in the head size and cognitive phenotypes.
Here we highlight one example of how miRNA expression leads to differences in brain size and cognition through altered neurogenesis during cortical development. We have yet to uncover the specific regulatory effects of miR-4707-3p (which genes are targeted or which pathways disrupted), but the effect on cellular behavior is clear. The lengthening of neurogenesis and associated expansion of the brain are hallmarks of the evolutionary differences between humans and other mammals63–66. Comparative genomic studies have revealed that humanspecific gene regulatory differences in developing neocortex are associated with neurogenesis, brain complexity, and disease, and that primate-specific miRNAs have been shown to play a role in post transcriptionally regulating gene expression associated with these developmental processes67–71. Here, we show that miRNAs also play a role in differences in brain size between humans. Continued work on predictive miRNA targeting algorithms and on experimental methods to uncover miRNA regulatory networks will be crucial to further understanding the molecular pathways that lead to brain size or cognitive differences between individuals.
Methods
Tissue Procurement
Human prenatal cortical tissues samples were obtained from the UCLA Gene and Cell Therapy Core following IRB regulations for 223 genetically distinct donors (96 females, 127 males, 14-21 gestation weeks) following voluntary termination of pregnancy. Tissue samples were flash frozen after collection and stored at −80C. These tissue samples overlapped with those used in a previous mRNA-eQTL study. Cortical tissue samples from an additional 3 donors were microdissected into germinal zone and cortical plate sections as previously described28 yielding 17 more tissue samples which were used for novel miRNA discovery but withheld from the miRNA-eQTL analysis.
Library Preparation and Sequencing
Total RNA was extracted using miRNeasy-mini (QIAGEN 217004) kits or was extracted using trizol with glycogen followed by column purification. Library preparation for small-RNA was conducted using TruSeq Small RNA Library Prep Kits (Illumina RS-200). RNA libraries were randomized into eight pools and run across eight lanes of an Illumina HiSeq2500 sequencer at 50 base-pair, single-end reads to a mean sequencing depth of 11.7 million reads per sample. mRNA library preparation and sequencing was previously described9. Briefly, library preparation for total RNA was conducted via TruSeq Stranded RNA Library Prep Kits (Illumina 20020597) with Ribozero Gold ribosomal RNA depletion. Libraries were sequenced with 50 base-pair, paired-end reads to a mean read depth of 60 million reads per sample.
MicroRNA Expression Analysis
Small RNA-sequencing FASTQ files were used as input to the miRge 2.028 annotation workflow to quantify expression of known miRNAs from miRBase release 22 (March 2018)29. Briefly, sequencing reads were first quality controlled, adaptors removed, and collapsed into unique reads. The reads were then annotated against libraries of mature miRNA, hairpin miRNA, tRNA, snoRNA, rRNA, and mRNA. The miRge 2.0 workflow protects against reference mapping bias by incorporating common genetic variants into the mature miRNA sequence library and allowing zero mismatched bases on the first pass of read annotation. A second pass of unannotated reads allows for mismatched bases to identify isomiRs. Unmapped reads were then used as input to a second annotation pipeline to quantify expression of recently discovered novel miRNAs from Friedländer et al46 and Nowakowski et al20. Bowtie v1.2.272 was used to map reads to the UCSC hg38 reference genome using the following options: -v 2 −5 1 −3 2 --norc -a --best -S --chunkmbs 512. Mapped reads were then counted using featureCounts73 against a custom GTF file including the Friedländer and Nowakowski novel miRNAs using the following options: -s 0 -M -f -O.
To test for allele-specific expression of miR-4707-3p containing variants at rs2273626, we used Bowtie with a modified reference genome which only included miR-4707-3p mature sequence. Sequencing reads were allowed to map to either AGCCCGCCCCAGCCGAGGTTCT (reference allele C, complementary strand G) or AGCCCTCCCCAGCCGAGGTTCT (alternate allele A, complementary strand T) with no mismatches using the Bowtie options: -n 0 --norc --all -S. Mapped reads were then counted with featureCounts as above. Read counts, specific to each genotype, from samples heterozygous at rs2273626 were plotted in Extended Data Figure 4C.
We also defined an additional set of novel miRNAs discovered within our 240 sample dataset using miRge 2.0 and miRDeep247 prediction pipelines. Putatively novel miRNAs predicted using miRge 2.0 (all predictions) and miRDeep2 (predictions with a score greater than zero) were removed if they overlapped with each other, known miRNAs from miRBase release 22, or recently discovered miRNAs from the Friedländer and Nowakowski datasets. Sequencing read coverage plots for each novel miRNA annotation were also created to visually inspect each annotation. Putatively novel miRNAs without a characteristic 5’ and 3’ mapping pattern (Extended Data Figure 2) were removed. Uniquely novel annotations, which passed visual inspection, were compiled and used to create a custom GTF file for use in the above annotation pipeline to quantify novel miRNA expression. Read counts of miRNAs from miRBase release 22, Friedländer et al, Nowakowski et al, and putatively novel annotations from this study were combined into one count matrix for use in downstream analyses.
mRNA Expression Analysis
Total RNA-sequencing FASTQ files were first filtered and adapter trimmed using trim_galore and the following options: --length 20 --stringency 5. Filtered and trimmed reads were mapped to GRCh38 using STAR v2.5.4b74. Mapped reads were counted using featureCounts73 against the Ensembl GRch38.p7 human gene annotations using the following options: -T 4 -p -t exon. Count data for each sample was combined into a count matrix for downstream analyses.
Genotyping
Genomic DNA was isolated using DNeasy Blood and Tissue Kit (QIAGEN 69504), and genotyping was performed on either HumanOmni2.5 or HumanOmni2.5Exome (Illumina) platforms in eight batches. SNP genotypes were exported and processed into PLINK format using PLINK v1.975. Quality control and pre-processing of genotypes was also done using PLINK v1.9 as previously described25. Briefly, SNPs were filtered based on Hardy-Weinberg equilibrium, minor allele frequency, individual missing genotype rate, and variant missing genotype rate (plink --hwe 1e-6 --maf 0.01 --mind 0.1 --geno 0.05) yielding a total of 1,760,704 genotyped SNPs.
Imputation
Sample genotypes were prepared for imputation using the McCarthy Group’s HRC-1000G imputation preparation tool (https://www.well.ox.ac.uk/~wrayner/tools/): perl HRC-1000G-check-bim.pl -b AllSamplesQC.bim -f AllSamplesQC.frq -r 1000GP_Phase3_combined.legend -g -p ALL. This tool produces a script to separate genotype files by chromosome, filter variants to include only reference SNPs, and convert the Plink filesets into VCF files. Compressed VCF files were then uploaded to the Michigan Imputation Server for use in the Minimac4 imputation pipeline49,76. TOPMed Freeze5 was used as the reference panel for imputation50. Imputed genotypes were filtered for an imputation quality score (R2) greater than 0.3, Hardy-Weinberg equilibrium p-value greater than 1e-6, and a minor allele frequency greater than 0.01.
Sample Quality Control
Sample sex was called from the genotype data using PLINK v1.9 based on X chromosome heterozygosity. Sex was confirmed by checking expression of XIST within the mRNA-sequencing data. Of the 223 samples, two were declared female using the genotype data but male by XIST expression and were excluded from downstream analysis. We further sought to detect sample swaps or mixtures by evaluating the consistency of genotypes called via genotyping and those that can be detected by RNA-sequencing using VerifyBamID v1.1.377. We detected four samples that were mixtures or possible sample swaps with their assigned genotype data (FREEMIX > 0.04 or CHIPMIX > 0.04). Finally, five samples were identified as outliers via PCA analysis after accounting for known technical confounders (Extended Data Figure 1C). A total of 212 samples (93 females, 119 males, 14-21 gestation weeks) were used in the miRNA-eQTL analysis.
PCA and Differential miRNA Expression Analysis
Principal component analysis (PCA) was performed using the prcomp() function within the stats package of the R software language78. Only miRBase release 22 expressed miRNAs with at least 10 counts across at least 10 samples were included when doing PCA. MiRNA expression was first normalized using the variance-stabilizing transformation (VST) function within DESeq279. To correct for known batch effects (sequencing-pool and RNA-purification method; Extended Data Figure 1), we used the removeBatchEffect() function within the limma package on the VST transformed expression matrix80. While removing batch effects, we preserved the effect of gestation week using the design option within removeBatchEffect(). After known batch effects were removed, PCA was repeated to confirm removal of technical variation across samples and to identify expression outliers (Extended Data Figure 1C).
Differential expression analysis was conducted on the 212 samples which survived quality control filtering using DESeq279. Expression of all known and novel miRNAs, which survived the above expression threshold, were used in the analysis. Gestation week was used as the treatment variable while controlling for technical confounding variables: sequencing-pool, RNA-purification method, RNA-integrity number (RIN), and RNA concentration after extraction. Differentially expressed miRNAs with a Benjamini-Hochberg adjusted p-value < 0.1 were deemed significant (false discovery rate (FDR) < 10%). For visualization of differentially expressed miRNAs (Figure 1C) log2(fold change) values were shrunk using the apeglm method81.
local-miRNA-eQTL Mapping
We conducted local-eQTL mapping using tissue samples from 212 donors. A total of 866 miRNAs with an expression of at least 10 counts across at least 10 samples were included in this analysis. Since expressed miRNAs can originate from more than one genomic locus, associations were conducted at a total of 907 genomic loci. MiRNA counts were normalized using a variance-stabilizing transformation function within DESeq279. Normalized expression values were adjusted using a linear model accounting for population stratification as well as known and unknown confounders. Known confounders included: sequencing pool, rna purification method, rna integrity score (RIN), sex, and donor gestation week. Unobserved confounding variables on miRNA expression were controlled using the first 10 principal components from a PCA. Population stratification was controlled using the first 10 principal components from a genotype PCA using PLINK v1.9.
Association testing between variants and residual miRNA expression (after adjusting for confounders) was done using a linear mixed-effects model as implemented in the EMMAX package51. To further control for cryptic relatedness and population stratification, we included an identity-by-state kinship matrix, also constructed using EMMAX (emmax-kin -v -h -s -d 10) by excluding variants located on the same chromosome as the variants tested in the association analysis (MLMe method52). In order to prevent a single outlier from driving the association results, variants were filtered before association testing to include only those which did not have exactly one homozygous minor sample and the number of heterozygous samples were greater than one. Variants within 1 Mb upstream of the mature miRNA start position or 1Mb downstream of the mature miRNA end position were tested for association. Using EMMAX, imputed variant dosages were used for association testing (emmax -v -d 10 -Z -t [variant_doages] -k [kinship_mat] -o [output_file] -p [expression_file]).
For multiple-testing adjustment we employed a two-stage analysis which accounts for linkagedisequilibrium between the variants tested (local adjustment) and the total number of miRNAs tested (global adjustment). First, p-values were adjusted locally using the eigenMT package82. Locally adjusted p-values were then further adjusted using the Benjamini-Hochberg multiple testing correction for a 5% false discovery rate (FDR < 0.05) across the 907 genomic loci tested. This yielded a nominal p-value of 1.434×10-6 as the stringent threshold for which variants were declared significantly associated with miRNA expression. We also declared a relaxed threshold using only global adjustment with a 5% FDR across the 6.3 million independent association tests which yielded a nominal p-value threshold of 2.804×10-5.
To declare conditionally independent local-miRNA-eQTLs, primary eQTLs were first defined as the most significant variant/miRNA pair for each expressed miRNA with at least one variant below the given nominal p-value threshold. An emiR was defined as a miRNA that has at least one variant associated with it. For each emiR, variant association testing was repeated for variants within the original 2 Mb window with the genotypes of the primary eQTL added to the association equation. The most significant variants below the original nominal p-value threshold (1.434×10-6 for stringent or 2.804×10-5 for relaxed) within this secondary analysis (if any) were defined as secondary eQTLs. The process was repeated (with the inclusion of primary and secondary genotypes in the association equation to find tertiary eQTLs, primary, secondary, and tertiary to find quaternary eQTLs, etc.) until no variants remained below the nominal p-value threshold.
Colocalization Analysis
Colocalization of local-miRNA-eQTLs with brain-relevant trait GWAS summary statistics (Supplementary Table 4), blood local-miRNA-eQTLs15, and fetal brain local-mRNA-e/sQTLs from a largely overlapping sample9 was done by first finding overlapping variants with LD r2 >= 0.8 of each local-miRNA-eQTL index variant and the set of variants within LD r2 >= 0.8 of each trait GWAS, disorder GWAS, eQTL, or sQTL index variant. LD for local-miRNA-eQTL, local-mRNA-eQTL, and local-mRNA-sQTL variants were calculated using the genotypes of the mixed-ancestry samples within each study. LD for GWAS and blood cis-miRNA-eQTL variants were calculated from 1000 Genomes phase 3 European genotypes83. After overlaps were detected, colocalization was confirmed by conditional analysis which incorporated the genotypes for a given overlapping index variant into the local-miRNA-eQTL association equation. A resultant increase in p-value beyond the stringent or relaxed p-value threshold confirmed a colocalization between the local-miRNA-eQTL index variant and the given trait GWAS, disorder GWAS, or QTL index variant.
eQTL Enrichment Analysis
Enrichment of local-miRNA-eQTLs within functionally annotated genomic regions was done using GARFIELD v284 in order to control for the distance to transcription start sites, LD, minor allele frequency (MAF) of the tested variants, and the number of effective tests across multiple annotations. Functional annotations were derived from the Roadmap Epigenomics Project53 for male and female fetal brain (E081 and E082), using the ChromHMM Core 15-state model85. MAF and LD for the variants were derived from the 212, mixed-ancestry samples in this study using PLINK v1.9. Minimum local-miRNA-eQTL p-values were used in cases where multiple association tests to different miRNAs were performed at a given variant. Only p-values surviving the stringent significance threshold were used for Figure 2, while other thresholds, including the relaxed threshold, can be seen in Extended Data Figure 3.
Comparison to Blood miRNA-eQTLs
To assess the cell-type specificity of miRNA-eQTLs we calculated the π1 statistic55. Blood miRNA-eQTLs were first defined as the emiR-variant pair with the lowest p-value for the 76 emiRs found in the blood miRNA-eQTL analysis15. Of the 76 emiRs, 52 of these miRNA were expressed in brain tissue (at least 10 counts in at least 10 samples). At the 52 blood miRNA-eQTLs, nominal p-values from brain miRNA-eQTL association analysis were used to compute the π0 value using the qvalue() function in the qvalue package86. The π1 statistic was then defined as 1 - π0. To estimate the standard error, we did 100 bootstrap samplings and computed a 95% confidence interval for each π1 statistic. An analogous calculation was done using mRNA-eQTLs from an overlapping cortical tissue dataset9 and whole blood mRNA-eQTLs reported by GTEx56.
Lentiviral Vector Cloning and Virus Production
To create the miR-4707 expression vector, the tetracycline-inducible, lentiviral expression vector pTRIPZ (ThermoFisher RHS4750) was modified by replacing the sequence for red fluorescent protein (RFP) with the sequence for enhanced green fluorescent protein (EGFP) between the AgeI and ClaI restriction enzyme cut sites. The stem-loop sequence for hsa-mir-4707 (miRbase release 22) was inserted into the multiple cloning site of pTRIPZ-EGFP using XhoI and EcoRI restriction enzymes.
pTRIPZ-mir-4707-EGFP lentivirus was produced in HEK293 cells. HEK293 cells were cultured to 90-95% confluency in DMEM (Life Technologies 11995081) supplemented with 10% FBS (Sigma-Aldrich F2442) and 1x Antibiotic-Antimycotic (Life Technologies 15240096) in 10cm tissue culture treated plates. Cells were triple transfected with 10 μg transfer plasmid, 7.5 μg PAX2 (Addgene plasmid #12260), and 2.5 μg pMD2.G (Addgene plasmid #12259) using FUGENE HD (Promega E2311). After 24 hours, media was replaced with 12 mL of 1x proliferation base media without the growth factors EGF, FGF, LIF, and PDGF (see progenitor cell culture protocol below). At 24 hours post media change, culture supernatant was filtered through a 0.45 μm syringe filter, aliquoted, and stored at −80°C in single-use aliquots. Lentivirus was titered using the qPCR Lentivirus Titration Kit (Applied Biological Materials LV900).
Primary Human Neural Progenitor Cell Culture
Primary human Neural Progenitor Cells (phNPCs) derived from developing cortical tissue were cultured, as described previously8,25,60. Two donor phNPC lines were used: Donor 54 (D54, gestation week 15.5, male, genotype G/G at rs4981455) and Donor 88 (D88, gestation week 14, male, genotype A/A at rs4981455). PhNPCs were grown on tissue culture treated plates that were coated with Growth Factor Reduced Matrigel (Corning 354230) at 50 μg/mL in 1xPBS at 37°C for 1 hr. To maintain phNPCs in a proliferative state, they were cultured in 1x proliferation media consisting of Neurobasal A (Life Technologies 10888-022) with 100 μg/ml primocin (Invivogen, ant-pm-2), 10% BIT 9500 (Stemcell Technologies 09500), 1x glutamax (Life Technologies 35050061), 1 μg/ml heparin (Sigma-Aldrich, H3393-10KU), 20 ng/ml EGF (Life Technologies PHG0313), 20 ng/ml FGF (Life Technologies PHG0023), 2 ng/ml LIF (Life Technologies PHC9481), and 10 ng/ml PDGF (Life Technologies PHG1034). phNPCs lines were split once per week using 0.05% Trypsin-EDTA (Life Technologies 25300062) into 1x proliferation media. Every two or three days, half of the culture media was replaced with 2x proliferation media: Neurobasal A with 100 μg/ml primocin, 10% BIT 9500, 1x glutamax, 1 μg/ml heparin, 40 ng/ml EGF, 40 ng/ml FGF, 4 ng/ml LIF, and 20 ng/ml PDGF.
For phNPC proliferation experiments (Figure 6), cells were plated at 4×105 and 2×104 cells/well in Matrigel-coated 6-well (for RNA extractions: Corning 3516) and 96-well (for immunocytochemistry: Corning 3598) plates respectively with 1x proliferation media. At 24 hrs post plating, cells were transduced with pTRIPZ-mir-4707-EGFP or control (pTRIPZ-EGFP) at 20 IU/cell in 1x proliferation media with 1 μg/ml doxycycline to express the miRNA (Sigma-Aldrich D9891). Media was changed at 24 hrs post transduction with 1x proliferation media with doxycycline, and plates were fed every two days by changing half of the culture media with 2x proliferation media with 2 μg/ml doxycycline. At 8 days post transduction, 6-well plates were used for RNA extraction, and 96-well plates were EdU labeled and fixed for immunofluorescence staining.
RNA Extraction and qPCR for miRNA and mRNA Expression
RNA was extracted from 6-well plates using miRNeasy Mini Kits (QIAGEN 217004) with the inclusion of an on-column DNase digestion (QIAGEN 79254). Following elution with RNase-Free water, RNA concentration and quality was assessed with a NanoDrop ND-1000 Spectrophotometer. For cDNA synthesis of mRNA, iScript cDNA Synthesis Kits (Bio-Rad 1708891) were used with an input of 200 ng of total RNA. For cDNA synthesis of miRNA, TaqMan Advanced miRNA cDNA Synthesis Kits (ThermoFisher Scientific A28007) were used with 10 ng of total RNA input. Delta cycle threshold values (ΔCt) were calculated using the housekeeping gene EIF4A2, and fold change was calculated using control samples:
To assay expression of mRNA, SsoAdvanced Universal SYBR Green Supermix (Bio-Rad 1725271) was used in 10 μL reactions on a 385-well plate. Reactions contained 2 μL of template cDNA (iScript reaction diluted 1:5 with water) and 500 nM of each forward and reverse primer (Supplementary Table 5). Primers were chosen from the PrimerBank database87. Reactions were placed in a QuantStudio 5 (Applied Biosystems) thermocycler and cycled for 40 cycles according to the SsoAdvanced protocol. To assay expression of miRNA, TaqMan Advanced miRNA Assays (ThermoFisher A25576) were used with probes against hsa-miR-361-5p as a housekeeping control and hsa-miR-4707-3p (Assay ID: 478056 and 479946). Reactions consisted of 2.5 μL of template cDNA (miRNA cDNA synthesis reaction diluted 1:10 with water), 0.5 μL of prob, and 5 μL of Taqman Fast Advanced Master Mix (ThermoFisher 4444557) in a 10 μL reaction. Reactions were placed in a QuantStudio 5 thermocycler and cycled for 45 cycles according to the TaqMan Advanced miRNA Assay protocol. Delta cycle threshold values were calculated using hsa-miR-361-5p expression as the housekeeping gene, and fold change was calculated using control samples with the above equations. Technical replicates were defined as independent wells on a 6-well plate. Technical replicates ranged from three to six; see figure legends for the number of technical replicates in each experiment. Significant differences were defined as a p-value < 0.05 from a two-sided t-test.
EdU Assay, Immunofluorescence Labeling and Imaging
To measure proliferating phNPCs we measured incorporation of 5-ethynyl-2’-deoxyuridine (EdU) into newly synthesized DNA using a Click-iT EdU Imaging Kit with Alexa Fluor 647 (Invitrogen C10340). At 8 days post transduction, phNPCs in 1x proliferation media in 96-well plates were labeled with the addition of 10 μM EdU at 37 °C for 2 hours. After the 2 hour incubation, cells were immediately fixed with 3.7% Paraformaldehyde (PFA) in PBS (FisherScientific 50-980-487) for 15 minutes. Fixed cells were then permeabilized with 0.5% Triton X-100 (Sigma-Aldrich T8787) for 20 minutes. Staining with Alexa Fluor 647 was conducted using the Click-iT EdU Imaging Kit protocol.
Fixed and permeabilized phNPCs in 96-well plates were blocked with 10% goat serum (MP Biomedicals 0219135680) in PBST: 0.02% Tween-20 (FisherScientific BP337500) in 1x PBS, for 1 hr at room temperature. Cells were labeled with primary antibody in 3% goat serum/PBST at 4 °C overnight using 1:500 chicken-anti-GFP (FisherScientific AB16901). After overnight incubation, plates were washed three times for 5 minutes each with PBST. Secondary antibody labeling was done at 1:1000 dilution in 3% goat serum/PBST at room temperature for 1 hr using goat-anti-chicken-AF488 (ThermoFisher A11011). After 1 hr incubation, a 1:1000 dilution of DAPI (ThermoFisher 62248) in PBST was added to the secondary antibody solution for 10 min at room temperature. Plates were then washed three times for 5 minutes each with PBST. Plates were stored at 4°C in 0.04% sodium azide (Sigma-Aldrich S2002) in PBS until imaging.
Plates were imaged using a Nikon Eclipse Ti2 microscope set up for high content image acquisition. Each well was imaged with 4 non-overlapping fields of view using 10x magnification and a 0.3 numerical aperture using 3 filter sets: DAPI (ex: 325-375, em: 435-485), GFP (ex: 450-490, em: 500-550), and AF647 (ex: 625-655, em: 665-715). Image sets were fed through an automated CellProfiler 88 pipeline to quantify the number of nuclei, GFP positive, and AF647 (EdU) positive nuclei in each image. Technical replicates were defined as independent wells of a 96-well plate. Replicates of 14 wells per condition. Significant differences were defined as a p-value < 0.05 from a two-sided t-test.
Supplementary Tables
Supplementary Table 1: Expressed known and novel miRNAs
UNIQUE_NAME: combination of primary-miRNA and mature-miRNA name to create a unique identifier for miRNAs with the same sequence that originate from multiple genomic loci.
ID: miRBase ID, NAME for non-miRBase miRNAs.
ALIAS: miRBase alias, NAME for non-miRBase miRNAs.
NAME: miRBase name, for non-miRBase miRNAs the identifier given in the respective publications or identifier given by miRge2.0 or miRDeep2 packages.
DERIVES_FROM: ID of primary-miRNA.
DERIVES_FROM_NAME: NAME of primary-miRNA.
SOURCE: miRBase_v22 for known miRNAs cataloged in miRBase release 22.
Friedlander2014 or Nowakowski2018 for novel miRNAs reported in those respective publications. miRge or miRDeep2 for novel miRNAs discovered in this study.
TYPE: miRNA for mature miRNAs in miRBase, miRNA_putative_mature or miRNA_putative_star for novel miRNAs.
SCORE: NA for miRBase and Nowakowski2018 miRNAs. Friedlander2014 miRNAs reported a confidence score 1-4. miRge quality scores 0-1. miRDeep2 scores >= 0.
CHR: chromosome where the mature miRNA is located.
START_hg38: start base-pair position for mature miRNA using hg38.
END_hg38: end base-pair position for the mature miRNA using hg38.
WIDTH: base-pair width of the mature miRNA.
STRAND: genomic strand of the mature miRNA.
SEQUENCE: miRNA sequence.
MEAN_VST_EXPRESSION: mean variance-stabilizing transformation expression.
Supplementary Table 2: Differentially expressed known and novel miRNAs.
BASE_MEAN: mean expression as reported by DESeq2.
LOG2_FOLD_CHANGE: log2 transformed fold-change as reported by DESeq2 on the treatment variable, gestation week. Positive values indicate enrichment within late gestation week tissues. Negative values indicate enrichment within early gestation week tissue.
LFC_SE: standard error on fold change as reported by DESeq2.
PVALUE: p-value as reported by DESeq2.
PADJ: Benjamini-Hochberg adjusted p-value as reported by DESeq2.
SIGNIFICANT: logical, TRUE if PADJ is below 0.1.
NAME: same as in Supplementary Table 1.
ID: same as in Supplementary Table 1.
ALIAS: same as in Supplementary Table 1.
DERIVES_FROM: same as in Supplementary Table 1.
SOURCE: same as in Supplementary Table 1.
TYPE: same as in Supplementary Table 1.
SCORE: same as in Supplementary Table 1.
SEQUENCE: same as in Supplementary Table 1.
Supplementary Table 3: Mid-gestation cortical tissue miRNA-eQTLs.
eQTL: unique eQTL identifier, combination of emiR and eSNP.
emiR: same as UNIQUE_NAME in Supplementary Table 1.
eSNP: unique variant identifier, combination of chromosome, base-pair position, and variants.
BETA: eQTL association effect size after fitting to the linear mixed model using EMMAX.
P: nominal p-value on linear mixed model fitting using EMMAX.
DEGREE: degree to which the eQTL is conditionally-independent.
SNP_CHR: variant chromosome.
SNP_BP_hg38: variant base-pair position using hg38.
EFFECT_ALLELE: effect allele used in the linear mixed model association by EMMAX. REF: reference allele using hg38.
ALT: alternate allele using hg38.
ALT_CTS: alternate allele counts. Summed allelic dosage across all samples in the analysis.
OBS_CT: total allele counts. Number of samples x2.
A1: A1 allele as defined by plink1.9, usually minor allele.
A2: A2 allele as defined by plink1.9, usually major allele.
A1_HOM_COUNT: number of homozygous A1 samples.
HET_COUNT: number of heterozygous samples.
A2_HOM_COUNT: number of homozygous A2 samples. miR_CHR: miRNA chromosome.
miR_START_hg38: miRNA start position using hg38. miR_END_hg38: miRNA end position using hg38.
miR_WIDTH: miRNA width in base pairs.
miR_STRAND: miRNA genomic strand.
SOURCE: same as in Supplementary Table 1.
TYPE: same as in Supplementary Table 1.
ID: same as in Supplementary Table 1.
ALIAS: same as in Supplementary Table 1.
NAME: same as in Supplementary Table 1.
DERIVES_FROM: same as in Supplementary Table 1.
DERIVES_FROM_NAME: same as in Supplementary Table 1.
SEQUENCE: same as in Supplementary Table 1.
SIGNIFICANCE: label for significance threshold used to define significant eQTLs, eigenMT_fdr5percent is the stringent threshold and fdr5percent is the relaxed threshold, see Methods.
NOM_P_VALUE_THRESHOLD: nominal p-value threshold used to define significant eQTLs.
Supplementary Table 4: Colocalizations
Sheet1: miRNA-eQTL/mRNA-eQTL colocalizations. Columns with .mirQTL suffix refer to the miRNA-eQTL analysis while .mQTL refer to the mRNA-eQTL analysis8,9.
eQTL.mirQTL: unique miRNA-eQTL identifier, combination of emiR and eSNP.
emiR.mirQTL: same as UNIQUE_NAME in Supplementary Table 1.
eSNP.mirQTL: unique variant identifier, combination of chromosome, base-pair position, and variants.
BETA.mirQTL: eQTL association effect size after fitting to the linear mixed model using EMMAX.
P.mirQTL: nominal p-value on linear mixed model fitting using EMMAX.
DEGREE.mirQTL: degree to which the eQTL is conditionally-independent.
SNP.CHR.mirQTL: variant chromosome.
SNP.BP.hg38.mirQTL: variant base-pair position using hg38.
EFFECT.ALLELE.mirQTL: effect allele used in the linear mixed model association by EMMAX.
REF.mirQTL: reference allele using hg38.
ALT.mirQTL: alternate allele using hg38.
SIGNIFICANCE.mirQTL: label for significance threshold used to define significant eQTLs, eigenMT_fdr5percent is the stringent threshold and fdr5percent is the relaxed threshold, see Methods.
SNP.mQTL: variant identifier.
ENSG.mQTL: Ensembl gene ID.
BETA.mQTL: mRNA-eQTL beta value.
P.mQTL: mRNA-eQTL p-value.
CHR.mQTL: variant chromosome.
BP.mQTL: variant base-pair position on hg38.
RANK.mQTL: conditional analysis rank.
BETA.CONDITIONAL.mQTL: beta after conditioning.
P.CONDITIONAL.mQTL: p-value after conditioning.
ALLELE_MINOR.mQTL: minor allele in the mRNA-eQTL dataset.
ALLELE_MAJOR_EFFECT.mQTL: effect allele. Major allele in the mRNA-eQTL dataset.
eQTL.mQTL: unique mRNA-eQTL identifier, combination of SNP.mQTL and ENSG.mQTL.
Sheet2: miRNA-eQTL/mRNA-sQTL colocalizations. Columns with .mirQTL suffix refer to the miRNA-eQTL analysis while .mQTL refer to the mRNA-sQTL analysis8,9.
eQTL.mirQTL: see Supplementary Table 4, Sheet 1.
emiR.mirQTL: see Supplementary Table 4, Sheet 1.
eSNP.mirQTL: see Supplementary Table 4, Sheet 1.
BETA.mirQTL: see Supplementary Table 4, Sheet 1.
P.mirQTL: see Supplementary Table 4, Sheet 1.
DEGREE.mirQTL: see Supplementary Table 4, Sheet 1.
SNP.CHR.mirQTL: see Supplementary Table 4, Sheet 1.
SNP.BP.hg38.mirQTL: see Supplementary Table 4, Sheet 1.
EFFECT.ALLELE.mirQTL: see Supplementary Table 4, Sheet 1.
REF.mirQTL: see Supplementary Table 4, Sheet 1.
ALT.mirQTL: see Supplementary Table 4, Sheet 1.
SIGNIFICANCE.mirQTL: see Supplementary Table 4, Sheet 1.
snp.mQTL: unique variant identifier, combination of chromosome, base-pair position, and variants.
intron.mQTL: unique intron identifier, combination of chromosome, base-pair start and end positions, and cluster identifier.
beta.mQTL: mRNA-sQTL beta value.
pvalue.mQTL: mRNA-sQTL p-value.
chr.mQTL: sQTL chromosome.
rank.mQTL: degree to which the sQTL is conditionally independent.
cond.beta.mQTL: mRNA-sQTL beta value at conditional rank.
cond.pval.mQTL: mRNA-sQTL p-value at conditional rank.
clusterID.mQTL: unique cluster identifier.
gene.mQTL: gene symbol.
ensemblID.mQTL: Ensembl gene ID.
transcripts.mQTL: Ensembl transcript ID.
BP.mQTL: sQTL base-pair.
rsid.mQTL: rsid of sQTL.
Sheet3: miRNA-eQTL brain/blood colocalizations
eQTL.mirQTL: see Supplementary Table 4, Sheet 1.
emiR.mirQTL: see Supplementary Table 4, Sheet 1.
eSNP.mirQTL: see Supplementary Table 4, Sheet 1.
BETA.mirQTL: see Supplementary Table 4, Sheet 1.
P.mirQTL: see Supplementary Table 4, Sheet 1.
DEGREE.mirQTL: see Supplementary Table 4, Sheet 1.
SNP.CHR.mirQTL: see Supplementary Table 4, Sheet 1.
SNP.BP.hg38.mirQTL: see Supplementary Table 4, Sheet 1.
EFFECT.ALLELE.mirQTL: see Supplementary Table 4, Sheet 1.
REF.mirQTL: see Supplementary Table 4, Sheet 1.
ALT.mirQTL: see Supplementary Table 4, Sheet 1.
SIGNIFICANCE.mirQTL: see Supplementary Table 4, Sheet 1.
NOM.P.VALUE.THRESH.mirQTL: see Supplementary Table 4, Sheet 1.
snpID.bloodQTL: rsid for blood eQTL.
Estimate.bloodQTL: blood eQTL beta.
Pval.bloodQTL: blood eQTL p-value.
hsa_miR_name.bloodQTL: miRBase miRNA name.
effect.bloodQTL: effect variant for blood eQTL.
noneffect.bloodQTL: non-effect variant for blood eQTL.
Sheet4: GWAS data sources
TRAIT: trait or disorder name.
PMID: PubMed ID for published article associated with each dataset.
DATA_LINK: link to data download site.
Supplementary Table 5: qPCR primers
GENE: gene name.
NCBI_GENE_ID: NCBI gene ID.
PRIMER_BANK_ID: primer bank ID.
AMPLICON_SIZE: distance between forward and reverse primer on gene transcript. FORWARD_PRIMER: forward primer.
REVERSE_PRIMER: reverse primer.
Author Contributions
MJL and JLS prepared the manuscript. LTU and JLS collected the samples. MJL and JLS designed the bioinformatic analyses. MJL, JLS, and LTU designed the validation experiments. MJL performed the bioinformatic analyses and validation experiments. NA performed the mRNA-e/sQTL analysis. DL preprocessed the genotype data. NA, OK, DL, and JMW supported bioinformatic analyses and validation experiments. DHG and JLS secured funding to support the work. JLS oversaw the work.
Acknowledgements
This work was supported by the National Institutes of Health (R01MH120125, R01MH118349, U54EB020403, R00MH102357 to JLS). Tissue was collected from the UCLA CFAR (5P30 AI028697). RNA-seq libraries were sequenced by the UCLA Neuroscience Genomics Core. This work was supported by a grant from the National Institute of General Medical Sciences under award 5T32GM067553-13 to MJL. The UNC Neuroscience microscopy core is supported by the NICHD (U54HD079124).
References
- 1.↵
- 2.↵
- 3.
- 4.↵
- 5.↵
- 6.↵
- 7.
- 8.↵
- 9.↵
- 10.
- 11.
- 12.
- 13.
- 14.↵
- 15.↵
- 16.
- 17.
- 18.
- 19.↵
- 20.↵
- 21.
- 22.
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.
- 38.
- 39.
- 40.
- 41.↵
- 42.↵
- 43.
- 44.
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.
- 65.
- 66.↵
- 67.↵
- 68.
- 69.
- 70.
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵