Abstract
Amniocentesis is typically performed to identify large chromosomal abnormalities within the fetus. Here we demonstrate that it is feasible to generate an accurate whole genome sequence (WGS) of a fetus from an amniotic sample. DNA from cells and the amniotic fluid were isolated and sequenced from 31 amniocenteses. Concordance of variant calls between the two DNA sources and with parental libraries was high. Two fetal genomes were found to harbor potentially detrimental variants in CHD8 and LRP1, variations in these genes have been associated with Autism Spectrum Disorder (ASD) and Keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. In this study, we demonstrate for the first time the sequencing of the whole genome of fetuses from amniotic fluid and show that much more information than large chromosomal abnormalities can be gained from an amniocentesis.
Introduction
Amniocentesis is a common procedure performed on over 200,000 women a year in the United States alone. It is currently performed on women who are considered to be at a higher risk for pregnancy complications because of their advanced age or to further investigate an abnormal blood or ultrasound test result (e.g., suspected Down syndrome as a result of an extra chromosome 21 detected by noninvasive prenatal testing (NIPT)). The procedure involves the insertion of a needle through the wall of the uterus and the amniotic sac to collect approximately 20 ml of amniotic fluid. Cells from the fluid are collected through centrifugation, cultured, and after approximately two weeks analyzed by fluorescent in situ hybridization (FISH) or a microarray to detect abnormal chromosomal copy number changes or large chromosomal structural rearrangements. In some cases a small number of specific genes are examined for single to multi-base changes. These tests have become the gold standard for detecting Down syndrome and several other serious birth defects because they have a low false positive rate, however they are unable to detect the majority of birth defects. Currently, there are over 5,000 genes associated with a genetic disease for which a gene test is offered according to GeneTests (www.genetests.org). In addition, recent studies have defined large sets of genes in which coding variants within these genes are associated with Autism (Michaelson et al. 2012; O’Roak et al. 2012; Sanders et al. 2012; De Rubeis et al. 2014; Iossifov et al. 2014), severe intellectual disability (Gilissen et al. 2014), and other congenital disabilities (de Ligt et al. 2012; Veltman and Brunner 2012; Epi et al. 2013; Yang et al. 2013; Al Turki et al. 2014; Fromer et al. 2014; McCarthy et al. 2014; Purcell et al. 2014; Deciphering Developmental Disorders 2015) and a large population study (Lek et al. 2016) discovered thousands of genes that are intolerant to coding variants, adding to the set of genes that could be disease causing. Taken together these data provide strong evidence that there are probably thousands of genes for which coding changes or complete loss of function of the gene product are incompatible with life or could result in a serious disease phenotype. This suggests testing for large chromosomal changes or examining just a few genes is insufficient for detecting most disease causing genetic defects.
In this study, we developed a modified workflow to enable the WGS analysis of cell-free DNA (cfDNA) from amniotic fluid. For each amniotic sample, we isolated DNA from both the amniotic fluid and the cell pellet. We also collected DNA from the blood of each parent. WGS libraries were made and sequenced for each DNA sample enabling a rich set of genomic data for reproducibility analyses and clinical variant annotations to be performed.
Results
Determination of quality of fetal genome data
28 cfDNA and all 31 cell pellet DNA samples yielded high quality data. Coverage across each sample was excellent with a confident call for both alleles made for ~97% of the genome and ~98% of the exome from both DNA sources (Fig. 1a and b and Table S1). For about half of the cell pellet DNA samples an LFR (Peters et al. 2012) library could successfully be made. These libraries also showed good coverage with an average of ~96% of the genome and exome called confidently. Approximately 4 million variants per library were called, similar to previous studies on Asian genomes (Genomes Project et al. 2015) (Fig. 1b). For most amniocentesis samples a standard library was made from both the cfDNA and the cell pellet enabling comparisons of variant calls between each library (Table S2). In general, over 96% of calls were shared between pairwise comparisons at all locations for which both libraries were covered with sufficient reads (Fig. 2a). Additionally, since both parents were sequenced the fetal genome from each library could be compared with parental calls at each loci as further confirmation that the correct variants are being called at all positions. This showed that ~99% of calls were consistent with variant calls made in the parents (Fig. 2b). Taken together these results suggest that high quality fetal genomes can be generated from either cfDNA in the amniotic fluid or high molecular weight DNA isolated from the cell pellet.
DNM analysis
Previous studies (Gilissen et al. 2014; Peters et al. 2015; Yuen et al. 2015) using Complete Genomics’ genome data have shown the DNMs can be detected with a low false positive error rate using appropriate filters (Methods and Materials and Supplementary Methods). Following similar analysis steps, we found approximately 65, 65, and 50 DNMs per fetal genome in the STD libraries from a cell pellet, in the cfDNA libraries, and in the LFR libraries, respectively (Table S3). Pairwise comparisons between cfDNA and cell pellet libraries demonstrated that approximately 88% of DNMs were shared between libraries for each amniocentesis sample (Fig. 2c).
In order to determine the accuracy of DNM calls and confirm that DNMs identified in the fetal genome were present in the child’s genome we contacted and collected buccal samples from 13 of the study participants. Potential DNMs were randomly selected for confirmation and several hundred base pairs surrounding each candidate DNM were amplified by PCR. 175 regions were successfully amplified and Sanger sequenced. Of these, 162 (92.5%) were found to harbor the potential DNM (Table S4). Candidates shared between two replicate libraries had a much higher confirmation rate (99.1%), in agreement with inherited variant rates (Fig. 2) and suggesting that using replicate libraries to confirm inherited or de novo variant calls is a robust method of evaluation. Potential DNMs were further confirmed to be true DNMs by Sanger sequencing the DNA of each parent. Only 3 of the 89 potential DNMs for which Sanger sequencing was successful for both parents were found to be inherited (Table S4). This suggests the overall false negative rate of our sequencing process is quite low (~3.4%).
The average age of mothers and fathers in our cohort at the time of the amniocentesis procedure was 37.8 and 43.3, respectively (Table S2). It has previously been shown that older fathers contribute a higher number of DNMs to their children than younger fathers (Kong et al. 2012; Jiang et al. 2013). To examine if this correlation could be seen in our cohort we plotted the total number of DNMs by maternal and paternal age. This resulted in the previously described pattern of an increase in the total number of DNMs with increasing paternal age. In our cohort paternal age contributed approximately 1.3 DNMs per additional year of age (Fig. 3a). Plotting the same data by maternal age did not show the same pattern; instead older mothers appeared to contribute less DNMs to their children (Fig. 3b). However, this may be due to the small sample size and the trend in our cohort that the oldest fathers tended to have the youngest wives (Fig. S1). Analysis of the base spectrum of DNMs did not differ significantly from that of inherited variants (Fig. S2). For those samples with LFR data the parental origin of most DNMs could be determined. This analysis showed the expected pattern of approximately 1.6X more DNMs from the father (Table S5).
Detection of copy number variants (CNVs) and structural variants (SVs)
Most of the women in this study were referred for an amniocentesis as a result of a positive noninvasive prenatal test (NIPT) for abnormal chromosomal copy number. For our WGS test to be useful, we should be able to detect these large-scale changes in structure and chromosomal copy number. To determine this we first compared our read coverage across all chromosomes to karyotyping results from the amniocentesis procedure. Two fetuses were found to carry an extra copy of chromosome 21 by karyotyping, this was also confirmed in our read coverage data for libraries made from cfDNA and cell pellet DNA (Fig. S3). In addition, there we three fetal genomes with know benign polymorphisms in heterochromatin and satellite DNA that were poorly covered by our WGS reads. This suggests that for many types of structural changes karyotyping will be necessary until WGS can be improved in these difficult to sequence parts of the genome.
However, many smaller changes (< 1 Mb) are difficult for karyotyping or array CGH to detect, but should be much easier for WGS. It is difficult to know what the ground truth is for CNVs and SVs in each genome, but having multiple replicates for each fetal sample and parental genome data enables reproducibility of our assay. As with small variants, we compared CNVs and SVs between replicate libraries and also compared them to parental genome calls (Tables S6 and S7). Over 94% of CNVs and 96% of SVs were found in at least one of the parents (Fig. 4a). In addition, ~66% of CNVs and ~74% of SVs overlapped with CNVs and SVs identified as part of the 1KG project (Sudmant et al. 2015) (Fig. 4b). Of those CNVs and SVs that were inherited, 97% and 85% were called between replicate libraries, respectively (Fig. 4c). In general, this demonstrates a high level of reproducibility and suggests that most CNV/SV calls are true positives.
A total of 8 de novo CNV/SVs of greater than 1 kb were identified within the fetal genomes excluding trisomy 21 within the fetal genomes from families 21 and 22 (Tables S8 and S9). The largest identified was 14 kb. Based on previous studies, de novo CNVs larger than 100 kb are rare in healthy individuals (Sebat et al. 2007; Xu et al. 2008; Conrad et al. 2010; Itsara et al. 2010; Oskoui et al. 2015; Acuna-Hidalgo et al. 2016).
Analysis of disease related genes
There is a growing list of databases that track the association of specific variants with disease. We searched our list of variants for entries in Clinvar (Landrum et al. 2014), a well know database for associating genomic variants with disease. On average, each fetal genome contained ~1 pathogenic or likely pathogenic variant with assertion criteria and no conflicting interpretations. Most of the potentially disease-causing variants appear to act in a recessive manner (Table S10) and no homozygous or compound heterozygous variants with these criteria were discovered. However, this means on average each child is a carrier for a potentially serious disease. We identified variants to such diseases as severe combined immunodeficiency, limb-girdle muscular dystrophy, and a predisposition to breast cancer. In addition, we found 6 children with different autosomal recessive deafness carrier alleles in the genes GJB2 and TMPRSS3; these alleles are known to be more prevalent in Asian populations. None of the DNMs identified in our study were found in Clinvar.
To further analyze potential disease-causing variants that were not found in Clinvar we determined the Combined Annotation Dependent Depletion (CADD) (Kircher et al. 2014), SIFT (Kumar et al. 2009), and Polyphen2 (Adzhubei et al. 2010) scores for all rare coding inherited variants (Table S11) and DNMs (Table 1). We also used the ExAC (Lek et al. 2016) database to identify those genes with high pLI and missense Z-scores with variants and/or CNVs/SVs (Tables S8-S9). As a control these steps were repeated on the genomes of healthy Asian participants of the Personal Genome Project (PGP) (Mao et al. 2016). Based on this analysis the majority of variants appeared to be benign (Fig. 5). There were, however, a small number that merited further examination based on their scores, notably a detrimental DNM in CHD8 in the fetal genome of FAM12. Mutations in this gene have recently been described as being one of the more common causes of ASD and define a particular subtype of the disease (Bernier et al. 2014). Contact of the family of this now 2-year old boy revealed that he does show at least one of the common phenotypes, macrocephaly, but at this time he does not show nor has he been evaluated for symptoms of ASD. In the fetal genome of FAM26 two different heterozygous missense variants, one from each parent, were identified in the gene LRP1 (Table S11). Both are predicted to be detrimental by Polyphen2 and SIFT and have CADD scores above 24. Both variants are listed in ExAC, but are rare, one having been found in only two individuals in the database. In addition, the missense Z-score for this gene is 10.62 suggesting that it is highly intolerant to variation. Variants in this gene have been associated with Keratosis pilaris atrophicans, a skin disease that isn’t expected to severely affect the health of this child, however no additional information about whether this child shows any symptoms of the disease are available. The remainder of the children, as predicted by this genetic screen, have not been reported to have any serious illness.
Analysis of variants in genes targeted by drugs
Apart from diagnosing serious diseases there are other phenotypes that are important to identify in these fetal genomes. A tragic example is the case of a child who died from respiratory depression associated with excessive levels of morphine in the blood after elective adenotonsillectomy. It was later determined he had a duplication of CYP2D6, making him an ultrafast metabolizer of codeine; thus, the increased levels of its metabolite, morphine. He would have likely been prescribed a different dose or drug for pain management had this information been know (Ciszkowski et al. 2009). WGS analysis of amniotic material could potentially eliminate these types of severe interactions between a drug and an individual’s genetics from birth.
Each fetal genome in our study was analyzed against a list of potential drug interactions cataloged in the DrugBank database (Law et al. 2014). This resulted in over 400 coding variants per fetal genome in genes that are known targets of drugs. Analysis of Asian genomes and other ethnic groups from the 1KG project resulted in a similar number of coding drug target variants (Fig. S4). The vast majority of these variants would be unexpected to alter the protein product of these genes in such a way as to cause a serious adverse drug reaction. However, we discovered 381 instances of a variant with a low frequency in the population that resulted in complete loss of one copy of a gene listed in the DrugBank database in at least one of our fetal genomes. Again, it is unclear what affect, if any, these variants would have and improvements in our understanding of the interaction between drugs and specific variants will be necessary before this type of data can be fully utilized.
Currently, there are a few well-known gene-drug interactions that we can investigate. Specifically, the cytochrome 450 (CYP450) family involved with metabolizing most drugs and the genes involved in severe reactions to anesthesia. The results of this analysis are summarized in Table 2. Importantly, we discovered a large number of these children had at least one copy of an inactive or reduced activity CYP450. There are a number of drugs, such as warfarin, where dosing would be altered based on this information. In addition, we identified 4 rare damaging variants in RYR1 and one in CACNA1S. Variants in these genes have been associated with malignant hyperthermia (MH), a serious and sometimes fatal response to anesthesia (Rosenberg et al. 2015). While it is unlikely that all 5 individuals are at risk for MH, this information would alert an anesthesiologist to utilize additional precautions and avoid MH triggering medications during the management of anesthetic care. A caffeine halothane contracture test on a muscle biopsy would also be recommended to confirm MH.
Discussion
In this study, we demonstrate for the first time the complete WGS analysis of amniotic samples from pregnant women. We show that up to 97% of the fetal genome can be confidently called using either DNA isolated from the fetal cell pellet or the amniotic fluid with virtually no difference in quality or coverage between the two sources. This is an important discovery as the leftover amniotic fluid is considered a waste product suggesting that WGS could be added to amniocentesis testing without interfering with the current standard of care tests. We also demonstrate that LFR libraries of high quality can be made from DNA isolated from the cell pellet allowing for haplotyping in these samples. We also discovered within the fetal genome almost all CNVs and SVs found in the parental genomes and many of these also overlapped with 1KG project samples. We identify 65 DNMs per genome from both cell pellet and cfDNA sources, in agreement with previous studies (Acuna-Hidalgo et al. 2016), and show that most of these are shared between the two libraries. In our data we find a previously described (Kong et al. 2012; Jiang et al. 2013) trend towards more DNMs in the genomes of those fetus’ with older fathers. Importantly, we find that over 92% of the DNMs identified by sequencing a single library from either the cell pellet or cfDNA exist in the newborn child, proving that this type of analysis is accurate and that the fetal genome is sufficiently predictive of the genome of the child.
In this cohort we discovered a single fetus with a DNM in CHD8. At 2 years of age it is already evident that this child has macrocephaly. Importantly, damaging DNMs in CHD8 are one of the most common causes of simplex cases of ASD and 80% of individuals with ASD and a DNM in CHD8 also display macrocephaly (Bernier et al. 2014). This suggests that the child in our study should be monitored for development of ASD. We also identified a child with compound heterozygous detrimental variants in LRP1, which could cause Keratosis pilaris atrophicans, although we were unable to obtain any additional information about the health of this child beyond that she was born healthy. We also found that many of the children in this study are carriers for a severe disease and importantly identified many that could have potential drug dosage issues due to reduced CYP450 activity. Finally, we identified 5 individuals we would consider to be at risk for MH due to rare variants in RYR1 and CACNA1S. For those individuals, we would suggest that they undergo additional testing before being given general anesthesia or avoid anesthetic medications that are contraindicated for MH susceptible patients.
Importantly, through this analysis we show that much more information can be acquired from a routine amniocentesis procedure. At the current cost of about $1,000 we suggest the process we have described here should be considered as an additional analysis that can augment current karyotyping methods. This type of additional information has the potential to identify many of the causes of serious birth defects that currently go undetected.
Materials and Methods
Sample collection and processing
The institutional review boards of BGI-Shenzhen and the Peking University Shenzhen Hospital provided approval for this study. For each amniocentesis procedure 4 ml of amniotic fluid were sampled and frozen. Frozen amniotic fluid was thawed and centrifuged at 500X g for 5 minutes to pellet cells. The pellet was washed in PBS, followed by cell lysis, and purification by dialysis using a RecoverEase DNA isolation kit (Agilent Technologies, Santa Clara, CA). Approximately 100-1000 ng of high molecular weight genomic DNA were collected from each sample. Samples were concentrated using Microcon 30 kDa columns (Merck Millipore, Billerica, MA) to 105 ul. 5 ul and 100 ul were used for LFR library (Peters et al. 2012) and standard library construction (Drmanac et al. 2010), respectively, as previously described.
cfDNA was isolated from 3 ml of the remaining amniotic fluid supernatant from each sample using the QIAamp Circulating Nucleic Acid Kit (Qiagen, Hilden, Germany) and following the manufacturer’s protocol. Samples were eluted in 40ul TE buffer, which yielded between 80-1000 ng of DNA. Samples were brought to 100 ul with the addition of TE buffer and sheared by an E220 instrument (Covaris, Woburn, MA). Sheared samples were processed directly following a modified version of Complete Genomics’ (Mountain View, CA) standard library construction (Drmanac et al. 2010). Due to low starting material several purification steps were eliminated.
High molecular weight DNA was isolated from parental blood samples using a dialysis method as previously described (Peters et al. 2012) and processed using Complete Genomics’ standard library process (Drmanac et al. 2010). Both standard and LFR libraries were sequenced on Complete Genomics’s nanoarray platform. Sequence read mapping and variant calling were performed using Complete Genomics’s custom analysis pipeline(Carnevali et al. 2012). A full description of variant annotation and other analyses using whole genome data can be found in the supplementary materials.
DNM analysis
DNMs in each fetal genome were first identified using the calldiff algorithm in CGA™ Tools (http://www.completegenomics.com/public-data/analysis-tools/cgatools/). Potential DNMs were screened against databases from the 1,000 genomes project (1KG) (Genomes Project et al. 2015), the Personal Genome Project (PGP) (Mao et al. 2016), the Wellderly Project (Erikson et al. 2016), and a Complete Genomics internal variant database to remove any variants that were false negatively called in the parental genomes.
Sanger sequencing confirmation
To confirm DNMs buccal samples were collected from newborns using Omni swabs (GE Healthcare Life Sciences, Chicago, IL) and DNA was isolated using a QIAamp DNA Mini kit (Qiagen, Hilden, Germany) following the manufacturer’s protocol. High molecular weight DNA from the parents isolated for standard library construction and NA12878 were used to confirm DNMs were truly de novo and not false negatively called in the parents. PCR primers were designed to encompass approximately 250 base pairs on either side of the candidate DNM using Primer3 (Untergasser et al. 2012). Forward and reverse primers contained M13 forward and reverse primer sequences, respectively, to enable common primers to be used in Sanger sequencing. 1 ng of genomic DNA was used per PCR using AccuPrime Taq DNA polymerase (Thermo Fisher Scientific, Waltham, MA) following the manufacturer’s protocol. Sanger sequencing with both forward and reverse M13 primers was performed by McLab (South San Francisco, CA) and analyzed using Mutation Surveyor (SoftGenetics, State College, PA) with manual inspection of all results.
Data access
Reads and mappings data have been submitted to the database of Genotypes and Phenotypes (dbGaP, http://www.ncbi.nlm.nih.gov/gap/) under accession ID phs001283.v1.p1.
Author contributions
B.A.P., R.D., and F.C. conceived the study. Y.D., W.X., and F.C. collected the amniocentesis samples. B.A.P., R.C., and R.Y.Z. developed the lab processes and made the libraries for sequence analysis. Q.M., N.G., Z.L., H.X., Q.S., E.E.P., and B.A.P performed analyses. B.A.P., W.X., F.C., and R.D. coordinated the study. B.A.P., R.C., and Q.M. wrote the paper. All authors contributed to revision and review of the manuscript.
Supplemental Materials
Supplemental Methods
Supplemental Table S1. Summary statistics of all genomes.
Supplemental Table S2. Library IDs of each sample.
Supplemental Table S3. DNMs.
Supplemental Table S4. DNMs confirmed by Sanger sequencing.
Supplemental Table S5. LFR phasing of DNMs.
Supplemental Table S6. CNV counts
Supplemental Table S7. SV counts
Supplemental Table S8. CNVs
Supplemental Table S9. SVs
Supplemental Table S10. Clinvar variants
Supplemental Table S11. Rare variants
Supplemental Fig. S1. Paternal versus Maternal Age.
Supplemental Fig S2. Inherited and de novo single base spectrum changes.
Supplemental Fig S3. Chromosomal copy number analysis.
Supplemental Fig. S4. Gene drug interactions.
Supplemental Fig. S5. PLINK analysis.
Supplemental Fig. S6. Principal component analysis of samples.
Acknowledgments
We would like to acknowledge the ongoing contributions and support of all Complete Genomics and BGI-Shenzhen employees, in particular the many highly skilled individuals that work in the libraries, reagents, and sequencing groups that make it possible to generate high quality whole genome data. This work was supported in part by the Shenzhen Municipal Government of China Peacock Plan NO.KQTD20150330171505310. Employees of BGI and Complete Genomics have stock holdings in BGI.