Chasing a moving target: Detection of mitochondrial heteroplasmy for clinical diagnostics

Clinical interpretation of human mitochondrial DNA (mtDNA) variants has been challenging for technical and biological reasons but the involvement of dysfunctional mitochondria in many diseases makes it imperative to have a validated assay for detecting pathogenic variants. We have tested several methods to identify those best suited to detect and confirm mtDNA variants. The choice of methods is dependent on the amount of DNA available for testing and the sensitivity required for detecting low-level heteroplasmies. There is a tradeoff between a polymerase’s ability to amplify small amounts of DNA and its ability to generate accurate sequence. We report a simple method to measure heteroplasmy levels of large deletions from NGS data alone without need for qPCR or other methods. Use of HapMap samples for standardization needs to be done with caution as most have novel heteroplasmic sites that have arisen during immortalization/cell culture processes. Different batches of DNA can have variable sequence. In contrast, we observed no de novo heteroplasmies in healthy mother-child pairs studied using blood or saliva though the frequency of pre-existing heteroplasmies often changed dramatically across generations. Long-read nanopore sequencing of individuals with two heteroplasmies suggested a random distribution of variants on single molecules but technical artifacts prevent certainty on this finding. Urine provides an additional readily accessible source of mtDNA that can be used for bone marrow transplant recipients whose saliva/blood mtDNA may be contaminated by the BMT donor’s mtDNA. We have characterized cells suspended in urine via expression profiling and shown them to be primarily mucosal cells that are independent of blood. Understanding the pitfalls of the various mtDNA sequencing methods allows development of reliable and accurate tests suitable for clinical diagnostics. Author Summary Mitochondrial DNA is important for many diseases but it is present at many copies per cell so is harder to check for mutations compared to nuclear DNA. We have studied mitochondrial DNA in different ways to see how it changes across generations and in different locations in the body. The tests need to be much more sensitive than nuclear DNA tests so that we can detect mutations down to 1%. We have shown that mitochondrial DNA changes when cell lines are used but saliva, blood and cells in the urine can all be used for testing. Cells in the urine originate as mucosal cells and are independent of blood. We developed a new method for analyzing large deletions that means sequencing data alone can be used for measuring the frequency of deletions. We also followed a family with two variable sites to better understand how mitochondrial DNA changes from mother to child. In some children, the variants stayed the same while, in others, variants disappeared.


Introduction
Mitochondrial DNA (mtDNA) has been extensively studied because its unique inheritance and variant patterns have important implications for evolutionary biology, genealogy, forensics, human disease, and other fields (1--5). mtDNA is less than 0.001% the length of the remainder of the human genome, but contains 37 tightly packed genes, all of which are essential for life. The number of mitochondria per cell varies widely among different cell types (6). The unique maternal inheritance pattern and high mutation rate can lead to a continuous range of variant frequencies within cells/tissues with such variant sites referred to as heteroplasmic (7). The heteroplasmy can be inherited or occur de novo and its frequency can change with age as well as across tissues, making it challenging to understand how any given variant affects health and disease. Generally, a pathogenic variant needs to reach a frequency of >60% (deletions) or >80% (SNVs) in order to generate a phenotypic effect (8) although there are exceptions to this (9). Heteroplasmy levels may also change with age, an effect that has complicated attempts to correct mitochondrial mutations (10).
Metabolically--active cells like muscle, brain, and kidney have more mitochondria than other tissues and these cells are generally more affected by mitochondrial dysfunction. Determining whether mitochondrial variants are functionally relevant is more difficult than with nuclear variants due to the uncertain impact of heteroplasmy and the possibility that heteroplasmy varies across tissues.
Comparisons across different tissues have shown significant variation in heteroplasmy frequency with the greatest variant accumulation seen in liver, kidney, and skeletal muscle (11,12). Correlating heteroplasmy variation in different tissues with disease severity induced by the A3243G mutation concluded that heteroplasmy in urinary cells is the best predictor of disease (13,14). These observations have led to the consensus recommendation that blood or tissue measurement of heteroplasmy levels be followed up by assessment in urine cells (15). Assigning disease relevance to mitochondrial mutations is even further complicated by the fact that hundreds of nuclear--encoded genes are known to affect mitochondrial function (16). Mutations in nuclear genes have the potential to cause either Primary Mitochondrial Disease (PMD) or Secondary Mitochondrial Dysfunction (SMD) (3). Only recently has it been possible to properly assess heteroplasmy. The original methodology for detecting it, Sanger sequencing, is poorly suited for the task. Sanger is unreliable at detecting low but potentially biologically--relevant rates of heteroplasmy. As a result, many early studies set the minimal detection rate for heteroplasmies at 20% or higher. Furthermore, the difficulty of Sanger sequencing the whole mitochondrial genome led many to focus only on selected segments leading to a limited view of the potential impact of functional variants. The introduction of massively parallel NGS technologies with their digital determination of base calls has allowed both easy sequencing of whole mitochondrial genomes as well as more accurate determination of heteroplasmy down to very low levels (17).
However, there are limits to even these technologies because the inherent error 6 rates in single NGS reads and the need for amplification during and sometimes prior to sequencing can introduce technical artifacts. Superimposed on these technical issues is the biological issue that DNA fragments with high homology to mtDNA pepper the nuclear genome (NuMTs) and can lead to erroneous alignment of nuclear reads to mtDNA (18). The exact boundaries of NuMTs vary depending on how one defines the degree of identity required. They are typically identified by BLAST versus the nuclear genome and, in some cases, results are restricted to specific thresholds such as >500bp and >80% identity (19). All these issues can lead to technology--specific errors in determination of heteroplasmy levels.
To ensure that calling of mitochondrial mutations and heteroplasmy can be achieved with clinical quality, we have examined mtDNA using a variety of sequencing technologies, sample preparation methodologies, and tissue sources and developed a method for determining heteroplasmy for large deletions using only NGS data. We have examined the cells found in the urine to better understand their source and relevance. We find that careful choice of methods is required if sensitive and accurate sequencing is to be achieved.

Results
mtDNA sequencing is challenging for many reasons, making it useful to compare independent methods in order to obtain optimal results. We have used two different DNA enrichment methods, hybridization capture and amplification, to select mtDNA for sequencing. Both hybridization capture and short--range PCR have the same disadvantage that the probes/primers cover only relatively short regions and thus are prone to capturing/amplifying homologous nuclear regions. When testing hybridization capture, we used probes developed by Agilent to capture mtDNA at the same time as the whole exome so that all nuclear coding regions relevant to mitochondrial function could be sequenced simultaneously with the mtDNA (20).
The ratio of mitochondrial probes to nuclear probes was titrated to generate sufficient coverage across the whole mtDNA genome while minimizing coverage loss of nuclear exomic regions. While these capture probes could potentially pull down NuMT sequences as well as mtDNA, the high ratio of mitochondrial to nuclear DNA in cells ensures that NuMT sequences should be, at most, a minor fraction of the total sequence reads. If NuMTs were to be found, they would appear as variants with low heteroplasmy. Thus, NuMT contamination could limit the sensitivity of heteroplasmy detection when hybridization capture is used but the impact would be limited to low level heteroplasmy (<5%).
We found that different methods of extracting DNA from the same blood samples yielded different ratios of mitochondrial:nuclear DNA, thus necessitating different ratios of mitochondrial baits depending on which method was used for extraction.
When manual extraction of DNA from blood with a Qiagen column was used, there was 3x as much mitochondrial coverage (~300x) as nuclear coverage (~100x) with a 1:50 mitochondrial:nuclear bait ratio. However, DNA extraction with an automated Autogen FlexSTAR yielded less mitochondrial coverage (~100x) relative to nuclear DNA (~100x). While 100x coverage would be sufficient for heteroplasmy detection of >5% if coverage were perfectly even, the variation in coverage across the mitochondrial genome meant that there would be limited sensitivity in some regions. As a result, bait coverage for samples prepared in an automated manner was adjusted to 1:10, which produced sufficient mitochondrial coverage (~300x mito vs. 100x exomic) for sensitive heteroplasmy detection.
While it is desirable to obtain samples from affected tissues, it is not always possible as tissues like brain and heart are generally inaccessible to sampling. Saliva, buccal swabs, and blood are more readily obtained but may not always provide independent information as it has been found that a substantial amount of DNA in saliva/buccal swabs can originate from blood cells. When buccal DNA from individuals who have received bone marrow transplants (BMT) was examined, typically 10--30% (21) of the DNA was from the BMT donor's blood and, on occasion, up to 100% of the DNA could be from the BMT donor. Thus, sequence determined from buccal or saliva cells may be uninformative for understanding mtDNA variants.
Another readily accessible source of cellular DNA is urine where cells can accumulate from different sources. Using qPCR, we compared the relative amounts of mtDNA/nuclear DNA in selected tissues. As shown in Table 1, the amount of mtDNA in urine cells is far higher than found in blood or saliva though less than that found in liver, muscle, and kidney.  Sample 1  67  104  221  2323  569       Sample 2  92  109  151  448  182       Other       255  3350  7231  3350  1710 When amplification is used for mtDNA selection rather than bait--based capture, care must be taken to ensure proper primer design. Primers have been designed to generate short amplicons so that even degraded DNA could be sequenced (22), up through very long amplicons that require fully intact mtDNA (23). We have evaluated amplification using both short (<500bp) and long (>8kb) amplicons to compare performance.
HapMap DNA NA12878 was amplified with short amplicons using the Ion AmpliSeq Mitochondrial panel kit followed by sequencing with the Ion Torrent PGM (22). In addition to the risk that short--amplicons could permit NuMT contamination, it was found that sequencing uniformity was poor. In a sample where median coverage was 641x, more than 5% of the mtDNA was covered at less than 20x. At the other end of the spectrum, the top 5% of the genome was covered at >35,000x and more than 2% was covered at >100,000x. These coverage extremes make it difficult to get high quality sequence over all regions, particularly if low level heteroplasmy detection is desired. Such extremes may be permissible if complete coverage is not necessary as is the case in many forensic and genealogical applications, but it is not acceptable if one needs to detect all rare variants as in clinical applications. For this reason, short--amplicon PCR was not explored further for clinical diagnostics.
Long--range PCR is advantageous for avoiding NuMT contamination and achieving more uniform coverage but makes it more challenging to obtain robust amplification. Primers need to be designed so as to be specific for mtDNA rather than NuMTs and they must also avoid common variants to prevent unequal amplification. There are only one or two NuMTs >6000 bp (18,19), so appropriately designed longer amplicons can avoid amplifying these potential artifacts. Because of the high mutation rate in mtDNA (>400 variant sites having population frequencies >1%), it is challenging to avoid all such positions while still maximizing mismatches with NuMTs. Two sets of four primers ( Table 2) were identified that meet these criteria and these primer sets amplify the entire mtDNA with just two amplicons each. This minimizes coverage variation and eliminates the chance of inadvertently amplifying known NuMTs. By using two independent primer sets, the chance of allele dropout due to SNVs or deletions is minimized. Of the eight primers we have validated, one is identical and one is highly overlapping with previously published primers (24). Two of the eight primers have only a single change relative to reference NuMTs while all others have 2--5 nucleotide changes from the most homologous NuMT region. None of the primer pairs should bind within a single NuMT. The most common SNV within each primer site is listed in Table 2 and no primer overlaps a SNV that has a frequency of >1% based on Mitomap.org. This combination of features provides robust and specific amplification for all samples tested and from a variety of tissue sources (Fig 1). The resultant DNA is readily sequenced via Sanger or NGS methods. In addition to the challenges created by mtDNA/nuDNA sequences, there is also the challenge of sequence quality when the need for extensive amplification is combined with the desire to measure heteroplasmy down to levels of 1% or less. As the limits for detection are pushed lower, the possibility of amplification and/or sequencing artifacts increases. As shown in Fig 2,  to obtain when attempting to amplify those samples.

Fig 2. Titration of Input DNA with different DNA polymerases
Long--range PCR fragments were amplified individually and run on a TapeStation. The primers listed in Table 2 were used with varying amounts of input DNA (1--50ng) and either Q5 or Takara LA polymerase. Lanes were not normalized for intensity. Lanes from multiple runs have been spliced together in this figure.
While effective amplification is obviously a prerequisite for sequencing, it is also important to consider how the different polymerases cause sequence errors that could be significant at low heteroplasmy levels. While lower amounts of DNA can be used with Takara LA, it lacks proofreading activity so is more error--prone than Q5 (25,26). This can lead to a higher error baseline that could be mistaken for low--level heteroplasmy. To test this, 50 ng samples were amplified with the two enzymes and sequencing results compared. When a 5% heteroplasmy cutoff is used, the results with the enzymes are identical. However, if one examines the 1--5% variant range, there are more putative low heteroplasmy variants with Takara than Q5 (typically To determine how well the amplicons generated by these primers performed in NGS and to assess reproducibility, we sequenced DNA from seven families, the NIST forensic standards (29), and other individuals. The NIST samples were originally characterized by Sanger sequencing but they were subsequently resequenced using NGS in order to better detect heteroplasmy (30). For the NIST and all other samples   To test this, an independent source (NIST) of NA12878 was examined. The three heteroplasmies found in the original Coriell sample were found in the NIST sample with two of them at slightly different frequencies (Fig 3). In addition, a novel heteroplasmic site was observed, C6518T, at 2% frequency.
The second familial set was sequenced from saliva--derived DNA so potential cell culture artifacts do not limit its interpretation. This family was selected based on its longevity, availability of multiple generations, and the observation of two heteroplasmic sites, T3290C and T16075C, in one of two nonagenarian siblings (A1). The centenarian mother of the nonagenarian siblings died before her DNA could be obtained. The second sibling, A2, had neither heteroplasmic site but was identical to the other sibling at all other mitochondrial positions (T2e haplogroup).
Three A1 children, B1, B2, and B3, were also sequenced. B1 had the same two heteroplasmic sites with T16075C found at a very similar frequency as her mother ( Fig 4) while T3290C had increased frequency. Both heteroplasmic site frequencies in B1's child (C1) were similar to the mother. B2, in contrast, had even higher T3290C than the mother, A1, while the T16075C heteroplasmy had disappeared.
T16075C was also absent in both of B2's children while T3290C frequency was still high and drifted in opposite directions in the two children. The third child, B3, lost the other heteroplasmy, T3290C, while T16075C increased in frequency. This multigenerational study shows the dynamic nature of heteroplasmy variation.
The different frequencies of the two heteroplasmies in the longevity family and the fact that both disappear independently suggested that they might reside on separate mtDNA molecules. Short read sequencing is unable to confirm this. Use of a long read technology like Oxford Nanopore has been used previously with mtDNA to assess contamination issues but the higher intrinsic error rate limited its ability to detect low frequency mutations (32). However, once heteroplasmy frequencies are known via short--read sequencing, it may be possible to determine whether variants are on the same molecule as long as they can be amplified on the same fragment.
Because the two heteroplasmies in this family are located on different fragments with our standard long range PCRs, alternative primers were chosen to provide a single fragment containing both sites. After amplification and ligation of barcodes according to Oxford Nanopore protocols, the resultant DNA was sequenced using a  This fragment is identically sized in both samples so should amplify identically with both mtDNAs. As shown in Table 4

Table 4 Heteroplasmy in Deletion Mixtures
The two deletion samples (Haplogroups H11a and L2c2b2) were mixed with HapMap sample NA24631 (F2b1). Each mixture was then amplified using primers 1/2 and 3/4 and sequenced. Because there were many differences in sequence between the mtDNAs, there were many apparent heteroplasmies. These were averaged over 3 regions: all of Fragment 2 and two segments of Fragment 1, both inside and outside the deletion region. This allows heteroplasmy of the original deletion sample to be determined. The overlapping segments of Fragments 1 and 2 were omitted from the analysis. The heteroplasmic site C12969T in NA24631 was not used in the analysis. G10310A was omitted from the 10/15 analysis because it was a significant outlier that could have been impacted by alignment issues near the deletion breakpoint. For many variant sites, the test sample varied from the reference and NA24631 had the reference base. For consistency in fractions, these sites were treated as if NA24631 was non--reference even though it actually matched the reference sequence. In order to determine the true heteroplasmy of deletions in the patient samples, shown in Supplementary Fig 2 where  Because urine cells provide a useful, potentially independent tissue source of mtDNA, we sought to better understand the tissue origin of these cells to ensure they were not derived from blood like saliva DNA has been shown to be in some  Table 5. Genes are sorted by urine/UHR expression ratio.  The expression profiles of the listed urine--selective genes were compared to the expression levels in 51 tissues found in the GTEx database (gtexportal.org/).

Deletion
Because the GTEx data was generated using classical RNA Seq and our data was generated using DGE that is independent of transcript length, the normalized units Similarities in tissue gene expression patterns were examined in two ways. To look at overall similarities, all genes with >1 RPM in at least one tissue were compared pairwise across all tissues (Table 6A). Not unexpectedly, the highest correlation was kidney medulla versus kidney cortex with r=0.991 while the least similar pair was muscle versus kidney medulla (0.048). The UHR sample, intended to be representative of all tissues, did indeed have the smallest range of r values (0.339--0.716). Saliva, blood, and urine had modest similarity (0.510--0.608). Rather than looking at >18K genes with expression levels > 1 RPM in any tissue, it is also instructive to look at only the tissue specific genes (Table 6B). Genes with either >100x or >10x expression in each tissue relative to UHR were compared. Genes with >100x expression ranged from a low of 19 with lung FFPE to 149 for saliva.
Similarly, there was a low of 412 genes with >10x expression in lung FFPE ranging to a high of 1031 in HBR. The highest degree of sharing of >100x genes was between urine and saliva (56 out of a possible 110) and the highest degree of sharing of >10x genes was between blood and saliva (348 out of a possible 564). These comparisons show that there are similarities among the different fluids evaluated but there are also sufficient differences to indicate that blood cells do not contribute substantially to saliva and urine signals in these healthy individuals.
To confirm that sequencing of urine mtDNA yielded the same result as saliva DNA, multiple samples were sequenced including one of the samples with heteroplasmies.
Because of the issues with effective amplification/sequencing of urine mtDNA due to its low yield, low--level heteroplasmies (up to 3%) were ignored. The single heteroplasmic sample for which both saliva and urine were available yielded very similar frequencies comparing the two (T3290C: 0.34 vs 0.37; T16075C: 0.19 vs 0.23) and homoplasmic variants were identical.

Discussion
When DNA sequencing was restricted to Sanger methodology, the ability to sequence the whole mitochondrial genome and detect low--level heteroplasmies was very limited. The NIST forensic standards demonstrate this clearly where NGS analysis identified two heteroplasmies in SRM--9947A that were missed by Sanger sequencing (30). Despite this, Sanger sequencing of mtDNA could still be used effectively for forensic and genealogical purposes because these applications do not generally require information from the whole mitochondrial genome. However, a complete knowledge of mtDNA is required in order to detect potential causal variants in disease. While the advent of NGS technologies opened up mtDNA to a more complete analysis, technical aspects of mtDNA sequencing remain more challenging than with nuclear DNA and their impact needs to be understood in order to obtain results accurate and reproducible enough for clinical diagnostics.
For example, knowing that heteroplasmy can vary significantly across both tissues and generations affects how one chooses which samples to examine and how to sequence them. In order to reliably achieve detection of heteroplasmies down to 1% or less, care must be taken to choose sample preparation methods and enzymes that are highly accurate or the background error rate will obscure true heteroplasmies.  In families with pre--existing heteroplasmy, the frequency usually changed, sometimes substantially, but no new sites were observed. There are few literature estimates of the rate of de novo heteroplasmy generation among healthy individuals because there are few studies that have examined the whole mtDNA at sufficient depth and accuracy to provide solid numbers. One study with sufficient coverage depth (~20,000x) used the number of heteroplasmies and the rate of heteroplasmic drift among mother/child pairs to allow calculation of de novo variants to about 1 per 100 healthy births (39). This rate is consistent with our findings of no de novo variants in saliva/blood samples but is not consistent with a natural cause for the high de novo rate observed in HapMap samples.
Even with the artifactual heteroplasmies in the HapMap samples, the varied levels are useful for assessing the ability of various protocols to generate reproducible results. In theory, each method should give identical results but each is also subject to its own potential technical artifacts that could lead to discrepancies. Using these sensitive methods, it is possible to follow inherited heteroplasmies across generations and detect it down to 1% or less. The ability to detect such low levels can be important. The mother of proband 2 ( The large number of SNV differences between the spiked sample and the test sample enhances the accuracy of these measurements. There were 33 and 70 differences in the two pairs of samples, so the variant frequencies can be averaged for higher accuracy. The qPCR and NGS methods agree reasonably well for heteroplasmy determination in these samples. As can be seen in Supplementary Fig 2, the variant frequencies can also be used to generate a low resolution mapping of the deletion.
The magnitude of the amplification bias can also be estimated. Based on the excess heteroplasmy due to the deletion fragments in the amplified fragment outside the deletion region, one can estimate an overrepresentation of 6.8x for the short Common deletion fragment and 11x for the short 10/15 deletion fragment using these amplification conditions. The larger effect with 10/15 is expected because its deletion is larger. This method of determining heteroplasmy for large deletions is attractive for use in a clinical setting because it relies only on NGS data without the need for other technologies like qPCR, thus simplifying the workflow.
Most mtDNA assays use either saliva or blood as a source of the DNA. Based on studies with bone marrow donors/recipients, a significant fraction of DNA in saliva could be derived from blood (21). Thus, saliva may not provide a fully independent view of mutations that may be occurring across tissues. Most other relevant tissues are inconvenient or dangerous to sample so we examined the easily accessible cells found in urine for their suitability. As shown in Table 1, mitochondrial content is much higher in urine cells than blood or saliva. To ensure that these cells were not primarily derived from blood, RNA expression studies were carried out to assess the tissue origin of cells in the urine. Many of the top 100 genes expressed in whole blood were expressed poorly in urine including some with expression < 1 RPM (ALAS2, HLA--B, CD53) or 1--10 RPM (DEFA3, NKG7, RHOG, SERPINA1). Nearly all genes expressed at high levels in blood and urine are highly expressed in most other tissues also. The only exceptions to this are S100A8 and S100A9, which are highly expressed in blood and urine cells but not most other tissues. However, both are also highly expressed in esophageal mucosal cells. Though the cells in urine likely come from a variety of sources, the best match in the GTEx database is clearly mucosal cells so these likely contribute the largest fraction of DNA to the analysis.
While there may be a minor contribution from blood to the sample, it must be a small fraction of the mix so urine provides the independent source that is desirable for analysis of mtDNA. While the urine cells provide the desired independent source of DNA, this DNA is limited by its very low yield. While it is easy to get micrograms of DNA from blood or saliva, nanograms is the relevant unit for urine DNA. This limits the methods that can be used for sequencing and the higher level of amplification required means low--level heteroplasmy will be more difficult to detect due to higher background noise.
The ability to generate accurate results for clinical diagnostic sequencing is important so that families may use that information in planning and treatment.
Efforts to prevent transmission of pathogenic mitochondrial variants have begun with a variety of strategies. Selection of embryos with low levels of pathogenic variants has had limited success thus far (10) and efforts to generate "three parent" children have begun but are being subjected to a high degree of regulatory and ethical scrutiny due to the uncertainties surrounding the process. Families making these difficult choices should be able to rely on accurate genetic information on which to base their decisions. High quality, clinical mtDNA sequence is achievable but care must be taken to avoid artifacts that can compromise the data.

Materials and Methods
Oligonucleotide primers for sequencing and amplification were obtained from IDT. instructions as described previously (42). Oxford Nanopore libraries were prepared with input of 1µg of amplified long--range PCR product using R9 Ligation Sequencing 1D Kit with Native Barcoding, omitting the fragmentation and DNA repair steps.
These samples were loaded on R9 Spot--on Flo--min107 flowcells. MinKNOW software was used for instrument control and basecalling. Barcodes were de-convoluted using EPI2ME agent. Sequences were searched for specific variant sites using 8 or 9mers.
RNA was extracted from tissues using the Qiagen RNA Mini kit, from blood using the