Detection of EGFR deletion using unique RepSeq technology

We are reporting a novel sequencing technology, RepSeq (Repetitive Sequence), that has high sensitivity, specificity and quick turn-around time. This new sequencing technology is developed by modifying traditional Sanger sequencing technology in several aspects. The first, a homopolymer tail is added to the PCR primer(s), which makes interpreting electropherograms a lot easier than that in traditional Sanger sequencing. The second, an indicator nucleotide is added at the 5’end of the homopolymer tail. In the presence of a deletion, the position of the indicator nucleotide in relation to the wild type confirms the deletion. At the same time, the indicator of the wild type serves as the internal control. Furthermore, the specific design of the PCR and/or sequencing primers will specifically enrich/select mutant alleles, which increases sensitivity and specificity significantly. Based on serial dilution studies, the analytical lower limit of detection was 1.47 copies. A total of 89 samples were tested for EGFR exon 19 deletion, of which 21 were normal blood samples and 68 were samples previously tested by either pyrosequencing or TruSeq Next Generation Sequencing Cancer Panel. There was 100 % concordance among all the samples tested. RepSeq technology has overcome the shortcomings of Sanger sequencing and offers an easy-to-use novel sequencing method for personalized precision medicine.


52
The detection of somatic mutations in the epidermal growth factor receptor (EGFR) is the 53 key for choosing first line targeted therapies for treating patients with late stage non-small 54 cell lung cancer (NSCLC) (1,2,3). EGFR deletion mutations constitute a key component for imposes five challenges; being a hyper variable region, the deletion could be anywhere in the 59 above-mentioned region, there could be multiple deletions of varying number of nucleotides on 60 the same allele, deletion could be homozygous or heterozygous, the copies of the deletion alleles 61 could be low, and finally, the mutant allele is usually mixed wild type allele at different ratios. 62 RepSeq technology has been developed to address most of these challenges. We have selected 63 the most common exon 19 deletion, EGFR L747-A750, that offers the choice of treatment 64 with Afatinib, Gefitinib, or Erlotinib (4) (7). Such an overlap generates scrambled nucleotide sequences that can be difficult 74 to decipher to make the call on the nucleotide sequence. As an alternative, pyrosequencing is 75 used routinely to detect EGFR L747_A750 deletion from FFPE samples (8). The nucleotide 76 read outs from pyrosequencing require experience to call the results with confidence. Unlike 77 germline mutations, where the copy number is high, EGFR L747_A750 is usually a somatic 78 mutation, and different samples have different mutant to wild type ratios, adding another level 79 of complexity to detect mutants in the presence of a n abundance o f wild type EGFR.

86
RepSeq platform technology 87 The RepSeq process includes extraction of total DNA from FFPE samples, followed by 88 amplicon sequencing and analysis by capillary electrophoresis. If t h e s a m p l e c a r r i e s 89 a s o m a t i c d e l e t i o n , there will be two amplicons generated, one carrying 90 the m u t a n t w i t h deletion region, and the other carrying the wild type.

91
Purified PCR products will be simultaneously sequenced using both wild type and be distinguishable, at the distal end, the 'C' signal from the mutant will be among 120 the TAA repeats, and hence could be detected. water. The cleaned products were injected for 16 seconds into ABI 3130xl Genetic Analyzer 152 and the electropherogram was analyzed using Sequencing Analysis Software 6.0.

154
If there is no deletion, and using selective sequencing primer only, the single 155 nucleotide sequence of the wild type will be displayed in the sequencing result 156 with its specific read sequences ( Figure 1C). If the sample carries cells with EGFR 157 deletion, then there will be two nucleotide sequences; one from the wild-type 158 sequence and other from that of EGFR deletion, and hence the nucleotide signal 159 from b o t h wild type and mutant will overlap for most part. Although at the 160 beginning of the electropherogram the overlap of both the nucleotide sequences 161 could be scrambled and not readable, the nucleotide sequences at the distal end 162 will display the detection region that is made up of adenosine thymidine-163 thymidine repetitive homopolymer sequence with cytosine at its distal end ( Figure   164 1D) . Since the nucleotide sequence from the deletion will be shorter, the  of diluted stock were used in a PCR reaction and the lower limit of detection was determined 175 to be 1.47 copies per assay. (Table 1).  Table 2). The wild type sequencing primer (selective) generated a wild type nucleotide 185 sequence when tested with wild type template HD 709 ( Figure 1C). The wild type sequencing 186 primer (Non-selective) generated a wild type nucleotide sequence and deletion sequence with 187 heterozygous template (HD 251) that carries 50% wild type allele ( Figure 1E). The deletion 188 sequencing primer generated a deletion specific nucleotide sequence when tested with 189 heterozygous template (HD 251) that carries 50% wild type allele ( Figure 1F). However, the 190 mutant sequencing primer did not generate any nucleotide sequence when tested with 100% 191 wild type template HD 709 (Data not shown). Further, when mutant and wildtype (selective) 192 sequencing primers were both included with wild type template (HD 709), only wild type 193 specific and no deletion specific sequences were generated ( Figure 1G). When both the wild 194 type (selective) and the deletion sequencing primers were tested with heterozygous template 195 (HD 251), the expected nucleotide sequences were generated, with the indicator signal for the 196 deletion moving to the left of the wild type ( Figure 1D).

197
The human genomic controls with 50% mutant allele template generated both wild type 198 sequence and the deletion sequence. In addition, twenty-one normal blood samples were tested 199 for EGFR deletion L747_A750 by RepSeq technology using both deletion and wild type Out of the twenty-four samples that were tested by pyrosequencing; both methods detected 15 205 EGFR L747-A750 negatives and nine EGFR L747-A750 positives. (Table 3). thirty-seven EGFR L747-A750 negatives that were detected by TruSeq negatives. (Table 3).

211
RepSeq features 212 Similar to Sanger sequencing and pyrosequencing, the electropherogram from the wild type 213 and the deletion are generated from the same set of PCR primers. However, unlike Sanger 214 sequencing and pyrosequencing, RepSeq uses two sequencing primers; one for the wild type 215 and the other for a deletion-specific sequencing primer that spans across the deletion region.

216
This unique feature increases the signal intensity of that of the deletion that is in par with that 217 of the wild type and therefore increases the test sensitivity. The lower PCR primer carries a 218 three nucleotide (Adenosine-Thymidine-Thymidine) repetitive sequence with a 219 guanidine nucleotide at its 5' end as an indicator of one end of the PCR product. Such a 220 design make sequencing data interpretation much easier and faster.

221
This study used FFPE samples from late stage lung cancer where positive EGFR L747_A750 222 deletion samples will have an abundance of copies of the deletion, and hence had acceptable 223 level of concordance among the three methodologies tested. Since RepSeq has a very low limit 224 of detection compared to other two methods, it is expected to play a significant role in liquid 225 biopsy and detection of EGFR 747_A750 in early stages (< stage IV).

227
There are other additional deletions that are clinically significant. A second generation RepSeq 228 -EGFR assay will have a combination of sequencing primers covering additional deletions. The

232
In summary, the EGFR RepSeq assay produces an easy to read electropherogram to detect 233 mutation in the presence of wild type, and enrichment using allele specific primers. Further,

234
RepSeq also contains built-in features that address troubleshooting due to variations in sample 235 matrix.