A protocol for locating and counting transgenic sequences from laboratory animals using a map-then-capture (MapCap) sequencing workflow: procedure and application of results

Transgenic rodent models for human diseases have been widely used over the past 50 years and are a mainstay of many biomedical research programs. Oftentimes the sequence of the transgenic segment of DNA is carefully designed but incorporation of this DNA into the host genome is less well understood. Structural variation and insertional mutagenesis may occur at transgenic insertion sites. Here, we present a robust workflow including identification of the transgene locus via selective Illumina sequencing followed by Cas9-mediated target DNA enrichment of the locus, which successfully identified beginning and end sites of a large transgenic insertion into a murine model for human amylin-induced type II diabetes. Enriched sequences were mapped via Oxford Nanopore sequencing. Although the insertion was too long for a single mapped genetic sequence to encompass, the method provided multiple insights relevant to the animal model: a minimum number of forward- and reverse-facing transcript copies as well as characterization of an inversion point within the insertion site. The insertion start point containing both murine and human DNA was used to identify and separate animals hemizygous for the transgenic insertion from homozygous animals. This identification could be performed early in the rodent life cycle prior to maturation (i.e. breeding age), thus allowing for management of colony phenotypes and eliminating the need to “genotype by phenotype” later on (onset of amylin-induced type II diabetes does not occur until ~8-10 weeks of age for this model). We further confirmed our homozygous diabetic mice function the same as colonies established in other labs and present full antibody and fluorescent-staining protocols (available in SI). Lastly, we note that, due to our genotyping, a novel animal was identified within our colony: non-diabetic homozygous mice. Indeed, only 37% of homozygous mice bred in our colony became diabetic. AUTHOR SUMMARY (for broad audience) Transgenic rodent models are important to studying human diseases. When creating a new rodent model, one may insert new DNA into a well-characterized background genome. However, it is oftentimes not known where the new DNA was incorporated, how many times it was incorporated, or if any coding sequences or regulatory elements within native DNA were disrupted. Here, we have developed a method to characterize transgenic animals, and have applied it to a popular model for studying human amylin-induced type II diabetes.

hemizygous for the transgenic insertion from homozygous animals. This identification could be performed early in the rodent life cycle prior to maturation (i.e. breeding age), thus allowing for management of colony phenotypes and eliminating the need to "genotype by phenotype" later on (onset of amylin-induced type II diabetes does not occur until ~8-10 weeks of age for this model). We further confirmed our homozygous diabetic mice function the same as colonies established in other labs and present full antibody and fluorescent-staining protocols (available in SI). Lastly, we note that, due to our genotyping, a novel animal was identified within our colony: non-diabetic homozygous mice. Indeed, only 37% of homozygous mice bred in our colony became diabetic.
AUTHOR SUMMARY (for broad audience): Transgenic rodent models are important to studying human diseases. When creating a new rodent model, one may insert new DNA into a well-characterized background genome.
However, it is oftentimes not known where the new DNA was incorporated, how many times it was incorporated, or if any coding sequences or regulatory elements within native DNA were disrupted. Here, we have developed a method to characterize transgenic animals, and have applied it to a popular model for studying human amylin-induced type II diabetes.

INTRODUCTION:
Understanding mutated genomes of transgenic animals.
Almost 50 years has passed since the first successful introduction of transgenic material into a mouse (1)(2)(3)(4), and to this day transgenic mouse alleles remain an indispensable biomedical tool from basic research to development of preclinical therapeutics. Transgenic mouse alleles are traditionally created by microinjecting recombinant DNA into the pronuclei of fertilized eggs and identifying integration events of the transgenic fragment into a random locus or loci of the genome. Neither the number nor location of insertion segments are reliable; thus native genes may be disrupted and expression levels of transgenic DNA vary widely among these models. (5,6) Continued breeding of mouse lines often results in mutagenesis over time, occasionally resulting in phenotype-enhancing or phenotype-suppressing effects.(7) Although more precise genome editing strategies have since become available, the transgenic method continues to reliably generate many animal models for human diseases, some of which have been commercialized. While phenotypes of these alleles are often carefully described, precise molecular characterization of transgenic alleles is seldomly reported. In fact, reports of molecular characterization of transgenic alleles indicate that transgenic alleles are often more complex than expected. (8) One example of an animal model produced in this fashion that is now widely available is the RIPHAT hIAPP +/-mouse (FVB/N-Tg(Ins2-IAPP)RHFSoel/J, The Jackson Laboratory stock no. 008232). Phenotypically, these mice are highly valuable: when bred to homozygosity, some mice experience human amylin-induced type II diabetes (T2D) after ~10 weeks of age (T2D penetrance has previously been undefined). Molecularly, the location of the transgenic DNA is not known, although using primers designed based on the non-native transgene promoter sequence easily distinguishes animals carrying the transgenic allele from those not carrying it.
However, this method fails to distinguish hemizygous transgenic from homozygous transgenic animals (SI-1). The distinction between hemi-and homozygous mice is typically only possible after the mice become diabetic (using phenotype to estimate the genotype). This approach is suboptimal financially (as all transgenic mice must be raised to ~10 weeks of age before their phenotype is known) and further assumes that mature phenotype perfectly reflects genotype (i.e., 100% penetrance).
While detailed molecular characterizations of transgenic alleles are infrequent, great improvements have been made in our ability to study whole genomes of organisms. This includes substantial decreases in costs of short-read sequencing and development of powerful long-read sequencing technologies that allow transgenes to be mapped and characterized, respectively (9,10). In this study, we use these technologies to genetically characterize a colony of RIPHAT hIAPP (+/+) mice and study their resulting phenotypes. Here we report a map-thencapture sequencing workflow (hereafter referred to as 'MapCap') to easily permit the characterization of transgenic alleles. Briefly, the first step uses selective amplification of transgene-containing Illumina inserts to map the locus of integration. The second step uses Cas9 to selectively sequence that locus using Oxford Nanopore Technology. This method of molecularly characterizing transgenic mouse alleles can be applied to any transgenic mouse model with a known transgenic sequence, even if the insertion locus is unknown. This method yields the insertion locus, the structure of the insertion, and a rapid PCR-based genotyping assay to distinguish alleles with and without the transgene.

DESCRIPTION OF THE METHOD:
DNA sequencing.
DNA was isolated from mouse tail and blood of wild type, hemizygous, and homozygous mice and provided to the UW-Madison Biotechnology Center DNA sequencing facility (doubleblinded to sample ID). DNA target sequence construct was designed according to the patent submitted by Soeller et al (US patent 6187991 B1) describing RIPHAT transgenic construct (US 6187991 B1 sequence ID no. 7 encompassing rat insulin II promoter and 5' untranslated leader, IAPP coding region, albumin intron I, and GAPDH polyadenylation region. See SI for full primer design). Illumina and Oxford Nanopore sequencing was performed and analyzed at the UW-Madison Biotechnology Center (UWBC) using protocols provided by manufacturer and a Cas9 enrichment protocol developed internally at UWBC.

Mice.
Breeding pairs were purchased form Jackson Laboratories (stock no. 008232). Initial sequencing was performed on 18-week-old homozygous mouse tail snip DNA. Follow-up rounds of sequencing were performed on blood DNA from 10-week-old wild type FVB, hemizygous, and homozygous mice. Mice were handled and sacrificed according to approved UW-Madison Research Animal Resources and Compliance (RARC) protocols.

Illumina sequencing of transgene insertion site.
Identification of the DNA junction between the integrated RIPHAT hIAPP transgene and the mouse strain genome was performed following a modified High-throughput Insertion Tracking by Deep Sequencing (HITS) method (11). Libraries were prepared using TruSeq Nano DNA Library Prep (Illumina). DNA was fragmented to an average size of 400bp using Covaris  First, we wanted to define both the chromosomal location and insertion site of the RIPHAT hIAPP transgene. Genomic DNA was isolated from tail punches of heterozygous mice followed by TruSeq Nano DNA Library Prep. Transgene-specific primers were designed using the plasmid sequence used to generate the RIPHAT transgene. An additional primer was used against the Illumina adapters, as shown in Figure 1A, allowing selective amplification of inserts containing the transgene. Mapping this library back to mouse genome yielded the precise insertion location of the transgene.
After mapping the transgene to the precise location within the mouse genome, a Cas9enrichment long-read DNA assessment was performed ( Figure 1B), as shown in Figure 1C high-quality reads overlap, but are not overlapped with the third read. Lastly, Figure 2F displays a final scenario where none of the reads overlap. In all scenarios, there is a minimum of 26 copies of the transgene at the insertion site, indicating that homozygous mice carry 52 copies of the transgenic construct. Our results place the transgenic construct in a noncoding region of chromosome 15; proximity to the two nearest-neighbor coding sequences is displayed in Figure   2G. Using PCR sequences complementary to the rat insulin 2 promoter, it is not possible to distinguish the mice in our colony (SI-1). This is presumably due to high sequence similarity between the rat insulin promoter and the mouse insulin promoter sequences; thus wild-type mice are also positive for the ca. ~500 b.p. band (SI-1). Using a complementary sequence to human GAPDH, located near the end of the transgenic sequence, it is possible to distinguish wild type mice from transgenic mice, but not hemizygous from homozygous mice (SI-1). Shown in Figure   3  Only RIPHAT hIAPP(+/+) homozygous mice that achieve blood glucose levels >300 mg/dl at 12 weeks of age were used in the studies described below. Islets from wildtype and RIPHAT hIAPP(+/+) mice are both amylin-positive according to Anti-hIAPP antibodies (T4157 polyclonal antibody raised against hIAPP amino acids 25-37, Peninsula Laboratories International) that according to our experiments recognize both mIAPP and hIAPP (SI-2, A2 and A4). However, in contrast to wild type mice, islets in RIPHAT hIAPP(+/+) mice show a dramatic loss of β-cells, as judged by reduced insulin immunoreactivity (Si-2, A1 and A3). Islet amyloid using Thioflavin T (SI-2, B1, B2, and B3) and Congo red under polarized light (SI-2, C3) was identified in RIPHAT hIAPP(+/+) mice but not wild-type mice (SI-2, B4 and C4, for Thioflavin T and Congo Red under polarized light, respectively). These results suggest that expression of human IAPP results in amyloid formation and, because of the loss of insulin-producing β-cells, inhibits normal islet function, in agreement with previous work on these mice. (16)(17)(18) DISCUSSION AND CONCLUSIONS: In summary, we have developed a robust method to genotype colonies of transgenic animals and applied this method to a colony of wild-type and RIPHAT hIAPP(+/-and +/+) mice. We also characterized the animals using previous methods, such as blood glucose tracking and fluorescence microscopy. In so doing, we observed that the prevalence of type II diabetes amongst homozygous animals is only 37%, and discovered a previously unidentified subset of animals-those homozygous for the transgenic insertion, but who did not get type II diabetes.
We expect other transgenic animal models may have uncharacterized genetic penetrance as well, if the animals are only characterized by phenotype. This method improves the current understanding of the genotype-versus-phenotype relationship in animal models of human diseases.