A macromutation eliminates colour patterning in captive butterflies

Captive populations often harbor variation that is not present in the wild due to artificial selection. Recent efforts to map this variation have provided insights into the genetic and molecular basis of variation. Heliconius butterflies display a large array of pattern variants in the wild and the genetic basis of these patterns has been well-described. Here we sought to identify the genetic basis of an unusual pattern variant that is instead found in captivity, the ivory mutant, in which all scales on both the wings and body become white or yellow. Using a combination of autozygosity mapping and coverage analysis from 37 captive individuals, we identify a 78kb deletion at the cortex wing patterning locus as the ivory mutation. This deletion is undetected among 458 wild Heliconius genomes samples, and its dosage explains both homozygous and heterozygous ivory phenotypes found in captivity. The deletion spans a large 5’ region of the cortex gene that includes a facultative 5’UTR exon detected in larval wing disk transcriptomes. CRISPR mutagenesis of this exon replicates the wing phenotypes from coding knock-outs of cortex, consistent with a functional role of ivory-deleted elements in establishing scale color fate. Population demographics reveal that the stock giving rise to the ivory mutant has a mixed origin from across the wild range of H. melpomene, and supports a scenario where the ivory mutation occurred after the introduction of cortex haplotypes from Ecuador. Homozygotes for the ivory deletion are inviable, joining 40 other examples of allelic variants that provide heterozygous advantage in animal populations under artificial selection by fanciers and breeders. Finally, our results highlight the promise of autozygosity and association mapping for identifying the genetic basis of aberrant mutations in captive insect populations.


Introduction
Captive populations can harbour variation that is not observed in the wild. Aside from domestication for agricultural or commercial yield, animals and plants have often been selectively bred for 'beautiful', 'interesting' or 'unnatural' aesthetic traits, especially colour and pattern variants (1,2). In recent decades, the genetic basis of artificially selected variation has begun to be mapped in many organisms used in agriculture, floriculture and the pet trade, including the genetic mapping of a large number of genes responsible for flower colour variation (3,4), melanin-based coat colour in domestic mammals (1), plumage patterns and colours in birds (5,6), and chromatophore distribution in squamates and fishes (7). These variants can affect genes involved in the enzymatic production, transport and deposition of pigments like the melanin and carotenoid pathways (8,9), or can be caused by genes that affect signalling or cell type differentiation (as with chromatophore distribution in squamates (10)). The extensive set of artificial and domesticated genetic variants has been used repeatedly both for studies of genotype-to-phenotype relationship, and for understanding the genetics of disease states in humans (11).
Notably, several cases have been identified where a gene underlying an artificially selected variant is also responsible for natural variation in other populations or species. This is the case with BCO2, where an artificially selected protein coding variant in cattle affects milk colour (12), and a wild type regulatory variant in wall lizards causes changes to ventral scale colour (13). The gene Agouti has repeatedly been mapped in cases of pigment differences, both in cases of domestication [e.g. in the chicken (14), horse (15), and rabbit (16)], and in natural variation (e.g. in warblers (17), humans (18) and snowshoe hare (19)). This reuse of hotspot genes in wild and captive populations can even be observed at the within-species level; multiple separate variants in agouti have been linked to pigment variation in sheepboth in wild and captive populations, and caused by protein coding mutations, cis-regulatory mutations and copy number variation (20,21). As such, studies of domesticated variation can inform our understanding of natural variation too, both on macro-and micro-evolutionary scales As the primary selective force in captive populations is the eye of the selector, one may often observe variation that is not seen in nature because it is not optimally fit (22)(23)(24)(25). Fanciers and breeders have purposefully selected for variation that causes deleterious effects in combination with colour and pattern differences, applying a regime of selection that favours the phenotype of interest over fitness. This applies especially to distinctive colouration, which can be straightforward to observe and maintain in a captive stock. Examples include coat color variants for Merle dogs, where homozygotes for a retrotransposon insertion in the SILV gene have an increased risk of deafness and blindness (26,27), and in overo horses, where foals homozygous for a mutated Endothelin Receptor B (EDNRB) develop Lethal White Syndrome (28). "Lemon frost" geckos have been selected for their unique colour, but exhibit an increased risk of iridophoroma, the formation of tumours from iridophores (analogous to melanoma) (7). Such reduced fitness has also been identified in floriculture, where white petunias with mutations to the AN1 gene have deficient vacuole acidification and a weakened seed coat (29), though an artificially selected mutation to the same gene causing a similar effect in the morning glory does not appear to carry the same deleterious fitness effects (3). Whether they are the direct targets of selective breeding, or collateral effects of inbreeding and bottlenecks, deleterious mutations that would be purged by natural selection in the wild are common in captive populations (30).
The majority of work mapping artificially selected or domesticated variation in animals has taken place in vertebrates, with the notable exceptions of the economically important silkworm (31). Heliconius butterflies, a model system for the genetic study of color pattern adaptations in the wild (32,33), have also been maintained in captivity by butterfly breeders continuously since at least the mid-20th century, including at notable tourist attractions like Butterfly World in Florida. There, they have been selectively bred for hundreds of generations with special attention paid to pattern variations that deviate from phenotypes normally observed in the wild. Over an extended period of inbreeding, selection and occasional out-crossing, novel color variations have appeared in this captive population, including the 'Piano Keys' (PK) pattern, and a deleterious mutation that occurred within the PK genetic stock, here dubbed ivory (Fig 1).
The genetics of wing patterning have been extensively studied in Heliconius (34). Linkage and association mapping have pinpointed regulatory regions around a small toolkit of genes as being responsible for much of the pattern variation seen in wild populations, including the transcription factor optix (35)(36)(37)(38), the signaling ligand WntA (39,40), and cortex, a cdc20 homolog with a currently-unidentified function (41,42). We used whole genome resequencing, association mapping and autozygosity mapping to determine that the ivory mutation is caused by a large deletion at the patterning gene cortex, which likely occurred de novo in the captive population.

Stock history
The captive population of Heliconius melpomene was initially by JRG Turner and others from around Central and South America, and were transferred from his genetic research stocks at the University of Leeds to the former London Butterfly House (LBH) initiated by Clive Farrell 1981 as a multi-race hybrid population. They were then acquired from Tom Fox of LBH by R. Boender at the MetaScience Butterfly Farm in Florida about 1985. These hybrid stocks formed a core of Boender's inhouse hybrid Heliconius display stock when Butterfly World (BW) opened in 1988. Only one known introduction of wild-caught H. melpomene occurred to this stock, of H. m. cythera collected in the early 1980's by R. Boender and T. Emmel around Tinalandia, Ecuador. After this introduction, Boender began selection for a novel pattern that is termed "Piano Key" in this paper. The exact mix of races that were transferred from LBH to BW are not known, but contain pattern alleles that are characteristic of H. melpomene races of Suriname and the Guianas (LEG, personal obs). Occasional butterflies with more extensive regions of yellow or white scales were identified (Fig. 1A centre). Other than the wing pattern differences, both forms of butterflies exhibit typical feeding, mating, egg-laying and flight behaviours to other insectary-reared Heliconius (e.g. S1 Movie). However, the offspring of a mating of two pale H. m. BWPK will include completely white-winged and -bodied butterflies (Fig 1C). We dub these butterflies 'ivory' in reference to their emergence in the 'Piano Keys' stock, and inferred that the pale morph of H.m. BWPK represents a heterozygous state for a mutation that causes the ivory phenotype in the homozygous state.
The ivory butterflies have melanin pigments in their body cuticle and eyes, indicating that their capacity to synthesize and deposit melanin has not been perturbed. However, all scales on the wings and body are white or yellow, with no black or red scales, with the exception of a 1 mm patch of red at the base of the hindwing in some individuals. This is in stark contrast to wild-type or BWPK butterflies which have black scales covering most of the body. Unlike the other BWPK butterflies, ivory homozygote butterflies do not fly and have not been observed to successfully mate or lay eggs. The wings appear to be of a normal strength and sturdiness, but flight is weak and will not occur when prompted by dropping (S1 Movie).
In order to investigate the genetic basis of these wing patterns, we sequenced 37 butterflies from the UT Austin H. m. BWPK stock to an average coverage of 13 x, including 11 Dark BWPKs, 9 Pale BWPKs and 10 ivory butterflies ( Fig 1A).

Autozygosity mapping identifies associated SNPs
We used a combination of GWAS and patterns of SNP autozygosity to determine the locations of regions associated with Mendelian pattern variants ( Fig 2). As proof of principle, we mapped the Dennis pattern element which was previoulsy described as a cis-regulatory element of the gene optix. GWAS for presence vs absence of Dennis gave a narrow association peak centred near optix, in a region previously identified as associated with the Dennis pattern element (38) (S1 Fig). We then examined the inheritance patterns of SNPs, filtering for SNPs where butterflies with no Dennis element were homozygous for the reference allele, and individuals with the Dennis element were either heterozygous or homozygous for an alternate allele. In total, 3430 SNPs matched this pattern of inheritance, with 3425 in a cluster centred on the gene optix, indicating that autozygosity mapping is appropriate for mapping pattern elements in this data set.
GWAS for ivory gave a broad association peak on chromosome 15 ( Fig 2D). We expected dark morph BWPK butterflies to carry the reference allele, pale morphs to be heterozygous, and ivory morphs to be homozygous for a non-reference allele. We expected that at the causative locus, this allelic segregation pattern should be fixed in every individual. This was the case at just 131 SNPs in the whole genome, all within a 9 kb interval on chromosome 15, within the broad association peak (Fig 2D, blue marks). These SNPs sit within the first intron of the gene cortex. Cortex is a switch gene necessary for differentiation into black Type II and red Type III scales (41,42), and has previously been linked to melanic switches in many species, including in industrial melanism in the peppered moth (54,55).
We noted that in [ivory WT/-] butterflies, scales in red regions exhibited aberrant morphologies (Fig 2E-F). Granular clumps of red pigments could be observed in the body of some scales, whereas wild type scales are solidly-pigmented with no granularity. Additionally, we observed scales that were curled at the edges rather than flat. Similar scale phenotypes occur in the wings of cortex crispant butterflies (Fig 2E-F), thus supporting a role for cortex loss-of-function in the ivory colour phenotypes (42).

Ivory butterflies carry a large deletion including the cortex promoter
Many phenotypes involve structural genomic variation, both under artificial selection (56), and in natural adaptation (57,58). As such, we checked read depth across this region in all our sequenced samples. Immediately adjacent to the block of fixed SNPs identified by autozygosity mapping, we found a large region where read depth in [ivory -/-] butterflies dropped to zero, while in [ivory WT/-] it dropped by half. This indicated that a large deletion spans the first exon and promoter of cortex (Fig 3A).
To determine the precise breakpoints of this deletion, we de novo assembled short read data for all [ivory WT/WT] and [ivory -/-] individuals, and BLASTed the resulting contigs against the cortex region. This allowed us to recover individual contigs from some individuals that bridged the two ends of the deletion, giving a precise breakpoint at positions Hmel215003o:1494902-1573177, with a length of 78,275 bp relative to the reference genome ( Fig 3B).

The ivory deletion includes one of the two cortex transcription start sites
The ivory deletion contains the only annotated transcription start site (TSS) and promoter for cortex. While butterflies carrying this mutation can complete development, even with effects in scale development across the whole organism and the inability to fly, coding knockouts of cortex (including clonal G 0 crispants) have been observed to have high levels of embryonic lethality, with very few crispant clones observed (42).
The ability of ivory mutants to survive in spite of the loss of the promoter, as well as the presence of a long first intron in cortex, led us to hypothesise that there may be a second, alternate TSS. In order to determine if this is the case, we mapped RNAseq from H. melpomene for a variety of samples including larval and pupal wings (59), adult ovaries (43), and embryos, adult head and adult abdomen (60), and looked at splicing and alignment of the 5' end of the transcript. No expression was detected in adult head or abdomen, but in embryos, ovaries, and pupal wings, transcription initiates at position Hmel215003o:1424680 in the third annotated exon, which is adjacent to the coding sequence (the 'proximal promoter') ( Fig 4A, S2-3 Figs). Expression in larval wings initiates at the annotated TSS (the 'distal promoter'), and the exon with the proximal promoter is skipped. In pupal wings, transcription again initiated at the proximal promoter. This indicates that cortex has two alternative TSSs, a proximal promoter used in multiple tissues and a distal promoter specific to the larval wing. Only the distal promoter is included in the ivory deletion.

Mutagenesis of the distal promoter phenocopies cortex null effects
We reasoned that if the loss of the distal promoter causes the ivory phenotype, knocking out the promoter with CRISPR/Cas9 should phenocopy the effects of the ivory deletion. We generated G 0 mosaic knockouts (crispants) of the promoter in the related species Heliconius erato (Fig 4B-D). Resulting butterflies had extensive yellow/white clones on the wings, however several failed to emerge from the pupa, and a low survival rate and penetrance were observed ( Table 1), suggesting that the few wing crispants obtained are rare mosaic escapers of a loss-of-function experiment with lethal effects (61). Thus while the pleiotropic effects of deleting the TSS were reminiscent of the previously reported cortex protein coding knockouts, which also exhibited low survival and penetrance (42), this experiment did not fully recapitulate the ability of the ivory deletion allele to generate large wing clones. This difference in pleiotropy may be due to the use of H. erato in the CRISPR assay, and further experiments are needed to test the functionality of deleted elements within a H. melpomene stock. However, we conclude from these experiments that the disruption of 5' distal elements of cortex are necessary for normal color scale patterning, consistent with a causal role of the 78 kb deletion in underlying the ivory phenotypes.

Demographic origins
The H. m. BWPK stock was kept in insectaries for 30 years (c. 400 generations) before producing the ivory mutation. The precise history and ancestry of the stock is not recorded, but we do know that the original stock was generated with a mix of butterflies from multiple locations. after the addition of butterflies from the vicinity of Tinalandia, Ecuador in the 1980s. The first pale PK (i.e. heterozygotes for ivory) appeared in the PK culture at Butterfly World after 2013, and were shared with L.E.G. for study in 2014-15.
Whole genome PCA clustering of a geographic spread of samples of H. melpomene, as well as sister taxa H. cydno and H. timareta, replicates the geographic and species clustering previously described by Martin et al. (62), with the addition of a distinct "Atlantic" group within H. melpomene (S1 Table). H.m. BWPK cluster together near H. melpomene from the west of the Andes. Similarly, PCA of just the cortex locus, using many more individuals generated by selective sequencing by Moest et al (63) (S3 Table), also places BWPK closest to western H. melpomene.
Given the mixed ancestry of H. m BWPK, and the observation that ivory patterns emerged after introduction of butterflies from Ecuador, we examined heterogeneity in genome-wide ancestry by building phylogenetic trees in windows of 50

Discussion
Using whole genome sequencing of a set of related individuals, we were able to map the ivory mutation to a large deletion that removed the distal promoter of the wing patterning gene cortex. Cortex was first identified as a switch between melanic and non-melanic patterns in both Heliconius and the peppered moth Biston betularia, and has later been mapped as a switch between melanic and non-melanic scale types in a number of other species, including three other geometrid moths (55), and Papilio butterflies (64), and also has a role in wing pattern polyphenism in J. coenia (65). The list of genes that have been identified as the targets of selection in butterfly wing pattern variation includes transcription factors like Optix and Bab (66,67), signalling pathway components like WntA (68), or terminal effector genes like Yellow (69), all of which have molecular functions that support a role in developmental patterning or pigment synthesis (34). Cortex, on the other hand, has no clear functional association with colour pattern or cell differentiation, and yet has been mapped in a very broad phylogenetic spread of species, suggesting its function may be ancestral in the Lepidoptera. The gene does not appear to have a role in Drosophila wing development or patterning (41), limiting our ability to make inferences about its molecular function and interactions. In identifying this large deletion, this study provides novel insight into the genetics of this hotspot locus, and highlights the potential utility of looking at insectary mutants to understand butterfly wing development.
The loss of all black and red scales in the ivory mutant supports a model in which white/yellow scales are the 'default' state during scale cell differentiation, with scale cell precursors failing to differentiate into red or black cell types. As such, when comparing pattern homologies between butterflies, Heliconius white/yellow regions can be considered as the 'background', and black and red regions can be considered as 'pattern elements' layered on top of this background. This model was initially proposed based on observation and ultrastructure and pigmentation of scales (70), as well as on the analysis of pattern homologies of the Heliconiini (71).

Structural variation and mutations
Large deletions have repeatedly been observed in cases of both natural variation and domestication, with 29 deletions affecting regulatory regions which are larger than 1 kb listed on GepheBase (72). Recurrent deletions of a pitx1 enhancer in sticklebacks, up to 8 kb in length, have caused convergent pelvic reduction (73), and the deletion of a 60.7 kb regulatory region at the AR gene in humans, which removes enhancers present in chimpanzee and conserved in other mammals, leads to loss of penile spines (74). Similarly large deletions have also been confirmed in the genetic basis of domestication phenotypes, including a 44 kb deletion, also at pitx1, causing feathered feet in chickens (75), and a 141 kb deletion upstream of agouti/ASIP in Japanese quail (76).
The large deletion that causes ivory is a form of structural variation. Other structural variants affecting this locus have been found in wild populations of Heliconius; multiple inversions have formed a wing pattern supergene in H. numata, and an independent inversion, similar in size and position, has been found in multiple species of the erato clade (77,78). It is possible that this genomic region will prove to be prone to a higher-than-average level of structural variation, or, conversely, that the region is more tolerant to structural variation than other parts of the genome.

Regulatory consequences of the ivory mutation
Large deletions have the potential to remove a large section of cis-regulatory sequence, as well as transcribed sequences. The ivory deletion causes the loss of one of two promoters. As many as 40% of developmentally expressed genes in Drosophila have two promoters which can cause distinct regulatory programs (79), and multiple promoters are also common in human genes (80). This both increases the complexity of gene regulatory interactions and increases the number of transcript isoforms per gene -for example, an alternate promoter in the Drosophila gene Zfh1 creates an isoform that has a shorter 5' UTR which is missing miRNA seed sites on its 5'UTR, permitting differential degradation of the mRNA by miR-8 (81). We determined that the ivory deletion contains one annotated promoter of cortex, but that in some contexts during development, another transcription start site is used, much closer to the translation start site. Alternate promoter usage is likely to be common and widespread in animals.
We expect that the use of two separate, context-specific promoters at cortex will create a hierarchical regulatory organisation, where CREs used in different tissues or at different times can loop to different promoters. Additionally, transcripts from each of the two promoters contain a different arrangement of 5' non-coding exons, giving them different UTRs, possibly leading to differential translational regulation or degradation of the mRNA.
The ivory deletion does not include protein-coding sequence, which led us to hypothesise that by removing just one promoter, cortex would still be expressed in tissues that use the alternate promoter, and that therefore ivory mutants would bypass the pleiotropic, highly lethal effects observed in CRISPR experiments that target coding exons of cortex (42). However, crispants with deletions in the distal promoter in H. erato did not appear to be free of these pleiotropic effects, and exhibited high lethality and comparatively small clone sizes, dissimilar to the H. melpomene [ivory -/-] mutants. This may be due to inherent differences in the regulatory function of the distal TSS between these two species, or to the existence of other functional sequences within the deletion which might be necessary for generating the pale phenotypes. Specifically, the 78-kb deletion removes part of the 3'UTR of the neighbouring gene parn as well as two miRNAs (82), and it is also possible that a large deletion affects other cortex cis-regulatory elemnts or affect 3D chromatin interactions on a large portion of the chromosome. Additionally, the ivory butterflies carry a set of fixed SNPs adjacent to the deletion, which could contribute to the phenotype. In summary, the complete loss of black and red scales caused by the ivory deletion and by CRISPR-Cas9 mutagenesis in two Heliconius species indicates that either the distal promoter or another functional sequence within the 5' non-coding region of cortex are necessary for black and red scale development.

Heterozygote advantage of deleterious mutations in captivity
The cortex mutant phenotype has been artificially selected by a breeder in the heterozygous state, when unusually widespread yellow and white patterns emerged in the stock. The heterozygote carriers of the ivory mutation ([ivory WT/-]) do not appear to have any reduction in fitness or vigor. In contrast, homozygous ivory butterflies of either sex are unable to fly, limiting their ability to feed and reproduce -we have not observed any successful matings involving an [ivory -/-] butterfly. Those recessive effects make the mutant allele unfit for breeding in the homozygous state, resulting in a form of heterozygote advantage where mutant heterozygotes are selectively bred in spite of inviable or undesirable effects in homozygotes (83,84). We compiled similar situations from the literature using a database of gene-to-phenotype relationships (72), and found 38 analogous gene-to-trait relationships spanning domesticated mammals and birds (Table 2). Large structural variations (SVs) such as the ivory deletion accounted for 9 out of 46 derived alleles in this dataset, suggesting that the recessive deleterious effects of macromutations occasionally provide heterozygous states of interest to artificial selection. Of note, 17 out the 38 cases of captive heterozygote advantage involve selection for depigmentation traits, highlighting the trend among breeders and fanciers to select for conspicuous variants (Table 2). Finally, cases of heterozygous advantage also occur in the wild, such as in ruff birds where a large inversion haplotype provides male coloration and behavioral traits that are maintained by sexual selection in spite of being recessive lethal (85). In wild populations of the polymorphic butterfly Heliconius numata, cortex itself is situated in the center of an inversion polymorphism under balancing selection (86). Further work will be needed to determine the sub-gene level elements that are within the ivory deletion and mediate homozygous inviability. * See Gephebase (www.gephebase.org) for references. The corresponding entries can be retrieved using a search for "@HeterozygoteAdvantage". ** recommendations to outcross are recent in the affected stock. *** MSTN loss-of-function alleles exist in other bred species. Homozygotes sometimes requires assisted birthing, but are viable and artificially selected.

Conclusion
By using autozygosity mapping and association on a comparatively small pool of individuals, we were able to identify a structural mutation involved with butterfly wing pattern. This approach may prove fruitful in other studies of butterfly wing patterning or in the identification of de novo mutants, especially if combined with recent developments in whole genome sequencing from dried museum specimens (87,88). This could allow the mapping of other cases like Hindsight in Junonia coenia, or pseudozorro in Parnassius apollo (89,90). The identification and characterisation of spontaneous mutants has provided very valuable insights into the genetics of development in model organisms. Further whole genome sequencing of deleterious mutations in butterflies will help to identify parts of the genome that are functionally required for normal development, which will assist in a more complete understanding of their evolution and development.

Data availability
Whole genome sequences are accessible on the NCBI SRA under the Bioproject accession PRJNA610063.