Butterfly eyespots evolved via co-option of the antennal gene-regulatory network

Butterfly eyespots are beautiful novel traits with an unknown developmental origin. Here we show that eyespots likely originated via co-option of the antennal gene-regulatory network (GRN) to novel locations on the wing. Using comparative transcriptome analysis, we show that eyespots cluster with antennae relative to multiple other tissues. Furthermore, three genes essential for eyespot development (Distal-less (Dll), spalt (sal), and Antennapedia (Antp)) share similar regulatory connections as those observed in the antennal GRN. CRISPR knockout of cis-regulatory elements (CREs) for Dll and sal led to the loss of eyespots and antennae, and also legs and wings, demonstrating that these CREs are highly pleiotropic. We conclude that eyespots likely re-used the ancient antennal GRN, a network previously implicated also in the development of legs and wings.


Two pleiotropic CREs reveal a shared network between eyespots, antennae, and other 149 traits 150
Evidence of GRN co-option is bolstered by the identification of shared cis-regulatory elements 151 (CREs) driving the expression of genes common to both the ancestral and the novel trait 152 (eyespots). To identify putative CREs specific to wing tissue with eyespots, we used 153 contain any repetitive elements, we used CRISPR-Cas9 to disrupt its function. We designed 163 four guide RNAs along its 319 bp length to maximize the likelihood of its disruption (Fig. 3A,  164 Fig. S15). We obtained a variety of different phenotypes that were also observed when 165 targeting exons of the Dll gene using CRISPR (6): several caterpillars showed a missing or 166 necrotic thoracic leg (Fig. 3C, Fig. S16), adults were missing legs and even a hindwing (Fig.  167 3D-E), adults lacked eyespots (Fig. 3F-H), adults showed truncated antennae, pigmentation 168 defects, and loss of wing scales (Fig. 3I-J and Fig. S16-S19, Table S1), all having deletions 169 within the CRE of various sizes (Fig. S16). These findings confirm that the Dll319 CRE is 170 pleiotropic and further suggest that eyespots use the same GRN as antennae in addition to legs 171 and wings. 172 In order to confirm that the Dll319 contains a functional and pleiotropic CRE, we cloned a 917 174 bp region containing this CRE into a piggyBac-based reporter construct (22) and evaluated its 175 CRE activity in transgenic butterflies. We observed that embryos expressed the reporter gene 176 (EGFP) in antennae, mouthparts, as well as thoracic limbs, indicating that this CRE is sufficient 177 to drive gene expression both in antennae and legs ( Fig. 3K and S20). Unfortunately, the loss 178 of this line precluded us from visualizing EGFP expression in eyespots. Using this same cloned 179 region containing the Dll319 CRE, we also observed pleiotropic CRE activity in antennae, developmental stages using CRISPR-Cas9 (Fig. 4B). We obtained aberrations in caterpillar 195 horns, adult antennae, leg and chevron patterns, as well as missing eyespots and a missing 196 wing ( Fig. 4C-4F, Fig. S22

Acknowledgement: 388
We would like to thank NUS HPC for allowing access to perform the bioinformatic analysis. Butterfly eyespots evolved via co-option of the antennal gene-regulatory network 518 519 520 Suriya Narayanan Murugesan 1, *, †, Heidi Connahs 1, *, †, Yuji Matsuoka 1, †, Mainak das Gupta 1 , 521 Manizah Huq 1 , V Gowri 1 , Sarah Monroe 1 , Kevin D. wing was cut in half and the distal region was used for the library) and 2 hindwing libraries, 568 using both the proximal and distal regions of the wing. All FAIRE-enriched libraries were 569 prepared from 7-8 pooled wing tissues. Libraries were prepared by Genotypic Technology 570 (India) as 75bp pair-end reads and sequenced, using Illumina NextSeq. Raw reads were quality-571 checked and reads with phred scores >30 were retained for downstream analyses. converted to BAM files, using SAMtools-0.1.7a(30). The BAM files were converted to sorted 576 BAM, followed by removal of PCR duplicates. The final BAM files were converted to 577 BEDgraph files, using BEDtools-2.14.3(31). Peaks were called with MACS2 software(32), 578 using the aligned enriched and input (control) files with the q-value (minimum FDR) cutoff to 579 call significant regions. The command bdgcmp script was used on the enriched and input 580 BEDgraph files to generate fold-enrichment and log likelihood scores. This command also 581 removed noise from the enriched sample relative to the control. The BEDgraph files were 582 converted to BigWig files for visualization in Integrative genomic viewer (IGV). 583 584 Identifying cis-regulatory elements (CREs) for CRISPR-Cas9 experiments 585 The FAIRE-seq data were visualized using IGV. All 18 candidate CREs identified around the 586 Dll locus were blasted against the B. anynana genome in LepBase to verify whether they were 587 unique in the genome. Most of the candidate CREs were not unique and had multiple hits 588 throughout the genome. One of the unique regions, the CRE Dll319, was selected as a suitable 589 target for CRISPR knock-out. 590 591 Single guide RNA design and production 592 Single guide RNA (sgRNA) target sequences for sal were selected based on their GC content 593 (around 60%) and the number of mismatch sequences relative to other sequences in the genome 594 (> 3 sites). In addition, we selected target sequences that started with a guanidine for subsequent 595 in vitro transcription by T7 RNA polymerase. sgRNA for the Dll319 CRE were designed using 596 CRISPR Direct(33), corresponding to GGN20NGG. We designed 4 guides spanning the length 597 of the CRE (Fig. 3B, Fig. S15, and Table S2). Two guides were designed targeting the sal740 598 region (Fig. S24, and Table S2). The sgRNA templates were created by a PCR reaction with 599 overlapping primers, using Q5 polymerase (New England Biolabs). Constructs were 600 transcribed using T7 polymerase and (10X) transcription buffer (New England Biolabs), 601 RNAse inhibitor (Ribolock), NTPs (10 mM) and 600 ng of the PCR template. The final sample 602 volume was 40 μL. Samples were incubated for 16 h at 37°C and then treated with 2 μL of 603 DNAse 1 at 37°C for 15 minutes. Samples were purified by ethanol precipitation, and RNA 604 size and integrity were confirmed by gel electrophoresis.

606
Cas9 mRNA production 607 The plasmid pT3TS-nCas9n (Addgene) was linearized with XbaⅠ (NEB) and purified by 608 phenol/chloroform purification and ethanol precipitation. pT3TS-nCas9n was a gift from 609 Wenbiao Chen (Addgene plasmid # 46757; http://n2t.net/addgene:46757; 610 RRID:Addgene_46757). In vitro transcription of mRNA was performed using the 611 mMESSAGE mMACHINE T3 kit (Ambion). One microgram of linearized plasmid was used 612 as a template, and a poly(A) tail was added to the synthesized mRNA by using the Poly(A) 613 Tailing Kit (Thermo Fisher). The A-tailed RNA was purified by lithium-chloride precipitation 614 and then dissolved in RNase-free water and stored at -80°C. The Cas9 transcript was used for 615 producing sal crispants, and for the analysis of regulatory interactions among Dll, Antp, and 616 sal.

618
In vitro cleavage assay for the Dll319 CRE 619 The sgRNAs were tested using an in vitro cleavage assay. Wild-type genomic DNA was 620 amplified using primers that were at least 200 bp from the sgRNA sites. sgRNA (200 ng/μL 621 per guide), Cas9 protein (800 ng/μL) (stored in a buffer containing 300 mM NaCl, 0.1 mM 622 EDTA, 1 mM DTT, 10 mM Tris-HCl, 50% glycerol pH 7.4 at 25°C), NEB buffer 3 (1 μL) and 623 BSA (1 μL) were brought to a final volume of 10 μL with nuclease-free water and incubated 624 at 37°C. After 15 minutes of incubation, the purified amplicon (100 ng) was added to the 625 sample, which was then incubated for an additional 1-2 h at 37°C. The entire reaction volume 626 was analyzed on a 1%-agarose gel. Cas9 protein was purchased from NEB EnGen Cas9 NLS. 627 The cleavage assay confirmed that each guide successfully cleaved the PCR amplicon.

629
Embryo injections 630 Wild-type lab populations of B. anynana adults were provided with corn plants to lay eggs. 631 The eggs were collected within 1.5 h of oviposition and placed onto 1-mm-wide strips of 632 double-sided tape in plastic Petri dishes (90 mm). Cas9 protein (final concentration 800 ng/μL) 633 and sgRNA (final concentration 200 ng/μL per guide) for all 4 guides were prepared in a total 634 volume of 10 µL and incubated for 15 min at 37°C prior to injection along with 0.5 µL of food 635 dye to improve visualization of the injected sample into the embryos. For sal crispants, Cas9 636 mRNA (500 μg/μL final concentration) and sgRNA (500 μg/μL final concentration) were 637 injected along with one tenth of the volume of food dye. For sal740 CRE, eggs were injected 638 with the mix of Cas9 protein (final concentration 800 ng/μL) and sgRNA (final concentration 639 400 ng/μL per guide). The injection mixture was kept on ice after the incubation and prior to 640 injection. Embryo injections were carried out by nitrogen-driven injections through glass 641 capillary needles. Injected eggs were stored in closed Petri dishes containing cotton balls that 642 were dampened daily to maintain humidity. The hatched larvae were reared in small paper cups 643 for 1 week and then moved to corn plants to complete their development. Tables S1, S3 and 644 S4 summarize the injection results.

646
In vivo cleavage assay and genotyping of sal crispants 647 Genomic DNA was extracted with a SDS-based method from a pool of 5 injected embryos that 648 did not hatch. About 250 bp of sequence spanning the target sequence was amplified with 649 PCRBIO Taq Mix Red (PCRBIOSYSTEMS), and PCR conditions were optimized until there 650 were no smears, primer dimers, or extra bands. Primers are listed in Table S2. The PCR 651 products were purified with the Gene JET PCR purification kit (Thermo Fisher). Two hundred 652 nanograms of PCR product were denatured and re-annealed in 10x NEB2 buffer. batch of caterpillars, 19 reached adulthood (10 females and 9 males). These butterflies were 704 evenly distributed into 4 cages (~5/cage) and placed with respective wild-type males and 705 females for breeding. We were unable to observe any dsRed signal (the positive marker of 706 transgenesis driven by the 3xP3 promoter) in the eyes of the caterpillars from the F1 or F2 707 generation, despite ubiquitous dsRed signal in some 1 st -intar larvae (only) of the F1 generation, 708 which were used later for outcrossing to wild-type individuals. This ubiquitous signal was not 709 observed again in the offspring of these larvae. We collected eggs from the F3 generation and 710 dissected some embryos for EGFP antibody staining. Two out of the four dissected embryos 711 did show expression of EGFP driven by the Dll319 CRE in the embryonic antennae, 712 mouthparts, thoracic legs and pleuropodia (Fig. 3K, Fig. S10). Subsequent hemolymph PCR 713 genetic screening in individuals of the 4 th generation failed to identify additional positive 714 individuals and the line was lost.

716
Drosophila enhancer reporter assay 717 The same 917 bp sequence that contained The Dll319 CRE was directionally cloned into 718 pENTR-D, then GATEWAY cloned into the piggyPhiGUGd, the Gal4-delta version of the 719 previously reported piggyBac-based reporter construct(22). piggyPhiGUGd also has an attB 720 site, allowing phiC31 transgenesis. The detail of piggyPhiGUE will be published elsewhere 721 (Deem and Tomoyasu, unpublished). For Drosophila transgenesis, the piggyPhiGUGd Dll319 722 CRE construct was transformed into the attP2 site (68A4) through phiC31 integrase-mediated 723 transgenesis system with EGFP as a visible marker (BestGene Drosophila transgenic service). 724 Established transgenic flies were crossed with G-TRACE(35) to visualize the tissues with CRE 725 activities.

727
Antibody staining of B. anynana embryos and wings 728 Two-day-old embryos, as well as fifth-instar larval and pupal wing tissues were dissected in 729 PBS buffer under the microscope. The samples were fixed in 4% formaldehyde/Fix buffer (0.1 730 M PIPES pH 6.9, 1 mM EGTA pH 6.9, 1.0% Triton x-100, 2 mM MgSO4) for 30 min on ice. 731 The samples were washed with 0.02% PBSTx (PBS + Triton x-100) 3 times every 10 min, and 732 then blocked in 5% BSA/PBSTx for 1 h. The samples were then incubated in 5% BSA/PBSTx 733 with the primary antibody, and incubated at 4°C overnight. As primary antibodies, we used a 734 rabbit polyclonal anti-Dll antibody (at 1:200, a gift from Grace Boekhoff-Falk), a mouse 735 monoclonal anti-Antp 4C3 antibody (at 1:200; Developmental Studies Hybridoma Bank), a 736 rabbit anti-Sal antibody (at 1:20,000 for wings and pupal tissues, and 1:2,000 for embryos; de 737 Celis et al., 1999), and a rabbit anti-EGFP antibody (at 1:200; Abcam ab290) for the transgenic 738 embryos at 24h (n=4) and wt controls. For double staining, we added two primary antibodies 739 to the same tube. The wings were washed with PBSTx 3 times every 10 min. Then, we replaced 740 the PBSTx with 5% BSA/PBSTx to block for 1 hour, followed by the incubation with the 741 secondary antibody (1:200) in 5% BSA/PBSTx at 4°C for 2 h. The wings were washed with 742 PBSTx 3 times every 10 min, followed by mounting the wings in ProLong Gold Antifade 743 Mountant (Thermo Fisher). The images were taken under an Olympus FV3000 Confocal Laser 744 Scanning Microscope.

746
Sample collection and library preparation for RNA sequencing 747 In order to identify gene expression patterns specific to eyespot formation on the developing 748 wings, we extracted RNA from sixteen different tissue types at four developmental time points: 749 3-4-hour-old, 12-13-hour-old, and 24-25-hour-old embryos; T1 legs, prolegs, forewings, and 750 horns from wandering caterpillars; T1 legs, antennae, forewings, forewing margins, eyes, 751 eyespots, and two control tissues adjacent to eyespot centers from 3-h-old pupae (Fig. 1A). For 752 wing wounding experiments, we poked one wing between 17 to 18 h after pupation in two 753 different places in the M3 sector, using a fine tungsten needle with a diameter of 0.25 mm and 754 0.001 mm at the tip (FST-10130-10). We collected the wings 6 hours later, which corresponds 755 to 23-24 h after pupation (Monteiro et al 2006). We performed the experiments with four 756 biological replicates for each tissue type with 10 to 25 female individuals in each replicate 757 (both left and right tissues were used, except for the wounded pupal wings, where a single wing 758 was used) (Table S5). Total RNA was extracted in 70 µL of nuclease-free water, using Qiagen 759 RNA Plus Mini Kit. RNA quantity and integrity were measured using a Nanodrop and an RNA 760 Bleach gel (Aranda et al 2013). RNA libraries were prepared, using the Truseq stranded mRNA 761 kit from Illumina. Forty million reads were sequenced for each replicate, using Novoseq 6000 762 with 150bp paired-end and an average insert size of 250-300 bp. Library preparation and 763 sequencing were carried out at AIT Novogene, Singapore. In order to avoid batch effects, we 764 randomized the sample extraction and RNA isolation, such that two replicates of the same 765 group were never extracted at same time.

767
RNA-seq analysis 768 The raw RNA-seq data were quality-controlled and filtered. Adapter sequences and reads with 769 low quality (less than Q30) were trimmed, using bbduk scripts (ktrim=r, k=23, mink=11, 770 hdist=1, tpe, tbo, qtrim=rl, trimq=30, minlen=40). In order to remove any bacterial 771 contamination in the samples, we used the bbsplit script, which is a part of the bbmap tools 772 (36). All bacterial genomes were downloaded from NCBI (last downloaded in June 2018), and 773 the reads were mapped to the bacterial genomes, using bbmap. Only reads whose pairs also 774 passed through the filter were further analyzed. To remove any ribosomal RNA sequences from 775 the RNA-seq data, the reads were aligned to the eukaryotic rRNA database available in 776 sortmeRNA (37). The processed reads from different samples were then mapped to the BaGv2 777 genome, using hisat2 (38) (mapping statistics in Table S6), resulting in bam files that were 778 sorted by genomic positions, using samtools (30). They were used as inputs in StringTie (38) 779 to create the initial transcriptome assembly with 71,042 transcripts, which was used to annotate 780 the genome using Maker v.3 (39), resulting in 18,196 genes with 29,389 transcripts. 781 782

RNA-seq differentially expressed (DE) gene analysis 783
A read count matrix of the annotated genes was obtained for the samples using StringTie (38). 784 We used the GO terms to filter out any ribosomal genes before obtaining the read counts. This 785 approach led to the removal of 496 genes to a final set of 17,700 genes, which was used 786 throughout the analysis. Correlations between the replicate samples was analyzed using 787 DESeq2 (8) with a sample distance matrix. One of the antennal samples was removed due to 788 its poor correlation with its other biological replicates. The remaining samples were used for 789 the downstream analyses (Fig. S25).

791
Identifying eyespot-specific DE genes 792 To identify eyespot-specific genes, a pairwise DE analysis was performed between eyespot and 793 control adjacent tissues, Nes1 and Nes2, using DESeq2 (Fig. 1A, Fig. S2). Common genes 794 upregulated and downregulated between eyespot vs. Nes1 and eyespot vs. Nes2 with an 795 adjusted P-value (padj) of 0.05 were chosen as eyespot-specific DE genes (Spreadsheet S1). 796 797 RNA hierarchical sample clustering 798 In order to identify the tissue with the closest gene expression profile to eyespot, we used all 799 tissue samples except the eyespot control tissue samples. DE analysis between the multiple 800 tissues was performed, using run_DE_analysis.pl script provided in Trinity tool, using 801 DESeq2 as the method of choice for this analysis (40). Hierarchical clustering was performed 802 for the different tissues, using genes that showed a log2fold change of |2| and padj value of 803 0.001, as in Fisher et al (2020) (41). Clustering was performed using an Euclidean distance 804 matrix derived, using the DE genes for the tissues with the hclust function in R(42). The pvclust 805 package(9) in R was used to calculate the uncertainty in the hierarchical clustering with a 1000 806 bootstrap value. 807 808 ATAC-seq library preparation 809 We prepared ATAC libraries for the same set of tissues as we did for the RNA-seq experiment, 810 except for the eyespot control tissues (Table S7). Library preparation failed for a few groups 811 leading to 2 to 4 biological replicates per group. Tissues were collected, flash-frozen in liquid 812 nitrogen and stored in -80֯ C, before we extracted nuclei and prepared the libraries. We used 10 813 to 25 individuals and approximately 80,000 nuclei per replicate. Libraries were prepared as 814 described in the Omni-ATAC protocol (43) with slight modifications. Individual tissues 815 extracted at different time periods during the process were randomized and pooled into each 816 replicate before extracting the nuclei. The tissues were thawed and homogenized in 2 mL of 817 ice cold 1X homogenization buffer (HB) in a 2-mL-glass douncer. Homogenization was 818 performed by 10 strokes with pestle A, followed by 15 strokes with pestle B. The homogenized 819 mixture was left on ice for 2 min before filtering it through a 100-µm-nylon mesh filter into a 820 DNA "low bind" 2-mL Eppendorf tube (Z666556-250EA and further PCR amplified for 10 cycles. The PCR products were purified, using a Zymo-DNA 844 Clean & Concentrator-5 kit, and the DNA fragments were size-selected between 50 -1500 bp, 845 using the ProNex Size-Selective Purification System (NG2002) from Promega. The samples 846 were sequenced, using Novoseq 6000 with an average read depth of 30 million and 2x50 bp 847 paired end reads by AIT Novogene, Singapore. 848 849 850 ATAC-seq peak calling 851 ATAC reads were processed, using bbduk scripts from bbmap tools, to remove any adapters. 852 The reads were mapped to the BaGv2 genome, using bowtie with the -x 1500 and -m1 853 parameters. Only reads with insert sizes of 1500 bp or less and only those mapping to a unique 854 region of the genome were mapped. Reads mapped to the mitochondrial genome were 855 removed, using samtools idxstats, and reads marked for PCR duplicates were also removed, 856 using GATK Markduplicates. We kept only paired-end mapped reads with a phred quality 857 score of Q20 and above for downstream analysis. Because the Tn5 transposase binds to DNA 858 as a dimer and inserts adapters of 9 bp in length at the insertion sites, the start sites of the 859 mapped reads were adjusted to an offset of +4 bp in the forward strand and -5 bp in the reverse 860 strand. The bam files were converted to bed files, using Bedtools (31), and we used F-Seq (44) 861 to call peaks for each sample. Bedtools intersect was used to identify the common set of peaks 862 for each tissue type. Peaks from all samples were merged if they were separated by 50 bp, using The distal wing region was broadly mutated for Sal activity, but Dll expression was not 1133 affected. Scale 50 μm. 1134 1135