Luciferase of the Japanese syllid polychaete Odontosyllis umdecimdonta

Odontosyllis undecimdonta is a marine syllid polychaete that produces bright internal and exuded bioluminescence. Despite over fifty years of biochemical investigation into Odontosyllis bioluminescence, the light-emitting small molecule substrate and catalyzing luciferase protein have remained a mystery. Here we describe the discovery of a bioluminescent protein fraction from O. undecimdonta, the identification of the luciferase using peptide and RNA sequencing, and the in vitro reconstruction of the bioluminescence reaction using highly purified O. undecimdonta luciferin and recombinant luciferase. Lastly, we found no identifiably homologous proteins in publicly available datasets. This suggests that the syllid polychaetes contain an evolutionarily unique luciferase among all characterized luminous taxa. 3 Highlights The polychaete O. undecimdonta uses a luciferin-luciferase bioluminescence system O. undecimdonta bioluminescence does not require additional cofactors The luciferase of the Japanese fireworm is 329 amino acids long Recombinant luciferase is not secreted when expressed in human cells Exogenous luciferin does not seem to penetrate cell membranes-only lysate luminesces The luciferase transcript is supported by full-length cDNA reads with 5’ and 3’ UTR


Introduction
Odontosyllis is a widely distributed genus of marine syllid polychaete worms that are noted for their striking bioluminescent courtship displays [1][2][3][4][5] . The bioluminescence (BL) of Odontosyllis is a luciferin-luciferase system [6] , but the structure of the luciferin and the luciferase protein remain unknown despite several biochemical studies following the first in 1931 by Harvey [6][7][8][9][10][11] . More broadly, to date the enzyme sequences and luciferin structures remain a mystery for all polychaete species in the thirteen families containing luminous species [12] .
Previous studies of the Odontosyllis bioluminescence system generated conflicting results regarding whether the system is a soluble oxygen-dependent luciferin-luciferase reaction [8,9] , or is a photoprotein system in which the light-emitting small molecule substrate is covalently bound to the enzyme [11] . The above studies used a different Odontosyllis species, and the different colors of aqueous extracts identified from those species make it unclear whether there are multiple bioluminescent chemistries within Odontosyllis . However, both species have the same behavior of secreting luminescence during mating [1,4] , so both species presumably share a homologous bioluminescent system.
Odontosyllis undecimdonta is a species found in Toyama Bay, Japan which engages in bioluminescent surface courtship displays around the first new moon in October [13] . Recently a protein-coding sequence from O. undecimdonta was patented that produces a recombinant protein with luminescence activity similar to that of crude worm extract mixed with crude luciferin isolate (WO2017155036A1).
Here, we describe the identification, cloning, and characterization of the O. undecimdonta luciferase. In addition, our results suggest that the O. undecimdonta luminescence system is a luciferase-luciferin type without requisite cofactors, despite reports of magnesium ions as a necessary cofactor [14] .

Specimen Collection
Professor S. Inoue provided lyophilized O. undecimdonta worms collected in 1993 to develop the protein purification strategy [15] . The final specimens used in this study for protein purification, MS transcript identification, and nucleic acid purification were collected on October 06, 2016 in Toyama Prefecture Japan, Namerikawa City. At dusk, Odontosyllis worms were attracted to a handheld light at the surface and collected with a hand dip net. Worms were individually preserved in Invitrogen RNAlater or lyophilized for later analysis.

DNA and RNA isolation
Methods for DNA and RNA isolation, as well as construction of the RNA-seq and genomic DNA libraries are as described in the Supplementary Information. Briefly, the O. undecimdonta transcriptome was assembled using 32,457,166 Illumina 2x150 read pairs and 343,752 Oxford Nanopore long reads using the Trinity assembler [16] . DNA from a single O. undecimdonta specimen was used to prepare both a 10X Genomics chromium library [17] and a PCR-free library. All sequencing reads are available to download from the European Nucleotide Archive under project PRJEB26709. Individual luciferase transcripts are available at NCBI accession numbers MH350412 and MH350413.

Protein extraction from biomaterial
Five ml of phosphate buffer (5 mM sodium phosphate buffer, pH 7.4) was added to 150 mg of lyophilized worms. Then this mixture was dropped in to the liquid nitrogen, using a 1 ml pipette, to create small drops of frozen material. These small ice drops were ground in a mortar. Frozen powder was added to 10 ml of phosphate buffer (5 mM sodium phosphate buffer, pH 7.4) and incubated 40 min on an ice bath with stirring. After incubation this solution was centrifuged at 40000 g (4°C) for 40 min. The supernatant, containing luciferase, was then collected and used for further purification by anion exchange chromatography.

Ultrafiltration and concentration.
To discard additional proteins from the luciferase-containing fractions the ultrafiltration procedure was used. First, the active fraction was filtered on a 50 kDa Amicon ® Ultra centrifugal filter unit (Merck Millipore, Germany). BL activity was measured for the concentrated retentate and the permeate. We found that only the permeate was bioluminescent. The bioluminescent permeate was then concentrated on 30 kDa Amicon ® Ultra centrifugal filter unit (Merck Millipore, Germany). The resulting retentate possessed BL activity while the permeate did not.
Thus this concentrated luciferase sample was used for size exclusion chromatography.

Size exclusion chromatography.
The bioluminescent retentate from ultrafiltration was applied to a Superdex 200 column (Phenomenex, USA) on a Shimadzu chromatography system (Shimadzu, Japan). The loaded column was washed with 5 mM sodium phosphate buffer, 150 mM NaCl, pH 7.4 at rate of 0.4 ml/min. During separation 0.5 ml fractions were collected. The solvent, fractions, and column were maintained at 4°C. BL-active fractions were used in the subsequent gel electrophoresis experiments.

5.4
Denaturing polyacrylamide gel electrophoresis and amino acid sequence analysis.
SDS-PAGE of the BL-active fractions was performed using a 10-25% gradient gel according to [18] . Gel staining was done according to the silver staining protocol from [19] , or using a standard Coomassie G250 staining protocol. Protein bands were excised from the gel and subjected to in-gel trypsinolysis [20] . LC-MS was performed on the Ultimate 3000 Nano LC System (Thermo Fisher Scientific), connected to a Q Exactive HF mass spectrometer (Thermo Fisher Scientific). For data analysis, Mascot software (Matrix Science) with the O. undecimdonta transcriptome as a reference was used.

Molecular cloning
Four Odontosyllis luciferase candidate genes were codon-optimized for expression in mammalian cells, domesticated for compatibility with MoClo assembly [21] and ordered from a commercial supplier (Twist Biosciences, USA) as linear dsDNA fragments. Molecular cloning is described in detail in the Supplementary Materials.

In vitro bioluminescence assay
The reaction was monitored with a custom-made luminometer Oberon-K (Krasnoyarsk, Russia) at room temperature. For each measurement 100 μl of reaction mix (10 mM sodium phosphate buffer, 150 mM NaCl, pH 7.4, 2 μl of luciferase fraction, 2 μl of highly purified luciferin [22] were used. In experiments with mammalian cells lysates, the same amount of cells was used for each clone in each bioluminescence analysis to make results comparable. The involvement of additional cofactors in the O. undecimdonta bioluminescence reaction was tested using an in vitro assay with only purified luciferase and highly purified luciferin. Since previous studies suggest the involvement of Mg 2+ in the Odontosyllis bioluminescence reaction (optimum conc is 30 mM; [14] ), we also testing the in vitro bioluminescence assay with 30 mM-60 mM Mg 2+ with cell lysate.

Protein structure and homology analysis
HMMER and the BLAST suite were used to predict structural domains and interspecies homology of transcripts that produced bioluminescence [23][24][25] . We also used Phobius and SignalP to detect signal peptides and transmembrane domains of the same transcripts [26,27] . Lastly, we used the I-TASSER server for structural prediction [28] . See the supplementary materials for a detailed description of the search for homologous sequences.

Results
The isolation and purification of O. undecimdonta luciferase required ion exchange chromatography, size exclusion chromatography, and ultrafiltration. (Fig. 1). The presence of luciferase in samples was controlled by an in vitro BL assay for all stages of purification. Several bands that corresponded to BL activity in the size exclusion chromatography fractions were identified by polyacrylamide gel electrophoresis (Fig. 1C). These bands were excised from gel and were identified by The transcriptome assembled with Illumina paired-end reads and ONT 2D reads extracted with poretools "fwd" parameter yielded 256,027 transcripts and a median transcript length of 737 base pairs. Four transcripts were identified as potential luciferases ( Fig. 2 ) based on coverage and quantity of MS matches. Three long transcripts c9g1i2 (990 bp), c9g1i3 (993 bp), c9g1i6 (990 bp) had c-terminal amino acid variation. Transcript c9g1i5 (711 bp) was homologous to the aforementioned three transcripts but lacked 118 n-terminal amino acids. These four transcripts were verified by presence of two ONT whole-cDNA reads that spanned from the 5' UTR to the 3'UTR. Non-spliced mapping of an Illumina paired-end polyA RNA-seq library also confirmed that the longest of the four transcripts were expressed. The BLOSUM-alignment for the protein products of these four transcripts were identical at 92% of sites.
All four candidate DNA sequences were synthesized as linear dsDNA fragments and cloned using MoClo technology. Then, mammalian cells were transfected by resulting constructs. Mammalian cell culture lysate from two of the above four candidates produced bioluminescence when assayed with purified luciferin (c9g1i2 and c9g1i6) ( Fig. 3A ). The bioluminescence spectra of positive clones were similar to that of native O. undecimdonta worms ( Fig 3B). However, cell culture lysate from expressed transcripts c9g1i3 and c9g1i5 were not luminous.
None of the non-lysed cell cultures produced luminescence when purified luciferin was applied.
The protein product of c9g1i2 is 329 amino acids long. The signal peptide prediction software Phobius had a posterior probability of 1 that the first 21 c-terminal peptides are a signal peptide. The SignalP service has a probability of 0.28 that the first 21 amino acids are signal peptides. The only HMMER and PHMMER results for this protein product were an insignificant match (E-value = 0.8) to a prokaryotic protein involved in mRNA production. I-TASSER protein structure and function prediction results found that nine of the top ten structural homologs to the protein product of c9g1i2 were adenosine deaminase/hydrolases. A tblastn search with the c9g1i2 protein product only found an insignificant match (E-value = 3.1) to a predicted gerbil transcription factor ( sequence XM_021634012.1). A blastn search returned no significant matches. Blast searches against the assembled transcriptomes of publicly available polychaete RNA-seq read data also yielded no significant matches (SI results).
The mixture of purified O. undecimdonta luciferase and luciferin in TBS (50 mM Tris-HCl, 150 mM NaCl, pH 8.0) was luminescent, even in the absence of Mg 2+ ions. Increasing the Mg 2+ concentration in the reaction buffer of recombinant luciferase cell lysate did not affect the yield of the bioluminescence reaction (data not shown) .

Discussion
Given our lack of fresh specimens we opted to extract and purify the Odontosyllis luciferase directly from the lyophilized worms and successfully identified the luciferase gene using classic protein purification, luciferin purification, and recent whole-cDNA sequencing techniques. We then reconstructed native Odontosyllis bioluminescence in vitro using purified protein and highly purified luciferin [22] with no additional cofactors. Lastly, we verified the identity of the Odontosyllis luciferase gene by showing that recombinant protein and purified luciferin in cell-lysate is luminous, in which the luminescence spectra ( max, near 510 nm) matches that of the Odontosyllis in vivo luminescence.
It is notable that using purified components in studying bioluminescence reactions is important to verify that off-target reactions are not the source of luminescence and to avoid erroneous interpretation of the results [14] . Given that the protein and luciferin purification products were luminous and that luminescence of recombinant luciferase cell lysates were not enhanced with Mg 2+ the O. undecimdonta luciferase-luciferin reaction does not appear to require additional cofactors. It is also important to note that the recombinant protein is not secreted by eukaryotic cells and that the luminescence reaction only occurs when cells containing the recombinant luciferase are lysed. This suggests that the highly purified O. undecimdonta luciferin is not membrane-permeable, thus limiting the potential for applications of this luciferase in optogenetics or other cellular expression-based technology.
While the bioluminescence emitted during mating is well-characterized in Odontosyllis spp., the luciferase structure and the mechanism of the luciferin-luciferase reaction remains unclear. Despite this uncertainty, protein ortology searches using BLAST and HMMER show that syllid luciferase is unique both among sequenced polychaetes and other sequenced organisms in public databases. The lack of evidence for a conserved protein in the transcriptomes of other luminous polychaetes leaves open the theory that bioluminescence evolved more than three times in the annelids. In this conservative estimate, we only include the evolution of two unique bioluminescent systems for which either the structure of the luciferin, luciferase, or both have been determined (earthworms [29] and Odontosyllis) , plus at least one event for other annelids with uncharacterized bioluminescent systems.
Given that the structure of other polychaete luciferins is still unknown, this leaves the question of polychaete bioluminescence unanswered. Identification of the O. umdecimdonta luciferase sequence is the most important step to further characterization of this worm's bioluminescent system and the screening of other purified polychaete luciferins for cross-reactivity.

Acknowledgements
We thank late Professor Shoji Inoue and Dr. Hisae Kakoi (Meijo University, Japan) for providing lyophilized Odontosyllis specimens, and Uozu Aquarium (Toyama, Japan) for help collecting Odontosyllis specimens.   Supporting evidence for transcript models aligned to the c9g1i2 transcript, including 5' and 3' UTR. The Peptide Matches track shows unique peptide hits to any of the four transcript models that match by DNA and amino acid sequence similarity. All transcript models except c9g1i5 share the same structure, whereas c9g1i5 lacks 93 N-terminal amino acids. The ONT cDNA Reads track shows individual Oxford Nanopore 2D cDNA reads that align to the c9g1i2 transcript. Three reads span the complete 5' UTR, transcript, and 3' UTR of the long isoforms (c9g1i2, c9g1i3, c9g1i6), and four additional reads support the 5' UTR of the long isoforms. The RNA-seq coverage track supports the 5' and 3' UTR of the long isoforms, despite the predictable 3' bias inherent to polyA-selecting library preparation techniques.

Supplementary Methods
Genomic DNA isolation and sequencing Genomic DNA Isolation Genomic DNA of one O. undecimdonta specimen, OdonB, was isolated using the Omega Biotek E.Z.N.A. Mollusc DNA kit (product number D3373). A 30 mg sample of RNAlater-preserved tissue yielded 24 µg of DNA at 80 ng/µl in 300 µL. A 1 µl sample of OdonB DNA was imaged on a 1% agarose gel in a 150 V field for 45 minutes and was found to be larger than the 10 kbp ladder. We did not perform pulse field gel electrophoresis to image the size distribution of the DNA greater than 10 kbp.

Genome Library Prep
We prepared both a 10X Genomics chromium DNA library [1] , as well as a PCR-free whole genome shotgun library. For the 10X Genomics chromium library preparation, we sent a sample of the DNA to the UC Davis DNA Technologies Core where they performed the library prep and a single lane of 2x150 PE sequencing on an Illumina HiSeq 4000.
To prepare a PCR-free whole genome shotgun library, we sheared 1.5 µg of DNA in 50 µl of 1xTE using a Bioruptor sonication device on setting HIGH, for 30 seconds ON/30 seconds OFF for 13 cycles until the DNA size distribution mode was 350bp. Every five shearing cycles we removed the DNA tube from the Bioruptor, vortexed it, and quickly spun down the contents. We used 1 µg of sheared DNA as input for the Illumina TruSeq DNA PCR-Free library prep kit. The final library concentration was 6.68 ng/µL and uses the TruSeq i7 index ACAGTG(A). This library was pooled and sent to the UC Davis DNA Technologies Core for 2x150 PE sequencing on an Illumina HiSeq 4000. The sequencing run produced 39,628,963 read pairs.

Genome Assembly
To assemble the genome, we used the 10X Genomics Supernova assembler v1.1.2. We opted to not use the PCR-free shotgun data to assemble the genome due to low predicted coverage, approximately 16x assuming a conservative guess of a 700 Mbp genome size. All computation was performed using Haddock lab computational resources at the Monterey Bay Aquarium Research Institute. We used the simple command "supernova run --id <runid> --fastqs <path to fastq>/ --localmem=500" to assemble the genome using 500 GB of memory and 96 cores [1] . This took approximately three days to complete.
RNA sequencing protocol and transcriptome assembly RNA Isolation Total RNA intended for an Illumina RNA-seq library was isolated using the Trizol protocol on an RNAlater-preserved specimen (OdonA) from the same collection location, date, and time as sample OdonB. The final RNA yield from a 40 mg OdonA Trizol extraction was 170 ng/µl quantified by nanodrop in 45 µl, or 7.65 µg.
We also isolated total RNA from another individual, OdonC, to use downstream for Oxford Nanopore cDNA full-length sequencing. We isolated total RNA from OdonC using the manufacturer's recommended protocol. OdonC also has the same collection parameters as OdonB and OdonA . The final RNA concentration from approximately 30 mg of tissue was 14.2 µg from 142 ng/µl in 100 µl of ddH 2 O, quantified by qubit.

Illumina cDNA library prep and sequencing
A template-switching Illumina RNA-seq library from OdonB total RNA was prepared at Evrogen (Moscow, Russia) with a TruSeq Stranded mRNA Library Prep Kit v2 with the i7 index ACAGTG(A). The library was sequenced at the UC Davis DNA Technologies Core on an Illumina HiSeq 4000 2x150 PE run to a depth of 32,457,166 read pairs. cDNA Synthesis for 2D ONT Sequencing For cDNA sequencing on the Oxford Nanopore Technologies Minion, we first synthesized cDNA from sample OdonC. All primers used in the following protocol were adapted from [2] . To 50 ng OdonC total RNA in 8 µl, we added 2 µl of a modified SmartSeq2 Oligo dT primer ( 5'-/5Me-isodC/AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTT TTTTTVN-3' ) synthesized by IDT, 1 µL of 10 mM dNTPs. We mixed by vortexing and spun down briefly. We incubated this mixture at 65ºC for five minutes and snap-cooled on a freezer block in ice. To this reaction we added 4 µl of 5x RT buffer, 1 µl of 100 mM DTT, 1 µl of RNaseOUT, and 2 µl of 10 mM strand-switching oligo ( 5'-AAGCAGTGGTATCAACGCAGAGTACATrGrGrG-3' ). This mixture was mixed by vortexing and spun down briefly, then incubated for 2 minutes at 42ºC on a thermal cycler. Then, 1 µl of SuperScript IV enzyme (200 U/µl) was added to the mixture and mixed with five 1 µl pipette strokes. The reverse transcription reaction was carried out as follows: 10 minutes at 50ºC for RT, then 10 minutes at 42ºC for strand switching, then 10 minutes at 80ºC for heat inactivation.
To amplify the cDNA, we set up three PCR reactions using the above RT reaction as input: 5 µl of the RT reaction, 1.5 µl of 10 mM ISPCR primer ( 5′-AAGCAGTGGTATCAACGCAGAGT-3′ ), 18.5 µl NFW, and 25 µl of LongAmp Taq 2x Master Mix. Reaction contents were mixed by gentle inversion then centrifuged to remove bubbles. The PCR reaction conditions were: one cycle of 95ºC for 10 seconds, fifteen cycles of 95ºC for 15 seconds then 64ºC for 15 seconds then 65ºC for 500 seconds, and one cycle of 65ºC for 10 minutes.
The resulting cDNA was visualized on an agarose gel and all three amplicons were pooled together.

Library Prep and 2D ONT sequencing
From the pooled OdonC cDNA 1 µg was used as input for the remainder of the standard SQK-LSK208 Oxford Nanopore Technologies 2D Strand switching cDNA sequencing protocol. The final library concentration after 2D adapter-ligated capture and prior to sequencing was 8.18 ng/µl. The final library mass loaded to the flow cell was 98.16 ng in 12 µl of library. The flow cell used was a model FLO-MIN106, and the flowcell ID was FAF06207. We used MinKNOW v1.3.30 to control the sequencing run.

Transcriptome Assembly
Adapters were trimmed from the Illumina RNA-seq reads using SeqPrep2 [4] . We then de novo assembled a transcriptome using Trinity v2.1.1 with the option --SS_lib_type FR for read directionality and the --long_reads option using all 2D reads extracted from the Albacore-basecalled Oxford Nanopore reads [5] .

Read Mapping
Illumina RNA-seq reads were mapped to the transcripts with bwa mem [6] . Oxford Nanopore Technologies cDNA reads were mapped to the transcripts with the splice-aware minimap2 [7] . Peptide matches were extracted from source transcripts, then the small sequences were mapped to the reference transcript with bwa aln [8] . This is the information that is displayed in Figure 2 of the main text. This procedure allows some amino acid mismatch when matching mass spectrometry hits to a DNA sequence. This provides the advantage of finding signal when population-level amino acid diversity is high.

Mammalian cell culture
HEK293NT were maintained in DMEM supplemented with 10% FBS and 1× Penicillin/Streptomycin ("fullDMEM") for all growth and passaging steps unless otherwise noted. For continuous culture, the cells were grown to approximately 70-80% confluency and then split 1:12 to be ready to be split again 3 days later. To split cells from a 25 cm 2 flask the culture medium was gently removed, 1 ml PBS without Mg/Ca was added to cover the surface of the cells. PBS was removed and 1 ml 0.025 % Trypsin in 6 mM EDTA was added to the side of the flask, not directly onto the cells. Solution was spread over the cells by gently "rocking" the flask several times. The flask was incubated at 37°C for 1-2 min. Then flask was rocked to completely dislodge the cells. After gently pipetting 80 µl of cell suspension was transferred to the new flask supplied with 5 ml of fullDMEM. For preparing cells for transfection, 40 µl of cell suspension described above was transferred to 2 cm cell culture dish supplied with 2 ml of fullDMEM. After 24 hours incubation at 37°C with 5% CO 2 cells were transfected with FuGene 6 reagent (Promega, Fitchburg, WI, USA) in accordance to the manufacturer's protocol.

Molecular cloning
All cloning was performed by Golden Gate assembly. The synthesized genes were cloned into MoClo Level 0-SP vector from MoClo kit plasmid pICH41258. Level 1 eukaryotic expression plasmids were assembled into MoClo kit plasmid pICH47742 as a backbone, and the following parts were cloned in Level 0 vectors: CMV promoter, luciferase candidate gene, stop-codon containing DNA part and SV40poyA terminator. Prokaryotic expression plasmids were assembled with pCOOFY plasmid (T7 promoter) as a backbone and luciferase candidate gene as a single insert. A vector containing GFP was used as a positive control for cloning and expression.
To search for luciferase homologs, we assembled transcriptomes of polychaetes using publicly available data. To do this, we downloaded the forward and reverse read fastq.gz files from the European Nucleotide Archive. The confirmed luminous species included in this analysis were Chaetopterus variopedatus [14] , Harmothoe extenuata [15] , Harmothoe imbricata [16] , and Tomopteris helgolandica [17] . All other species mentioned above may be luminous, with the exception of: Eunice spp. , Pareurythoe