Discovery and Characterization of Novel Lignocellulose-Degrading Enzymes from the Porcupine Microbiome

Plant cell walls are comprised of cellulose, hemicellulose, and lignin, collectively known as lignocellulose. Microorganisms degrade these components to liberate sugars to meet metabolic demands. Using a metagenomic sequencing approach, we previously demonstrated that the microbiome of the North American porcupine (Erethizon dorsatum) is replete with novel lignocellulose-degrading enzymes. Here, we report the identification, synthesis and partial characterization of four genes from the porcupine microbiome encoding putative novel lignocellulose-degrading enzymes, including a β-xylanase, endoxylanase, β-glucosidase, and an ⍺-L-arabinofuranosidase. These genes were identified via conserved catalytic domains associated with cellulose and hemicellulose degradation. We cloned the putative β-xylanase into the pET26b(+) plasmid, enabling inducible gene expression in Escherichia coli (E. coli) and periplasmic localization. We demonstrated IPTG-inducible accumulation of β-xylanase protein but failed to detect xylobiose degrading activity in a reporter assay. Alternative assays may be required to measure activity of this putative β-xylanase. In this report, we describe how a synthetic metagenomic pipeline can be used to identify novel microbial lignocellulose-degrading enzymes and take initial steps to introduce a hemicellulose-degradation pathway into E. coli to enable biofuel production from wood pulp feedstock.


Introduction
The gut microbiome comprises thousands of bacterial species encoding millions of genes with the potential to affect host physiology [1]. These microbes provide a genetic repository of enzymes that aid digestive processes, which could be repurposed for various bioengineering applications, including biofuel production. Lignocellulosic biofuel production can be achieved via synthetic microbial pipelines that degrade complex lignocellulose polymers into simple fermentable sugars [2,3]. Lignocellulosic biomass consists of cellulose, hemicellulose, and lignin in a 4:3:3 ratio [4]. Lignin has a complex chemical structure that impedes chemical and/or enzymatic hydrolysis of lignocellulose. Wi et al. recently demonstrated a new hydrogen peroxide pretreatment that improves downstream biocatalytic hydrolysis of lignocellulose by removing lignin [7]. Similar to cellulose, hemicellulose can be degraded to release monosaccharides that can be utilized for ethanol production, such as xylose [8]. Currently, hemicellulose sugars are not widely used for ethanol production, leading to losses in efficiency of biofuel production per input of lignocellulosic biomass.
As a hind-gut fermenter, the North American Porcupine, Erethizon dorsatum, has an enlarged cecum packed with microbes that aid digestion of lignified plants, coniferous (preferred) and deciduous cambium (inner bark), and flowers [5]. Using metagenomic sequencing, the 2016 Dalhousie iGEM team discovered that the porcupine microbiome is replete with microbial enzymes with putative lignocellulose-degrading properties, and that host diet influences gut microbial diversity and metabolic function [6]. Their study compared shot gun metagenomic and 16S sequencing of carnivore microbiomes to the microbiomes of herbivores such as the porcupine and beaver [6]. The results of this comparison revealed that herbivores have elevated levels of celluloytic genes in their microbiome versus carnivores [6]. This research is what inspired the 2017 team to continue to work with the porcupine microbiome with the long-term objective of harnessing the enzymatic potential of the porcupine microbiome to turn lignocellulosic biomass into biofuel. To do this, we created our own synthetic metagenomic pipeline to identify candidate cellulose-and hemicellulose-degrading enzymes from our existing datasets. We selected four novel candidate enzymes for synthesis and further study; One these previously uncharacterized enzymes, with 75% homology to a Butyrivibrio sp. β-xylanase (NCBI protein database: CDC35707.1), was cloned and transformed into Escherichia coli (E. coli) BL21(DE3) to enable IPTGinducible protein expression and investigation of enzymatic activity.

Methods
All protocols can be found here.

Identification of Open Reading Frames
Metagenomic analysis of Illumina MiSeq data was conducted using our previously published protocols [6]. FASTQC and BowTie2 were used to inspect reads for overall quality and contaminants from sequencing. Reads were trimmed to 400 bp in length to remove low-quality terminal sequences from further analysis. MegaHIT alignment software processed reads in FASTq format and stitched reads into longer contigs by identifying overlapping coding regions [9]. Prodigal was used to identify open reading frames (ORFs) by searching sequences in six frames across both DNA strands [10]. A '-c' command modifier in Prodigal was used to ensure the program only detected ORFs with both start and stop codons present. Prodigal also searched for non-canonical start codons, as well as ribosome binding sites to identify all ORFs present in sequencing data. Non-canonical stop codons are relevant for gene searches as <10% of prokaryotic protein translation is initiated this way and these products are often overlooked in conventional searches [11,12].

In Silico Protein Function Predictions
pHmmer was used to identify putative function of protein domains [13,14]. Protein domains and possible functions were identified using the Research Collaboratory for Structural Bioinformatics Protein Data Bank [15]. e-values were calculated to compare domains identified in candidate proteins to known domains in the database [15]. Candidate proteins with the lowest e-values were queried against the Basic Local Alignment Search Tool (BLAST) database using pHmmer to identify proteins with major protein domain conservation. Selected candidate genes were codon-optimized for E. coli and synthesized by Integrated DNA technologies (IDT, Coralville, IA, USA) as gBlock gene fragments.

Gene Cloning
Candidate genes were PCR-amplified from IDT gBlock gene fragments with Phusion High-Fidelity DNA Polymerase according to manufacturer's instructions (New England Biolabs (NEB), Ipswich, MA, USA). PCR products were purified using the QIAquick gel extraction kit protocol (Qiagen Inc., Toronto, ON, Canada) ( Table 1). The pET26b(+) expression plasmid was used as a backbone due to its pelB leader sequence that translocates fusion proteins to the periplasm, after which they can be secreted into the extracellular space [16]. Thus, by fusing our putative enzymes to pelB we increased the likelihood of secretion to the extracellular space to access substrates. This plasmid was generously donated to us by Dr. Zhenyu Cheng. Candidate genes and pET26b(+) were digested with restriction endonucleases (NEB) indicated in Table 1 for 1 hour at 37 ℃. Digested DNA was subjected to agarose gel electrophoresis on a 0.8% agarose gel and purified using the QIAquick gel extraction kit according to manufacturer's instructions (Qiagen Inc.), then ligated with pET26b(+) plasmid DNA using T4 DNA ligase (NEB). Ligation products were transformed into chemically competent Stbl3 E. coli via standard heat-shock transformation method [17]. Specifically, 5 µL of ligation products were added to 50 µL of E. coli suspension in Luria-Bertani (LB) broth, and following heat-shock transformation, 250 µL of LB broth was added during the 1 hour recovery stage and the mixture was subsequently plated on LB agar + 25 µg/ml kanamycin. Plates were incubated at 37℃ for 18-24 hours to allow the growth of transformants. Colonies were picked and inoculated into 5 mL of LB broth, grown overnight to saturation, and plasmid DNA was extracted via QIAprep Spin Miniprep Kit (Qiagen, Inc.). Plasmids were screened by restriction digestion, and processed for Sanger sequencing (Genewiz, South Plainfield, NJ, USA).

Table 1. Oligonucleotide primers for PCR amplification of candidate genes
Inducible Protein Expression pET26b(+)-β-xylanase plasmid was transformed into BL21(DE3) E. coli to enable inducible protein expression. Selected colonies were inoculated into 5 mL of LB broth and incubated at 37˚C in a shaker (220 RPM) overnight. The overnight culture was diluted 1:100 in 5 mL LB borth and incubated in a shaker until the OD600 of the culture reached 0.5-0.8. Once in log phase, 0.1 mM isopropylthio-β-galactoside (IPTG) (Thermo Fisher Scientific, Waltham, MA, USA) was added to induce protein expression. After 4 hours of shaking incubation at 37°C, protein lysates were harvested in 2x Laemmli. Empty vector pET26b(+) (EV) was used as a negative control. Proteins were separated by SDS-PAGE, fixed in methanol and acetic acid solution, and stained with Coomassie Brilliant Blue (Thermo Fisher Scientific).

Xylobiose Degradation Assay
The xylobiose degradation assay was modified from a previously described method [18]. BL21(DE3) E. coli transformed with β-xylanase-pET26b(+) vector were grown in LB broth supplemented with 25 µg/ml kanamycin to an OD600 of 0.6-0.8, and β-xylanase expression was induced using the allolactose mimetic IPTG. After 4 hours, 50 µL of culture was added to an opaque-walled 96-well plate (Thermo Fisher Scientific), followed by 50 µL of 200 µM of 4-methylumbelliferyl β-glycosides of xylobiose (CMU-X) in lysis buffer (1% Triton-X100, 50 mM Potassium Acetate at pH 7). The plate was incubated at 37°C for 18 hours, shaking at 220 RPM. After incubation, fluorescence was measured on a Tecan Infinite M200 PRO microplate reader with excitation at 365 nm, and emission at 450 nm. Each sample was normalized to LB broth alone. The unconjugated fluorophore, 4-methylumbelliferone, was used as a positive control and un-induced recombinant E. coli and pET26b(+) empty vector were used as negative controls.

Identification and Cloning of Putative Microbial Enzymes via a Synthetic Metagenomic Pipeline
Using our metagenomic sequencing pipeline to investigate porcupine fecal samples ( Figure 1A), we identified four microbial genes encoding putative cellulose-and/or hemicellulose-degrading enzymes ( Table 2). These genes were identified by similarity of predicted primary amino acid sequences to conserved domains from known enzymes found in the Research Collaboratory for Structural Bioinformatics Protein Data Bank ( Figure  1B). Genes encoding predicted cellulose-/hemicellulose-degrading microbial enzymes from the porcupine microbiome were synthesized and cloned into the pET26b(+) vector to enable drug-inducible gene expression. The T7 promoter positioned upstream from the multiple cloning site in pET26b(+) enables IPTG-inducible gene expression. Successful cloning of the putative β-xylanase was confirmed by Sanger sequencing.  [6]. Reads were checked for quality and trimmed, concatemerized via MegaHit, and open reading frames were identified using Prodigal. Protein sequences of interest were identified by pHMMER using various protein databases and were selected for matches of interest based on e-value selection. (B) Top candidate microbial enzymes identified by the metagenomic sequencing pipeline; putative signal sequences are shown in orange and predicted conserved protein domains are shown. Table 2: Four candidate cellulose-/hemicellulose-degrading enzymes identified from the porcupine microbiome. The e-value is a measure of confidence, with lower values denoting higher confidence.

Inducible expression of a putative β-xylanase
To test inducible expression of the putative β-xylanase, IPTG was added to log-phase E. coli cultures transformed with pET26b(+)-β-xylanase plasmid or pET26b(+) vector control. After 4 hours, lysates were harvested and processed for SDS-PAGE. Proteins were visualized via Coomassie Brilliant Blue staining. IPTG treatment caused the accumulation of a distinct 51 kDa protein species in lysates from pET26b(+)-β-xylanasetransformed E. coli, consistent with the predicted molecular weight of the putative β-xylanase (Figure 2). This protein species was not observed in lanes containing negative control lysates (pET26b(+) empty vector, or no IPTG controls). Results were reproduced in biological replicates seen on the left and right side of the center molecular weight marker lane (M).

β-xylanase Enzyme Assay
β-xylanase activity was tested using substrates and a protocol developed by Hallam and Withers [18]. The assay employs CMU-X, comprised of a fluorophore conjugated to a xylose sugar via a β-1,4 glycosidic bond; β-xylanase activity cleaves this bond resulting in fluorescence emission at 450 nm. Expression of the putative β-xylanase was induced in BL21(DE3) E. coli as described above and cells were incubated with CMU-X in lysis buffer for 18 h, followed by measurement of fluorescence emission. Baseline fluorescence emission from control cultures was ~100 relative light units, consistent with previous observations, whereas the pure unconjugated fluorophore control was almost 100-fold higher (Figure 3). Expression of the putative β-xylanase did not increase hydrolysis of the CMU-X substrate compared to controls.

Interpretation
Using a synthetic metagenomic pipeline, we discovered, synthesized and initiated characterization of four novel putative enzymes from the porcupine microbiome ( Figure 1; Table 2). These candidate enzymes were selected based on predicted conservation of protein domains typically associated with cellulose-and/or hemicellulose-degradation processes. Of these four candidate enzymes, we focused our attention an uncharacterized β-xylanase enzyme with 75% homology to a Butyrivibrio sp. β-xylanase (NCBI protein database: CDC35707.1) (Figure 4). Our synthetic metagenomic pipeline enabled rapid identification of novel putative proteins with desired properties. In the future, this pipeline will be useful for further mining of metagenomic sequencing datasets by others. For example, future iGEM teams may use our pipeline to discover novel enzymes with potentially useful properties for synthetic biology applications. Our team was the first to search the porcupine microbiome for enzymes that can be used in synthetic biology applications; our specific focus was on finding enzymes that could be harnessed to convert wood pulp waste into useful sugars for fermentation and biofuel production [6]. Over millions of years of evolution certain organisms, including termites and ruminids, and microbes, including fungi, have evolved multi-component enzyme systems to break down lignocellulosic biomass. Fungal species including Aspergillus, Trichoderma and Acrophialophora break down wood biomass using cellulases, ligninases, and auxiliary cellulase enzymes [19,20]. By contrast, termites and ruminids rely on their microbiomes to break down lignocellulosic biomass to liberate sugars [21,22]. A study by Fibryanti et al. cultured four bacterial species from the gut of builder termites, two of which were able to be characterized as Bacillus megaterium and Paracoccus yeei [22]. Inspired by these examples, we chose the humble porcupine as our source for microbial cellulolytic enzymes. Like termites, fungi and ruminids, the diet of porcupines consists of cellulose-rich sources; for porcupines, these include young softwood buds and branches [23]. We hypothesized that the porcupine microbiome was a heretofore untapped source of lignocellulose-degrading enzymes that could be exploited for industrial processes. These enzymes could be produced by different microbes in the porcupine gut, with the common goal of breaking down complex lignocellulose chains, and for this reason they would have evolved to work optimally in the consistent temperature and pH of the hindgut. We believe that careful study of the porcupine hind-gut environment will be required to guide design of bioreactors that utilize enzymes from the porcupine microbiome.
Our metagenomic sequencing pipeline identified a plethora of cellulose, hemi-cellulose, and auxiliary hemi-cellulose degradation enzymes, but we focused on four high-confidence genes for synthesis and further characterization. Significantly more work will be required to fully characterize these putative enzymes, and to start to incorporate them into pilot experiments in small-scale E. coli bioreactors. Beyond these four candidates, many more enzymes will be required to recreate the complete cellulose-and hemicellulose-degradation pathway in E. coli. Breakdown of cellulose into glucose monomers requires at least three enzymes: endoglucanase, exoglucanase, and β-glucosidase. Endoglucanase was cloned directly from Ruminiclostridium thermocellum cultured from porcupine fecal matter by the 2016 Dalhousie University iGEM Team [6]. Using our synthetic metagenomic pipeline, we identified and synthesized a putative β-glucosidase; more work will be required to identify an exoglucanase from our existing datasets. Hemicellulose is more complex than cellulose, necessitating a set of enzymes to hydrolyze the main polymer, and auxiliary enzymes to cleave side-chains [24]. Known hemicellulose degradation enzymes include endoxylanase and β-xylanase, which target the main polymer backbone, and α-glucuronidase and α-arabinofuranosidase that cleave side chains [24]. In this study, we identified endoxylanase, β-xylanase, and α-arabinofuranosidase enzymes, whereas α-glucuronidase remains elusive. Once we have identified, synthesized and validated a full set of cellulose-and hemicellulose-degrading enzymes, we plan to subclone these genes into operons to facilitate construction of a full cellulose-, and hemicellulose-degrading pathway in E. coli, wherein each enzyme will be shuttled to the periplasmic space for subsequent secretion. By using an E. coli with a mutant gluc1 importer, we will prevent import and utilization of glucose and xylose endproducts in the bioreactor. Instead, these products will be restricted to the extracellular space, which will allow transfer to a second bioreactor where S. cerevisiae can ferment the sugars into bioethanol. The whole system will be assessed for efficiency and yield for comparison against current bioreactor systems.

Functional Metagenomics as an Alternative Discovery Method
Our synthetic metagenomic pipeline is a powerful tool for discovery, but because it infers functional relationships from homology to previously characterized sequences in a database, it will likely fail to identify greatly divergent proteins with desirable properties. It may also identify candidate proteins that appear to have conserved functional domains, but lack the function predicted by homology. By contrast, functional metagenomic library screens rely on functional assays for gene discovery. Thus, functional metagenomics provides a convenient approach to new gene discovery that nicely complements sequencebased approaches, but with greater potential for discovery of truly novel genes that don't resemble those in existing databases. Recently, using functional metagenomics, Cheng, et. al. discovered three novel βgalactosidase enzymes, two of which had conserved domains, and one of which was part of a previously undiscovered enzyme family [25]. While less high-throughput, we believe creating a functional metagenomic library from porcupine microbiome DNA would be a viable way to continue to search for enzymes in the cellulose and hemicellulose degradation pathways.

Alternative Enzyme Activity Assays
Going forward, several other assays can be performed to measure activity of the putative enzymes described in this study. β-xylanase and β-glucosidase activity can be assessed by the Somogyi-Nelson method, which is based on measuring the reducing power of sugars in solution; this method relies on cupric ion reduction by a reducing sugar such as glucose and xylose [26]. β-xylanase and β-glucosidase work by cleaving the β-1,4 backbone of xylobiose and cellobiose, respectively. This cleavage results in the release of monomeric xylose or glucose, both of which can be reduced by the Somogyi-Nelson method. By contrast, endoglucanase activity can be measured by the agar-based Congo Red staining assay. As pH decreases, the colour of Congo Red darkens, which enables measurement of the reaction between carboxymethylcellulose and endoglucanase which yields a cellobiose molecule as well as a phosphate group [27]. The phosphate group lowers local pH in the semisolid medium allowing visualization of enzyme activity; this is a qualitative measure [27]. By contrast, high performance liquid chromatography (HPLC) using a column designed for carbohydrate analysis could facilitate measurements of cellulose/hemi-cellulose degradation products [28].
In our pilot experiments described in this report, all bacterial growth was carried out at 37℃, at a neutral pH, in aerobic conditions. While these conditions are optimal for growth of most laboratory E. coli strains, they do not match the anaerobic and acidic environment of the porcupine hind-gut [29,30]. Indeed, it is possible that our putative β-xylanase may have been inhibited by aerobic assay conditions; previous studies have shown that reactive oxygen species can alter enzyme folding and function as described previously for nitrogenase enzymes from anaerobic bacteria Azotobacter and Cyanothece [31,32]. pH also has been shown to affect protein folding, stability and function [33,34]. Future studies will thoroughly test different environmental conditions, particularly those that match the environment of the porcupine hindgut.