A curated list of genes that control elemental accumulation in plants

Understanding the mechanisms underlying plants’ adaptation to their environment will require knowledge of the genes and alleles underlying elemental composition. Modern genetics is capable of quickly, and cheaply indicating which regions of DNA are associated with particular phenotypes in question, but most genes remain poorly annotated, hindering the identification of candidate genes. To help identify candidate genes underlying elemental accumulations, we have created the known ionome gene (KIG) list: a curated collection of genes experimentally shown to change uptake, accumulation, and distribution of elements. We have also created an automated computational pipeline to generate lists of KIG orthologs in other plant species using the PhytoMine database. The current version of KIG consists of 176 known genes covering 5 species, 23 elements and their 1588 orthologs in 10 species. Analysis of the known genes demonstrated that most were identified in the model plant Arabidopsis thaliana, and that transporter coding genes and genes altering the accumulation of iron and zinc are overrepresented in the current list.


Abstract-
Knowledge of the genes and alleles underlying elemental composition will be required to understand how plants interact with their environment. Modern genetics is capable of quickly, and cheaply indicating which regions of DNA are associated with the phenotype in question, but most genes remain poorly annotated, hindering the identification of candidate genes. To help identify candidate genes underlying elemental accumulations, we have created the known ionome gene (KIG) list: a curated collection of genes experimentally shown to change elemental uptake. We have also created an automated computational pipeline to generate lists of KIG orthologs in other plant species using the PhytoMine database. The current version of KIG consists of 96 known genes covering 4 species and 23 elements and their 596 orthologs in 8 species. Most of the genes were identified in the model plant Arabidopsis thaliana and transporter coding genes as well as genes that affect the accumulation of iron and zinc are overrepresented in the current list.
Intro-Understanding the complex relationships that determine plant adaptation will require detailed knowledge of the action of individual genes and the environment. One of the fundamental processes that plants must accomplish is to manage the uptake, distribution and storage of elements from the environment. Many different physiological, chemical, biochemical and cell biology processes are involved in moving elements, implicating thousands of genes in every plant species. Modern genetic techniques have made it easy and inexpensive to identify hundreds to thousands of loci for traits such as the elemental composition (or ionome) of plant tissues. However, moving from loci to genes is still difficult as the number of possible candidates is still extremely large and the ability of researchers to identify a candidate gene by looking at annotations is limited by our current knowledge and inherent biases about what is worth studying (Stoeger et al. 2018) .
The most obvious candidates for genes affecting the ionome in a species are orthologs of genes that have been shown to affect elemental accumulation in another species. Indeed, there are multiple examples of orthologs affecting elemental accumulation in distantly related species, such as Arabidopsis thaliana and rice (Oryza sativa), including Na+ transporters from the HKT family (Ren et al. 2005; the heavy metal transporters AtHMA3 and OsHMA3 ( Chao et al. 2012, Yan et al. 2016; E3 ubiquitin ligase BRUTUS and OsHRZs that regulate degradation of iron uptake factors (Selote et al. 2015, Hindt et al. 2017, Kobayashi et al. 2013 and the K+ channel AKT1 (Lagarde et al. 1996 . To our knowledge, no comprehensive list of genes known to affect elemental accumulation in plants exists. To ameliorate this deficiency, we sought to create a curated list of genes based on peer reviewed literature along with a pipeline to identify orthologs of the genes in any plant species and a method for continuously updating the list. Here we present version 0.1 of the known ionomic gene (KIG) list.

Materials and Methods
Criteria for inclusion in the primary KIG list were as follows: we included functionally characterized genes from the literature that are linked to changes in the ionome. For being considered, the phenotype of knockout or knock-down plants for the specific gene needs to show consistent changes in at least one element in at least one experimental condition. Thus, we have not included genes that are linked to metal tolerance or sensitivity but do not alter the ionome. For double mutants, both genes are listed. In order to identify the KIG genes, we created a Google survey that was distributed to members of the ionomicshub research coordinators, as well as advertising on Twitter and in oral presentations by the authors. We asked submitters to provide the species, gene name (or names where alleles of two genes were required for a phenotype), gene ID(s), tissue(s), element(s) altered and a DOI link for the primary literature support.
Creating the inferred orthologs list: The known-ionomics gene list contains known genes from the primary list and their orthologous genes inferred by inParanoid (v4.1) pairwise species comparisons. The inParanoid files were downloaded from Phytozome for each organism-to-organism combination of species in the primary list, plus Glycine max , Sorghum bicolor , Setaria italica , and S. viridis . Orthologs of the primary genes were labeled as "inferred" genes. If a primary gene was also found as an ortholog to a primary gene in another species, the status was changed to "Primary/Inferred" in both species. It is important to note that only primary genes can infer genes; inferred genes cannot infer other genes. The pipeline for transforming the primary list into the known-ionomics gene list can be found at github.com/baxterlab/KIG. Gene Enrichment analysis: Overrepresentation analysis was performed on the primary and inferred genes in A. thaliana using the GO Consortium's web-based GO Enrichment Analysis tool powered by PANTHER (GO ontology database, released 09/08/2018) classification system tool (Ashburner et al. 2000, The Gene Ontology Consortium 2017, Mi et al. 2017 . We restricted overrepresentation analysis to A. thaliana because of its dominance in the known ionome list and our lack of confidence in the functional annotation of the other species in the list. An analysis performed by Wimalanathan et al. (2018) found that maize gene annotations in databases like Gramene and Phytozome lacked GO annotations outside of automatically assigned, electronic annotations (IEA). IEA annotations are not curated and have the least amount of support out of all the evidence codes (Harris et al. 2004) . A. thaliana annotations come from a variety of evidence types, showing a higher degree of curation compared to maize (Wimalanathan et al. 2018) .
We tested both the PANTHER GO-Slim and the GO complete datasets for biological processes, molecular function and cellular component. The enriched terms (fold enrichment > 1 and with a false discovery rate <0.05) were sorted into five specific categories relating to the ionome based annotation terms: 1. Ion homeostasis -terms include homeostasis, stress, detoxification, regulation of an ion 2. Ion transport -terms specifically states transport, export, import or localization of ion(s).
Does not include hydrogen ion transport 3. Metal ion chelation -terms relating to phytochelatins, other chemical reactions or pathways of metal chelator synthesis 4. Response to ions -vaguely states a response to ions, but does not have any child annotation terms that offer any more clarification (ie. stress response). Broadly this is referring to any change to the state or activity of cell secretion, expression, movement, or enzyme production (Carbon et al. 2009) . 5. Other transport -annotation stating the transfer of anything that is not an ion (glucose, peptides, etc.) Genes may belong to more than one category, but if they belong to a parent and child term in the same category, they were only counted once.

Results
The current primary list (v0. . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted October 31, 2018.

Figure 1. Number of genes for each species that are primary, inferred from other primary genes in other species, or both.
Most primary genes have orthologs in other species-which we call inferred genes. Less than 11% of primary genes in A. thaliana and O.sativa , and less than 2% in M. truncatula , lack orthologs (Table 2). G. max , S. bicolor , S. italica , and S. viridis currently contain only inferred genes (Table 2, Figure 1). . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted October 31, 2018. ;https://doi.org/10.1101/456384 doi: bioRxiv preprint The YSL genes in A. thaliana and O.sativa are an example that provides evidence for the validity of the KIG list pipeline: AtYSL3, OsYSL9 and OsYSL16 were listed in their respective species as primary genes (Table 1) and after the ortholog search are annotated as primary/inferred genes, referencing each other (STable1). However, AtYSL2 in A. thaliana , which was not listed as primary gene, was inferred through both OsYSL9 and OsYSL16. Additionally, AtYSL1 in A. thaliana is not a paralog of AtYSL3 or an ortholog of OsYSL9 and OsYSL16 according to PhytoMine, and is not listed as an ortholog to either of the O. sativa YSL genes in the KIG list. Other examples include AtVIT1 and OsVIT1/OsVIT2 (Kim et al. 2006, Zhang et al. 2012) and AtMTP8 and OsMTP8.1 (Eroglu et al. 2016, Chen et al. 2013) . Thus, we can reliably generate inferred genes for and create a species specific KIG list for any species in PhytoMine.
The primary list covers 23 elements (Figure 2) according to the reported elements from authors in the primary list, which is more elements than predicted by the GO term annotations for those genes. Some GO annotations for these genes mention only a portion of elements listed by the literature in the primary list. This may be due to GO annotation evidence codes lacking curation or biological data (IEA,ND,NAS) (Wimalanathan et al. 2018) , or it may be due to alterations in one element leading to alterations in other elements (Baxter et al. 2008) . . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted October 31, 2018. ; https://doi.org/10.1101/456384 doi: bioRxiv preprint A. thaliana studies seem to be driving the elements included in the list, as it is the only species to have a gene listing each primary element. There is a bias towards elements like Manganese, Zinc and Iron which have 2, 3.5 and 4.5 times more associated genes than the average 8±9 genes of other elements. Iron is also the only element to contain genes from all four species in the primary list. In addition to biases towards certain elements, our primary list is also skewed towards an overrepresentation of ionome genes in above ground tissue studies (Figure 3). This is likely due to the difficulties in studying the elemental content of below ground tissues. All of our M. truncatula genes come from nodule studies, most likely because it is a model legume species.

Figure 3. Number of primary genes each type of tissue contributes to the known ionomics list. Above ground is a summary of anther, leaf, seed and shoot, while below ground is a summary of root and nodule.
Querying the manually curated PANTHER GO-Slim biological process database, with the A. thaliana KIG list returned no terms significantly overrepresented. However, all of the A. thaliana genes in the known ionomics list were mapped to significantly (false discovery rate <0.05) overrepresented annotation terms within the GO biological processes complete database and thus categorized into the five groups listed in the methods (Figure 4).
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted October 31, 2018. ; https://doi.org/10.1101/456384 doi: bioRxiv preprint Even though some genes were annotated as associated in the "other transport" of glycoside, glucose, oligopeptides, or phloem transport, the citations that have added them into our primary list show that their mutant alleles altered elemental accumulation. AtBCC1 and AtBCC2 are annotated as glycoside transporters, but were inferred orthologs through an O. sativa gene in the primary list from a paper finding that OsABCC1 was contributing to the reduction of arsenic in rice grain s (Song et al. 2014) . The YSL genes and OPT3 are annotated as genes encoding oligopeptide transporters, but more specifically they are encoding predicted phloem-localized metal-nicotianamine complex and iron/cadmium transporters, respectively (Waters et al. 2006, Zhai et al. 2014 . Lastly, NRT1.5/NPF7.3 is also annotated as encoding an oligopeptide transporter, but Li et al. (2017) identifies it as a xylem loading potassium ion antiporter.
The PANTHER GO-Slim molecular function annotation database did show a significant overrepresentation for cation transmembrane transporter activity. The results using the GO complete molecular function database supported this, with the addition of metal ion binding and cyclic nucleotide binding annotations. The cyclic nucleotide binding annotation genes were more specifically cyclic nucleotide ion gated channel genes (Gobert et al. 2006) . The PANTHER GO-Slim cell component and GO complete cell component annotation database both returned . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted October 31, 2018. ; https://doi.org/10.1101/456384 doi: bioRxiv preprint significant overrepresentation for vacuoles and the plasma membrane, both known to be critical for elemental movement and storage (need refs). The molecular function and cell component results are further evidence that our list is dominated by ion transporters.
To test how complete our list is in its current state, we searched PANTHER's biological processes annotations for the number of A. thaliana genes encoding predicted elemental transporters predicted to transport elements. We found 634 genes predicted to encode elemental transport, and only 18 of these PANTHER genes are listed in the known ionomics list. We checked these results against ThaleMine genes with the term "ion transport" in the gene name, description, or GO annotation and found only 376 genes, with 53 of these genes listed in the known ionomics list. Interestingly, 219 of the genes from ThaleMine were not found in the 634 from PANTHER.

Discussion
Here we have produced a curated list of genes known to alter the elemental composition of plant tissues. We envision several possible uses for this list: 1. Researchers can use the list to identify candidate genes in loci from QTL and GWAS experiments. 2. This list can serve as a gold standard for computational approaches. 3. The list can serve as a reading list for those interested in learning about elemental accumulation.
The list is highly enriched for transporters, genes that affect elemental accumulation in above ground tissues and genes that affect the accumulation of Fe and Zn. All of these factors, however, could be the result of human bias towards research topics. For example, transporter genes became obvious candidates for studying plant material nutrition when disruption allele collections were produced (McDowell et al. 2013) . Fe and Zn are both important nutrients and of considerable interest to the community where the ionomics approach was developed. Additionally, above ground tissues are easier to study without contamination from the soil, and such studies are therefore more prevalent.
Most entries on this list are derived from model organisms which reflects the fact that most of our knowledge about genes that affect elemental accumulation comes from these species. A. thaliana and M. truncatula account for 65.63% of the primary genes list, and several of the genes in crop plants were found due to being orthologs of genes in the model organisms , Xu et al. 2017 .
We conducted all of our analyses of GO terms in Arabidopsis, as it had the highest number of high confidence annotations. The lack of good annotations in other species highlights the value of creating curated lists like this one.
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted October 31, 2018. ; https://doi.org/10.1101/456384 doi: bioRxiv preprint Call for more submissions: While we believe that the current list is useful, we are likely missing genes due to our lack of comprehensive knowledge of the literature. Currently, the list contains entries from only 9 people. We ask readers who know of genes that we are missing to contribute by submitting them here: https://docs.google.com/forms/d/e/1FAIpQLSdmS_zeOlxTOLmq2wB45BuSQml1LMKtKnWSat mFRGR2Q1o0Ew/viewform?c=0&w=1 or email corresponding author. KIG lists 0.1v for each of the species can be seen in STable1, and future updates to the list can be found at https://docs.google.com/spreadsheets/d/1XI2l1vtVJiHrlXLeOS5yTQQnLYq7BOHpmjuC-kUejUU /edit?usp=sharing.

Contributions:
Contributed genes: IB, FKR, FM, SC, EW, PK Analyzed data: LW, GZ Wrote paper: LW, FKR, IB Edited paper: FKR, FM, SC, EW, PK, GZ, LW, IB . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted October 31, 2018. ; https://doi.org/10.1101/456384 doi: bioRxiv preprint . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted October 31, 2018. ; https://doi.org/10.1101/456384 doi: bioRxiv preprint