RT Journal Article SR Electronic T1 Enhancing the interoperability of glycan data flow between ChEBI, PubChem, and GlyGen JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.06.17.448729 DO 10.1101/2021.06.17.448729 A1 Rahi Navelkar A1 Gareth Owen A1 Venkatesh Mutherkrishnan A1 Paul Thiessen A1 Tiejun Cheng A1 Evan Bolerlton A1 Nathan Edwards A1 Michael Tiemeyer A1 Matthew P Campbell A1 Maria Martin A1 Jeet Vora A1 Robel Kahsay A1 Raja Mazumder YR 2021 UL http://biorxiv.org/content/early/2021/07/03/2021.06.17.448729.abstract AB Glycans play a vital role in health, disease, bioenergy, biomaterials, and biotherapeutics. As a result, there is keen interest to identify and increase glycan data in bioinformatics databases like ChEBI and PubChem, and connecting them to resources at the EMBL-EBI and NCBI to facilitate access to important annotations at a global level. GlyTouCan is a comprehensive archival database that contains glycans obtained primarily through batch upload from glycan repositories, glycoprotein databases, and individual laboratories. In many instances, the glycan structures deposited in GlyTouCan may not be fully defined or have supporting experimental evidence and citations. Databases like ChEBI and PubChem were designed to accommodate complete atomistic structures with well-defined chemical linkages. As a result, they cannot easily accommodate the structural ambiguity inherent in glycan databases. Consequently, there is a need to improve the organization of glycan data coherently to enhance connectivity across the major NCBI, EMBL-EBI, and glycoscience databases.This paper outlines a workflow developed in collaboration between GlyGen, ChEBI, and PubChem to improve the visibility and connectivity of glycan data across these resources. GlyGen hosts a subset of glycans (~29,000) from the GlyTouCan database and has submitted valuable glycan annotations to the PubChem database and integrated over 10,500 (including ambiguously defined) glycans into the ChEBI database. The integrated glycans were prioritized based on links to PubChem and connectivity to glycoprotein data. The pipeline provides a blueprint for how glycan data can be harmonized between different resources. The current PubChem, ChEBI, and GlyTouCan mappings can be downloaded from GlyGen (https://data.glygen.org).Competing Interest StatementThe authors have declared no competing interest.EMBL-EBIEMBL-European Bioinformatics InstituteSIBSwiss Institute of BioinformaticsChEBIChemical Entities of Biological InterestCFGConsortium for Functional GlycomicsGNOmeGlycan Naming OntologyCIDPubChem Compound IdentifierSIDPubChem Substance IdentifierWURCSWeb3 Unique Representation of Carbohydrate StructuresSMILESSimplified Molecular-Input Line-Entry SystemInChlInternational Chemical IdentifierIUPACInternational Union of Pure and Applied ChemistryHCVHepatitis C virusX-refCross-reference