|
Carbohydrates are one of the major constituents of living cells. They provide mechanical stability of the cell wall and play important role in signal transduction, cell-cell recognition and immunological properties of microorganisms. The role of the provision of data on carbohydrates to the scientific community in biomedical and immunological research can hardly be overestimated. However, in contrast to other disciplines studying molecular basis of life, glycomics is lacking information-technology-based advantages. Universal integration standards and computer-assisted tools in glycomics are still in the making. Many existing carbohydrate databases are focused on particular properties, utilize incompatible formats, do not provide complete coverage, and most of them lack data quality.
Carbohydrate Structure Database (CSDB) aims at closing this gap by its curated content and cross-database integration, thus bringing glycomics to the same level of integrity as exists in genomics and proteomics. CSDB has been continuously developed and updated since 2004. Nowadays it provides data on bacterial, archaeal, plant, fungal, and protistal carbohydrates and glycoconjugates with published chemical sequence. Currently, it is the only free database with primary data on carbohydrate structures from these taxonomical domains published up to 2023.
Two key features of this project are coverage and data consistency. The database contains structures of ~33K carbohydrates and glycoconjugates (including glycoproteins and glycolipids) associated with ~17K microorganisms in ~15K publications. The coverage approaches nearly all glycans of microorganisms and fungi reported up to 2023, and of plants up to 2000. The average growth is ~1000 structures annually.
CSDB stores structural, taxonomical, bibliographical, assigned NMR-spectroscopic and other data (elucidation methods, publication abstracts, conformational, biochemical, and genetic data etc.) on carbohydrates with a known sequence. The source of data were import and manual re-annotation of other databases (incl. “CarbBank”), manual and semi-automated retrospective processing of publications, and user data submissions. All data have been checked for consistency by experts in carbohydrate biochemistry prior to the upload, and corrected when necessary. This makes CSDB one of a few primary glycoinformatic databases with fully curated content. Comparison of consistency of freely available carbohydrate databases showed the high data quality in CSDB.
The CSDB interface includes the web user part, administrator part and gateways for automated data interchange with other databases. Currently it is cross-linked with NCBI PubMed, NCBI Taxonomy, GlyTouCan, ICD-11, MonosaccharideDB, ImmuneEpitopeDB, and other resources. Users can search the database by fragments of structure, bibliography, taxonomical annotations, fragments of NMR spectra, composition data, trivial names etc. The integration with certain projects in glycomics has been achieved at the level of programming interface and by bulk data export as a Resource Description Framework feed. The unambiguous but nevertheless human-readable carbohydrate notation has been developed for this project, and translation tools to and from other known glycan representations are provided.
CSDB serves as a glycoinformatic platform and, except the database itself, hosts a number of services, such as:
The CSDB is freely available at http://csdb.glycoscience.ru.
This project started as "Bacterial CSDB" in the beginning of XXI century in the framework of the International Science and Technology Center Partner Project. The further funding originated from Russian Foundation of Basic Research, Russian Federation President grant committee, Deutches Krebbsforschungszentrum, and Russian Science Foundation. My personal role in this project was general research and development, database ideology and architecture, data formats, carbohydrate encoding and notation, programming of engine and services, web-design, cross-database interfaces, coordination of literature annotation and database filling processes, general management, and funding acquisition.
For scholars and students: Invitation to collaboration (in Russian) text, presentation.
Merged CSDB poster, 2015 (18th European Carbohydrate Symposium) (JPG, 566Kb)
Bacterial, plant and fungal CSDB poster, 2014 (6th Baltic Meeting on Bacterial Carbohydrates) (JPG, 637Kb)
Bacterial CSDB poster, 2009 (4th Baltic Meeting on Bacterial Carbohydrates) (JPG, 876Kb)
Carbohydrate databases: problems and solutions (lection)
Ph.V. Toukach
"Supplementing the Carbohydrate Structure Database with glycoepitopes"
(Glycobiology, 2023, v. 33(7), pp. 528-531)
Ph.V. Toukach, K.S. Egorova
"Source files of the Carbohydrate Structure Database: the way to sophisticated analysis of natural glycans"
(Scientific Data, 2022, v. 9, id. 131)
S.I. Scherbinina, M. Frank, Ph.V. Toukach
"Carbohydrate Structure Database (CSDB) oligosaccharide conformation tool"
(Glycobiology, 2022, v. 32(6), pp. 460-468)
Ph.V. Toukach, K.S. Egorova
"Examining the diversity of structural motifs in fungal glycome"
(Computational and Structural Biotechnology Journal, 2022, v. 20, pp. 5466-5476)
K.S. Egorova, N.S. Smirnova, Ph.V. Toukach
"CSDB_GT, a curated glycosyltransferase database with close-to-full coverage on three most studied non-animal species"
(Glycobiology, 2021, v. 31(5), pp. 524-529)
A.Y. Bochkov, Ph.V. Toukach
"CSDB/SNFG Structure Editor: an online glycan builder with 2D and 3D structure visualization"
(Journal of Chemical Information and Modeling, 2021, v. 61(10), pp. 4940-4948)
Ph.V. Toukach, K.S. Egorova
"New features of CSDB Linear, as compared to other carbohydrate notations"
(Journal of Chemical Information and Modeling, 2020, v. 60(3), pp. 1276-1289)
V.S. Stroylov, M.P. Panova, Ph.V. Toukach
"Comparison of methods for bulk automated simulation of glycosidic bond conformations"
(International Journal of Molecular Science, 2020, v. 21(20), ID 7626)
S.I. Scherbinina, Ph.V. Toukach
"Three-dimensional structures of carbohydrates and where to find them"
(International Journal of Molecular Science, 2020, v. 21(20), ID 7702)
K.S. Egorova, Yu.A. Knirel, Ph.V. Toukach
"Expanding CSDB_GT glycosyltransferase database with Escherichia coli"
(Glycobiology, 2019, v. 29(4), pp. 285-287)
I.Yu. Chernyshov, Ph.V. Toukach
"REStLESS: automated translation of glycan sequences from residue-based notation to SMILES and atomic coordinates"
(Bioinformatics, 2018, v. 34(15), pp. 2679-2681)
K.S. Egorova, Ph.V. Toukach
"Glycoinformatics: bridging isolated islands in the sea of data"
(Angewandte Chemie International Edition, 2018, v. 57, pp. 14986-14990)
R.R. Kapaev, Ph.V. Toukach
"GRASS: semi-automated NMR-based structure elucidation of saccharides"
(Bioinformatics, 2018, v. 34(6), pp. 957-963)
Ph. Toukach, K. Egorova
"Carbohydrate Structure Database (CSDB): examples of usage"
(in "A Practical Guide to Using Glycomics Databases", ed: K.F. Aoki-Kinoshita, Springer Japan, 2017, ch.5, pp. 75-113, ISBN 978-4-431-56452-2)
K.S Egorova, Ph.V. Toukach
"CSDB_GT : a new curated database on glycosyltransferases"
(Glycobiology, 2017, v.27(4), pp.285-290)
Ph.V. Toukach, K.S Egorova
"Carbohydrate Structure Database merged from bacterial, archaeal, plant and fungal parts"
(Nucleic Acid Research Database Issue, 2016, v. 44(D1), pp. D1229-D1236)
K.S Egorova, A.N. Kondakova, Ph.V. Toukach
"Carbohydrate Structure Database: tools for statistical analysis of bacterial, plant and fungal glycomes"
(Database, 2015, ID bav073)
Ph. Toukach, K. Egorova
"Bacterial, Plant, and Fungal Carbohydrate Structure Databases: daily usage"
(in "Glycoinformatics", eds: T. Lütteke, M. Frank, ñåðèÿ: Methods in Molecular Biology, ò. 1273. Springer New York, 2015, ch. 5, pp. 55-85, ISBN 978-1-4939-2342-7)
R.R. Kapaev, Ph.V. Toukach
"Improved carbohydrate structure generalization scheme for 1H and 13C NMR simulations"
(Analytical Chemistry, 2015, v. 87(14), pp. 7006-7010)
R. Ranzinger, K.F. Aoki-Kinoshita, M.P. Campbell, S. Kawano, T. Lütteke, S. Okuda, D. Shinmachi, T. Shikanai, H.Sawaki, Ph.V. Toukach, M. Matsubara, I. Yamada, H. Narimatsu
"GlycoRDF: An ontology to standardize Glycomics data in RDF"
(Bioinformatics, 2015, v. 31(6), pp. 919-925)
R.R. Kapaev, K.S. Egorova, Ph.V. Toukach
"Carbohydrate structure generalization scheme for database-driven simulation of experimental observables, such as NMR chemical shifts"
(Journal of Chemical Information and modeling, 2014, v. 54, pp. 2594-2611)
Ph. Toukach, K. Egorova
"Bacterial, Plant, and Fungal Carbohydrate Structure Database (CSDB)"
(in "Glycoscience: Biology and Medicine", eds: T. Endo, P.H. Seeberger, G.W. Hart, C-H. Wong, N. Taniguchi, Springer Japan, 2014, ch. 29, pp. 241-250, ISBN 978-4-431-54840-9)
K.S. Egorova, Ph.V. Toukach
"Expansion of coverage of Carbohydrate Structure Database (CSDB)"
(Carbohydrate Research, 2014, v. 389, pp. 112–114)
K.F. Aoki-Kinoshita, J. Bolleman, M.P. Campbell, S. Kawano, J. Kim, T. Lütteke, M. Matsubara, S. Okuda, R. Ranzinger, H. Sawaki, T. Shikanai, D. Shinmachi, Y. Suzuki, Ph.V. Toukach, I. Yamada, N.H. Packer, H. Narimatsu
"Introducing glycomics data into the Semantic Web"
(Journal of Biomedical Semantics, 2013, v. 4, id.39)
K.S. Egorova, Ph.V. Toukach
"Critical analysis of CCSD data quality"
(Journal of Chemical Information and modeling, 2012, v. 52(11), pp.2812-2814)
Ph.V. Toukach
"Bacterial Carbohydrate Structure Database 3: Principles and Realization"
(Journal of Chemical Information and modeling, 2011, v. 51(1), pp.159-170)
S. Herget, Ph.V. Toukach, R. Ranzinger, W.E. Hull, Y. Knirel, C.-W. von der Lieth
"Statistical analysis of the Bacterial Carbohydrate Structure Data Base (BCSDB): Characteristics and diversity of bacterial carbohydrates in comparison with mammalian glycans"
(BMC Structural Biology, 2008, v.8, id.35)
Ph. Toukach, H. Joshi, R. Ranzinger, Yu. Knirel, C.-W. von der Lieth
"Sharing of worldwide distributed carbohydrate-related digital resources: online connection of the Bacterial Carbohydrate Structure DataBase and GLYCOSCIENCES.de"
(Nucleic Acid Research - Database Issue, 2007, v.35, pp. D280-D286)