1 N.D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow, Russia
2 National Research University Higher School of Economics, Moscow, Russia
KEYWORDS: CSDB, Carbohydrate Structure Database, CSDB Linear, carbohydrate notation, glycoinformatics
Journal of Chemical Information and Modeling, 2020, v.60(3), pp. 1276-1289
DOI: 10.1021/acs.jcim.9b00744, PMID: 31790229
The CSDB Linear notation for carbohydrate sequences utilized in the Carbohydrate Structure Database (CSDB) has been improved to meet modern requirements in glycoinformatics. The new features include the possibility to combine repeating and non-repeating moieties in one structure; support of carbon-carbon bonds; and usage of SMILES encodings for unambiguous chemical description of glycan structures, including aglycons and atypical components. The new capabilities of CSDB Linear, together with the older ones, allow efficient detection of errors in CSDB and, at the same time, ensure the absence of informatic problems common for human-readable notations. The CSDB Linear implementation provides translation to other carbohydrate notations and multiple procedures for content error checking.