1 N.D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow, Russia
2 National Research University Higher School of Economics, Faculty of Chemistry, Moscow, Russia
KEYWORDS: Carbohydrate Structure Database, 13C NMR errors, data quality, glycan databases, glycobiology, NMR assignment
International Journal of Biological Macromolecules, 2024, т.282(6), ID 137042
DOI: 10.1016/j.ijbiomac.2024.137042, PMID: 39521218
Primary structure elucidation in glycobiology is strongly affected by published structure-reporting NMR signals, especially on the 13C nucleus. The glycan NMR simulation accuracy and machine learning outcome depend on the quality of the NMR signal assignment in glycan databases. Within our work on improving the data quality in the Carbohydrate Structure Database (CSDB), we have applied a systematic search for inconsistencies in the published NMR data. The search was based on a bulk comparison between the experimental and simulated 13C NMR chemical shifts and manual analysis of the mismatches. On the basis of this analysis, CSDB was remediated by marking and correcting the NMR errors found in 272 structure elucidation reports published over the past 40 years.