| Literature DB >> 33093930 |
Haroldas Bagdonas1, Daniel Ungar2, Jon Agirre1.
Abstract
The heterogeneity, mobility and complexity of glycans in glycoproteins have been, and currently remain, significant challenges in structural biology. These aspects present unique problems to the two most prolific techniques: X-ray crystallography and cryo-electron microscopy. At the same time, advances in mass spectrometry have made it possible to get deeper insights on precisely the information that is most difficult to recover by structure solution methods: the full-length glycan composition, including linkage details for the glycosidic bonds. The developments have given rise to glycomics. Thankfully, several large scale glycomics initiatives have stored results in publicly available databases, some of which can be accessed through API interfaces. In the present work, we will describe how the Privateer carbohydrate structure validation software has been extended to harness results from glycomics projects, and its use to greatly improve the validation of 3D glycoprotein structures.Entities:
Keywords: Privateer; X-ray crystallography; electron cryomicroscopy; glycoinformatics; glycomics
Year: 2020 PMID: 33093930 PMCID: PMC7554661 DOI: 10.3762/bjoc.16.204
Source DB: PubMed Journal: Beilstein J Org Chem ISSN: 1860-5397 Impact factor: 2.883
Figure 1Comparison of the glycan features in electron density maps over a range of resolutions from selected glycoprotein structures (PDB entries: 6RI6 [19]; 6MZK [20]; 4O5I [21]). The electron density maps were obtained with X-ray crystallography. The data resolution and PDB entry IDs associated with the structures have been directly annotated on the structure. Left: A high-resolution example where monosaccharides and the conformations can be elucidated; middle: A medium resolution example where the identification starts to become difficult; right: A low-resolution example for which all prior knowledge must be used. Despite coming from different glycoprotein structures, the glycan has the same composition, and thus is assigned a unique GlyTouCan ID of G15407YE.
A comparison of the structural information storage capabilities of different sequence formats used in glycobioinformatics.a
| notation | multiple | repeating | alternative | linear | atomic |
| CCSD(CarbBank) | – | + | – | + | – |
| LINUCS | – | + | – | + | – |
| GlycoSuite | – | – | + | + | – |
| BCSDB | (+) | (+) | + | + | – |
| LinearCode | – | – | + | + | – |
| KCF | + | + | – | – | – |
| GlycoCT | + | + | + | – | – |
| Glyde-II | + | + | – | – | – |
| WURCS 2.0 | + | + | + | + | + |
a“+” Denotes that information can be stored directly without any significant issues, “(+)” denotes that information can be stored indirectly, or that there are some issues and “–” denotes that information description in the particular sequence format is unavailable. This table is a simplified version of the one originally published by Matsubara et al. [52].
Figure 2A roadmap of the software development project that allows structural biologists to quickly obtain detailed information about specific glycans in glycoprotein models from glycomics/glycoproteomics databases. The GlyTouCan (https://glytoucan.org/) and GlyConnect (https://glyconnect.expasy.org/) logos have been reproduced here under explicit permission from their respective authors.
Comparison of the successful glycan matches detected by Privateer in the GlyTouCan and the GlyConnect database.a
| experimental | glycan chain | GlyTouCan ID | GlyTouCan ID | % of GlyTouCan in | total glycan |
| MX | 1 | 16797 | 0 | 1% | 16797 |
| MX | 2 | 5870 | 5 | 90% | 5875 |
| MX | 3 | 2550 | 17 | 71% | 2567 |
| MX | 4 | 1012 | 21 | 80% | 1033 |
| MX | 5 | 834 | 72 | 74% | 906 |
| MX | 6 | 460 | 85 | 69% | 545 |
| MX | 7 | 345 | 55 | 77% | 400 |
| MX | 8 | 235 | 25 | 85% | 260 |
| MX | 9 | 164 | 16 | 81% | 180 |
| MX | 10 | 118 | 5 | 92% | 123 |
| MX | 11 | 20 | 5 | 85% | 25 |
| MX | 12 | 8 | 4 | 75% | 12 |
| MX | 13 | 0 | 1 | 0% | 1 |
| MX | 14 | 0 | 0 | 0% | 0 |
| MX | 15 | 2 | 0 | 0% | 2 |
| MX | 16 | 0 | 1 | 0% | 1 |
| cryo-EM | 1 | 2080 | 0 | 3% | 2080 |
| cryo-EM | 2 | 1081 | 0 | 98% | 1081 |
| cryo-EM | 3 | 439 | 0 | 96% | 439 |
| cryo-EM | 4 | 143 | 0 | 93% | 143 |
| cryo-EM | 5 | 146 | 2 | 85% | 148 |
| cryo-EM | 6 | 70 | 1 | 97% | 71 |
| cryo-EM | 7 | 45 | 0 | 100% | 45 |
| cryo-EM | 8 | 26 | 0 | 88% | 26 |
| cryo-EM | 9 | 15 | 1 | 100% | 16 |
| cryo-EM | 10 | 16 | 0 | 100% | 16 |
| cryo-EM | 11 | 4 | 0 | 100% | 4 |
| cryo-EM | 12 | 1 | 0 | 100% | 1 |
| cryo-EM | 13 | 1 | 0 | 0% | 1 |
aGlycans obtained from the glycoprotein models were elucidated by X-ray crystallography and cryo-EM.
Figure 3N-Linked glycans in Epstein Barr virus major envelope glycoprotein (PDB entry: 2H6O [66]). A) A selection of the glycan chains that failed to return database IDs with their WURCS sequences extracted from the Privateer CCP4i2 report. B) Glycan chain (right) for which a GlyTouCan and GlyConnect ID have successfully been matched with the modelling errors present in the model. After manual fixing (left), the WURCS sequence for the glycan failed to return database IDs. Highlighting in red depicts the locations in WURCS notation where both glycans differ.
Figure 4An N-linked glycan attached to Asn35 of human Toll-like receptor 4 (A: PDB entry 2z62 [68]). Model iteratively rebuilt by PDB-Redo as shown in steps B and C [41]. Pictures at the top depict glycoprotein models of the region of interest and electron-density maps of the glycan chain (grey: 2mFo DFc map, green and red: mFo DFc difference density map). Pictures at the bottom depict the SNFG representations of glycan chains, their WURCS sequence and accession IDs to relevant databases (taken directly from Privateer's CCP4i2 report).