| Literature DB >> 35011294 |
Vincenzo Daponte1,2, Catherine Hayes1,2, Julien Mariethoz1,2, Frederique Lisacek1,2,3.
Abstract
The level of ambiguity in describing glycan structure has significantly increased with the upsurge of large-scale glycomics and glycoproteomics experiments. Consequently, an ontology-based model appears as an appropriate solution for navigating these data. However, navigation is not sufficient and the model should also enable advanced search and comparison. A new ontology with a tree logical structure is introduced to represent glycan structures irrespective of the precision of molecular details. The model heavily relies on the GlycoCT encoding of glycan structures. Its implementation in the GlySTreeM knowledge base was validated with GlyConnect data and benchmarked with the Glycowork library. GlySTreeM is shown to be fast, consistent, reliable and more flexible than existing solutions for matching parts of or whole glycan structures. The model is also well suited for painless future expansion.Entities:
Keywords: glycan structure; knowledge representation; ontology; pattern recognition; semantic web
Mesh:
Substances:
Year: 2021 PMID: 35011294 PMCID: PMC8746581 DOI: 10.3390/molecules27010065
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1GlycoCT Breakdown.
Textual and graphical representation of GlcNAc using IUPAC and GlycoCT syntaxes.
| IUPAC | GlycoCT | SNFG Cartoon | ||
|---|---|---|---|---|
| GlcNAc | RES |
|
Figure 2The class hierarchy describing the glycan model top classes and the residue structure of the core and the bag.
Figure 3Semantic representation of the GlcNAc building block in the GlySTreeM knowledge base: the base and the substituent are related to the same residue.
Correspondences between GlySTreeM and SKOO classes.
| GlySTreeM Class | SKOO Class |
|---|---|
| Glycan | ⊑ DomainObject |
| Residue | ⊑ DomainObject |
| Molecule | ⊑ DomainObject |
| Base | ⊑ DomainObject |
| Substituent | ⊑ DomainObject |
| Epitope | ⊑ Hypothesis |
| GlycanCore | ⊑ Proof |
| GlycanBag | ⊑ Hypothesis |
| GlycanBagItem | ⊑ Observation |
| ResidueRoot (in core structures) | ⊑ Assertion |
Quantitative Queries—Results.
| Query No. | GlyConnect Result | Triple Store Result | Notes |
|---|---|---|---|
| 1 | 0 | 0 | Expected result |
| 2 | 4781 | 4779 | Note 1 |
| 3 | 20 | 20 | Expected result |
| 4 | 39 | 39 | Expected result |
| 5 | 6 | 6 | Expected result |
| 6 | 5 | 5 | Expected result |
| 7 | 9 | 9 | Expected result |
| 8 | 10 | 10 | Expected result |
| 9 | 82 | 82 | Expected result |
| 10 | 4 | 4 | Expected result |
| 11 | 1285, 809, 477, 242, | 1285, 809, 477, 242, | Expected result |
| 12 | 476, 332, 235, 131, | 476, 332, 235, 131, | Expected result |
| 13 | 29 | 29 | Expected result |
| 14 | 4810 | 4808 | Note 1 and 2 |
| 15 | 4781 | 4779 | Note 1 and 2 |
| 16 | 4781 | 4779 | Note 1 and 2 |
Note 1 GlyConnect contains two structures with repeat units (structures 414 and 2371). These were omitted from the triple store. Note 2 The total number of structures includes those with GlycoCT strings (4779) and those without (29). Only those with GlycoCT strings have a GlycanCore or a ResidueRoot. Average real time for these queries varied between 0.07 and 0.60 s.
Qualitative Queries—Results.
| Query No. | GlyConnect Result | Triple Store Result | Notes |
|---|---|---|---|
| 1 | Yes—11, 9901 | Yes—11, 9901 | Expected result |
| 2 | 3316 | 3316 | Expected result |
| 3 | 3 sections; bagitem 1, | Expected result | |
| 4 | 51 structures | 51 structures | Expected result |
| 5 | Yes, structure 3456 | Yes, structure 3456 | Expected result |
| 6 | O-linked | Same subset of | Expected result |
Average real time for these queries varied between 0.18 and 0.68 s.
Figure 4Example of a structure extracted from GlySTreeM using SPARQL query.
Figure 5(a) Lewis B/Y type structure and (b) Free Fuc and GlcNAc residues
Use Case 1.
|
| Identify Lewis A/B/X/Y type |
|
| Bioinformatician/Scientist. |
|
| List of 4779 glycans1 in GlycoCT. |
|
|
|
| (1) SPARQL query on GlySTreeM | (2) 116 positive matches. |
| (3) Produce 116 IUPAC codes | (4) 114 IUPAC |
| (5) Run annotate-glycan | (6) Glycowork: 53 Positive, 28 |
| (7) Validate IUPAC and rerun. | (8) Glycowork: 53 Postive, 61 negative. |
| (9) Added wildcards to optional | (10) Glycowork: 64 positive, 50 negative. |
| (11) Added customised motifs. | (12) Glycowork: 102 positive, |
| (13) Refactored IUPAC condensed | (14) Glycowork: 112 positive, |
|
| |
| The 116 positive results from GlySTreeM were manually validated by | |
1 Data from GlyConnect; 2 71 available in GlyConnect, 33 were converted using GlycanFormtConverter; 3 2 were discounted as they gave GlycoCT validation error with the converter.
Use Case 1—Randomised dataset analysis.
| Total | Converted | Processed | Positives | TP | FP | TN | FN | |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
|
| 200 | 199 | 199 | 8 | 8 | 0 | 191 | 0 |
|
| 200 | 152 | 146 | 8 | 8 | 0 | 138 | 0 |
|
| ||||||||
|
| 200 | 199 | 199 | 4 | 4 | 0 | 195 | 0 |
|
| 200 | 152 | 146 | 4 | 4 | 0 | 142 | 0 |
|
| ||||||||
|
| 200 | 198 | 198 | 3 | 3 | 0 | 195 | 0 |
|
| 200 | 141 | 135 | 2 | 1 | 0 | 133 | 1 |
|
| ||||||||
|
| 200 | 199 | 199 | 6 | 6 | 0 | 193 | 0 |
|
| 200 | 136 | 131 | 6 | 6 | 0 | 125 | 0 |
|
| ||||||||
|
| 200 | 198 | 198 | 6 | 6 | 0 | 192 | 0 |
|
| 200 | 135 | 130 | 6 | 6 | 0 | 124 | 0 |
1 Abbrev: DS, DataSet; TP, True Positives; FP, False Positives; TN, True Negatives; FN, False Negatives. 2 One of the true positives was not converted to IUPAC with GlycanFormatConverter so is not present in the dataset processed by GlycoWord. 3 GlycoWork false negative is structure ID 3529 in GlyConnect.
Use Case 2.
|
| Identify Lewis A/B/X/Y type |
|
| Bioinformatician/Scientist. |
|
| List of 4779 glycans |
|
|
|
| (1) SPARQL query on GlySTreeM | (2) 11 positive matches. |
| (3) Produce 11 IUPAC codes. | (4) 7 IUPAC |
| (5) Run annotate-glycan function on 7. | (6) Glycowork: IUPAC validation error. |
| (7) Validate IUPAC and rerun. | (8) Glycowork: IUPAC validation error. |
| (9) Added wildcards to optional | (10) Glycowork: IUPAC validation error. |
| (11) Added customised motifs. | (12) Glycowork: IUPAC validation error |
|
| |
| The 11 structures that were identified by GlySTreeM all contain at least one Fuc residue | |
1 Data from GlyConnect; 2 3 had no available IUPAC in GlyConnect; 1 did not have a Gal attached to the GlcNAc.
Figure 6Example of false-negative structure assigned by Glycowork. (a) Example of false negative structure assigned by Glycowork due to an IUPAC validation error and (b) Example of an undetermined structure potentially containing the Lewis B or Y motifs.