| Literature DB >> 22456542 |
Abstract
Spectra-structure relationships were investigated for estimating the anomeric configuration, residues and type of linkages of linear and branched trisaccharides using 13C-NMR chemical shifts. For this study, 119 pyranosyl trisaccharides were used that are trimers of the α or β anomers of D-glucose, D-galactose, D-mannose, L-fucose or L-rhamnose residues bonded through a or b glycosidic linkages of types 1→2, 1→3, 1→4, or 1→6, as well as methoxylated and/or N-acetylated amino trisaccharides. Machine learning experiments were performed for: (1) classification of the anomeric configuration of the first unit, second unit and reducing end; (2) classification of the type of first and second linkages; (3) classification of the three residues: reducing end, middle and first residue; and (4) classification of the chain type. Our previously model for predicting the structure of disaccharides was incorporated in this new model with an improvement of the predictive power. The best results were achieved using Random Forests with 204 di- and trisaccharides for the training set-it could correctly classify 83%, 90%, 88%, 85%, 85%, 75%, 79%, 68% and 94% of the test set (69 compounds) for the nine tasks, respectively, on the basis of unassigned chemical shifts.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22456542 PMCID: PMC6268221 DOI: 10.3390/molecules17043818
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
RF, CT and CPGNN predictions of the anomeric configurations, type of linkages, residues and chain type in trisaccharides from 1D 13C-NMR descriptors.
| RF a | CT | CPGNN b | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Training set / Test set | |||||||||||
| Model | Classes | Size | Correct pred. | Sensitivity c | Specificity d | Correct pred. | Sensitivity | Specificity | Correct pred. | Sensitivity | Specificity |
|
|
| 46/12 | 33/6 | 0.71/0.50 | 0.75/0.67 | 43/7 | 0.93/0.58 | 0.78/0.58 | 27/5 | 0.59/0.42 | 0.59/0.56 |
|
| 46/15 | 35/12 | 0.76/0.8 | 0.73/0.67 | 34/10 | 0.74/0.67 | 0.92/0.67 | 27/11 | 0.59/0.73 | 0.59/0.61 | |
|
|
| 46/12 | 45/12 | 0.98/1 | 0.98/0.92 | 38/5 | 0.83/0.42 | 0.80/0.67 | 30/5 | 0.65/0.42 | 0.62/0.56 |
|
| 46/15 | 45/14 | 0.98/0.93 | 0.98/1 | 37/10 | 0.81/0.50 | 0.82/0.59 | 28/11 | 0.61/0.73 | 0.64/0.61 | |
|
|
| 46/16 | 34/12 | 0.74/0.75 | 0.77/0.92 | 38/11 | 0.83/0.69 | 0.93/1 | 23/8 | 0.5/ 0.5 | 0.74/0.89 |
|
| 46/11 | 36/10 | 0.78/0.91 | 0.75/0.71 | 43/11 | 0.93/1 | 0.84/0.69 | 38/10 | 0.83/0.91 | 0.62/0.56 | |
|
|
| 33/13 | 30/10 | 0.91/0.77 | 0.83/1 | 28/10 | 0.85/0.77 | 0.78/0.62 | 8/4 | 0.24/0.31 | 0.53/0.8 |
|
| 27/7 | 22/7 | 0.81/1 | 0.85/0.78 | 23/4 | 0.85/0.57 | 0.72/0.57 | 18/6 | 0.67/0.86 | 0.34/0.35 | |
|
| 16/4 | 11/2 | 0.69/ 0.5 | 0.92/1 | 8/0 | 0.5/0 | 0.67/0 | 6/3 | 0.38/0.75 | 0.28/0.6 | |
|
| 16/3 | 15/3 | 0.94/1 | 0.83/0.5 | 12/3 | 0.75/1 | 1/1 | 0/0 | 0/0 | 0/0 | |
|
|
| 8/1 | 1/0 | 0.12/0 | 1/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
|
| 17/12 | 9/7 | 0.53/0.58 | 1/1 | 12/8 | 0.70/0.67 | 0.86/1 | 7/6 | 0.41/0.5 | 0.32/0.86 | |
|
| 51/13 | 50/13 | 0.98/1 | 0.74/0.72 | 47/11 | 0.92/0.85 | 0.75/0.73 | 39/12 | 0.76/0.92 | 0.57/0.6 | |
|
| 16/1 | 13/1 | 0.81/1 | 0.93/0.5 | 13/1 | 0.81/1 | 0.87/0.25 | 0/0 | 0/0 | 0/0 | |
|
|
| 26/17 | 21/14 | 0.81/0.82 | 0.6/1 | 18/10 | 0.69/0.59 | 0.75/0.77 | 6/3 | 0.23/0.18 | 1/1 |
|
| 18/5 | 8/5 | 0.44/1 | 0.5/0.71 | 8/2 | 0.44/0.4 | 0.89/1 | 6/2 | 0.33/0.4 | 0.17/0.14 | |
|
| 18/3 | 9/2 | 0.5/0.67 | 0.64/0.67 | 11/1 | 0.61/0.33 | 0.61/0.2 | 9/1 | 0.5/0.33 | 0.24/0.12 | |
|
| 17/2 | 12/2 | 0.70/1 | 0.92/0.67 | 16/1 | 0.94/0.5 | 0.84/0.25 | 3/2 | 0.23/1 | 0.25/1 | |
|
| 13/0 | 10/0 | 0.77/--- | 0.71/--- | 13/0 | 1/--- | 0.59/--- | 1/0 | 0.08/--- | 1/--- | |
|
|
| 26/6 | 19/5 | 0.73/0.83 | 0.61/0.56 | 20/3 | 0.77/0.5 | 0.83/0.33 | 7/3 | 0.27/0.5 | 0.78/0.6 |
|
| 19/5 | 9/4 | 0.47/0.8 | 0.64/0.8 | 8/1 | 0.42/0.2 | 0.53/0.17 | 11/2 | 0.58/0.4 | 0.34/0.2 | |
|
| 19/5 | 9/4 | 0.47/0.8 | 0.43/0.8 | 12/2 | 0.63/0.4 | 0.63/0.67 | 8/0 | 0.42/0 | 0.35/0 | |
|
| 12/4 | 6/3 | 0.5/0.75 | 0.6/1 | 10/3 | 0.83/0.75 | 0.53/0.75 | 7/3 | 0.58/0.75 | 0.25/0.5 | |
|
| 16/7 | 9/4 | 0.56/0.57 | 0.56/0.8 | 11/4 | 0.69/0.57 | 0.73/0.8 | 0/1 | 0/0.14 | 0/1 | |
|
|
| 22/9 | 18/6 | 0.82/0.67 | 0.86/0.67 | 15/5 | 0.68/0.56 | 0.83/0.62 | 10/2 | 0.45/0.22 | 0.91/1 |
|
| 18/3 | 14/2 | 0.78/0.67 | 0.82/0.28 | 11/1 | 0.61/0.33 | 0.69/0.17 | 8/2 | 0.44/0.67 | 0.38/0.17 | |
|
| 16/6 | 9/2 | 0.56/0.33 | 0.56/1 | 13/5 | 0.81/0.83 | 0.59/0.83 | 6/2 | 0.38/0.33 | 0.21/0.5 | |
|
| 19/4 | 14/3 | 0.74/0.75 | 0.7/0.75 | 12/3 | 0.63/0.75 | 0.63/1 | 8/2 | 0.42/0.5 | 0.28/0.28 | |
|
| 17/5 | 15/3 | 0.88/0.6 | 0.83/0.6 | 14/3 | 0.82/0.6 | 0.82/0.75 | 1/1 | 0.06/0.2 | 0.33/0.5 | |
|
|
| 39/8 | 29/7 | 0.74/0.88 | 0.83/0.88 | 37/8 | 0.95/1 | 0.80/0.67 | 26/6 | 0.67/0.75 | 0.81/1 |
|
| 53/19 | 47/18 | 0.89/0.95 | 0.82/0.95 | 44/15 | 0.83/0.79 | 0.96/1 | 47/19 | 0.89/1 | 0.78/0.90 | |
a Out-of-bag; b 10-Fold cross-validation; c Ratio of true positives to the sum of true positives and false negatives; d Ratio of true positives to the sum of true positives and false positives; 1 Anomeric configuration of the first unit; 2 Anomeric configuration of the second unit; 3 Anomeric configuration of the reducing end; 4 First linkage type; 5 Second linkage type; 6 Reducing end; 7 Middle residue; 8 First residue.
Mean predictability of RF, CT and CPGNN predictions of the anomeric configurations, type of linkages, residues and chain type in trisaccharides from 1D 13C-NMR.
| Mean Predictability (%) a | |||||||
|---|---|---|---|---|---|---|---|
| Training set b | Test set | ||||||
| Model | RF | CT | CPGNN | RF | CT | CPGNN | |
| Anomeric Configurations |
| 73.91 | 83.70 | 58.70 | 65.00 | 62.5 | 57.50 |
|
| 97.83 | 81.52 | 63.04 | 96.67 | 54.17 | 57.50 | |
|
| 76.09 | 88.04 | 66.30 | 82.95 | 84.38 | 70.45 | |
| Linkage Types |
| 83.72 | 73.76 | 32.10 | 81.73 | 58.52 | 47.87 |
|
| 61.18 | 61.00 | 29.41 | 64.58 | 62.82 | 35.58 | |
| Residues |
| 64.54 | 73.78 | 26.35 | 87.25 | 45.54 | 47.74 |
|
| 54.81 | 66.85 | 37.05 | 75.10 | 48.43 | 41.25 | |
|
| 75.55 | 71.21 | 35.08 | 60.33 | 61.44 | 43.06 | |
|
| 81.52 | 88.94 | 77.67 | 91.12 | 89.47 | 87.50 | |
a Average sensitivity of the trisaccharide classes; b 10-Fold cross-validation with CPGNN and out-of-bag estimation with RF on training set; 1 Anomeric configuration of the first unit; 2 Anomeric configuration of the second unit; 3 Anomeric configuration of the reducing end; 4 First linkage type; 5 Second linkage type; 6 Reducing end; 7 Middle residue; 8 First residue.
Figure 1Representation of the classification tree derived with CART algorithm to distinguish the reducing end anomeric configuration of 92 trisaccharides (training set).
RF predictions of the anomeric configurations, type of linkages, residues and chain type in 204 di- and trisaccharides for the training set and 69 for the test set using 1D 13C-NMR descriptors.
| Training set / Test set | ||||||
|---|---|---|---|---|---|---|
| Model | Classes | Size | Correct pred. | Sensitivity a | Specificity b | Mean Predictability c (%) |
|
|
| 105/30 | 89/25 | 0.85/0.83 | 0.88/0.78 | 86.32/82.69 |
|
| 99/39 | 87/32 | 0.88/0.82 | 0.84/0.86 | ||
|
|
| 46/12 | 38/10 | 0.83/0.83 | 0.74/0.83 | 84.78/90 |
|
| 46/15 | 33/13 | 0.72/0.87 | 0.80/0.87 | ||
|
| 112/42 | 112/42 | 1/1 | 1/1 | ||
|
|
| 102/39 | 84/31 | 0.82/0.79 | 0.88/0.94 | 85.78/88.08 |
|
| 102/30 | 91/28 | 0.89/0.93 | 0.83/0.78 | ||
|
|
| 33/13 | 30/10 | 0.91/0.77 | 0.79/1 | 84.99/85.38 |
|
| 27/7 | 21/7 | 0.78/1 | 0.88/0.78 | ||
|
| 16/4 | 11/2 | 0.69/0.5 | 0.85/0.67 | ||
|
| 16/3 | 14/3 | 0.88/1 | 0.82/0.6 | ||
|
| 112/42 | 112/42 | 1/1 | 1/1 | ||
|
|
| 36/13 | 22/10 | 0.61/0.77 | 0.88/0.83 | 82.32/85.16 |
|
| 48/21 | 38/15 | 0.79/0.71 | 0.84/0.94 | ||
|
| 71/26 | 69/24 | 0.97/0.92 | 0.748/0.77 | ||
|
| 49/9 | 45/9 | 0.92/1 | 0.98/0.9 | ||
|
|
| 72/38 | 60/33 | 0.83/0.87 | 0.71/0.75 | 75.70/74.96 |
|
| 58/13 | 39/9 | 0.67/0.69 | 0.72/0.75 | ||
|
| 44/16 | 37/7 | 0.73/0.44 | 0.86/0.7 | ||
|
| 17/2 | 12/2 | 0.70/1 | 0.92/0.67 | ||
|
| 13/0 | 11/0 | 0.87/--- | 0.69/--- | ||
|
|
| 26/6 | 20/5 | 0.77/0.83 | 0.69/0.56 | 62.27/79.25 |
|
| 19/5 | 9/4 | 0.47/0.8 | 0.5/0.8 | ||
|
| 19/5 | 7/4 | 0.37/0.8 | 0.37/0.8 | ||
|
| 12/4 | 6/3 | 0.5/0.75 | 0.67/1 | ||
|
| 16/7 | 10/4 | 0.62/0.57 | 0.59/0.8 | ||
|
| 112/42 | 112/42 | 1/1 | 1/1 | ||
|
|
| 74/33 | 65/31 | 0.88/0.94 | 0.86/0.76 | 79.64/67.50 |
|
| 44/7 | 31/2 | 0.70/0.28 | 0.74/0.33 | ||
|
| 50/20 | 39/12 | 0.78/0.6 | 0.78/1 | ||
|
| 19/4 | 14/3 | 0.74/0.75 | 0.67/0.75 | ||
|
| 17/5 | 15/4 | 0.88/0.8 | 0.88/0.67 | ||
|
|
| 39/8 | 28/7 | 0.72/0.88 | 0.82/0.88 | 86.82/94.08 |
|
| 53/19 | 47/18 | 0.89/0.95 | 0.81/0.95 | ||
|
| 112/42 | 112/42 | 1/1 | 1/1 | ||
a Ratio of true positives to the sum of true positives and false negatives; b Ratio of true positives to the sum of true positives and false positives; c Average sensitivity of the trisaccharides classes; 1 Anomeric configuration of the first unit; 2 Anomeric configuration of the second unit; 3 Anomeric configuration of the reducing end; 4 First linkage type; 5 Second linkage type; 6 Reducing end; 7 Middle residue; 8 First residue.
Comparison of the ten most important descriptors by RF in the trisaccharides and oligosaccharide models and the selected descriptors by CT in the trisaccharides model.
| Model | RF | CT | ||
|---|---|---|---|---|
| Trisaccharides | Di- and trisaccharides | |||
|
| C12; C11; C9; C22; C14; C13; C23; C15; C16; C17 | C23; C19; C20; C12; C15; C18; C21; C13; C11; C22 | C12; C9 (2×); C21 | |
|
| C15; C14; C13; C22; C16; C6; C10; C23; C17; C7 | C12; C6; C10; C9; C8; C11; C7; C21; C13; C14 | C14; C6; C22 | |
|
| C16; C6; C21; C10; C14; C18; C11; C17; C5; C8 | C22; C18; C16; C20; C19; C17; C14; C23; C21; C6 | C16; C6 (2×); C21; C2 | |
|
| C8; C7; C20; C6; C19; C23; C10; C14; C16; C11 | C8; C7; C6; C10; C21; C12; C11; C9; C20; C13 | C8; C20; C7; C11; C6; C17 | |
|
| C8; C7; C19; C6; C22; C20; C5; C18; C9; C23 | C21; C13; C14; C15; C8; C22; C20; C12; C23; C19 | C8 (2×); C22; C17 | |
|
| C6; C5; C22; C19; C20; C7; C9; C10; C14; C18 | C22; C15; C14; C16; C20; C13; C19; C21; C18; C17 | C6; C12; C5; C20; C23 | |
|
| C6; C7; C15; C5; C16; C11; C23; C9; C10; C18 | C10; C7; C6; C9; C8; C12; C11; C21; C15; C5 | C16; C6 (2×); C7; C18; C23 | |
|
| C7; C5; C6; C10; C15; C16; C8; C9; C11; C21 | C14; C12; C16; C15; C23; C17; C20; C5; C21; C13 | C16; C9 (2×); C7; C8; C5 | |
|
| C7; C23; C20; C14; C5; C8; C6; C21; C18; C12 | C8; C21; C7; C6; C9; C10; C11; C12; C14; C23 | C7; C5; C23 (2×); C8 | |
1 Anomeric configuration of the first unit; 2 Anomeric configuration of the second unit; 3 Anomeric configuration of the reducing end; 4 First linkage type; 5 Second linkage type; 6 Reducing end; 7 Middle residue; 8 First residue.