| Literature DB >> 20122206 |
Limin Li1, Wai-Ki Ching, Takako Yamaguchi, Kiyoko F Aoki-Kinoshita.
Abstract
BACKGROUND: Glycobiology pertains to the study of carbohydrate sugar chains, or glycans, in a particular cell or organism. Many computational approaches have been proposed for analyzing these complex glycan structures, which are chains of monosaccharides. The monosaccharides are linked to one another by glycosidic bonds, which can take on a variety of comformations, thus forming branches and resulting in complex tree structures. The q-gram method is one of these recent methods used to understand glycan function based on the classification of their tree structures. This q-gram method assumes that for a certain q, different q-grams share no similarity among themselves. That is, that if two structures have completely different components, then they are completely different. However, from a biological standpoint, this is not the case. In this paper, we propose a weighted q-gram method to measure the similarity among glycans by incorporating the similarity of the geometric structures, monosaccharides and glycosidic bonds among q-grams. In contrast to the traditional q-gram method, our weighted q-gram method admits similarity among q-grams for a certain q. Thus our new kernels for glycan structure were developed and then applied in SVMs to classify glycans.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20122206 PMCID: PMC3009505 DOI: 10.1186/1471-2105-11-S1-S33
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Statistics about the chemical bonds and monosaccharides in glycan database.
| bond | occurrence | percent | mono | occurrence | percent |
|---|---|---|---|---|---|
| b1-4 | 16475 | 0.272 | GlcNAc | 13110 | 0.1833 |
| a1-3 | 7002 | 0.1156 | Gal | 12248 | 0.1712 |
| b1-3 | 6039 | 0.0997 | Man | 10632 | 0.1486 |
| a1-6 | 4802 | 0.0793 | Glc | 7446 | 0.1041 |
| b1-2 | 4051 | 0.0669 | LFuc | 3003 | 0.042 |
| a1-2 | 3974 | 0.0656 | Neu5Ac | 2682 | 0.0375 |
| a1-4 | 3734 | 0.0617 | S | 2653 | 0.0371 |
| b1-6 | 2538 | 0.0419 | GalNAc | 2601 | 0.0364 |
| a2-3 | 1692 | 0.0279 | LRha | 1606 | 0.0225 |
| -6 | 1249 | 0.0206 | Xyl | 1418 | 0.0198 |
| a2-6 | 1217 | 0.0201 | GlcA | 1135 | 0.0159 |
| -2 | 1042 | 0.0172 | GlcN | 1074 | 0.015 |
| b1- | 879 | 0.0145 | * | 999 | 0.014 |
| b1-1 | 809 | 0.0134 | Cer | 833 | 0.0116 |
| a1- | 600 | 0.0099 | P | 772 | 0.0108 |
| -4 | 585 | 0.0097 | Lgro-manHep | 589 | 0.0082 |
| -3 | 553 | 0.0091 | Asn | 545 | 0.0076 |
| - | 318 | 0.0053 | Kdo | 496 | 0.0069 |
| a1-5 | 315 | 0.0052 | Fruf | 358 | 0.005 |
| a2-8 | 224 | 0.0037 | LIdoA | 354 | 0.0049 |
| 1-3 | 223 | 0.0037 | GalA | 337 | 0.0047 |
| 1- | 220 | 0.0036 | LAraf | 309 | 0.0043 |
| 1-4 | 218 | 0.0036 | Neu5Gc | 253 | 0.0035 |
| a2-4 | 152 | 0.0025 | Galf | 237 | 0.0033 |
Occurrence gives the times each chemical bond and monosaccharides appears in glycan database. Percent means the percentage of each chemical bond or monosaccharides in glycan database.
Bond similarity. The matrix gives the similarity among chemical bonds in glycan database. Higher score indicates that the two chemical bonds are more similar to each other.
| a1-2 | a1-3 | a1-4 | a1-6 | b1-2 | b1-3 | b1-4 | b1-6 | a2-3 | a2-6 | a2-8 | a2-9 | -6 | -3 | -4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| a1-2 | 1 | ||||||||||||||
| a1-3 | 0.6 | 1 | |||||||||||||
| a1-4 | 0.6 | 0.9 | 1 | ||||||||||||
| a1-6 | 0.6 | 0.6 | 0.6 | 1 | |||||||||||
| b1-2 | 0.8 | 0.5 | 0.5 | 0.5 | 1 | ||||||||||
| b1-3 | 0.5 | 0.8 | 0.6 | 0.5 | 0.6 | 1 | |||||||||
| b1-4 | 0.5 | 0.6 | 0.8 | 0.5 | 0.6 | 0.9 | 1 | ||||||||
| b1-6 | 0.5 | 0.5 | 0.5 | 0.8 | 0.6 | 0.6 | 0.6 | 1 | |||||||
| a2-3 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 1 | ||||||
| a2-6 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.6 | 1 | |||||
| a2-8 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.6 | 0.7 | 1 | ||||
| a2-9 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.6 | 0.7 | 0.85 | 1 | |||
| -6 | 0 | 0 | 0 | 0.55 | 0 | 0 | 0 | 0.3 | 0 | 0 | 0 | 0 | 1 | ||
| -3 | 0 | 0.47 | 0 | 0 | 0 | 0.41 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
| -4 | 0 | 0 | 0.8 | 0 | 0 | 0 | 0.2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Glycan data. The data labels, the number of each class and the total number of each data.
| leukemia | non-leukemia | total |
|---|---|---|
| 162 | 193 | 355 |
| cystic fibrosis | non-cystic | total |
| 104 | 118 | 222 |
Results. The results for four methods are reported: traditional q-gram, Linkage (LK) kernel method, KCaM (KM) kernel method and Linkage KCaM (LKM) kernel method.
| Leu | KM | LKM | LK | Cystic | KM | LKM | LK | ||
|---|---|---|---|---|---|---|---|---|---|
| 2-gram | 0.9578 | 0.9555 | 0.9623 | 0.9606 | 2-gram | 0.7872 | 0.7666 | 0.7581 | 0.7684 |
| 3-gram | 0.9568 | 0.9608 | 0.9621 | 0.9647 | 3-gram | 0.8220 | 0.8151 | 0.8034 | 0.7823 |
| 4-gram | 0.9499 | 0.9516 | 0.9540 | 4-gram | 0.7812 | 0.7648 | 0.7467 | ||
| 5-gram | 0.9354 | 0.9365 | 0.9311 | 5-gram | 0.7254 | 0.7530 | 0.7441 | ||
| 6-gram | 0.9300 | 0.9272 | 0.9287 | 6-gram | 0.6886 | 0.7224 | 0.7265 | ||
| 7-gram | 0.9272 | 0.9245 | 0.9181 | 7-gram | 0.5965 | 0.6088 | 0.6186 | ||
| 8-gram | 0.9086 | 0.9039 | 0.8990 | 8-gram | 0.5319 | 0.5354 | 0.5522 | ||
| 9-gram | 0.8906 | 0.8889 | 0.8875 | 9-gram | 0.4794 | 0.498 | 0.4922 | ||
| 0.9368 | 0.9441 | 0.9500 | 0.7698 | 0.7953 | 0.7645 | ||||
| Multiple | 0.9472 | 0.9591 | 0.9621 | Multiple | 0.8091 | 0.8225 | 0.7892 |