| Literature DB >> 25880169 |
Watcharaporn Tanchotsrinon1, Chidchanok Lursinsap2, Yong Poovorawan3.
Abstract
BACKGROUND: Human Papillomavirus (HPV) genotyping is an important approach to fight cervical cancer due to the relevant information regarding risk stratification for diagnosis and the better understanding of the relationship of HPV with carcinogenesis. This paper proposed two new feature extraction techniques, i.e. ChaosCentroid and ChaosFrequency, for predicting HPV genotypes associated with the cancer. The additional diversified 12 HPV genotypes, i.e. types 6, 11, 16, 18, 31, 33, 35, 45, 52, 53, 58, and 66, were studied in this paper. In our proposed techniques, a partitioned Chaos Game Representation (CGR) is deployed to represent HPV genomes. ChaosCentroid captures the structure of sequences in terms of centroid of each sub-region with Euclidean distances among the centroids and the center of CGR as the relations of all sub-regions. ChaosFrequency extracts the statistical distribution of mono-, di-, or higher order nucleotides along HPV genomes and forms a matrix of frequency of dots in each sub-region. For performance evaluation, four different types of classifiers, i.e. Multi-layer Perceptron, Radial Basis Function, K-Nearest Neighbor, and Fuzzy K-Nearest Neighbor Techniques were deployed, and our best results from each classifier were compared with the NCBI genotyping tool.Entities:
Mesh:
Year: 2015 PMID: 25880169 PMCID: PMC4375884 DOI: 10.1186/s12859-015-0493-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The number of genomes, minimum and maximum genome lengths of HPV genotypes in the HPV data set
|
|
|
| |
|---|---|---|---|
|
|
| ||
| 6 | 58 | 7954 | 8051 |
| 11 | 49 | 7931 | 10424 |
| 16 | 103 | 7881 | 7976 |
| 18 | 19 | 7824 | 7857 |
| 31 | 23 | 7878 | 7945 |
| 33 | 22 | 7830 | 7912 |
| 35 | 28 | 7820 | 7908 |
| 45 | 12 | 7841 | 7858 |
| 52 | 22 | 7933 | 7974 |
| 53 | 16 | 7856 | 7863 |
| 58 | 37 | 7814 | 7836 |
| 66 | 11 | 7816 | 7824 |
Figure 1Chaos game representation (CGR) of HPV genotypes 6, 16, 18, and 31. (a) Genotype 6. (b) Genotype 16. (c) Genotype 18. (d) Genotype 31.
Figure 2The distances between the centroids and the center of CGR for HPV genotype 16 after being partitioned into sub-regions of size 2×2.
Best results of the HPV genotype prediction based on the features extracted by ChaosCentroid and by ChaosFrequency with multi-layer perceptron neural network
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| 6 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 11 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 16 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 18 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 31 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 33 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 35 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 45 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 52 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 53 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 58 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 66 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
Best results of the HPV genotype prediction based on the features extracted by ChaosCentroid and by ChaosFrequency with radial basis function network
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| 6 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 11 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 16 | 99.50 | 99.03 | 99.66 | 0.99 | 100.00 | 100.00 | 100.00 | 1.00 |
| 18 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 31 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 33 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 35 | 99.50 | 96.43 | 99.73 | 0.96 | 100.00 | 100.00 | 100.00 | 1.00 |
| 45 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 52 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 53 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 58 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 66 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
Best results of the HPV genotype prediction based on the features extracted by ChaosCentroid and by ChaosFrequency with k-nearest neighbor technique
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| 6 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 11 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 16 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 18 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 31 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 33 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 35 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 45 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 52 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 53 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 58 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 66 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
Best results of the HPV genotype prediction based on the features extracted by ChaosCentroid and by ChaosFrequency with fuzzy k-nearest neighbor technique
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| 6 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 11 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 16 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 18 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 31 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 33 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 35 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 45 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 52 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 53 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 58 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
| 66 | 100.00 | 100.00 | 100.00 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 |
Results of the HPV genotype prediction obtained by NCBI viral genotyping tool
|
|
|
|
|
|
|---|---|---|---|---|
| 6 | 100.00 | 100.00 | 100.00 | 1.00 |
| 11 | 100.00 | 100.00 | 100.00 | 1.00 |
| 16 | 100.00 | 100.00 | 100.00 | 1.00 |
| 18 | 100.00 | 100.00 | 100.00 | 1.00 |
| 31 | 100.00 | 100.00 | 100.00 | 1.00 |
| 33 | 100.00 | 100.00 | 100.00 | 1.00 |
| 35 | 100.00 | 100.00 | 100.00 | 1.00 |
| 45 | 100.00 | 100.00 | 100.00 | 1.00 |
| 52 | 100.00 | 100.00 | 100.00 | 1.00 |
| 53 | 100.00 | 100.00 | 100.00 | 1.00 |
| 58 | 100.00 | 100.00 | 100.00 | 1.00 |
| 66 | 100.00 | 100.00 | 100.00 | 1.00 |