| Literature DB >> 25150834 |
Amir Hossein KayvanJoo, Mansour Ebrahimi1, Gholamreza Haqshenas.
Abstract
BACKGROUND: Hepatitis C virus (HCV) causes chronic hepatitis C in 2-3% of world population and remains one of the health threatening human viruses, worldwide. In the absence of an effective vaccine, therapeutic approach is the only option to combat hepatitis C. Interferon-alpha (IFN-alpha) and ribavirin (RBV) combination alone or in combination with recently introduced new direct-acting antivirals (DAA) is used to treat patients infected with HCV. The present study utilized feature selection methods (Gini Index, Chi Squared and machine learning algorithms) and other bioinformatics tools to identify genetic determinants of therapy outcome within the entire HCV nucleotide sequence.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25150834 PMCID: PMC4246553 DOI: 10.1186/1756-0500-7-565
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Flowchart of data-mining processes that were applied to each of comparative groups.
Most important nucleotide attributes that were selected by different weighting algorithms
| A | |||
|---|---|---|---|
| Subtype 1a (Responders vs. | Subtype 1b (Responders vs. | ||
| Non-Responders) | Non-Responders) | ||
| Attribute | No. of selective | Attribute | No. of selective |
| attribute | attribute | ||
| weightings | weightings | ||
| (out of 10) | (out of 10) | ||
| Count of hydrogen | 9 | Count of GC | 8 |
| Count of oxygen | 8 |
| 7 |
| Count of CA | 7 | DS Count of nitrogen | 7 |
| Count of CG | 7 | Count of AU | 6 |
| Count of Cytosine | 7 | Count of GG | 5 |
| Count of Guanine | 7 | Count of Uracil | 5 |
| Count of GU | 6 | ||
| Count of UU | 5 | ||
|
| 5 | ||
| Count of CC | 5 | ||
|
| |||
|
|
| ||
|
|
| ||
|
|
|
|
|
|
|
| ||
|
|
| ||
|
|
| ||
| Count of oxygen | 10 |
| 6 |
|
| 7 | Count of CA | 5 |
| Count of Uracil | 7 | Count of carbon | 5 |
| Count of nitrogen | 6 | ||
Ten algorithms (PCA, SVM, Relief, Uncertainty, Gini Index, Chi Squared, Deviation, Rule, Information Gain, and Information Gain Ratio) were used to determine the most important nucleotide attributes for the prediction of HCV subtypes 1a and 1b responders from non-responders (A) and responders from relapsers (B). Common nucleotide attributes used for genotypes 1a and 1b have been bolded. A: adenine, T: thymine, C: cytosine, G: guanine.
Figure 2Achieved Decision Tree from Parallel model ran with Gini Index criterion on PCA dataset, which distinguish HCV subtype 1a responders’ strains from non-responders strains.
Figure 3Achieved Decision Tree from Parallel model ran with Gini Index criterion on Chi Squared dataset, which can distinguish HCV subtype 1a responder strains from relapser strains.
Figure 4Achieved Decision Tree model ran with Gini Index criterion on PCA Dataset, which can distinguish HCV subtype 1b responder strains from non-responder strain.
Figure 5Decision Tree from Parallel model ran with Accuracy criterion on Rule Dataset of HCV subtype 1b responders vs. relapser strains.
The highest values for accuracy, AUC, F-measure, precision, recall, sensitivity, and specificity for predicting responders vs. non-responders (A) and responders vs. relapsers (B) groups
| A | ||||||||
|---|---|---|---|---|---|---|---|---|
| Subtype 1a (Responders vs. Non-Responders) | Subtype 1b (Responders vs. Non-Responders) | |||||||
| Bayes | Neural Networks | SVM | Decision Trees | Bayes | Neural Networks | SVM | Decision Trees | |
|
| Chi Squared | SVM | Relief | PCA | SVM | SVM | Relief | Gini Index |
|
| Naive Bayes (Kernel) | AutoMLp | SVM | DT Parallel Gini Index | Naive Bayes (Kernel) | AutoMLp | SVM | DT Random Forest Info Gain |
|
| 74.17% | 76.67% | 74.17% | 69.17% | 89.17% | 85.00% | 75.00% | 80.00% |
|
| 0.84 | 0.68 | 0.75 | 0.59 | 0.94 | 0.94 | 0.84 | 0.83 |
|
| 0.84 | 0.68 | 0.75 | 0.83 | 0.94 | 0.94 | 0.84 | 0.85 |
|
| 0.84 | 0.68 | 0.75 | 0.58 | 0.94 | 0.94 | 0.84 | 0.80 |
|
| 0.78 | 0.82 | 0.80 | 0.73 | 0.92 | 0.87 | 0.80 | 0.86 |
|
| 0.84 | 0.82 | 0.80 | 0.80 | 0.93 | 0.94 | 0.81 | 0.87 |
|
| 0.73 | 0.82 | 0.88 | 0.73 | 0.93 | 0.83 | 0.83 | 0.90 |
|
| 0.73 | 0.82 | 0.88 | 0.73 | 0.93 | 0.83 | 0.83 | 0.90 |
|
| 0.85 | 0.75 | 0.60 | 0.65 | 0.85 | 0.80 | 0.50 | 0.65 |
|
| ||||||||
|
|
| |||||||
|
|
|
|
|
|
|
|
| |
|
| Chi Squared | SVM | Relief | PCA | SVM | SVM | Relief | Gini Index |
|
| Naive Bayes (Kernel) | AutoMLp | SVM | DT Parallel Gini Index | Naive Bayes (Kernel) | AutoMLp | SVM | DT Random Forest Info Gain |
|
| 82.50% | 79.17% | 82.50% | 81.67% | 78.33% | 78.33% | 84.17% | 81.67% |
|
| 0.89 | 0.79 | 0.82 | 0.61 | 0.00 | 0.66 | ||
|
| 0.89 | 0.79 | 0.82 | 0.91 | 0.85 | 0.85 | ||
|
| 0.89 | 0.79 | 0.82 | 0.74 | 0.15 | 0.47 | ||
|
| 0.84 | 0.86 | 0.86 | 0.84 | 0.87 | 0.87 | 0.91 | 0.89 |
|
| 0.92 | 0.83 | 0.90 | 0.90 | 0.78 | 0.78 | 0.84 | 0.82 |
|
| 0.82 | 0.92 | 0.87 | 0.80 | 1.00 | 1.00 | 1.00 | 1.00 |
|
| 0.82 | 0.92 | 0.87 | 0.80 | 1.00 | 1.00 | 1.00 | 1.00 |
|
| 0.85 | 0.55 | 0.75 | 0.85 | 0.00 | 0.00 | 0.29 | 0.14 |