| Literature DB >> 20550652 |
Mahmoud M ElHefnawi1, Suher Zada, Iman A El-Azab.
Abstract
BACKGROUND: Hepatitis C virus (HCV) is a worldwide health problem with no vaccine and the only approved therapy is Interferon-based plus Ribavarin. Response prediction to treatment has health and economic impacts, and is a multi-factorial problem including both host and viral factors (e.g: age, sex, ethnicity, pre-treatment viral load, and dynamics of the HCV non-structural protein NS5A quasispecies). We implement a novel approach for extracting features including informative markers from mutations in the non-structural 5A protein (NS5A), specifically its Interferon sensitivity determining region (ISDR) and V3 regions, and use a novel bioinformatics approach for pattern recognition on the NS5A protein and its motifs to find biomarkers for response prediction using class association rules and comparing the predictability of the different features.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20550652 PMCID: PMC3238222 DOI: 10.1186/1743-422X-7-130
Source DB: PubMed Journal: Virol J ISSN: 1743-422X Impact factor: 4.099
Summary of sequence analysis and mean genetic distance
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 1a | R | 21 | 21 | 0.017 | 0.03 | |
| 1b | R | 20 | 34 | 0.017 | 0.029 | |
| 3a | R | 17 | 24 | 0.045 | 0.041 | |
| 1a | R | 21 | 3 | 0.04 | 0.4849 | |
| 1b | R | 20 | 12 | 0.054 | 0.054 | |
| 3a | R | 17 | 10 | 0.04 | 0.028 | |
| 1a | R | 21 | 6 | 0.249 | 0.216 | |
| 1b | R | 20 | 19 | 0.272 | 0.227 | |
| V3 | 3a | R | 10 | 6 | 0.062 | 0.051 |
The number of sequences in each genotype and response group is shown; variable sites and mean genetic distance are calculated. The number of variable sites was extracted from the alignments, and mean genetic distances within and between groups were calculated using the MEGA program.
Figure 1Multiple sequence alignments of the ISDR and V3 regions for genotype 1b. The responder strains are labelled with resp/sr, and non-responders with nonresp/nr. Dots represent conserved positions. 1A: ISDR amino acid sequences. 1B: V3 amino acid sequences. Both sequences are from responders and non-responders of genotype 1b.
Figure 2Relative Shanon entropy between non-responders & responders in the ISDR & V3 regions of subtypes 1a, 1b and 3a. It represents the difference between the positional entropy of the responders and non-responders (shown on the +ve and -ve scale respectively). It was calculated using the REL Entropy tool available from the great facilities at the HCV LANL database (significant positional variations between the two groups are labelled with red).
Figure 3Comparative sequence logos for the ISDR and V3 regions. In the figure, the letters in the middle bar represent conserved positions. The totally empty positions represent variations within each group but no considerable variations between the two groups. The non-responders were set as the negative sample and the responders as the positive sample.
Summary comparison of the accuracy of different approaches used in the paper
|
|
|
|
|---|---|---|
| Wildtype 2378T in NR subtype 1b | 50% | 69% |
| A2368T in R in subtype 1a | 100% | 52.2% |
| E2356G/D in R in subtype 1b | 76.3% | 67.3% |
| 100% | 25% | |
| Six variable sites in R | 1.7% | 100% |
| 55% | 35% | |
| Profile Hidden Markov Model | 70% | 20% |
The different methods studied applied to the V3 region, which has been shown in the paper to be the most important for response prediction are compared in this table. The positive predictive value of the methods on the test set are shown.