| Literature DB >> 19812782 |
Ravit Arav-Boger1, Yuval S Boger, Charles B Foster, Zvi Boger.
Abstract
A large number of CMV strains has been reported to circulate in the human population, and the biological significance of these strains is currently an active area of research. The analysis of complex genetic information may be limited using conventional phylogenetic techniques. We constructed artificial neural networks to determine their feasibility in predicting the outcome of congenital CMV disease (defined as presence of CMV symptoms at birth) based on two data sets: 54 sequences of CMV gene UL144 obtained from 54 amniotic fluids of women who contracted acute CMV infection during their pregnancy, and 80 sequences of 4 genes (US28, UL144, UL146 and UL147) obtained from urine, saliva or blood of 20 congenitally infected infants that displayed different outcomes at birth. When data from all four genes was used in the 20-infants' set, the artificial neural network model accurately identified outcome in 90% of cases. While US28 and UL147 had low yield in predicting outcome, UL144 and UL146 predicted outcome in 80% and 85% respectively when used separately. The model identified specific nucleotide positions that were highly relevant to prediction of outcome. The artificial neural network classified genotypes in agreement with classic phylogenetic analysis. We suggest that artificial neural networks can accurately and efficiently analyze sequences obtained from larger cohorts to determine specific outcomes.\The ANN training and analysis code is commercially available from Optimal Neural Informatics (Pikesville, MD).Entities:
Year: 2008 PMID: 19812782 PMCID: PMC2735958 DOI: 10.4137/bbi.s764
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Prediction of congenital CMV outcome based on an ANN model using sequence data from 4 CMV-encoded genes. Bold numbers include incorrect classification.
| Asymptomatic expected result = 0.1)
| Symptomatic (expected result = 0.9)
| ||
|---|---|---|---|
| Sample | ANN prediction result | Sample | ANN prediction result |
| A1 | 0.21 | S1 | 0.91 |
| A4 | 0.16 | S2 | 0.93 |
| A5 | 0.35 | S3 | 0.88 |
| A6 | S4 | 0.71 | |
| A8 | 0.09 | S5 | 0.53 |
| A9 | 0.49 | S6 | 0.9 |
| A10 | 0.2 | S7 | 0.9 |
| S8 | 0.93 | ||
| S10 | 0.99 | ||
| S11 | |||
| S12 | 0.92 | ||
| S13 | 0.82 | ||
| S14 | 0.89 | ||
Figure 1Relevance values for UL144, UL146, UL147 and US28. (Positions that were eliminated in the preprocessing are shown as 0).
Prediction performance (accuracy and AUC) of CMV outcomes based on various gene combinations.
| Genes analyzed | Prediction accuracy | Correctly classified | AUC |
|---|---|---|---|
| A: Asymptomatic | |||
| S: Symptomatic | |||
| UL144, UL146, UL147, US28 | 90% | A:6/7, S: 12/13 | 0.857 |
| UL146, UL147 | 85% | A:5/7, A:12/13 | 0.824 |
| UL146, US28 | 85% | A:5/7, A:12/13 | 0.824 |
| UL146 | 85% | A: 5/7, S: 12/13 | 0.791 |
| UL144 | 80% | A: 4/7, S: 12/13 | 0.824 |
| UL144, UL147 | 75% | A: 5/7, S: 10/13 | 0.824 |
| UL147 | 75% | A: 3/7, S: 12/13 | 0.802 |
| UL 144, UL146 | 70% | A: 4/7, S: 10/13 | 0.802 |
| UL147, US28 | 70% | A:4/7, S:10/13 | 0.769 |
| UL144, US28 | 60% | A: 3/7, S: 9/13 | 0.725 |
| US28 | 55% | A:1/7, S: 10/13 | 0.495 |
Figure 2Number of remaining inputs and area under the curve (AUC) for each iteration of the input-reduction algorithm.
Specific important inputs identified by ANN. Causal index (9th iteration) is also reported to show magnitude of direction of influence for each input. Note: Causal Index for later iterations was identical in direction for each input and very similar in relative magnitude.
| Input reduction iteration
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Gene | Nucleotide | Value | Amino acid | Causal index | 9 | 10 | 11 | 12 | 13 | 14 | 15–20 |
| UL144 | 56 | A | 19 | −6.82 | ✓ | ✓ | |||||
| 66 | A | 22 | −7.08 | ✓ | ✓ | ||||||
| 72 | A | 24 | −6.81 | ✓ | ✓ | ||||||
| 108 | C | 36 | −6.90 | ✓ | ✓ | ✓ | |||||
| 115 | A | 39 | −6.94 | ✓ | ✓ | ||||||
| 116 | A | −6.82 | ✓ | ✓ | |||||||
| 118 | C | 40 | −7.12 | ✓ | ✓ | ||||||
| 119 | A | −7.01 | ✓ | ✓ | |||||||
| 126 | T | 42 | −6.94 | ✓ | ✓ | ||||||
| 140 | A | 47 | −7.01 | ✓ | ✓ | ||||||
| 180 | T | 60 | −7.23 | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| 226 | G | 76 | −6.94 | ✓ | ✓ | ✓ | |||||
| 234 | T | 78 | −6.94 | ✓ | ✓ | ||||||
| 298 | T | 100 | −10.27 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| UL146 | 9 | A | 3 | −9.25 | ✓ | ✓ | ✓ | ✓ | |||
| 46 | A | 16 | 8.22 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| 46 | G | −8.37 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| 70 | A | 24 | 6.70 | ✓ | |||||||
| 96 | T | 32 | −5.60 | ✓ | |||||||
| 140 | G | 47 | 9.00 | ✓ | |||||||
| 140 | T | −10.6 | ✓ | ||||||||
| 207 | A | 69 | 6.59 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| 227 | A | 76 | −8.13 | ✓ | ✓ | ||||||
| 262 | C | 88 | −5.51 | ✓ | ✓ | ||||||
| 302 | G | 101 | −5.35 | ✓ | |||||||
| 303 | A | 8.10 | ✓ | ✓ | ✓ | ✓ | |||||
| 304 | A | 102 | 6.04 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| 355 | C | 119 | 5.81 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| UL147 | 51 | T | 17 | 4.88 | ✓ | ||||||