| Literature DB >> 21637663 |
Abstract
The complete base sequence of HIV-1 virus and GP120 ENV gene were analyzed to establish their distance to the expected neutral random sequence. An especial methodology was devised to achieve this aim. Analyses included: a) proportion of dinucleotides (signatures); b) homogeneity in the distribution of dinucleotides and bases (isochores) by dividing both segments in ten and three sub-segments, respectively; c) probability of runs of bases and No-bases according to the Bose-Einstein distribution. The analyses showed a huge deviation from the random distribution expected from neutral evolution and neutral-neighbor influence of nucleotide sites. The most significant result is the tremendous lack of CG dinucleotides (p < 10(-50) ), a selective trait of eukaryote and not of single stranded RNA virus genomes. Results not only refute neutral evolution and neutral neighbor influence, but also strongly indicate that any base at any nucleotide site correlates with all the viral genome or sub-segments. These results suggest that evolution of HIV-1 is pan-selective rather than neutral or nearly neutral.Entities:
Keywords: non-random evolution; Bose-Einstein distribution; HIV-1; non-random sequences; pre-transcriptional evolution
Year: 2009 PMID: 21637663 PMCID: PMC3032973 DOI: 10.1590/S1415-47572009005000025
Source DB: PubMed Journal: Genet Mol Biol ISSN: 1415-4757 Impact factor: 1.771
Random expected and observed overlapping di-nucleotides of HIV-1, with separation of 0, 1, 2 and 3 sites.
| 1° Base | 2° Base
| 2° Base
| ||||||||||
| 0 site separation
| 1 site separation
| |||||||||||
| A | T | G | C | Tot | A | T | G | C | Tot | |||
| A | O | 1107 | 680 | 941 | 607 | 3235 | 1233 | 738 | 703 | 561 | 3235 | |
| T | O | 663 | 501 | 507 | 293 | 1964 | 601 | 472 | 529 | 361 | 1963 | |
| G | O | 727 | 379 | 646 | 421 | 2173 | 887 | 395 | 495 | 396 | 2173 | |
| C | O | 739 | 404 | 79 | 359 | 1581 | 515 | 358 | 446 | 262 | 1589 | |
| Tot | 3236 | 1964 | 2173 | 1580 | 8953 | 3236 | 1963 | 2173 | 1580 | 8952 | ||
| χ29 = 445.55, p < 10-80 | χ29 = 86.98, p < 10-30 | |||||||||||
| 2 sites' separation | 3 sites' separation | |||||||||||
| A | O | 1256 | 697 | 750 | 522 | 3235 | 1190 | 672 | 791 | 582 | 3235 | |
| T | O | 688 | 458 | 458 | 358 | 1962 | 738 | 426 | 479 | 319 | 1962 | |
| G | O | 789 | 430 | 568 | 386 | 2173 | 719 | 520 | 527 | 412 | 2173 | |
| C | O | 503 | 378 | 386 | 314 | 1581 | 593 | 345 | 375 | 267 | 1580 | |
| Tot | 3236 | 1963 | 2172 | 1580 | 8951 | 3235 | 1963 | 2172 | 1580 | 8950 | ||
| χ29 = 38.28, p = 0.000016 | χ29 = 20.09, p = 0.01736 | |||||||||||
O = observed; E = expected; Tot = total; p = probability.
Di- and mono-nucleotides on 10 segments of the HIV-1.
| Dinucleotides of segments 1° TO 10°(χ2135 = 327.6; p < 10-20)
| ||||||||||||
| Pair | 1° | 2° | 3° | 4° | 5° | 6° | 7° | 8° | 9° | 10° | Total
| |
| N | % | |||||||||||
| AA | 114 | 119 | 127 | 144 | 125 | 88 | 107 | 126 | 86 | 71 | 1107 | 12.4 |
| AT | 58 | 63 | 79 | 70 | 72 | 70 | 90 | 80 | 61 | 36 | 679 | 7.6 |
| AG | 109 | 99 | 82 | 97 | 106 | 103 | 78 | 81 | 89 | 95 | 939 | 10.5 |
| AC | 55 | 46 | 56 | 56 | 49 | 49 | 47 | 53 | 41 | 53 | 505 | 5.6 |
| TA | 66 | 52 | 70 | 71 | 73 | 65 | 91 | 70 | 54 | 49 | 661 | 7.4 |
| TT | 34 | 56 | 61 | 41 | 48 | 60 | 49 | 45 | 52 | 53 | 499 | 5.6 |
| TG | 36 | 43 | 52 | 42 | 42 | 46 | 66 | 53 | 65 | 62 | 507 | 5.7 |
| TC | 21 | 28 | 32 | 25 | 24 | 35 | 26 | 32 | 37 | 33 | 293 | 3.3 |
| GA | 78 | 83 | 72 | 73 | 70 | 76 | 53 | 67 | 80 | 75 | 727 | 8.1 |
| GT | 29 | 27 | 40 | 38 | 44 | 33 | 54 | 43 | 36 | 35 | 379 | 4.2 |
| GG | 69 | 70 | 48 | 59 | 66 | 61 | 47 | 58 | 77 | 90 | 645 | 7.2 |
| GC | 55 | 40 | 24 | 32 | 39 | 44 | 40 | 37 | 53 | 55 | 419 | 4.7 |
| CA | 79 | 74 | 75 | 79 | 84 | 81 | 71 | 78 | 58 | 60 | 739 | 8.3 |
| CT | 36 | 33 | 35 | 31 | 23 | 42 | 40 | 31 | 59 | 73 | 403 | 4.5 |
| CG | 17 | 7 | 3 | 3 | 5 | 5 | 3 | 13 | 15 | 8 | 79 | 0.9 |
| CC | 38 | 54 | 38 | 33 | 24 | 36 | 32 | 27 | 31 | 46 | 359 | 4.0 |
| Tot | 894 | 894 | 894 | 894 | 894 | 894 | 894 | 894 | 894 | 894 | 8940 | 100.0 |
| Mononucleotides of segments 1° TO 10°(χ227 = 87.1; p < 10-15)
| ||||||||||||
| A | 337 | 328 | 344 | 367 | 352 | 310 | 322 | 341 | 278 | 255 | 3234 | 36.1 |
| T | 157 | 179 | 215 | 180 | 187 | 206 | 233 | 200 | 208 | 197 | 1962 | 21.9 |
| G | 231 | 220 | 185 | 202 | 219 | 215 | 194 | 205 | 246 | 255 | 2172 | 24.3 |
| C | 170 | 168 | 151 | 146 | 136 | 164 | 146 | 149 | 163 | 187 | 1580 | 17.7 |
| Tot | 895 | 895 | 895 | 895 | 894 | 895 | 895 | 895 | 895 | 894 | 8948 | 100.0 |
Random expected and observed overlapping di-nucleotides of S-env, with separation of 0, 1, 2 and 3 sites.
| 1° Base | 2° Base
| 2° Base
| ||||||||||
| 0 site separation
| 1 site separation
| |||||||||||
| A | T | G | C | Tot | A | T | G | C | Tot | |||
| A | O | 292 | 220 | 224 | 164 | 900 | 314 | 242 | 179 | 164 | 899 | |
| T | O | 222 | 151 | 184 | 79 | 636 | 202 | 162 | 160 | 112 | 636 | |
| G | O | 191 | 123 | 182 | 125 | 621 | 243 | 124 | 160 | 94 | 621 | |
| C | O | 195 | 142 | 31 | 101 | 469 | 141 | 107 | 122 | 94 | 469 | |
| Tot | 900 | 636 | 621 | 469 | 2626 | 900 | 635 | 621 | 469 | 2625 | ||
| χ29 = 112.8, p < 10-15 | χ29 = 28.3, p = 0.00085 | |||||||||||
| 2 sites' separation | 3 sites' separation | |||||||||||
| A | O | 350 | 200 | 202 | 147 | 899 | 305 | 213 | 226 | 155 | 899 | |
| T | O | 207 | 173 | 151 | 104 | 635 | 234 | 137 | 163 | 100 | 634 | |
| G | O | 205 | 135 | 157 | 124 | 621 | 201 | 166 | 126 | 128 | 621 | |
| C | O | 138 | 127 | 110 | 94 | 469 | 159 | 119 | 105 | 86 | 469 | |
| Tot | 900 | 635 | 620 | 469 | 2624 | 899 | 635 | 620 | 469 | 2623 | ||
| χ29 = 22.7, p = 0.0069 | χ29 = 15.4, p = 0.0805 | |||||||||||
O = observed; E = expected; Tot = total; p = probability.
Di- and mono-nucleotides of three segments of S-env.
| Dinucleotides of segments 1° TO 3° (χ230 = 67.66; p = 0.000098)
| |||||
| Pair | 1° | 2° | 3° | Total
| |
| N | % | ||||
| AA | 104 | 98 | 89 | 291 | 11.1 |
| AT | 84 | 72 | 64 | 220 | 8.4 |
| AG | 66 | 85 | 73 | 224 | 8.5 |
| AC | 51 | 66 | 47 | 164 | 6.3 |
| TA | 82 | 77 | 62 | 221 | 8.4 |
| TT | 48 | 41 | 61 | 150 | 5.7 |
| TG | 72 | 50 | 61 | 183 | 7.0 |
| TC | 23 | 22 | 34 | 79 | 3.0 |
| GA | 53 | 67 | 71 | 191 | 7.3 |
| GT | 54 | 42 | 27 | 123 | 4.7 |
| GG | 48 | 64 | 70 | 182 | 6.9 |
| GC | 37 | 38 | 50 | 125 | 4.8 |
| CA | 65 | 79 | 51 | 195 | 7.4 |
| CT | 40 | 35 | 67 | 142 | 5.4 |
| CG | 6 | 12 | 13 | 31 | 1.2 |
| CC | 41 | 26 | 34 | 101 | 3.9 |
| Tot | 874 | 874 | 874 | 2622 | 100.0 |
| Mononucleotides of segments 1° TO 3° (χ26 = 9.899; p = 0.1290)
| |||||
| A | 305 | 321 | 273 | 899 | 34.3 |
| T | 226 | 190 | 219 | 635 | 24.2 |
| G | 192 | 211 | 218 | 621 | 23.7 |
| C | 152 | 152 | 165 | 469 | 17.8 |
| Tot | 875 | 874 | 875 | 2624 | 100.0 |
Observed and expected numbers of base and no-base runs.
N = number of consecutive bases in a run; p = probability; *= p ≥ 0.05; a= 0.05 > p ≥ 0.025; b= 0.25 > p ≥ 0.01; c= .01 > p ≥ 0.005; d= 0.005 > p ≥ 0.001; e = p < 0.001.