| Literature DB >> 20843365 |
P Andrew Nevarez1, Christopher M DeBoever, Benjamin J Freeland, Marissa A Quitt, Eliot C Bush.
Abstract
BACKGROUND: Models of sequence evolution typically assume that different nucleotide positions evolve independently. This assumption is widely appreciated to be an over-simplification. The best known violations involve biases due to adjacent nucleotides. There have also been suggestions that biases exist at larger scales, however this possibility has not been systematically explored.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20843365 PMCID: PMC2945941 DOI: 10.1186/1471-2105-11-462
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Data sets and sizes in alignment columns.
| 1. | transposon | 360,248,252 |
| 2. | transposon near gene | 53,105,001 |
| 3. | transposon far-from-gene | 347,985,192 |
| 4. | non-transposon | 340,461,195 |
| 5. | non-transposon near gene | 75,324,278 |
| 6. | non-transpson far-from-gene | 373,326,202 |
| 7. | LINE transposons | 147,409,103 |
| 8. | SINE transposons | 106,260,754 |
Top single-substitution patterns of length 2 by relative abundance
| Pattern | ||
|---|---|---|
| 1. CG→TG (CG→CA) | 7.5809 | 7.5581 - 7.6049 |
| 2. CG→CT (CG→AG) | 2.2765 | 2.2495 - 2.3054 |
| 3. CG→GG (CG→CC) | 1.9887 | 1.9605 - 2.0182 |
| 4. AT→GT (AT→AC) | 1.5444 | 1.5401 - 1.5492 |
| 5. TG→CG (CA→CG) | 1.3861 | 1.3816 - 1.3906 |
| 6. AT→TT (AT→AA) | 1.3335 | 1.3244 - 1.3433 |
| 7. TA→TT (TA→AA) | 1.2402 | 1.2297 - 1.2504 |
| 8. TG→GG (CA→CC) | 1.2087 | 1.1999 - 1.2166 |
| 9. GT→TT (AC→AA) | 1.2006 | 1.1920 - 1.2093 |
| 10. GT→AT (AC→AT) | 1.1903 | 1.1868 - 1.1943 |
| 11. TC→TG (GA→CA) | 1.1358 | 1.1279 - 1.1433 |
| 12. TT→TG (AA→CA) | 1.1141 | 1.1073 - 1.1212 |
| 13. TA→TG (TA→CA) | 1.0990 | 1.0942 - 1.1039 |
| 14. CT→GT (AG→AC) | 1.0969 | 1.0904 - 1.1033 |
| 15. GT→CT (AC→AG) | 1.0895 | 1.0816 - 1.0978 |
Top 50 single-substitution patterns 2-5 bp sorted by context bias
| Pattern | Context bias | |||
|---|---|---|---|---|
| 1. CG→TG (CG→CA) | 2.926e-03 | 2.907e-03 - 2.945e-03 | 7.5809 | 7.5581 - 7.6049 |
| 2. AT→GT (AT→AC) | 2.148e-04 | 2.125e-04 - 2.173e-04 | 1.5444 | 1.5401 - 1.5492 |
| 3. TG→CG (CA→CG) | 1.372e-04 | 1.351e-04 - 1.392e-04 | 1.3861 | 1.3816 - 1.3906 |
| 4. TG→TA (CA→TA) | -1.025e-04 | -1.021e-04 - -1.029e-04 | 0.6367 | 0.6341 - 0.6391 |
| 5. CT→TT (AG→AA) | -9.985e-05 | -9.947e-05 - -1.002e-04 | 0.6136 | 0.6110 - 0.6161 |
| 6. TNG→CNG (CNA→CNG) | 7.681e-05 | 7.571e-05 - 7.790e-05 | 1.5309 | 1.5252 - 1.5365 |
| 7. TT→TC (AA→GA) | -7.532e-05 | -7.497e-05 - -7.568e-05 | 0.6385 | 0.6355 - 0.6413 |
| 8. GT→AT (AC→AT) | 6.859e-05 | 6.691e-05 - 7.038e-05 | 1.1903 | 1.1868 - 1.1943 |
| 9. TC→TT (GA→AA) | -6.753e-05 | -6.697e-05 - -6.808e-05 | 0.7425 | 0.7399 - 0.7454 |
| 10. GG→GA (CC→TC) | -6.618e-05 | -6.573e-05 - -6.664e-05 | 0.7081 | 0.7049 - 0.7114 |
| 11. TT→CT (AA→AG) | -5.828e-05 | -5.770e-05 - -5.884e-05 | 0.7672 | 0.7641 - 0.7703 |
| 12. TC→CC (GA→GG) | -4.745e-05 | -4.718e-05 - -4.771e-05 | 0.6333 | 0.6296 - 0.6370 |
| 13. CG→CT (CG→AG) | 3.809e-05 | 3.676e-05 - 3.950e-05 | 2.2765 | 2.2495 - 2.3054 |
| 14. CT→CC (AG→GG) | -3.210e-05 | -3.144e-05 - -3.275e-05 | 0.8438 | 0.8402 - 0.8479 |
| 15. GNG→ANG (CNC→CNT) | 2.937e-05 | 2.857e-05 - 3.014e-05 | 1.1720 | 1.1682 - 1.1759 |
| 16. CG→GG (CG→CC) | 2.462e-05 | 2.358e-05 - 2.564e-05 | 1.9887 | 1.9605 - 2.0182 |
| 17. AT→TT (AT→AA) | 2.382e-05 | 2.294e-05 - 2.468e-05 | 1.3335 | 1.3244 - 1.3433 |
| 18. TA→TG (TA→CA) | 2.340e-05 | 2.218e-05 - 2.455e-05 | 1.0990 | 1.0942 - 1.1039 |
| 19. TNA→TNG (TNA→CNA) | -2.243e-05 | -2.214e-05 - -2.270e-05 | 0.7903 | 0.7866 - 0.7938 |
| 20. GG→AG (CC→CT) | 2.236e-05 | 2.105e-05 - 2.373e-05 | 1.0655 | 1.0620 - 1.0692 |
| 21. GNG→GNA (CNC→TNC) | 1.910e-05 | 1.839e-05 - 1.976e-05 | 1.1173 | 1.1134 - 1.1217 |
| 22. TNG→TNA (CNA→TNA) | -1.873e-05 | -1.833e-05 - -1.913e-05 | 0.8679 | 0.8648 - 0.8713 |
| 23. TNT→CNT (ANA→ANG) | -1.844e-05 | -1.802e-05 - -1.887e-05 | 0.8750 | 0.8715 - 0.8783 |
| 24. CNT→CNC (ANG→GNG) | 1.733e-05 | 1.670e-05 - 1.801e-05 | 1.1524 | 1.1478 - 1.1575 |
| 25. TG→TC (CA→GA) | -1.692e-05 | -1.660e-05 - -1.721e-05 | 0.7668 | 0.7612 - 0.7723 |
| 26. GNT→ANT (ANC→ANT) | -1.673e-05 | -1.630e-05 - -1.714e-05 | 0.8848 | 0.8814 - 0.8884 |
| 27. GT→TT (AC→AA) | 1.630e-05 | 1.552e-05 - 1.709e-05 | 1.2006 | 1.1920 - 1.2093 |
| 28. TG→GG (CA→CC) | 1.610e-05 | 1.530e-05 - 1.682e-05 | 1.2087 | 1.1999 - 1.2166 |
| 29. CT→AT (AG→AT) | -1.535e-05 | -1.500e-05 - -1.568e-05 | 0.7949 | 0.7894 - 0.8006 |
| 30. TGG→CGG (CCA→CCG) | -1.482e-05 | -1.468e-05 - -1.496e-05 | 0.7151 | 0.7116 - 0.7185 |
| 31. TA→TT (TA→AA) | 1.343e-05 | 1.273e-05 - 1.411e-05 | 1.2402 | 1.2297 - 1.2504 |
| 32. TC→TA (GA→TA) | -1.182e-05 | -1.149e-05 - -1.214e-05 | 0.8167 | 0.8109 - 0.8236 |
| 33. GNT→TNT (ANC→ANA) | 1.175e-05 | 1.135e-05 - 1.223e-05 | 1.2551 | 1.2474 - 1.2636 |
| 34. TTG→CTG (CAA→CAG) | 1.173e-05 | 1.136e-05 - 1.208e-05 | 1.2408 | 1.2347 - 1.2469 |
| 35. TT→AT (AA→AT) | -1.172e-05 | -1.147e-05 - -1.200e-05 | 0.7803 | 0.7727 - 0.7868 |
| 36. TC→TG (GA→CA) | 1.163e-05 | 1.093e-05 - 1.239e-05 | 1.1358 | 1.1279 - 1.1433 |
| 37. TNC→TNT (GNA→ANA) | -1.152e-05 | -1.099e-05 - -1.204e-05 | 0.9305 | 0.9273 - 0.9337 |
| 38. GT→GC (AC→GC) | 1.090e-05 | 9.893e-06 - 1.202e-05 | 1.0588 | 1.0539 - 1.0639 |
| 39. TT→TG (AA→CA) | 1.033e-05 | 9.639e-06 - 1.105e-05 | 1.1141 | 1.1073 - 1.1212 |
| 40. GGC→GGT (GCC→ACC) | 9.916e-06 | 9.571e-06 - 1.026e-05 | 1.2075 | 1.2018 - 1.2140 |
| 41. GC→GG (GC→CC) | -9.655e-06 | -9.383e-06 - -9.932e-06 | 0.7853 | 0.7772 - 0.7938 |
| 42. CT→GT (AG→AC) | 9.557e-06 | 8.870e-06 - 1.027e-05 | 1.0969 | 1.0904 - 1.1033 |
| 43. TT→TA (AA→TA) | -9.479e-06 | -9.164e-06 - -9.822e-06 | 0.8338 | 0.8268 - 0.8413 |
| 44. TC→GC (GA→GC) | -9.314e-06 | -9.093e-06 - -9.545e-06 | 0.7587 | 0.7509 - 0.7677 |
| 45. GNG→CNG (CNC→CNG) | 9.304e-06 | 8.933e-06 - 9.707e-06 | 1.2409 | 1.2320 - 1.2497 |
| 46. GNNG→GNNA (CNNC→TNNC) | 9.232e-06 | 8.864e-06 - 9.596e-06 | 1.1145 | 1.1104 - 1.1185 |
| 47. GG→TG (CC→CA) | -8.903e-06 | -8.530e-06 - -9.274e-06 | 0.8544 | 0.8467 - 0.8619 |
| 48. TNNG→TNNA (CNNA→TNNA) | -8.708e-06 | -8.482e-06 - -8.924e-06 | 0.8822 | 0.8787 - 0.8856 |
| 49. TG→TT (CA→AA) | -8.545e-06 | -8.105e-06 - -9.015e-06 | 0.9046 | 0.8987 - 0.9102 |
| 50. GG→CG (CC→CG) | -7.914e-06 | -7.543e-06 - -8.279e-06 | 0.8664 | 0.8596 - 0.8745 |
Figure 1Total context bias in transposon sequences along the human lineage after the divergence from chimpanzee. Real data is shown in black, while values for a corresponding no bias control are in orange. Context bias is greatest at the 2 bp scale, with CG → TG having the largest contribution. Total bias then drops, but remains level in 3-5 bp patterns rather than continuing to decrease.
Total context bias in transposons and non-repeats, and subsets near or far from genes.
| 2 bp | 3 bp | 4 bp | 5 bp | |
|---|---|---|---|---|
| Transposon | 1.2622e-2 | 2.5405e-3 | 1.8575e-3 | 1.7409e-3 |
| Transposon, near | 1.2981e-2 | 2.7622e-3 | 2.2495e-3 | 2.4520e-3 |
| Transposon, far | 1.1799e-2 | 2.4298e-3 | 1.6990e-3 | 1.3874e-3 |
| Non-repeat | 9.0816e-3 | 2.1532e-3 | 1.3989e-3 | 1.0385e-3 |
| Non-repeat, near | 7.5286e-3 | 2.1473e-3 | 1.4786e-3 | 1.2069e-3 |
| Non-repeat, far | 9.3789e-3 | 2.2102e-3 | 1.4290e-3 | 1.0287e-3 |
| LINEs | 9.7298e-3 | 2.1798e-3 | 1.4683e-3 | 1.1465e-3 |
| SINEs | 1.8042e-2 | 3.6693e-3 | 3.6117e-3 | 4.3391e-3 |
| Transposon | 4.3907e-5 | 7.4125e-5 | 1.1806e-4 | 1.9323e-4 |
| Transposon, near | 7.4565e-5 | 1.7935e-4 | 3.1243e-4 | 5.0755e-4 |
| Transposon, far | 3.7193e-5 | 7.4758e-5 | 1.2081e-4 | 2.0596e-4 |
| Non-repeat | 3.1367e-5 | 7.1670e-5 | 1.1779e-4 | 1.9054e-4 |
| Non-repeat, near | 8.3869e-5 | 1.3339e-4 | 2.6857e-4 | 4.1832e-4 |
| Non-repeat, far | 4.1651e-5 | 7.3881e-5 | 1.2627e-4 | 1.8502e-4 |
| LINEs | 5.6367e-5 | 1.0365e-4 | 1.8279e-4 | 2.8490e-4 |
| SINEs | 7.2458e-5 | 1.4746e-4 | 2.4695e-4 | 3.9894e-4 |
Figure 2Comparison of context bias in different types of sequence. Bars represent 95% confidence intervals. At pattern sizes from 2-5 bp, context bias is greater in transposon sequence than in non-repetitive sequence (A). Among transposons, context bias is greater near genes than far from them (B). Conversely, context bias is never significantly greater near genes in non-repetitive sequence, with markedly higher context bias far from genes for 2 bp patterns (C). Context bias also differs between the most common classes of transposons, LINEs and SINEs (D).