| Literature DB >> 19759198 |
Vera Afreixo1, Carlos A C Bastos, Armando J Pinho, Sara P Garcia, Paulo J S G Ferreira.
Abstract
MOTIVATION: DNA sequences can be represented by sequences of four symbols, but it is often useful to convert the symbols into real or complex numbers for further analysis. Several mapping schemes have been used in the past, but they seem unrelated to any intrinsic characteristic of DNA. The objective of this work was to find a mapping scheme directly related to DNA characteristics and that would be useful in discriminating between different species. Mathematical models to explore DNA correlation structures may contribute to a better knowledge of the DNA and to find a concise DNA description.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19759198 PMCID: PMC2778338 DOI: 10.1093/bioinformatics/btp546
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Inter-nucleotide distances for the gi|33286443|ref|NM_032427.1 gene of Homo sapiens.
Fig. 2.Distribution of the 4 nt distance sequences for gene gi|33286443|ref|NM_032427.1 of H.sapiens. The histogram is from the observed distances and the solid line shows the reference distribution with parameters estimated from the data.
List of DNA builds used for each species
| Species | Reference |
|---|---|
| Build 36.3 | |
| Build 2.1 | |
| Build 1.1 | |
| Build 37.1 | |
| Build 4.1 | |
| Build 2.1 | |
| Build 2.1 | |
| Build 4.1 | |
| Build 1.1 | |
| Build 2.1 | |
| Build 4.1 | |
| Build 3.1 | |
| Build 4.1 | |
| NC003279 | |
| Build 1.1 | |
| Build 1.0 | |
| AGI 7.2 | |
| SGD 1 | |
| Build 1.1 | |
| Build 2.1 | |
| Build 2.1 | |
| NC000913 | |
| NC000964 | |
| NC000117 | |
| NC000908 | |
| NC004350 | |
| NC011900 | |
| NC000854 |
P-values from Kolmogorov–Smirnov test to compare inter-nucleotide distance relative frequencies distributions between the chromosomes of H.sapiens
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | C20 | C21 | C22 | CX | CY | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C1 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.6 | 1.0 | 1.0 | 0.4 | 1.0 | 0.9 | 1.0 | 1.0 | 0.3 |
| C2 | – | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.9 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.7 | 1.0 | 1.0 | 0.3 | 0.9 | 0.8 | 0.9 | 1.0 | 0.2 |
| C3 | – | – | 1.0 | 1.0 | 1.0 | 1.0 | 0.9 | 1.0 | 1.0 | 0.8 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.6 | 1.0 | 1.0 | 0.3 | 0.8 | 0.7 | 0.8 | 1.0 | 0.1 |
| C4 | – | – | – | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.7 | 1.0 | 1.0 | 0.3 | 1.0 | 0.9 | 1.0 | 1.0 | 0.2 |
| C5 | – | – | – | – | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.6 | 1.0 | 1.0 | 0.3 | 1.0 | 0.8 | 1.0 | 1.0 | 0.1 |
| C6 | – | – | – | – | – | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.6 | 1.0 | 1.0 | 0.3 | 1.0 | 0.8 | 1.0 | 1.0 | 0.2 |
| C7 | – | – | – | – | – | – | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.9 | 1.0 | 1.0 | 0.7 | 1.0 | 1.0 | 1.0 | 1.0 | 0.2 |
| C8 | – | – | – | – | – | – | – | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.7 | 1.0 | 1.0 | 0.3 | 1.0 | 1.0 | 1.0 | 1.0 | 0.2 |
| C9 | – | – | – | – | – | – | – | – | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.7 | 1.0 | 1.0 | 0.6 | 1.0 | 0.9 | 0.9 | 1.0 | 0.1 |
| C10 | – | – | – | – | – | – | – | – | – | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.8 | 1.0 | 1.0 | 0.4 | 1.0 | 1.0 | 1.0 | 1.0 | 0.2 |
| C11 | – | – | – | – | – | – | – | – | – | – | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.8 | 1.0 | 1.0 | 0.4 | 1.0 | 0.9 | 0.9 | 1.0 | 0.2 |
| C12 | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 1.0 | 1.0 | 1.0 | 0.6 | 1.0 | 1.0 | 0.4 | 1.0 | 0.9 | 1.0 | 1.0 | 0.2 |
| C13 | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 1.0 | 1.0 | 0.6 | 1.0 | 1.0 | 0.3 | 1.0 | 0.9 | 1.0 | 1.0 | 0.2 |
| C14 | – | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 1.0 | 0.6 | 1.0 | 1.0 | 0.3 | 1.0 | 0.8 | 1.0 | 1.0 | 0.1 |
| C15 | – | – | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 0.8 | 1.0 | 1.0 | 0.2 | 1.0 | 0.9 | 1.0 | 1.0 | 0.1 |
| C16 | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 0.8 | 0.8 | 1.0 | 0.8 | 1.0 | 1.0 | 1.0 | 0.6 |
| C17 | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 1.0 | 0.6 | 1.0 | 1.0 | 1.0 | 1.0 | 0.2 |
| C18 | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 0.3 | 1.0 | 1.0 | 1.0 | 1.0 | 0.1 |
| C19 | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 0.6 | 0.8 | 0.8 | 0.7 | 0.9 |
| C20 | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 1.0 | 1.0 | 1.0 | 0.2 |
| C21 | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 1.0 | 1.0 | 0.6 |
| C22 | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 1.0 | 0.2 |
| CX | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 | 0.3 |
| CY | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | 1.0 |
Only the first 100 distances were used.
P-values from the Kolmogorov–Smirnov test to compare inter-nucleotide distance relative frequencies distributions between nucleotides in H.sapiens
| A | C | G | T | |
|---|---|---|---|---|
| A | 1 | 0.00032 | 0.00032 | 1 |
| C | – | 1 | 1 | 0.00058 |
| G | – | – | 1 | 0.00058 |
| T | – | – | – | 1 |
Only the first 100 distances were used.
Fig. 3.Relative error for the nucleotide distance distribution in the complete genome of H.sapiens. For convenience, only the first 40 distances are displayed.
Fig. 4.Relative error for the global distance distribution in the complete genome of H.sapiens. For convenience, only the first 40 distances are displayed.
Fig. 5.Relative error for the nucleotide distance distribution in the coding regions of the H.sapiens genome. For convenience, only the first 40 distances are displayed.
Fig. 6.Relative error for the global distance distribution in the coding regions of the H.sapiens genome. For convenience, only the first 40 distances are displayed.
P-values from the Kolmogorov–Smirnov test to compare inter-nucleotide distance relative frequencies distributions between nucleotides in H.sapiens coding regions
| A | C | G | T | |
|---|---|---|---|---|
| A | 1 | 0.26055 | 0.00058 | 0.19304 |
| C | – | 1 | 0.00058 | 0.00103 |
| G | – | – | 1 | 0.00000 |
| T | – | – | – | 1 |
Only the first 100 distances were used.
Fig. 7.Absolute value of the DFT of the relative error in the coding regions of the H.sapiens genome.
Fig. 8.Absolute value of the DFT of the relative error in the complete genome of H.sapiens.
Fig. 9.Phylogenetic tree with the species used in this study.