| Literature DB >> 24469313 |
Wang-Ren Qiu1, Xuan Xiao2, Kuo-Chen Chou3.
Abstract
Meiosis and recombination are the two opposite aspects that coexist in a DNA system. As a driving force for evolution by generating natural genetic variations, meiotic recombination plays a very important role in the formation of eggs and sperm. Interestingly, the recombination does not occur randomly across a genome, but with higher probability in some genomic regions called "hotspots", while with lower probability in so-called "coldspots". With the ever-increasing amount of genome sequence data in the postgenomic era, computational methods for effectively identifying the hotspots and coldspots have become urgent as they can timely provide us with useful insights into the mechanism of meiotic recombination and the process of genome evolution as well. To meet the need, we developed a new predictor called "iRSpot-TNCPseAAC", in which a DNA sample was formulated by combining its trinucleotide composition (TNC) and the pseudo amino acid components (PseAAC) of the protein translated from the DNA sample according to its genetic codes. The former was used to incorporate its local or short-rage sequence order information; while the latter, its global and long-range one. Compared with the best existing predictor in this area, iRSpot-TNCPseAAC achieved higher rates in accuracy, Mathew's correlation coefficient, and sensitivity, indicating that the new predictor may become a useful tool for identifying the recombination hotspots and coldspots, or, at least, become a complementary tool to the existing methods. It has not escaped our notice that the aforementioned novel approach to incorporate the DNA sequence order information into a discrete model may also be used for many other genome analysis problems. The web-server for iRSpot-TNCPseAAC is available at http://www.jci-bioinfo.cn/iRSpot-TNCPseAAC. Furthermore, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the current web server to obtain their desired result without the need to follow the complicated mathematical equations.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24469313 PMCID: PMC3958819 DOI: 10.3390/ijms15021746
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1.An illustration to show the process of meiosis and recombination in a DNA system. Adapted from [2].
Figure 2.A graph to show how a DNA codon of three nucleotides is converted to an amino acid. The characters in the first three rings from the center represent four bases in DNA, while those in the fourth ring represent the single-letter codes of the 20 native amino acids in protein. The symbol * means the “Stop” sign.
The conversion code of the 64 trinucleotides in DNA to the 20 amino acids in protein.
| Trinucleotide | Amino acid |
|---|---|
| AAA | Lys (K) |
| AAC | Asn (N) |
| AAG | Lys (K) |
| AAT | Asn (N) |
|
| |
| ACA | Thr (T) |
| ACC | |
| ACG | |
| ACT | |
|
| |
| AGA | Arg (R) |
| AGC | Ser (S) |
| AGG | Arg (R) |
| AGT | Ser (S) |
|
| |
| ATA | Ile (I) |
| ATC | |
|
| |
| ATG | Met (M) |
| ATT | Ile (I) |
| CAA | Gln (Q) |
| CAC | His (H) |
| CAG | Gln (Q) |
| CAT | His (H) |
|
| |
| CCA | Pro (P) |
| CCC | |
| CCG | |
| CCT | |
|
| |
| CGA | Arg (R) |
| CGC | |
| CGG | |
| CGT | |
|
| |
| CTA | Leu (L) |
| CTC | |
| CTG | |
| CTT | |
|
| |
| GAA | Glu (E) |
| GAC | Asp (D) |
| GAG | Glu (E) |
| GAT | Asp (D) |
|
| |
| GCA | Ala (A) |
| GCC | |
| GCG | |
| GCT | |
|
| |
| GGA | Gly (G) |
| GGC | |
| GGG | |
| GGT | |
|
| |
| GTA | Val (V) |
| GTC | |
| GTG | |
| GTT | |
|
| |
| TAA | Stop! |
| TAC | Tyr (Y) |
| TAG | Stop! |
| TAT | Tyr (Y) |
|
| |
| TCA | Ser (S) |
| TCC | |
| TCG | |
| TCT | |
|
| |
| TGA | Stop! |
| TGC | Cys (C) |
| TGG | Trp (W) |
| TGT | Cys (C) |
| TTA | Leu (L) |
| TTC | Phe (F) |
| TTG | Leu (L) |
| TTT | Phe (F) |
List of the original values of the six physical-chemical properties for each of the 20 native amino acids.
| Amino acid | Hydro-phobicity | Hydro-philicity | Side-chain mass | pK1 | pK2 | PI |
|---|---|---|---|---|---|---|
| A | 0.62 | −0.5 | 15 | 2.35 | 9.87 | 6.11 |
| C | 0.29 | −1.00 | 47 | 1.71 | 10.78 | 5.02 |
| D | −0.90 | 3.00 | 59 | 1.88 | 9.60 | 2.98 |
| E | −0.74 | 3.00 | 73 | 2.19 | 9.67 | 3.08 |
| F | 1.19 | −2.50 | 91 | 2.58 | 9.24 | 5.91 |
| G | 0.48 | 0.00 | 1 | 2.34 | 9.60 | 6.06 |
| H | −0.40 | −0.50 | 82 | 1.78 | 8.97 | 7.64 |
| I | 1.38 | −1.80 | 57 | 2.32 | 9.76 | 6.04 |
| K | −1.50 | 3.00 | 73 | 2.20 | 8.90 | 9.47 |
| L | 1.06 | −1.80 | 57 | 2.36 | 9.60 | 6.04 |
| M | 0.64 | −1.30 | 75 | 2.28 | 9.21 | 5.74 |
| N | −0.78 | 0.20 | 58 | 2.18 | 9.09 | 10.76 |
| P | 0.12 | 0.00 | 42 | 1.99 | 10.60 | 6.30 |
| Q | −0.85 | 0.20 | 72 | 2.17 | 9.13 | 5.65 |
| R | −2.53 | 3.00 | 101 | 2.18 | 9.09 | 10.76 |
| S | −0.18 | 0.30 | 31 | 2.21 | 9.15 | 5.68 |
| T | −0.05 | −0.40 | 45 | 2.15 | 9.12 | 5.60 |
| V | 1.08 | −1.50 | 43 | 2.29 | 9.74 | 6.02 |
| W | 0.81 | −3.40 | 130 | 2.38 | 9.39 | 5.88 |
| Y | 0.26 | −2.30 | 107 | 2.20 | 9.11 | 5.63 |
Taken from [98];
Taken from [99];
Taken from any biochemistry text book;
Taken from [100] for C-COOH;
Taken from [100] for NH3;
Taken from [101].
The corresponding values obtained by the standard conversion of Equation 12 on the original values in Table 2.
| A | 0.62 | −0.15 | −1.55 | 0.78 | 0.77 | −0.10 |
| C | 0.29 | −0.41 | −0.52 | −2.27 | 2.57 | −0.64 |
| D | −0.90 | 1.67 | −0.13 | −1.46 | 0.24 | −1.65 |
| E | −0.74 | 1.67 | 0.33 | 0.01 | 0.37 | −1.61 |
| F | 1.19 | −1.19 | 0.91 | 1.87 | −0.48 | −0.20 |
| G | 0.48 | 0.11 | −2.00 | 0.73 | 0.24 | −0.13 |
| H | −0.40 | −0.15 | 0.62 | −1.94 | −1.01 | 0.65 |
| I | 1.38 | −0.82 | −0.19 | 0.63 | 0.55 | −0.14 |
| K | −1.50 | 1.67 | 0.33 | 0.06 | −1.15 | 1.56 |
| L | 1.06 | −0.82 | −0.19 | 0.82 | 0.24 | −0.14 |
| M | 0.64 | −0.56 | 0.39 | 0.44 | −0.54 | −0.29 |
| N | −0.78 | 0.22 | −0.16 | −0.03 | −0.77 | 2.20 |
| P | 0.12 | 0.11 | −0.68 | −0.94 | 2.21 | −0.01 |
| Q | −0.85 | 0.22 | 0.29 | −0.08 | −0.69 | −0.33 |
| R | −2.53 | 1.67 | 1.23 | −0.03 | −0.77 | 2.20 |
| S | −0.18 | 0.27 | −1.03 | 0.11 | −0.65 | −0.32 |
| T | −0.05 | −0.10 | −0.58 | −0.18 | −0.71 | −0.36 |
| V | 1.08 | −0.67 | −0.65 | 0.49 | 0.51 | −0.15 |
| W | 0.81 | −1.65 | 2.17 | 0.92 | −0.18 | −0.22 |
| Y | 0.26 | −1.08 | 1.43 | 0.06 | −0.73 | −0.34 |
A comparison of iRSpot-TNCPseAAC with the best existing method.
| Predictor | Test method | MCC | |||
|---|---|---|---|---|---|
| iRSpot-PseDNC | Jackknife | 73.06 | 89.49 | 82.04 | 0.638 |
| iRSpot-KNCPseAAC | Jackknife | 87.14 | 79.59 | 83.72 | 0.671 |
From [25];
This paper with λ = 5, w = 1.1, C = 32 and γ = 0.5 for the LIBSVM operation engine [107,108].
Figure 3.A semi-screenshot for the top page of the web-server iRSpot-TNCPseAAC at http://www.jci-bioinfo.cn/iRSpot-TNCPseAAC.