| Literature DB >> 22196359 |
Maria A Korotkova1, Nikolay A Kudryashov, Eugene V Korotkov.
Abstract
The concept of the phase shift of triplet periodicity (TP) was used for searching potential DNA insertions in genes from 17 bacterial genomes. A mathematical algorithm for detection of these insertions has been developed. This approach can detect potential insertions and deletions with lengths that are not multiples of three bases, especially insertions of relatively large DNA fragments (>100 bases). New similarity measure between triplet matrixes was employed to improve the sensitivity for detecting the TP phase shift. Sequences of 17,220 bacterial genes with each consisting of more than 1,200 bases were analyzed, and the presence of a TP phase shift has been shown in ∼16% of analysed genes (2,809 genes), which is about 4 times more than that detected in our previous work. We propose that shifts of the TP phase may indicate the shifts of reading frame in genes after insertions of the DNA fragments with lengths that are not multiples of three bases. A relationship between the phase shifts of TP and the frame shifts in genes is discussed.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22196359 PMCID: PMC5054449 DOI: 10.1016/S1672-0229(11)60019-3
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1The influence of one DNA base insertion on the TP phase shift. The first three sequences have the reading frames T1, T2 and T3, respectively. Then the coding sequence S with TP is shown. In this sequence the insertion of the nucleotide c was made in the position 19. Explicit periodicity of this sequence is chosen for clarity. In a case of “fuzzy” periodicity, the situation is the same as in the figure, but the periodicity will be difficult to be observed visually. Then we construct the TP matrices M1(1, 18), M1(19, 37), M2(19, 37) and M3(19, 37). The first matrix M1 is constructed for the DNA region from the 1st to 18th base. Elements of these matrices m1(i, j), m2(i, j) and m3(i, j) show the number of the bases a, t, c and g (index i) for the positions in the triplet reading frames T1, T2 and T3 (index j). If we compare the matrix M1(1, 18) with the matrices M1(19, 37), M2(19, 37) and M3(19, 37), it can be seen that this matrix is most similar to the matrix M2(19, 37). The initial phase of the matrices M1, M2 and M3 in sequence S is equal to 1, 2 and 3 because the bases of sequence S with indices k equal to 1, 2 and 3 are the first bases of the triplet in the reading frames T1, T2 and T3. Therefore, there is a TP phase shift by 1 base in sequence S after the position x=18 (the difference between the initial phases of the matrices M2 and M1).
Figure 2The calculation of the matrix M1(x1−L1+1, x1) and the matrices M(x2+k, x2+L1+k−1), k=1, 2, 3 in sequence S. A. The positions of regions with length L1 in sequence S. B. The example of calculation of the matrix M1(x1−L1+1, x1) and the matrix M(x2+k, x2+L1+k−1) for L1=21, x1=21, x2=30. The insertion fragment begins from the 22nd base and ends by the 31st base of the DNA sequence. The insertion fragment shown in bold letters has another TP type than the rest DNA sequence. It is possible to see that matrix M1(1, 21) differs from matrix M1(31, 51) and matrix M3(33, 53) while similar to matrix M2(32, 52).
Figure 3The density distribution of Z12 for different values of X11. Symbols • for X11=0; ○ for X11=6.0; ▼ for X11=12.0.
List of the prokaryotic genomes used for searching the genes with triplet periodicity shifts
| No. | Genome | No. of analyzed genes (>1,200 bp) | Q1 | Q2 | Q3 |
|---|---|---|---|---|---|
| 1 | 611 | 5 | 30 | 3 | |
| 2 | 1,306 | 43 | 140 | 29 | |
| 3 | 885 | 42 | 108 | 34 | |
| 4 | 1,380 | 116 | 240 | 103 | |
| 5 | 937 | 50 | 145 | 35 | |
| 6 | 1,158 | 101 | 237 | 75 | |
| 7 | 444 | 16 | 49 | 14 | |
| 8 | 854 | 41 | 111 | 38 | |
| 9 | 1,566 | 38 | 162 | 29 | |
| 10 | 626 | 28 | 91 | 24 | |
| 11 | 1,187 | 109 | 227 | 86 | |
| 12 | 507 | 27 | 82 | 25 | |
| 13 | 1,183 | 175 | 286 | 141 | |
| 14 | 1,200 | 94 | 220 | 68 | |
| 15 | 1,047 | 61 | 176 | 41 | |
| 16 | 1,245 | 85 | 253 | 62 | |
| 17 | 1,084 | 119 | 252 | 98 | |
| Total | 17,220 | 1,150 | 2,809 | 905 | |
Note: Q1, Q2 and Q3 are the number of the genes with a length greater than 1,200 bp that have a TP phase shift revealed by the method developed previously (, the method developed in the present work, and both the method developed previously ( and the method developed in the present work, respectively.
Figure 4Dependence of mZ12 on x1 (A) and x2 (B) for the gene encoding the transaldolase B from the genome of E. coli (b0008 in KEGG database).
Figure 5Dependence of mZ12 on x1 (A) and x2 (B) for the gene encoding transaldolase B from the genome of E. coli (b0008 in KEGG database) with insertion of 298 nucleotides after the 300th nucleotide.