| Literature DB >> 15113412 |
Abstract
BACKGROUND: Genomic islands can be observed in many microbial genomes. These stretches of DNA have a conspicuous composition with regard to sequence or encoded functions. Genomic islands are assumed to be frequently acquired via horizontal gene transfer. For the analysis of genome structure and the study of horizontal gene transfer, it is necessary to reliably identify and characterize these islands.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15113412 PMCID: PMC394314 DOI: 10.1186/1471-2105-5-22
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Selectivity of four methods for the identification of compositional atypical genes. Two sets were analysed consisting of genes VCA0010 to VCA0230 (control group) and genes VCA0271 to VCA0491 (belonging to the integron island) from chromosome two of V. cholerae. For each gene, the indicators codon usage contrast (CU), δ* difference, dicodon usage (DC) and h(gene) (as introduced here) were determined as described, the values were accumulated set-wise in histograms. Any position on a curve gives on the two axes the fraction of genes below the corresponding cut-off value.
Figure 2Plot of GCB-scores versus CU-contrast values for all genes of E. coli K-12 and the classification of compositional atypical genes. For all genes of the genomic data set, the two parameters were determined, converted to z-values and plotted as small dots. A high GCB-score is an indicator for adaptation to translational efficiency. Genes annotated as putatively alien according to the classification CAand/or by using the MPW approach were labelled. The set CAAND MPW consists of those genes identified as compositional atypical by both methods.
Fraction of compositional atypical genes in microbial genomes. The numbers in the column CUare as in [21], column MPW gives the fraction of CA genes as determined by the MPW approach described here.
| 15.7 | ||
| 14.8 | ||
| 14.2 | ||
| 13.3 | ||
| 13.0 | ||
| 12.9 | ||
| 11.6 | 12.9 | |
| 12.9 | ||
| 12.6 | ||
| 12.8 | 12.5 | |
| 11.9 | ||
| 10.9 | ||
| 10.9 | ||
| 10.8 | ||
| 10.6 | ||
| 10.2 | ||
| 10.2 | ||
| 7.5 | 10.1 | |
| 9.7 | ||
| 9.4 | ||
| 9.0 | ||
| 8.9 | ||
| 8.5 | ||
| 8.1 | ||
| 7.9 | ||
| 7.7 | ||
| 7.7 | ||
| 7.4 | ||
| 7.4 | ||
| 6.8 | ||
| 6.4 | ||
| 6.3 | ||
| 3.2 | 6.1 | |
| 6.1 | ||
| 6.4 | 6.0 | |
| 5.9 | ||
| 6.1 | ||
| 4.5 | 5.9 | |
| 16.6 | 5.6 | |
| 5.4 | ||
| 5.1 | ||
| 5.0 |
Genomic islands in the genome of Bacillus subtilis.
| PHX: ribosomal proteins | 108–155 | |||
| P1 prophage | 202–220 | 202–223 | 202–213 | |
| Surfactin | 402–410 | |||
| P2 prophage | 529–570 | 529- | 555–567 | Bacteria |
| 570–600 | -587 | -- | Bacteria, | |
| P3 prophage | 651–664 | 653–664 | -- | |
| Site-specific recombinase | 738–747 | 737–746 | -- | |
| 752–782 | ||||
| Multidrug-efflux transporter | 818–822 | -- | ||
| -- | 1124–1130 | -- | ||
| P4 prophage | 1262–1270 | 1275–1280 | -- | Bacteria |
| PBSX prophage (1320–1348) | -- | -- | ||
| -- | 1397–1399 | 1385–1424 | ||
| -- | 1442–1447 | -- | ||
| -- | 1478–1482 | -- | ||
| P5 prophage | 1879–1891 | 1879–1901 | -- | Bacteria, |
| -- | 2038–2041 | -- | ||
| P6 prophage | 2046–2073 | 2050–2060 | ||
| SPβ prophage | 2151–2286 | 2152–2286 | -- | Bacteria, |
| Skin prophage | 2652–2701 | 2652- | 2654–2701 | Bacteria, |
| P7 prophage | 2707–2756 | -2747 | 2725–2735 | Bacteria, |
| Competence | 3253–3257 | 3252–3257 | -- | |
| Arsenic resistance regulator | 3463–3467 | 3462–3469 | ||
| PHX: | 3475–3482 | |||
| -- | -- | 3608–3634 | ||
| Cell wall synthesis | 3658–3685 | 3658–3684 | 3665–3672 | Bacteria, |
| Nitrate reductase | 3819–3831 | |||
| 4009–4022 | Bacteria | |||
| ABC transporter | 4123–4134 | 4122–4139 | -- | Bacteria |
| ABC transporter | 4171–4176 | 4168- | 4170–4176 | Bacteria, gamma subdivision |
| Streptothricin, tetracycline, mercury regul. | 4184–4190 | -4193 | 4189–4190 | Bacteria |
Numbers give positions on the chromosome in kb. The values in the columns HMM and Repeats are as from [24]. The column "Putative Source" lists predictions generated by SIGI.
Figure 3Summary view of SIGI's annotation for the genome of S. agalactiae. Each symbol labels a single gene (product). Meaning of the characters: "R" tRNA gene, "x" or "X" two levels of bias in putatively highly expressed genes, "I" integrase, "T" transposase, "H" hypothetical protein identified as CA, "G" a gene annotated with a function and identified as CA, "." a gene classified as insuspicious.
Combinations of acceptor and donor species for the generation of random sequences mimicking the amelioration process.
| 63 | |||||||||
| 52 | |||||||||
| 5.0 | |||||||||
| 42 | |||||||||
| 31 | |||||||||
The columnAcceptor gives the name of the "accepting" species. Column I to VIII list the names of those species selected as donors and the Manhattandistance to the acceptor's codon frequency table.
SIGI's performance in predicting the donor genome for synthetic genes modelling the amelioration process.
| 0.05 | 0.50 7/2 | 0.50 69/1 | 0.50 34/11 | 0.50 26/1 | 0.50 115/0 | 0.10 1/2 | 0.50 54/89 | 0.50 51/52 | |
| 0.10 | 0.75 286/16 | 0.75 329/3 | 0.50 82/17 | 0.50 298/7 | 0.50 360/7 | 0.50 399/6 | 0.50 154/251 | 0.50 193/265 | |
| 0.25 | 0.75 360/23 | 0.75 359/3 | 0.50 90/20 | 0.95 485/8 | 0.25 58/8 | 0.50 482/7 | 0.50 161/259 | 0.50 207/277 | |
| 0.95 476/13 | 0.95 486/2 | 0.95 489/7 | 0.95 485/8 | 0.95 497/1 | 0.95 499/0 | 0.95 499/0 | 0.95 302/198 | ||
| 0.05 | 1.00 32/6 | 0.50 38/0 | 0.50 69/1 | 0.95 10/2 | 0.75 73/1 | 0.90 88/3 | 0.75 45/3 | 0.75 133/7 | |
| 0.10 | 0.95 349/39 | 0.75 455/1 | 0.75 454/8 | 0.50 392/6 | 1.00 444/4 | 0.90 455/6 | 0.50 433/21 | 0.75 446/27 | |
| 0.25 | 1.00 378/69 | 0.75 484/1 | 0.75 490/8 | 0.50 471/8 | 1.00 496/4 | 0.90 492/8 | 0.50 473/27 | 0.75 469/31 | |
| 0.95 386/44 | 0.95 499/0 | 0.95 497/2 | 0.95 497/2 | 0.95 498/2 | 0.95 495/5 | 0.95 499/1 | 0.95 499/1 | ||
| 0.05 | 0.75 134/1 | 0.50 1/0 | 0.75 5/0 | 0.50 8/0 | 0.75 98/2 | 0.75 58/1 | 0.50 23/11 | 0.90 303/84 | |
| 0.10 | 0.75 389/1 | 0.95 358/2 | 0.75 137/0 | 0.50 120/1 | 0.75 455/8 | 0.75 440/5 | 0.50 258/153 | 0.90 381/110 | |
| 0.25 | 0.75 411/1 | 0.95 466/2 | 0.50 23/0 | 0.50 177/1 | 0.75 484/9 | 0.75 494/6 | 0.50 324/176 | 0.90 388/112 | |
| 0.95 498/0 | 0.95 428/0 | 0.95 500/0 | 0.95 494/6 | 0.95 496/4 | 0.95 476/26 | 0.95 460/40 | |||
| 0.05 | 1.00 20/7 | 0.50 5/1 | 0.75 134/42 | 0.50 46/37 | 0.75 157/6 | 0.50 50/32 | 0.25 60/20 | 0.75 121/242 | |
| 0.10 | 1.00 60/39 | 0.00 4/2 | 0.75 335/137 | 0.50 309/148 | 0.75 452/19 | 0.50 299/167 | 0.25 312/151 | 0.75 181/306 | |
| 0.25 | 1.00 79/44 | 0.00 7/2 | 0.75 354/146 | 0.50 338/162 | 0.50 472/23 | 0.25 378/122 | 0.25 339/161 | 0.75 188/312 | |
| 0.95 66/35 | 0.95 155/0 | 0.95 405/95 | 0.95 495/5 | 0.95 494/6 | 0.95 496/4 | 0.95 470/30 | 0.95 405/95 | ||
For each pair of donor and acceptor, the worst case is given for three values of AMELI. Each entry lists the fraction FRAC and the number of correct/incorrect prediction generated for a dataset consisting of 500 sequences. The last line gives the number of correct/incorrect predictions for a FRAC value of 0.95.