| Literature DB >> 16914045 |
Abstract
BACKGROUND: The origin of microbial ORFans, ORFs having no detectable homology to other ORFs in the databases, is one of the unexplained puzzles of the post-genomic era. Several hypothesis on the origin of ORFans have been suggested in the last few years, most of which based on selected, relatively small, subsets of ORFans. One of the hypotheses for the origin of ORFans is that they have been acquired thru lateral transfer from viruses. Here we carry out a comprehensive, genome-wide study on the origins of ORFans to quantify the strength of current evidence supporting this hypothesis.Entities:
Mesh:
Year: 2006 PMID: 16914045 PMCID: PMC1559721 DOI: 10.1186/1471-2148-6-63
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1U-value histogram for all the 818,906 ORFs in 277 prokaryote genomes. The U-value is a measure of the "conservation" of each ORF (see Methods); U = 0 means the ORF is unique to one single organism, i.e. a singleton or paralogous ORFan. 9.1% of all ORFs have U = 0. The left tail 0.0 < U <= 0.1 (4.3% of all ORFs) corresponds to orthologous ORFans, ORFs with homologs only in closely related organisms. Notice the uneven distribution of U, with its long left tail and the very high peak at U = 0.
Figure 2Percentage of microbial ORFs having homologs in viruses for 277 prokaryote genomes. The y-axis shows each of the 277 genomes, grouped according to NCBI's taxonomy classification. For each genome, two percentages are shown: red corresponds to ORFans-VH% (percentage of ORFans having homologs in viruses) and blue corresponds to non-ORFans-VH% (percentage of non-ORFans having homologs in viruses). The major 24 clade names are shown, with the number of organisms in each clade shown in parenthesis. The 24 phylogenetic clades are alternately marked by grey and no background colors. For the species names and taxonomies, please refer to Additional file 1. The inset shows the average percentage values of ORFans-VH% and non-ORFans-VH% in various groups. "Total" corresponds to the averages in all 277 genomes taken together, "Non_Firm_Gamma (Others)" corresponds to the averages in the 148, non-Firmicutes, non-Gamma-proteobacteria genomes and "Firm" corresponds to the 66 Firmicutes in the database. The remaining groups in the inset correspond to the major clades (with at least 10 genomes). The figure clearly shows that except for some Firmicutes, ORFans-VH% is much smaller than non-ORFans-VH%, suggesting that the current evidence from homology supporting the hypothesis that the origin of ORFans is viral is weak at best.