| Literature DB >> 23300345 |
Alexander Bolshoy1, Tatiana Tatarinova.
Abstract
In this paper we present a novel method for genome ranking according to gene lengths. The main outcomes described in this paper are the following: the formulation of the genome ranking problem, presentation of relevant approaches to solve it, and the demonstration of preliminary results from prokaryotic genomes ordering. Using a subset of prokaryotic genomes, we attempted to uncover factors affecting gene length. We have demonstrated that hyperthermophilic species have shorter genes as compared with mesophilic organisms, which probably means that environmental factors affect gene length. Moreover, these preliminary results show that environmental factors group together in ranking evolutionary distant species.Entities:
Keywords: adaptation; clustering; dimension-reduction techniques; evolution of prokaryotes; factor analysis; machine learning; orthologs; ranking; rating
Year: 2012 PMID: 23300345 PMCID: PMC3528112 DOI: 10.4137/BBI.S10525
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Following an example from (Hochbaum et al).20
| Gene families | Average | A-ran king | Intuitive ranking | |||
|---|---|---|---|---|---|---|
|
| ||||||
| a | b | c | ||||
| A | 800 | 100 | 450 | 2 | 1 | |
| B | 200 | 600 | 400 | 1 | 3 | |
| C | 900 | 100 | 500 | 500 | 3 | 2 |
Figure 1Histogram of number of genomes contained in each COG.
List of genomes in a ranking order.
| Rank | Domain | Name |
|---|---|---|
| 1 | Archaea | Archaeoglobus fulgidus dsm 4304 |
| 2 | Bacteria | Thermotoga sp. rq2 |
| 3 | Archaea | Thermococcus onnurineus na1 |
| 4 | Archaea | Thermoplasma volcanium gss1 |
| 5 | Bacteria | Thermotoga neapolitana dsm 4359 |
| 6 | Archaea | Thermoplasma acidophilum dsm 1728 |
| 7 | Archaea | Pyrococcus abyssi ge5 |
| 8 | Bacteria | Aquifex aeolicus vf5 |
| 9 | Bacteria | Campylobacter concisus 13826 |
| 11 | Archaea | Thermococcus sibiricus mm 739 |
| 12 | Archaea | Pyrococcus horikoshii ot3 |
| 12 | Bacteria | Campylobacter curvus 525.92 |
| 13 | Bacteria | Helicobacter felis atcc 49179 |
| 13 | Bacteria | Dictyoglomus thermophilum h-6–12 |
| 15 | Bacteria | Streptococcus pneumoniae p1031 |
| 16 | Bacteria | Streptococcus agalactiae a909 |
| 18 | Bacteria | Bacillus cereus atcc 14579 |
| 19 | Bacteria | Mycoplasma pulmonis uab ctip |
| 20 | Bacteria | Bacillus cytotoxicus nvh 391–98 |
| 20 | Bacteria | Listeria monocytogenes serotype 4b |
| 20 | Archaea | Methanosalsum zhilinae dsm 4017 |
| 21 | Bacteria | Streptococcus agalactiae 2603v/r |
| 22 | Bacteria | Caldicellulosiruptor bescii dsm 6725 |
| 25 | Bacteria | Bacillus amyloliquefaciens dsm 7 |
| 26 | Bacteria | Mycoplasma fermentans m64 |
| 27 | Bacteria | Rickettsia canadensis str. mckiel |
| 28 | Bacteria | Ureaplasma parvum serovar 3 |
| 29 | Bacteria | Francisella sp. tx077308 |
| 29 | Bacteria | Streptococcus zooepidemicus |
| 30 | Bacteria | Melissococcus plutonius atcc 35311 |
| 30 | Bacteria | Mycoplasma leachii pg50 |
| 31 | Bacteria | Bacillus pumilus safr-032 |
| 32 | Bacteria | Pediococcus pentosaceus atcc 25745 |
| 34 | Bacteria | Mycoplasma genitalium g37 |
| 35 | Bacteria | Enterococcus faecalis v583 |
| 39 | Bacteria | Legionella pneumophila str. paris |
| 40 | Bacteria | Natranaerobius thermophilus |
| 40 | Bacteria | Brachyspira pilosicoli 95/1000 |
| 40 | Bacteria | Ruminococcus albus 7 |
| 41 | Bacteria | Bacillus thuringiensis str. al hakam |
| 41 | Bacteria | Brevibacillus brevis nbrc 100599 |
| 41 | Bacteria | Geobacter uraniireducens rf4 |
| 42 | Bacteria | Geobacter lovleyi sz |
| 44 | Bacteria | Neisseria meningitidis 053442 |
| 44 | Bacteria | Coxiella burnetii rsa 331 |
| 45 | Bacteria | Mycoplasma pneumoniae m129 |
| 46 | Bacteria | Maribacter sp. htcc2170 |
| 47 | Bacteria | Laribacter hongkongensis hlhk9 |
| 48 | Bacteria | Pseudogulbenkiania sp. nh8b |
| 51 | Bacteria | Zobellia galactanivorans |
| 52 | Bacteria | Dechloromonas aromatica rcb |
| 53 | Bacteria | Sodalis glossinidius str. ‘morsitans’ |
| 54 | Bacteria | Erwinia amylovora atcc 49946 |
| 55 | Archaea | Halalkalicoccus jeotgali b3 |
| 55 | Bacteria | Escherichia coli bw2952 |
| 55 | Bacteria | Gramella forsetii kt0803 |
| 55 | Bacteria | Lactobacillus gasseri atcc 33323 |
| 55 | Bacteria | Borrelia turicatae 91e135 |
| 60 | Bacteria | Klebsiella variicola at-22 |
| 60 | Bacteria | Candidatus riesia pediculicola usda |
| 61 | Bacteria | Salmonella enterica subsp. arizonae |
| 61 | Bacteria | Eubacterium eligens atcc 27750 |
| 62 | Bacteria | Sphingobacterium sp. 21 |
| 63 | Bacteria | Methylomonas methanica mc09 |
| 63 | Bacteria | Dyadobacter fermentans dsm 18053 |
| 64 | Bacteria | Yersinia enterocolitica subsp |
| 67 | Bacteria | Chlamydophila pneumoniae ar39 |
| 68 | Bacteria | Cronobacter turicensis z3032 |
| 69 | Bacteria | Spirochaeta smaragdinae dsm 11293 |
| 71 | Bacteria | Yersinia pseudotuberculosis pb1/+ |
| 72 | Bacteria | Pelodictyon phaeoclathratiforme |
| 72 | Bacteria | Tropheryma whipplei tw08/27 |
| 73 | Bacteria | Xanthomonas oryzae |
| 73 | Bacteria | Desulfovibrio vulgaris |
| 74 | Bacteria | Dinoroseobacter shibae dfl 12 |
| 75 | Bacteria | Acidiphilium cryptum jf-5 |
| 77 | Bacteria | Thauera sp. mz1t |
| 77 | Bacteria | Magnetococcus marinus mc-1 |
| 78 | Bacteria | Prosthecochloris aestuarii dsm 271 |
| 79 | Bacteria | Anaerolinea thermophila uni-1 |
| 81 | Bacteria | Sinorhizobium meliloti 1021 |
| 82 | Bacteria | Bordetella petrii dsm 12804 |
| 84 | Bacteria | Chloroflexus aggregans dsm 9485 |
| 84 | Bacteria | Arcanobacterium haemolyticum |
| 85 | Bacteria | Corynebacterium glutamicum r |
| 87 | Bacteria | Cyanothece sp. pcc 7822 |
| 87 | Bacteria | Starkeya novella dsm 506 |
| 87 | Bacteria | Gluconacetobacter diazotrophicus |
| 88 | Bacteria | Rhodopseudomonas palustris dx-1 |
| 90 | Bacteria | Rhodospirillum centenum sw |
| 91 | Bacteria | Xanthobacter autotrophicus py2 |
| 92 | Bacteria | Mycobacterium leprae br4923 |
| 93 | Bacteria | Intrasporangium calvum dsm 43043 |
| 94 | Bacteria | Streptomyces scabiei 87.22 |
| 94 | Bacteria | Streptomyces griseus subsp. |
| 96 | Bacteria | Burkholderia rhizoxinica hki 454 |
| 97 | Bacteria | Haliangium ochraceum dsm 14365 |
| 98 | Bacteria | Salinibacter ruber m8 |
| 99 | Bacteria | Rothia dentocariosa atcc 17931 |
| 100 | Bacteria | Bifidobacterium animalis |
Figure 2Number of COGs that each genome contains. Genomes are ordered as in Table 2.
Figure 3Comparison of rankings produced by two different incomplete subsets of COGs.