| Literature DB >> 26747447 |
Tatiana V Tatarinova1,2, Inna Lysnyansky3, Yuri V Nikolsky4,5,6, Alexander Bolshoy7.
Abstract
BACKGROUND: The length of a protein sequence is largely determined by its function. In certain species, it may be also affected by additional factors, such as growth temperature or acidity. In 2002, it was shown that in the bacterium Escherichia coli and in the archaeon Archaeoglobus fulgidus, protein sequences with no homologs were, on average, shorter than those with homologs (BMC Evol Biol 2:20, 2002). It is now generally accepted that in bacterial and archaeal genomes the distributions of protein length are different between sequences with and without homologs. In this study, we examine this postulate by conducting a comprehensive analysis of all annotated prokaryotic genomes and by focusing on certain exceptions.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26747447 PMCID: PMC4706650 DOI: 10.1186/s13062-015-0104-3
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Fig. 1Histograms of protein lengths of Coxiella burnetii (a), Mycobacterium leprae (b), Chlamydia trachomatis (c), Rickettsia prowazekii (d) and all other prokaryotes (e) and (f) tested in this study. The X axis corresponds to the protein length intervals (0 -- (0100], 100-- (100,200], etc.), while the Y axis show the relative frequency of ORFans and HHPs among the genomes. Bar plot and the relative frequency plot with a smaller bin size for all prokaryotes are presented on Panels (e) and (f), respectively
Fig. 2The difference between mean lengths of HHPs and ORFans for 1484 prokaryotic genomes. The X axis corresponds to the difference between mean length of HHPs and ORFans, and the Y axis shows the relative frequency of genomes with the given length difference. For all prokaryotic genomes, HHPs are longer than ORFans by, on average, 144 amino acids. For the Mycoplasmataceae genomes, the average difference is only 14 amino acids, while 17 out of 37 Mycoplasmataceae genomes have ORFans that are, on average, longer than HHPs
List of atypical genomes showing HHPs’ average length, number of HHPs, ORFans' length, and the difference between the average length of HHP and ORFans for a given genome
| Species | Average length of HHPs, aa | Number of HHPs | Average length of ORFans, aa | Number ORFans | Difference between the average length of HHP and ORFans, aa | |
|---|---|---|---|---|---|---|
| 1. |
| 366 | 416 | 472 | 230 | −106 |
| 2. |
| 350 | 384 | 450 | 91 | −100 |
| 3. |
| 375 | 443 | 449 | 214 | −74 |
| 4. |
| 349 | 691 | 417 | 232 | −68 |
| 5. |
| 368 | 489 | 429 | 274 | −61 |
| 6. |
| 359 | 413 | 410 | 196 | −51 |
| 7. |
| 367 | 1564 | 417 | 435 | −50 |
| 8. |
| 353 | 699 | 395 | 241 | −42 |
| 9. |
| 358 | 450 | 400 | 183 | −42 |
| 10. |
| 352 | 464 | 387 | 194 | −35 |
| 11. |
| 375 | 437 | 410 | 254 | −34 |
| 12. |
| 363 | 441 | 394 | 173 | −30 |
| 13. |
| 352 | 699 | 382 | 249 | −30 |
| 14. |
| 360 | 420 | 387 | 272 | −27 |
| 15. |
| 363 | 490 | 387 | 199 | −24 |
| 16. |
| 391 | 471 | 413 | 186 | −23 |
| 17. |
| 369 | 378 | 383 | 145 | −14 |
| 18. |
| 492 | 51 | 500 | 53 | −8 |
| 19. |
| 384 | 658 | 390 | 379 | −5 |
| 20. |
| 377 | 2835 | 374 | 898 | 2 |
| 21. |
| 358 | 474 | 351 | 176 | 7 |
| 22. |
| 366 | 522 | 354 | 291 | 11 |
| 23. |
| 332 | 82 | 318 | 21 | 13 |
| 24. |
| 348 | 1513 | 335 | 704 | 14 |
| 25. |
| 286 | 356 | 269 | 184 | 16 |
| 26. |
| 363 | 479 | 345 | 180 | 17 |
| 27. |
| 384 | 619 | 361 | 303 | 23 |
| 28. |
| 353 | 475 | 329 | 267 | 25 |
| 29. |
| 371 | 526 | 343 | 239 | 28 |
| 30. |
| 348 | 678 | 320 | 247 | 28 |
| 31. |
| 379 | 560 | 350 | 222 | 30 |
| 32. |
| 367 | 445 | 334 | 203 | 32 |
Number of sequenced and annotated genomes for the selected set of bacterial species
| Genus | Total number sequenced genomes | Number of genomes with assigned COG |
|---|---|---|
|
| 9 | 4 |
|
| 6 | 5 |
|
| 2 | 2 |
|
| 7 | 4 |
|
| 68 | 29 |
|
| 3 | 3 |
|
| 2 | 1 |
|
| 6 | 0 |
|
| 3 | 1 |
|
| 3 | 2 |
|
| 0 | 0 |
|
| 0 | 0 |
|
| 0 | 0 |
Fig. 3Mean (a) and Median (b) ORFans’ length vs. average HHP length for selected eight groups of prokaryotes. Each point represents a genome. Family Mycoplasmataceae (pink), family Mycobacteriaceae (red), genus Agrobacterium (blue), genus Bacillus (green), genus Anaplasma (orange), genus Ehrlichia (dark green), genus Neorickettsia (black) and genus Campylobacter (grey) are shown. The regression line shows the relationships between the mean HHP length and the mean ORFan length across 1484 annotated prokaryotic genomes
Fig. 4Histograms of protein lengths of (a) M. genitalium (M. genitalium G37 uid57707, NC_000908) and (b) M. hyopneumonia (M. hyopneumoniae 232 uid58205, NC_006360) X axis labels correspond to the following protein length intervals 0 -- (0100], 100-- (100,200], etc. Y axis shows a relative frequency of the protein with a given length in a genome
Fig. 5Genomic GC content (Panel a) and genic GC3 content (Panel b) in annotated species of Mycoplasmataceae. Grey histograms correspond to all prokaryotes while red histograms correspond to selected Mycoplasma species. Horizontal axis shows GC (a) and GC3 content, and vertical axis shows the number of prokaryotic genomes with the given content