Literature DB >> 17355171

The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.

Shibu Yooseph1, Granger Sutton, Douglas B Rusch, Aaron L Halpern, Shannon J Williamson, Karin Remington, Jonathan A Eisen, Karla B Heidelberg, Gerard Manning, Weizhong Li, Lukasz Jaroszewski, Piotr Cieplak, Christopher S Miller, Huiying Li, Susan T Mashiyama, Marcin P Joachimiak, Christopher van Belle, John-Marc Chandonia, David A Soergel, Yufeng Zhai, Kannan Natarajan, Shaun Lee, Benjamin J Raphael, Vineet Bafna, Robert Friedman, Steven E Brenner, Adam Godzik, David Eisenberg, Jack E Dixon, Susan S Taylor, Robert L Strausberg, Marvin Frazier, J Craig Venter.   

Abstract

Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17355171      PMCID: PMC1821046          DOI: 10.1371/journal.pbio.0050016

Source DB:  PubMed          Journal:  PLoS Biol        ISSN: 1544-9173            Impact factor:   8.029


  141 in total

1.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

2.  Codon-substitution models for heterogeneous selection pressure at amino acid sites.

Authors:  Z Yang; R Nielsen; N Goldman; A M Pedersen
Journal:  Genetics       Date:  2000-05       Impact factor: 4.562

3.  Protein structure prediction and structural genomics.

Authors:  D Baker; A Sali
Journal:  Science       Date:  2001-10-05       Impact factor: 47.728

4.  Ensembl 2004.

Authors:  E Birney; D Andrews; P Bevan; M Caccamo; G Cameron; Y Chen; L Clarke; G Coates; T Cox; J Cuff; V Curwen; T Cutts; T Down; R Durbin; E Eyras; X M Fernandez-Suarez; P Gane; B Gibbins; J Gilbert; M Hammond; H Hotz; V Iyer; A Kahari; K Jekosch; A Kasprzyk; D Keefe; S Keenan; H Lehvaslaiho; G McVicker; C Melsopp; P Meidl; E Mongin; R Pettett; S Potter; G Proctor; M Rae; S Searle; G Slater; D Smedley; J Smith; W Spooner; A Stabenau; J Stalker; R Storey; A Ureta-Vidal; C Woodwark; M Clamp; T Hubbard
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  Distinguishing the ORFs from the ELFs: short bacterial genes and the annotation of genomes.

Authors:  Howard Ochman
Journal:  Trends Genet       Date:  2002-07       Impact factor: 11.639

6.  Structural biology. Structural genomics, round 2.

Authors:  Robert Service
Journal:  Science       Date:  2005-03-11       Impact factor: 47.728

7.  Orphans as taxonomically restricted and ecologically important genes.

Authors:  G A Wilson; N Bertrand; Y Patel; J B Hughes; E J Feil; D Field
Journal:  Microbiology       Date:  2005-08       Impact factor: 2.777

8.  The frequency distribution of gene family sizes in complete genomes.

Authors:  M A Huynen; E van Nimwegen
Journal:  Mol Biol Evol       Date:  1998-05       Impact factor: 16.240

9.  Myriads of protein families, and still counting.

Authors:  Victor Kunin; Ildefonso Cases; Anton J Enright; Victor de Lorenzo; Christos A Ouzounis
Journal:  Genome Biol       Date:  2003-01-28       Impact factor: 13.583

10.  Prolinks: a database of protein functional linkages derived from coevolution.

Authors:  Peter M Bowers; Matteo Pellegrini; Mike J Thompson; Joe Fierro; Todd O Yeates; David Eisenberg
Journal:  Genome Biol       Date:  2004-04-16       Impact factor: 13.583

View more
  321 in total

Review 1.  Inference of functional properties from large-scale analysis of enzyme superfamilies.

Authors:  Shoshana D Brown; Patricia C Babbitt
Journal:  J Biol Chem       Date:  2011-11-08       Impact factor: 5.157

2.  Diversity and abundance of single-stranded DNA viruses in human feces.

Authors:  Min-Soo Kim; Eun-Jin Park; Seong Woon Roh; Jin-Woo Bae
Journal:  Appl Environ Microbiol       Date:  2011-09-23       Impact factor: 4.792

3.  The metatranscriptome of a deep-sea hydrothermal plume is dominated by water column methanotrophs and lithotrophs.

Authors:  Ryan A Lesniewski; Sunit Jain; Karthik Anantharaman; Patrick D Schloss; Gregory J Dick
Journal:  ISME J       Date:  2012-06-14       Impact factor: 10.302

4.  Predicted protein subcellular localization in dominant surface ocean bacterioplankton.

Authors:  Haiwei Luo
Journal:  Appl Environ Microbiol       Date:  2012-07-06       Impact factor: 4.792

5.  Microbial community transcriptomes reveal microbes and metabolic pathways associated with dissolved organic matter turnover in the sea.

Authors:  Jay McCarren; Jamie W Becker; Daniel J Repeta; Yanmei Shi; Curtis R Young; Rex R Malmstrom; Sallie W Chisholm; Edward F DeLong
Journal:  Proc Natl Acad Sci U S A       Date:  2010-08-31       Impact factor: 11.205

6.  Analysis of membrane proteins in metagenomics: networks of correlated environmental features and protein families.

Authors:  Prianka V Patel; Tara A Gianoulis; Robert D Bjornson; Kevin Y Yip; Donald M Engelman; Mark B Gerstein
Journal:  Genome Res       Date:  2010-04-29       Impact factor: 9.043

7.  Biodiversity 2010: the tip of the iceberg.

Authors: 
Journal:  Nat Rev Microbiol       Date:  2010-06       Impact factor: 60.633

Review 8.  Iron-sulfur protein folds, iron-sulfur chemistry, and evolution.

Authors:  Jacques Meyer
Journal:  J Biol Inorg Chem       Date:  2007-11-09       Impact factor: 3.358

9.  Crystal structure of a novel Sm-like protein of putative cyanophage origin at 2.60 A resolution.

Authors:  Debanu Das; Piotr Kozbial; Herbert L Axelrod; Mitchell D Miller; Daniel McMullan; S Sri Krishna; Polat Abdubek; Claire Acosta; Tamara Astakhova; Prasad Burra; Dennis Carlton; Connie Chen; Hsiu-Ju Chiu; Thomas Clayton; Marc C Deller; Lian Duan; Ylva Elias; Marc-André Elsliger; Dustin Ernst; Carol Farr; Julie Feuerhelm; Anna Grzechnik; Slawomir K Grzechnik; Joanna Hale; Gye Won Han; Lukasz Jaroszewski; Kevin K Jin; Hope A Johnson; Heath E Klock; Mark W Knuth; Abhinav Kumar; David Marciano; Andrew T Morse; Kevin D Murphy; Edward Nigoghossian; Amanda Nopakun; Linda Okach; Silvya Oommachen; Jessica Paulsen; Christina Puckett; Ron Reyes; Christopher L Rife; Natasha Sefcovic; Sebastian Sudek; Henry Tien; Christine Trame; Christina V Trout; Henry van den Bedem; Dana Weekes; Aprilfawn White; Qingping Xu; Keith O Hodgson; John Wooley; Ashley M Deacon; Adam Godzik; Scott A Lesley; Ian A Wilson
Journal:  Proteins       Date:  2009-05-01

10.  MetAMOS: a modular and open source metagenomic assembly and analysis pipeline.

Authors:  Todd J Treangen; Sergey Koren; Daniel D Sommer; Bo Liu; Irina Astrovskaya; Brian Ondov; Aaron E Darling; Adam M Phillippy; Mihai Pop
Journal:  Genome Biol       Date:  2013-01-15       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.