Literature DB >> 17379688

UniRef: comprehensive and non-redundant UniProt reference clusters.

Baris E Suzek1, Hongzhan Huang, Peter McGarvey, Raja Mazumder, Cathy H Wu.   

Abstract

MOTIVATION: Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences.
RESULTS: The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis. AVAILABILITY: UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Substances:

Year:  2007        PMID: 17379688     DOI: 10.1093/bioinformatics/btm098

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  500 in total

Review 1.  Bioinformatics for personal genome interpretation.

Authors:  Emidio Capriotti; Nathan L Nehrt; Maricel G Kann; Yana Bromberg
Journal:  Brief Bioinform       Date:  2012-01-13       Impact factor: 11.622

2.  Complete genome sequence of the photosynthetic purple nonsulfur bacterium Rhodobacter capsulatus SB 1003.

Authors:  Hynek Strnad; Alla Lapidus; Jan Paces; Pavel Ulbrich; Cestmir Vlcek; Vaclav Paces; Robert Haselkorn
Journal:  J Bacteriol       Date:  2010-04-23       Impact factor: 3.490

3.  SIFT missense predictions for genomes.

Authors:  Robert Vaser; Swarnaseetha Adusumalli; Sim Ngak Leng; Mile Sikic; Pauline C Ng
Journal:  Nat Protoc       Date:  2015-12-03       Impact factor: 13.491

4.  Improving prediction of helix-helix packing in membrane proteins using predicted contact numbers as restraints.

Authors:  Bian Li; Jeffrey Mendenhall; Elizabeth Dong Nguyen; Brian E Weiner; Axel W Fischer; Jens Meiler
Journal:  Proteins       Date:  2017-04-01

5.  Towards completion of the Earth's proteome.

Authors:  Carolina Perez-Iratxeta; Gareth Palidwor; Miguel A Andrade-Navarro
Journal:  EMBO Rep       Date:  2007-12       Impact factor: 8.807

6.  Prediction of protein function improving sequence remote alignment search by a fuzzy logic algorithm.

Authors:  Antonio Gómez; Juan Cedano; Jordi Espadaler; Antonio Hermoso; Jaume Piñol; Enrique Querol
Journal:  Protein J       Date:  2008-02       Impact factor: 2.371

7.  Development of ChillPeach genomic tools and identification of cold-responsive genes in peach fruit.

Authors:  Ebenezer A Ogundiwin; Cristina Martí; Javier Forment; Clara Pons; Antonio Granell; Thomas M Gradziel; Cameron P Peace; Carlos H Crisosto
Journal:  Plant Mol Biol       Date:  2008-07-27       Impact factor: 4.076

8.  A fast Peptide Match service for UniProt Knowledgebase.

Authors:  Chuming Chen; Zhiwen Li; Hongzhan Huang; Baris E Suzek; Cathy H Wu
Journal:  Bioinformatics       Date:  2013-08-19       Impact factor: 6.937

Review 9.  Sequencing and beyond: integrating molecular 'omics' for microbial community profiling.

Authors:  Eric A Franzosa; Tiffany Hsu; Alexandra Sirota-Madi; Afrah Shafquat; Galeb Abu-Ali; Xochitl C Morgan; Curtis Huttenhower
Journal:  Nat Rev Microbiol       Date:  2015-04-27       Impact factor: 60.633

10.  Functional identification of valerena-1,10-diene synthase, a terpene synthase catalyzing a unique chemical cascade in the biosynthesis of biologically active sesquiterpenes in Valeriana officinalis.

Authors:  Yun-Soo Yeo; S Eric Nybo; Amar G Chittiboyina; Aruna D Weerasooriya; Yan-Hong Wang; Elsa Góngora-Castillo; Brieanne Vaillancourt; C Robin Buell; Dean DellaPenna; Mary Dawn Celiz; A Daniel Jones; Eve Syrkin Wurtele; Nick Ransom; Natalia Dudareva; Khaled A Shaaban; Nidhi Tibrewal; Suman Chandra; Troy Smillie; Ikhlas A Khan; Robert M Coates; David S Watt; Joe Chappell
Journal:  J Biol Chem       Date:  2012-12-14       Impact factor: 5.157

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.