Literature DB >> 27899574

Uniclust databases of clustered and deeply annotated protein sequences and alignments.

Milot Mirdita1, Lars von den Driesch1,2, Clovis Galiez1, Maria J Martin2, Johannes Söding3, Martin Steinegger4,5,6.   

Abstract

We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering. Uniclust sequences are annotated with matches to Pfam, SCOP domains, and proteins in the PDB, using our HHblits homology detection tool. Due to its high sensitivity, Uniclust contains 17% more Pfam domain annotations than UniProt. Uniboost MSAs of three diversities are built by enriching the Uniclust30 MSAs with local sequence matches from MMseqs2 profile searches through Uniclust30. All databases can be downloaded from the Uniclust server at uniclust.mmseqs.com. Users can search clusters by keywords and explore their MSAs, taxonomic representation, and annotations. Uniclust is updated every two months with the new UniProt release.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2016        PMID: 27899574      PMCID: PMC5614098          DOI: 10.1093/nar/gkw1081

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  22 in total

1.  RSDB: representative protein sequence databases have high information content.

Authors:  J Park; L Holm; A Heger; C Chothia
Journal:  Bioinformatics       Date:  2000-05       Impact factor: 6.937

2.  Announcing the worldwide Protein Data Bank.

Authors:  Helen Berman; Kim Henrick; Haruki Nakamura
Journal:  Nat Struct Biol       Date:  2003-12

3.  Sequence clustering strategies improve remote homology recognitions while reducing search times.

Authors:  Weizhong Li; Lukasz Jaroszewski; Adam Godzik
Journal:  Protein Eng       Date:  2002-08

4.  D³: Data-Driven Documents.

Authors:  Michael Bostock; Vadim Ogievetsky; Jeffrey Heer
Journal:  IEEE Trans Vis Comput Graph       Date:  2011-12       Impact factor: 4.579

5.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Authors:  Michael Remmert; Andreas Biegert; Andreas Hauser; Johannes Söding
Journal:  Nat Methods       Date:  2011-12-25       Impact factor: 28.547

6.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

Review 7.  Ortholog identification in the presence of domain architecture rearrangement.

Authors:  Kimmen Sjölander; Ruchira S Datta; Yaoqing Shen; Grant M Shoffner
Journal:  Brief Bioinform       Date:  2011-06-28       Impact factor: 11.622

8.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors:  Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal:  Mol Syst Biol       Date:  2011-10-11       Impact factor: 11.429

9.  MycoCosm portal: gearing up for 1000 fungal genomes.

Authors:  Igor V Grigoriev; Roman Nikitin; Sajeet Haridas; Alan Kuo; Robin Ohm; Robert Otillar; Robert Riley; Asaf Salamov; Xueling Zhao; Frank Korzeniewski; Tatyana Smirnova; Henrik Nordberg; Inna Dubchak; Igor Shabalov
Journal:  Nucleic Acids Res       Date:  2013-12-01       Impact factor: 16.971

10.  The Pfam protein families database: towards a more sustainable future.

Authors:  Robert D Finn; Penelope Coggill; Ruth Y Eberhardt; Sean R Eddy; Jaina Mistry; Alex L Mitchell; Simon C Potter; Marco Punta; Matloob Qureshi; Amaia Sangrador-Vegas; Gustavo A Salazar; John Tate; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2015-12-15       Impact factor: 16.971

View more
  122 in total

1.  Deep-learning contact-map guided protein structure prediction in CASP13.

Authors:  Wei Zheng; Yang Li; Chengxin Zhang; Robin Pearce; S M Mortuza; Yang Zhang
Journal:  Proteins       Date:  2019-08-14

2.  Distance-based protein folding powered by deep learning.

Authors:  Jinbo Xu
Journal:  Proc Natl Acad Sci U S A       Date:  2019-08-09       Impact factor: 11.205

3.  Accurate Annotation of Microbial Metagenomic Genes and Identification of Core Sets.

Authors:  Chiara Vanni
Journal:  Methods Mol Biol       Date:  2021

4.  Driven to near-experimental accuracy by refinement via molecular dynamics simulations.

Authors:  Lim Heo; Collin F Arbour; Michael Feig
Journal:  Proteins       Date:  2019-06-24

5.  High-accuracy protein structures by combining machine-learning with physics-based refinement.

Authors:  Lim Heo; Michael Feig
Journal:  Proteins       Date:  2019-11-15

6.  DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins.

Authors:  Chengxin Zhang; Wei Zheng; S M Mortuza; Yang Li; Yang Zhang
Journal:  Bioinformatics       Date:  2020-04-01       Impact factor: 6.937

7.  Improved protein structure prediction using potentials from deep learning.

Authors:  Andrew W Senior; Richard Evans; John Jumper; James Kirkpatrick; Laurent Sifre; Tim Green; Chongli Qin; Augustin Žídek; Alexander W R Nelson; Alex Bridgland; Hugo Penedones; Stig Petersen; Karen Simonyan; Steve Crossan; Pushmeet Kohli; David T Jones; David Silver; Koray Kavukcuoglu; Demis Hassabis
Journal:  Nature       Date:  2020-01-15       Impact factor: 49.962

8.  Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13.

Authors:  Yang Li; Chengxin Zhang; Eric W Bell; Dong-Jun Yu; Yang Zhang
Journal:  Proteins       Date:  2019-08-22

9.  LOMETS2: improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins.

Authors:  Wei Zheng; Chengxin Zhang; Qiqige Wuyun; Robin Pearce; Yang Li; Yang Zhang
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

10.  Functions of Essential Genes and a Scale-Free Protein Interaction Network Revealed by Structure-Based Function and Interaction Prediction for a Minimal Genome.

Authors:  Chengxin Zhang; Wei Zheng; Micah Cheng; Gilbert S Omenn; Peter L Freddolino; Yang Zhang
Journal:  J Proteome Res       Date:  2021-01-04       Impact factor: 4.466

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.