Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Uniclust databases of clustered and deeply annotated protein sequences and alignments.

Literature DB >> 27899574

Uniclust databases of clustered and deeply annotated protein sequences and alignments.

Milot Mirdita¹, Lars von den Driesch^1,2, Clovis Galiez¹, Maria J Martin², Johannes Söding³, Martin Steinegger^4,5,6.

Abstract

We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering. Uniclust sequences are annotated with matches to Pfam, SCOP domains, and proteins in the PDB, using our HHblits homology detection tool. Due to its high sensitivity, Uniclust contains 17% more Pfam domain annotations than UniProt. Uniboost MSAs of three diversities are built by enriching the Uniclust30 MSAs with local sequence matches from MMseqs2 profile searches through Uniclust30. All databases can be downloaded from the Uniclust server at uniclust.mmseqs.com. Users can search clusters by keywords and explore their MSAs, taxonomic representation, and annotations. Uniclust is updated every two months with the new UniProt release.

Entities: Disease Gene Species

Mesh：

Year: 2016 PMID： 27899574 PMCID： PMC5614098 DOI： 10.1093/nar/gkw1081

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

22 in total

1. RSDB: representative protein sequence databases have high information content.

Authors: J Park; L Holm; A Heger; C Chothia
Journal: Bioinformatics Date: 2000-05 Impact factor: 6.937

2. Announcing the worldwide Protein Data Bank.

Authors: Helen Berman; Kim Henrick; Haruki Nakamura
Journal: Nat Struct Biol Date: 2003-12

3. Sequence clustering strategies improve remote homology recognitions while reducing search times.

Authors: Weizhong Li; Lukasz Jaroszewski; Adam Godzik
Journal: Protein Eng Date: 2002-08

4. D³: Data-Driven Documents.

Authors: Michael Bostock; Vadim Ogievetsky; Jeffrey Heer
Journal: IEEE Trans Vis Comput Graph Date: 2011-12 Impact factor: 4.579

5. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Authors: Michael Remmert; Andreas Biegert; Andreas Hauser; Johannes Söding
Journal: Nat Methods Date: 2011-12-25 Impact factor: 28.547

6. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

Review 7. Ortholog identification in the presence of domain architecture rearrangement.

Authors: Kimmen Sjölander; Ruchira S Datta; Yaoqing Shen; Grant M Shoffner
Journal: Brief Bioinform Date: 2011-06-28 Impact factor: 11.622

8. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors: Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal: Mol Syst Biol Date: 2011-10-11 Impact factor: 11.429

9. MycoCosm portal: gearing up for 1000 fungal genomes.

Authors: Igor V Grigoriev; Roman Nikitin; Sajeet Haridas; Alan Kuo; Robin Ohm; Robert Otillar; Robert Riley; Asaf Salamov; Xueling Zhao; Frank Korzeniewski; Tatyana Smirnova; Henrik Nordberg; Inna Dubchak; Igor Shabalov
Journal: Nucleic Acids Res Date: 2013-12-01 Impact factor: 16.971

10. The Pfam protein families database: towards a more sustainable future.

Authors: Robert D Finn; Penelope Coggill; Ruth Y Eberhardt; Sean R Eddy; Jaina Mistry; Alex L Mitchell; Simon C Potter; Marco Punta; Matloob Qureshi; Amaia Sangrador-Vegas; Gustavo A Salazar; John Tate; Alex Bateman
Journal: Nucleic Acids Res Date: 2015-12-15 Impact factor: 16.971

122 in total

1. Deep-learning contact-map guided protein structure prediction in CASP13.

Authors: Wei Zheng; Yang Li; Chengxin Zhang; Robin Pearce; S M Mortuza; Yang Zhang
Journal: Proteins Date: 2019-08-14

2. Distance-based protein folding powered by deep learning.

Authors: Jinbo Xu
Journal: Proc Natl Acad Sci U S A Date: 2019-08-09 Impact factor: 11.205

3. Accurate Annotation of Microbial Metagenomic Genes and Identification of Core Sets.

Authors: Chiara Vanni
Journal: Methods Mol Biol Date: 2021

4. Driven to near-experimental accuracy by refinement via molecular dynamics simulations.

Authors: Lim Heo; Collin F Arbour; Michael Feig
Journal: Proteins Date: 2019-06-24

5. High-accuracy protein structures by combining machine-learning with physics-based refinement.

Authors: Lim Heo; Michael Feig
Journal: Proteins Date: 2019-11-15

6. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins.

Authors: Chengxin Zhang; Wei Zheng; S M Mortuza; Yang Li; Yang Zhang
Journal: Bioinformatics Date: 2020-04-01 Impact factor: 6.937

7. Improved protein structure prediction using potentials from deep learning.

Authors: Andrew W Senior; Richard Evans; John Jumper; James Kirkpatrick; Laurent Sifre; Tim Green; Chongli Qin; Augustin Žídek; Alexander W R Nelson; Alex Bridgland; Hugo Penedones; Stig Petersen; Karen Simonyan; Steve Crossan; Pushmeet Kohli; David T Jones; David Silver; Koray Kavukcuoglu; Demis Hassabis
Journal: Nature Date: 2020-01-15 Impact factor: 49.962

8. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13.

Authors: Yang Li; Chengxin Zhang; Eric W Bell; Dong-Jun Yu; Yang Zhang
Journal: Proteins Date: 2019-08-22

9. LOMETS2: improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins.

Authors: Wei Zheng; Chengxin Zhang; Qiqige Wuyun; Robin Pearce; Yang Li; Yang Zhang
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

10. Functions of Essential Genes and a Scale-Free Protein Interaction Network Revealed by Structure-Based Function and Interaction Prediction for a Minimal Genome.

Authors: Chengxin Zhang; Wei Zheng; Micah Cheng; Gilbert S Omenn; Peter L Freddolino; Yang Zhang
Journal: J Proteome Res Date: 2021-01-04 Impact factor: 4.466