Literature DB >> 31317184

Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D.

Minglei Yang1, Wenliang Zhang1, Guocai Yao1, Haiyue Zhang1, Weizhong Li1,2,3.   

Abstract

Iterative homology search has been widely used in identification of remotely related proteins. Our previous study has found that the query-seeded sequence iterative search can reduce homologous over-extension errors and greatly improve selectivity. However, iterative homology search remains challenging in protein functional prediction. More sensitive scoring models are highly needed to improve the predictive performance of the alignment methods, and alignment annotation with better visualization has also become imperative for result interpretation. Here we report an open-source application PSISearch2D that runs query-seeded iterative sequence search for remotely related protein detection. PSISearch2D retrieves domain annotation from Pfam, UniProtKB, CDD and PROSITE for resulting hits and demonstrates combined domain and sequence alignments in novel visualizations. A scoring model called C-value is newly defined to re-order hits with consideration of the combination of sequence and domain alignments. The benchmarking on the use of C-value indicates that PSISearch2D outperforms the original PSISearch2 tool in terms of both accuracy and specificity. PSISearch2D improves the characterization of unknown proteins in remote protein detection. Our evaluation tests show that PSISearch2D has provided annotation for 77 695 of 139 503 unknown bacteria proteins and 140 751 of 352 757 unknown virus proteins in UniProtKB, about 2.3-fold and 1.8-fold more characterization than the original PSISearch2, respectively. Together with advanced features of auto-iteration mode to handle large-scale data and optional programs for global and local sequence alignments, PSISearch2D enhances remotely related protein search.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31317184      PMCID: PMC6637259          DOI: 10.1093/database/baz092

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


  28 in total

1.  On counting position weight matrix matches in a sequence, with application to discriminative motif finding.

Authors:  Saurabh Sinha
Journal:  Bioinformatics       Date:  2006-07-15       Impact factor: 6.937

2.  Hidden Markov model speed heuristic and iterative HMM search procedure.

Authors:  L Steven Johnson; Sean R Eddy; Elon Portugaly
Journal:  BMC Bioinformatics       Date:  2010-08-18       Impact factor: 3.169

3.  RefProtDom: a protein database with improved domain boundaries and homology relationships.

Authors:  Mileidy W Gonzalez; William R Pearson
Journal:  Bioinformatics       Date:  2010-08-06       Impact factor: 6.937

4.  Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.

Authors:  W R Pearson
Journal:  Genomics       Date:  1991-11       Impact factor: 5.736

5.  New and continuing developments at PROSITE.

Authors:  Christian J A Sigrist; Edouard de Castro; Lorenzo Cerutti; Béatrice A Cuche; Nicolas Hulo; Alan Bridge; Lydie Bougueleret; Ioannis Xenarios
Journal:  Nucleic Acids Res       Date:  2012-11-17       Impact factor: 16.971

6.  Accelerated Profile HMM Searches.

Authors:  Sean R Eddy
Journal:  PLoS Comput Biol       Date:  2011-10-20       Impact factor: 4.475

7.  InterProScan 5: genome-scale protein function classification.

Authors:  Philip Jones; David Binns; Hsin-Yu Chang; Matthew Fraser; Weizhong Li; Craig McAnulla; Hamish McWilliam; John Maslen; Alex Mitchell; Gift Nuka; Sebastien Pesseat; Antony F Quinn; Amaia Sangrador-Vegas; Maxim Scheremetjew; Siew-Yit Yong; Rodrigo Lopez; Sarah Hunter
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

8.  InterPro in 2017-beyond protein family and domain annotations.

Authors:  Robert D Finn; Teresa K Attwood; Patricia C Babbitt; Alex Bateman; Peer Bork; Alan J Bridge; Hsin-Yu Chang; Zsuzsanna Dosztányi; Sara El-Gebali; Matthew Fraser; Julian Gough; David Haft; Gemma L Holliday; Hongzhan Huang; Xiaosong Huang; Ivica Letunic; Rodrigo Lopez; Shennan Lu; Aron Marchler-Bauer; Huaiyu Mi; Jaina Mistry; Darren A Natale; Marco Necci; Gift Nuka; Christine A Orengo; Youngmi Park; Sebastien Pesseat; Damiano Piovesan; Simon C Potter; Neil D Rawlings; Nicole Redaschi; Lorna Richardson; Catherine Rivoire; Amaia Sangrador-Vegas; Christian Sigrist; Ian Sillitoe; Ben Smithers; Silvano Squizzato; Granger Sutton; Narmada Thanki; Paul D Thomas; Silvio C E Tosatto; Cathy H Wu; Ioannis Xenarios; Lai-Su Yeh; Siew-Yit Young; Alex L Mitchell
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

9.  UniProt: the universal protein knowledgebase.

Authors: 
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

10.  ECO, the Evidence & Conclusion Ontology: community standard for evidence information.

Authors:  Michelle Giglio; Rebecca Tauber; Suvarna Nadendla; James Munro; Dustin Olley; Shoshannah Ball; Elvira Mitraka; Lynn M Schriml; Pascale Gaudet; Elizabeth T Hobbs; Ivan Erill; Deborah A Siegele; James C Hu; Chris Mungall; Marcus C Chibucos
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.