Literature DB >> 19620098

Benchmarking homology detection procedures with low complexity filters.

Kristoffer Forslund1, Erik L L Sonnhammer.   

Abstract

BACKGROUND: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences.
RESULTS: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed.
CONCLUSION: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated. AVAILABILITY: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/benchmark.tar.gz

Mesh:

Substances:

Year:  2009        PMID: 19620098     DOI: 10.1093/bioinformatics/btp446

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  Gentle masking of low-complexity sequences improves homology search.

Authors:  Martin C Frith
Journal:  PLoS One       Date:  2011-12-19       Impact factor: 3.240

2.  eggNOG v4.0: nested orthology inference across 3686 organisms.

Authors:  Sean Powell; Kristoffer Forslund; Damian Szklarczyk; Kalliopi Trachana; Alexander Roth; Jaime Huerta-Cepas; Toni Gabaldón; Thomas Rattei; Chris Creevey; Michael Kuhn; Lars J Jensen; Christian von Mering; Peer Bork
Journal:  Nucleic Acids Res       Date:  2013-12-01       Impact factor: 16.971

3.  Benchmarking the next generation of homology inference tools.

Authors:  Ganapathi Varma Saripella; Erik L L Sonnhammer; Kristoffer Forslund
Journal:  Bioinformatics       Date:  2016-06-01       Impact factor: 6.937

4.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis.

Authors:  Gabriel Ostlund; Thomas Schmitt; Kristoffer Forslund; Tina Köstler; David N Messina; Sanjit Roopra; Oliver Frings; Erik L L Sonnhammer
Journal:  Nucleic Acids Res       Date:  2009-11-05       Impact factor: 16.971

5.  The challenge of increasing Pfam coverage of the human proteome.

Authors:  Jaina Mistry; Penny Coggill; Ruth Y Eberhardt; Antonio Deiana; Andrea Giansanti; Robert D Finn; Alex Bateman; Marco Punta
Journal:  Database (Oxford)       Date:  2013-04-19       Impact factor: 3.451

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.