Literature DB >> 12668763

Enhanced protein domain discovery by using language modeling techniques from speech recognition.

Lachlan Coin1, Alex Bateman, Richard Durbin.   

Abstract

Most modern speech recognition uses probabilistic models to interpret a sequence of sounds. Hidden Markov models, in particular, are used to recognize words. The same techniques have been adapted to find domains in protein sequences of amino acids. To increase word accuracy in speech recognition, language models are used to capture the information that certain word combinations are more likely than others, thus improving detection based on context. However, to date, these context techniques have not been applied to protein domain discovery. Here we show that the application of statistical language modeling methods can significantly enhance domain recognition in protein sequences. As an example, we discover an unannotated Tf_Otx Pfam domain on the cone rod homeobox protein, which suggests a possible mechanism for how the V242M mutation on this protein causes cone-rod dystrophy.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12668763      PMCID: PMC404693          DOI: 10.1073/pnas.0737502100

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  19 in total

1.  Experimental data of a single promoter can be used for in silico detection of genes with related regulation in the absence of sequence similarity.

Authors:  V Gailus-Durner; M Scherf; T Werner
Journal:  Mamm Genome       Date:  2001-01       Impact factor: 2.957

2.  An insight into domain combinations.

Authors:  G Apic; J Gough; S A Teichmann
Journal:  Bioinformatics       Date:  2001       Impact factor: 6.937

Review 3.  Profile hidden Markov models.

Authors:  S R Eddy
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

4.  Interpolated Markov models for eukaryotic gene finding.

Authors:  S L Salzberg; M Pertea; A L Delcher; M J Gardner; H Tettelin
Journal:  Genomics       Date:  1999-07-01       Impact factor: 5.736

5.  Comparative genomics of the eukaryotes.

Authors:  G M Rubin; M D Yandell; J R Wortman; G L Gabor Miklos; C R Nelson; I K Hariharan; M E Fortini; P W Li; R Apweiler; W Fleischmann; J M Cherry; S Henikoff; M P Skupski; S Misra; M Ashburner; E Birney; M S Boguski; T Brody; P Brokstein; S E Celniker; S A Chervitz; D Coates; A Cravchik; A Gabrielian; R F Galle; W M Gelbart; R A George; L S Goldstein; F Gong; P Guan; N L Harris; B A Hay; R A Hoskins; J Li; Z Li; R O Hynes; S J Jones; P M Kuehl; B Lemaitre; J T Littleton; D K Morrison; C Mungall; P H O'Farrell; O K Pickeral; C Shue; L B Vosshall; J Zhang; Q Zhao; X H Zheng; S Lewis
Journal:  Science       Date:  2000-03-24       Impact factor: 47.728

Review 6.  Dominant Leber congenital amaurosis, cone-rod degeneration, and retinitis pigmentosa caused by mutant versions of the transcription factor CRX.

Authors:  C Rivolta; E L Berson; T P Dryja
Journal:  Hum Mutat       Date:  2001-12       Impact factor: 4.878

7.  OTX2 homeodomain protein binds a DNA element necessary for interphotoreceptor retinoid binding protein gene expression.

Authors:  N Bobola; P Briata; C Ilengo; N Rosatto; C Craft; G Corte; R Ravazzolo
Journal:  Mech Dev       Date:  1999-04       Impact factor: 1.882

8.  Promoter region-based classification of genes.

Authors:  P Pavlidis; T S Furey; M Liberto; D Haussler; W N Grundy
Journal:  Pac Symp Biocomput       Date:  2001

9.  Crx, a novel otx-like homeobox gene, shows photoreceptor-specific expression and regulates photoreceptor differentiation.

Authors:  T Furukawa; E M Morrow; C L Cepko
Journal:  Cell       Date:  1997-11-14       Impact factor: 41.582

10.  Crx, a novel Otx-like paired-homeodomain protein, binds to and transactivates photoreceptor cell-specific genes.

Authors:  S Chen; Q L Wang; Z Nie; H Sun; G Lennon; N G Copeland; D J Gilbert; N A Jenkins; D J Zack
Journal:  Neuron       Date:  1997-11       Impact factor: 17.173

View more
  19 in total

1.  Lineage-specific expansion of DNA-binding transcription factor families.

Authors:  Varodom Charoensawan; Derek Wilson; Sarah A Teichmann
Journal:  Trends Genet       Date:  2010-07-31       Impact factor: 11.639

Review 2.  Genomic repertoires of DNA-binding transcription factors across the tree of life.

Authors:  Varodom Charoensawan; Derek Wilson; Sarah A Teichmann
Journal:  Nucleic Acids Res       Date:  2010-07-30       Impact factor: 16.971

3.  Using context to improve protein domain identification.

Authors:  Alejandro Ochoa; Manuel Llinás; Mona Singh
Journal:  BMC Bioinformatics       Date:  2011-03-31       Impact factor: 3.169

4.  The 20 years of PROSITE.

Authors:  Nicolas Hulo; Amos Bairoch; Virginie Bulliard; Lorenzo Cerutti; Béatrice A Cuche; Edouard de Castro; Corinne Lachaize; Petra S Langendijk-Genevaux; Christian J A Sigrist
Journal:  Nucleic Acids Res       Date:  2007-11-14       Impact factor: 16.971

5.  The Pfam protein families database.

Authors:  Alex Bateman; Lachlan Coin; Richard Durbin; Robert D Finn; Volker Hollich; Sam Griffiths-Jones; Ajay Khanna; Mhairi Marshall; Simon Moxon; Erik L L Sonnhammer; David J Studholme; Corin Yeats; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

6.  Discovery of fibrillar adhesins across bacterial species.

Authors:  Vivian Monzon; Aleix Lafita; Alex Bateman
Journal:  BMC Genomics       Date:  2021-07-18       Impact factor: 3.969

7.  Protein domain recurrence and order can enhance prediction of protein functions.

Authors:  Mario Abdel Messih; Meghana Chitale; Vladimir B Bajic; Daisuke Kihara; Xin Gao
Journal:  Bioinformatics       Date:  2012-09-15       Impact factor: 6.937

8.  Automatic peak selection by a Benjamini-Hochberg-based algorithm.

Authors:  Ahmed Abbas; Xin-Bing Kong; Zhi Liu; Bing-Yi Jing; Xin Gao
Journal:  PLoS One       Date:  2013-01-07       Impact factor: 3.240

9.  DBD--taxonomically broad transcription factor predictions: new content and functionality.

Authors:  Derek Wilson; Varodom Charoensawan; Sarah K Kummerfeld; Sarah A Teichmann
Journal:  Nucleic Acids Res       Date:  2007-12-11       Impact factor: 16.971

10.  Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties.

Authors:  Jos Boekhorst; Berend Snel
Journal:  BMC Bioinformatics       Date:  2007-09-21       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.