Literature DB >> 2062834

Molecular sequence accuracy and the analysis of protein coding regions.

D J States1, D Botstein.   

Abstract

Molecular sequences, like all experimental data, have finite error rates. The impact of errors on the information content of molecular sequence data is dependent on the analytic paradigm used to interpret the data. We studied the impact of nucleic acid sequence errors on the ability to align predicted amino acid sequences with the sequences of related proteins. We found that with a simultaneous translation and alignment algorithm, identification of sequence homologies is resilient to the introduction of random errors. Proteins with greater than 30% sequence identity can be reliably recognized even in the presence of 1% frameshifting (insertion or deletion) error rates and 5% base substitution rates. Incorporation of prior knowledge about the location and characteristics of errors improves tolerance to error of amino acid sequence alignments. Similarly, inclusion of prior knowledge of biased codon utilization by yeast (Saccharomyces cerevisiae) allows reliable detection of correct reading frames in yeast sequences even in the presence of 5% substitution and 1% frameshift errors.

Entities:  

Mesh:

Substances:

Year:  1991        PMID: 2062834      PMCID: PMC51908          DOI: 10.1073/pnas.88.13.5518

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  14 in total

1.  Sequencing of megabase plus DNA by hybridization: theory of the method.

Authors:  R Drmanac; I Labat; I Brukner; R Crkvenjakov
Journal:  Genomics       Date:  1989-02       Impact factor: 5.736

2.  Cloning of the proteinase that facilitates infection by schistosome parasites.

Authors:  G R Newport; J H McKerrow; R Hedstrom; M Petitt; L McGarrigle; P J Barr; N Agabian
Journal:  J Biol Chem       Date:  1988-09-15       Impact factor: 5.157

3.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

4.  Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs.

Authors:  T Ikemura
Journal:  J Mol Biol       Date:  1982-07-15       Impact factor: 5.469

5.  Establishing homologies in protein sequences.

Authors:  M O Dayhoff; W C Barker; L T Hunt
Journal:  Methods Enzymol       Date:  1983       Impact factor: 1.600

6.  Sequencing end-labeled DNA with base-specific chemical cleavages.

Authors:  A M Maxam; W Gilbert
Journal:  Methods Enzymol       Date:  1980       Impact factor: 1.600

7.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

8.  Codon catalog usage is a genome strategy modulated for gene expressivity.

Authors:  R Grantham; C Gautier; M Gouy; M Jacobzone; R Mercier
Journal:  Nucleic Acids Res       Date:  1981-01-10       Impact factor: 16.971

9.  Recognition of protein coding regions in DNA sequences.

Authors:  J W Fickett
Journal:  Nucleic Acids Res       Date:  1982-09-11       Impact factor: 16.971

10.  DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Effect of pyrophosphorolysis and metal ions.

Authors:  S Tabor; C C Richardson
Journal:  J Biol Chem       Date:  1990-05-15       Impact factor: 5.157

View more
  11 in total

1.  Finding errors in DNA sequences.

Authors:  J Posfai; R J Roberts
Journal:  Proc Natl Acad Sci U S A       Date:  1992-05-15       Impact factor: 11.205

2.  Corruption of genomic databases with anomalous sequence.

Authors:  E D Lamperti; J M Kittelberger; T F Smith; L Villa-Komaroff
Journal:  Nucleic Acids Res       Date:  1992-06-11       Impact factor: 16.971

3.  Sequence alignment by cross-correlation.

Authors:  Alan L Rockwood; David K Crockett; James R Oliphant; Kojo S J Elenitoba-Johnson
Journal:  J Biomol Tech       Date:  2005-12

4.  A hidden Markov model that finds genes in E. coli DNA.

Authors:  A Krogh; I S Mian; D Haussler
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

5.  PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames.

Authors:  E Birney; J D Thompson; T J Gibson
Journal:  Nucleic Acids Res       Date:  1996-07-15       Impact factor: 16.971

6.  Assignment of position-specific error probability to primary DNA sequence data.

Authors:  C B Lawrence; V V Solovyev
Journal:  Nucleic Acids Res       Date:  1994-04-11       Impact factor: 16.971

7.  Highly improved homopolymer aware nucleotide-protein alignments with 454 data.

Authors:  Fredrik Lysholm
Journal:  BMC Bioinformatics       Date:  2012-09-12       Impact factor: 3.169

8.  Error and error mitigation in low-coverage genome assemblies.

Authors:  Melissa J Hubisz; Michael F Lin; Manolis Kellis; Adam Siepel
Journal:  PLoS One       Date:  2011-02-14       Impact factor: 3.240

9.  Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST.

Authors:  E Michael Gertz; Yi-Kuo Yu; Richa Agarwala; Alejandro A Schäffer; Stephen F Altschul
Journal:  BMC Biol       Date:  2006-12-07       Impact factor: 7.431

10.  Having a BLAST with bioinformatics (and avoiding BLASTphemy).

Authors:  A Pertsemlidis; J W Fondon
Journal:  Genome Biol       Date:  2001-09-27       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.