Literature DB >> 15459287

Adjust quality scores from alignment and improve sequencing accuracy.

Ming Li1, Magnus Nordborg, Lei M Li.   

Abstract

In shotgun sequencing, statistical reconstruction of a consensus from alignment requires a model of measurement error. Churchill and Waterman proposed one such model and an expectation-maximization (EM) algorithm to estimate sequencing error rates for each assembly matrix. Ewing and Green defined Phred quality scores for base-calling from sequencing traces by training a model on a large amount of data. However, sample preparations and sequencing machines may work under different conditions in practice and therefore quality scores need to be adjusted. Moreover, the information given by quality scores is incomplete in the sense that they do not describe error patterns. We observe that each nucleotide base has its specific error pattern that varies across the range of quality values. We develop models of measurement error for shotgun sequencing by combining the two perspectives above. We propose a logistic model taking quality scores as covariates. The model is trained by a procedure combining an EM algorithm and model selection techniques. The training results in calibration of quality values and leads to a more accurate construction of consensus. Besides Phred scores obtained from ABI sequencers, we apply the same technique to calibrate quality values that come along with Beckman sequencers.

Entities:  

Mesh:

Year:  2004        PMID: 15459287      PMCID: PMC521663          DOI: 10.1093/nar/gkh850

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  5 in total

1.  The accuracy of DNA sequences: estimating sequence quality.

Authors:  G A Churchill; M S Waterman
Journal:  Genomics       Date:  1992-09       Impact factor: 5.736

Review 2.  Assessing the quality of the DNA sequence from the Human Genome Project.

Authors:  A Felsenfeld; J Peterson; J Schloss; M Guyer
Journal:  Genome Res       Date:  1999-01       Impact factor: 9.043

3.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment.

Authors:  B Ewing; L Hillier; M C Wendl; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

4.  Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors:  B Ewing; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

5.  The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences.

Authors:  J Parkhill; B W Wren; K Mungall; J M Ketley; C Churcher; D Basham; T Chillingworth; R M Davies; T Feltwell; S Holroyd; K Jagels; A V Karlyshev; S Moule; M J Pallen; C W Penn; M A Quail; M A Rajandream; K M Rutherford; A H van Vliet; S Whitehead; B G Barrell
Journal:  Nature       Date:  2000-02-10       Impact factor: 49.962

  5 in total
  15 in total

1.  Inference of population genetic parameters in metagenomics: a clean look at messy data.

Authors:  Philip L F Johnson; Montgomery Slatkin
Journal:  Genome Res       Date:  2006-09-05       Impact factor: 9.043

2.  Efficient frequency-based de novo short-read clustering for error trimming in next-generation sequencing.

Authors:  Wei Qu; Shin-Ichi Hashimoto; Shinichi Morishita
Journal:  Genome Res       Date:  2009-05-13       Impact factor: 9.043

3.  SEME: a fast mapper of Illumina sequencing reads with statistical evaluation.

Authors:  Shijian Chen; Anqi Wang; Lei M Li
Journal:  J Comput Biol       Date:  2013-11       Impact factor: 1.479

4.  ComB: SNP calling and mapping analysis for color and nucleotide space platforms.

Authors:  Tade Souaiaia; Zach Frazier; Ting Chen
Journal:  J Comput Biol       Date:  2011-05-12       Impact factor: 1.479

5.  PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.

Authors:  Peizhou Liao; Glen A Satten; Yi-Juan Hu
Journal:  Genet Epidemiol       Date:  2017-05-31       Impact factor: 2.135

6.  Next generation sequencing technologies and the changing landscape of phage genomics.

Authors:  Jochen Klumpp; Derrick E Fouts; Shanmuga Sozhamannan
Journal:  Bacteriophage       Date:  2012-07-01

7.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

8.  Error and error mitigation in low-coverage genome assemblies.

Authors:  Melissa J Hubisz; Michael F Lin; Manolis Kellis; Adam Siepel
Journal:  PLoS One       Date:  2011-02-14       Impact factor: 3.240

9.  Next generation sequence analysis and computational genomics using graphical pipeline workflows.

Authors:  Federica Torri; Ivo D Dinov; Alen Zamanyan; Sam Hobel; Alex Genco; Petros Petrosyan; Andrew P Clark; Zhizhong Liu; Paul Eggert; Jonathan Pierce; James A Knowles; Joseph Ames; Carl Kesselman; Arthur W Toga; Steven G Potkin; Marquis P Vawter; Fabio Macciardi
Journal:  Genes (Basel)       Date:  2012-08-30       Impact factor: 4.096

10.  FadE: whole genome methylation analysis for multiple sequencing platforms.

Authors:  Tade Souaiaia; Zheng Zhang; Ting Chen
Journal:  Nucleic Acids Res       Date:  2012-09-10       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.