Literature DB >> 24195707

SEME: a fast mapper of Illumina sequencing reads with statistical evaluation.

Shijian Chen1, Anqi Wang, Lei M Li.   

Abstract

Mapping reads to a reference genome is a routine yet computationally intensive task in research based on high-throughput sequencing. In recent years, the sequencing reads of the Illumina platform have become longer and their quality scores higher. According to our calculation, this allows perfect k-mer seed match for almost all reads when a close reference genome is available subject to reasonable specificity. Our other observation is that the majority reads contain at most one short INDEL polymorphism. Based on these observations, we propose a fast-mapping approach, referred to as "SEME," which has two core steps: First it scans a read sequentially in a specific order for a k-mer exact match seed; next it extends the alignment on both sides allowing, at most, one short INDEL each using a novel method called "auto-match function." We decompose the evaluation of the sensitivity and specificity into two parts corresponding to the seed and extension step, and the composite result provides an approximate overall reliability estimate of each mapping. We compare SEME with some existing mapping methods on several datasets, and SEME shows better performance in terms of both running time and mapping rates.

Entities:  

Mesh:

Year:  2013        PMID: 24195707      PMCID: PMC3822393          DOI: 10.1089/cmb.2013.0111

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  14 in total

1.  Distributional regimes for the number of k-word matches between two random sequences.

Authors:  Ross A Lippert; Haiyan Huang; Michael S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2002-10-08       Impact factor: 11.205

2.  Adjust quality scores from alignment and improve sequencing accuracy.

Authors:  Ming Li; Magnus Nordborg; Lei M Li
Journal:  Nucleic Acids Res       Date:  2004-09-30       Impact factor: 16.971

3.  SOAP2: an improved ultrafast tool for short read alignment.

Authors:  Ruiqiang Li; Chang Yu; Yingrui Li; Tak-Wah Lam; Siu-Ming Yiu; Karsten Kristiansen; Jun Wang
Journal:  Bioinformatics       Date:  2009-06-03       Impact factor: 6.937

4.  Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors:  Heng Li; Jue Ruan; Richard Durbin
Journal:  Genome Res       Date:  2008-08-19       Impact factor: 9.043

5.  Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors:  B Ewing; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

6.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

7.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

8.  An initial map of insertion and deletion (INDEL) variation in the human genome.

Authors:  Ryan E Mills; Christopher T Luttig; Christine E Larkins; Adam Beauchamp; Circe Tsui; W Stephen Pittard; Scott E Devine
Journal:  Genome Res       Date:  2006-08-10       Impact factor: 9.043

9.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

Authors:  Ben Langmead; Cole Trapnell; Mihai Pop; Steven L Salzberg
Journal:  Genome Biol       Date:  2009-03-04       Impact factor: 13.583

10.  SOAP: short oligonucleotide alignment program.

Authors:  Ruiqiang Li; Yingrui Li; Karsten Kristiansen; Jun Wang
Journal:  Bioinformatics       Date:  2008-01-28       Impact factor: 6.937

View more
  4 in total

1.  Short Read Mapping: An Algorithmic Tour.

Authors:  Stefan Canzar; Steven L Salzberg
Journal:  Proc IEEE Inst Electr Electron Eng       Date:  2015-09-07       Impact factor: 10.961

2.  RegScaf: a regression approach to scaffolding.

Authors:  Mengtian Li; Lei M Li
Journal:  Bioinformatics       Date:  2022-05-13       Impact factor: 6.931

Review 3.  Technology dictates algorithms: recent developments in read alignment.

Authors:  Mohammed Alser; Jeremy Rotman; Onur Mutlu; Serghei Mangul; Dhrithi Deshpande; Kodi Taraszka; Huwenbo Shi; Pelin Icer Baykal; Harry Taegyun Yang; Victor Xue; Sergey Knyazev; Benjamin D Singer; Brunilda Balliu; David Koslicki; Pavel Skums; Alex Zelikovsky; Can Alkan
Journal:  Genome Biol       Date:  2021-08-26       Impact factor: 13.583

4.  Performance difference of graph-based and alignment-based hybrid error correction methods for error-prone long reads.

Authors:  Anqi Wang; Kin Fai Au
Journal:  Genome Biol       Date:  2020-01-17       Impact factor: 13.583

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.