Literature DB >> 31057068

Fast and memory efficient approach for mapping NGS reads to a reference genome.

Sanjeev Kumar1, Suneeta Agarwal1.   

Abstract

New generation sequencing machines: Illumina and Solexa can generate millions of short reads from a given genome sequence on a single run. Alignment of these reads to a reference genome is a core step in Next-generation sequencing data analysis such as genetic variation and genome re-sequencing etc. Therefore there is a need of a new approach, efficient with respect to memory as well as time to align these enormous reads with the reference genome. Existing techniques such as MAQ, Bowtie, BWA, BWBBLE, Subread, Kart, and Minimap2 require huge memory for whole reference genome indexing and reads alignment. Gapped alignment versions of these techniques are also 20-40% slower than their respective normal versions. In this paper, an efficient approach: WIT for reference genome indexing and reads alignment using Burrows-Wheeler Transform (BWT) and Wavelet Tree (WT) is proposed. Both exact and approximate alignments are possible by it. Experimental work shows that the proposed approach WIT performs the best in case of protein sequence indexing. For indexing, the reference genome space required by WIT is 0.6 N (N is the size of reference genome) whereas existing techniques BWA, Subread, Kart, and Minimap2 require space in between 1.25 N to 5 N. Experimentally, it is also observed that even using such small index size alignment time of proposed approach is comparable in comparison to BWA, Subread, Kart, and Minimap2. Other alignment parameters accuracy and confidentiality are also experimentally shown to be better than Minimap2. The source code of the proposed approach WIT is available at http://www.algorithm-skg.com/wit/home.html .

Keywords:  Indexing; burrows wheeler transform; genome; read alignment; suffix array; wavelet tree

Mesh:

Substances:

Year:  2019        PMID: 31057068     DOI: 10.1142/S0219720019500082

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  3 in total

Review 1.  FFPE-Based NGS Approaches into Clinical Practice: The Limits of Glory from a Pathologist Viewpoint.

Authors:  Filippo Cappello; Valentina Angerilli; Giada Munari; Carlotta Ceccon; Marianna Sabbadin; Fabio Pagni; Nicola Fusco; Umberto Malapelle; Matteo Fassan
Journal:  J Pers Med       Date:  2022-05-05

2.  Identification of Candidate Genes Conferring Cold Tolerance to Rice (Oryza sativa L.) at the Bud-Bursting Stage Using Bulk Segregant Analysis Sequencing and Linkage Mapping.

Authors:  Luomiao Yang; Lei Lei; Peng Li; Jingguo Wang; Chao Wang; Fan Yang; Jiahui Chen; HuaLong Liu; Hongliang Zheng; Wei Xin; Detang Zou
Journal:  Front Plant Sci       Date:  2021-03-11       Impact factor: 5.753

3.  Combining QTL-seq and linkage mapping to fine map a candidate gene in qCTS6 for cold tolerance at the seedling stage in rice.

Authors:  Luomiao Yang; Jingguo Wang; Zhenghong Han; Lei Lei; Hua Long Liu; Hongliang Zheng; Wei Xin; Detang Zou
Journal:  BMC Plant Biol       Date:  2021-06-19       Impact factor: 4.215

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.