Literature DB >> 29474523

Progressive approach for SNP calling and haplotype assembly using single molecular sequencing data.

Fei Guo1, Dan Wang2, Lusheng Wang2,3.   

Abstract

Motivation: Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. The new technologies can provide Single Molecular Sequencing (SMS) data that cover about 90% of positions over chromosomes. However, the SMS data has a higher error rate comparing to 1% error rate for short reads. Thus, it becomes very difficult for SNP calling and haplotype assembly using SMS reads. Most existing technologies do not work properly for the SMS data.
Results: In this paper, we develop a progressive approach for SNP calling and haplotype assembly that works very well for the SMS data. Our method can handle more than 200 million non-N bases on Chromosome 1 with millions of reads, more than 100 blocks, each of which contains more than 2 million bases and more than 3K SNP sites on average. Experiment results show that the false discovery rate and false negative rate for our method are 15.7 and 11.0% on NA12878, and 16.5 and 11.0% on NA24385. Moreover, the overall switch errors for our method are 7.26 and 5.21 with average 3378 and 5736 SNP sites per block on NA12878 and NA24385, respectively. Here, we demonstrate that SMS reads alone can generate a high quality solution for both SNP calling and haplotype assembly. Availability and implementation: Source codes and results are available at https://github.com/guofeieileen/SMRT/wiki/Software.

Entities:  

Mesh:

Year:  2018        PMID: 29474523     DOI: 10.1093/bioinformatics/bty059

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  7 in total

1.  Detecting and phasing minor single-nucleotide variants from long-read sequencing data.

Authors:  Zhixing Feng; Jose C Clemente; Brandon Wong; Eric E Schadt
Journal:  Nat Commun       Date:  2021-05-24       Impact factor: 14.919

2.  Haplotype-aware diplotyping from noisy long reads.

Authors:  Jana Ebler; Marina Haukness; Trevor Pesout; Tobias Marschall; Benedict Paten
Journal:  Genome Biol       Date:  2019-06-03       Impact factor: 13.583

3.  Application of different DNA extraction procedures, library preparation protocols and sequencing platforms: impact on sequencing results.

Authors:  F Pasquali; I Do Valle; F Palma; D Remondini; G Manfreda; G Castellani; R S Hendriksen; A De Cesare
Journal:  Heliyon       Date:  2019-11-01

4.  Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.

Authors:  Peter Edge; Vikas Bansal
Journal:  Nat Commun       Date:  2019-10-11       Impact factor: 14.919

5.  scHaplotyper: haplotype construction and visualization for genetic diagnosis using single cell DNA sequencing data.

Authors:  Zhiqiang Yan; Xiaohui Zhu; Yuqian Wang; Yanli Nie; Shuo Guan; Ying Kuo; Di Chang; Rong Li; Jie Qiao; Liying Yan
Journal:  BMC Bioinformatics       Date:  2020-02-01       Impact factor: 3.169

6.  Gene-Based Testing of Interactions Using XGBoost in Genome-Wide Association Studies.

Authors:  Yingjie Guo; Chenxi Wu; Zhian Yuan; Yansu Wang; Zhen Liang; Yang Wang; Yi Zhang; Lei Xu
Journal:  Front Cell Dev Biol       Date:  2021-12-16

7.  Testing Gene-Gene Interactions Based on a Neighborhood Perspective in Genome-wide Association Studies.

Authors:  Yingjie Guo; Honghong Cheng; Zhian Yuan; Zhen Liang; Yang Wang; Debing Du
Journal:  Front Genet       Date:  2021-12-08       Impact factor: 4.599

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.