Literature DB >> 24002111

A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads.

Kaname Kojima1, Naoki Nariai, Takahiro Mimori, Mamoru Takahashi, Yumi Yamaguchi-Kabata, Yukuto Sato, Masao Nagasaki.   

Abstract

MOTIVATION: Variant calling from genome-wide sequencing data is essential for the analysis of disease-causing mutations and elucidation of disease mechanisms. However, variant calling in low coverage regions is difficult due to sequence read errors and mapping errors. Hence, variant calling approaches that are robust to low coverage data are demanded.
RESULTS: We propose a new variant calling approach that considers pedigree information and haplotyping based on sequence reads spanning two or more heterozygous positions termed phase informative reads. In our approach, genotyping and haplotyping by the assignment of each read to a haplotype based on phase informative reads are simultaneously performed. Therefore, positions with low evidence for heterozygosity are rescued by phase informative reads, and such rescued positions contribute to haplotyping in a synergistic way. In addition, pedigree information supports more accurate haplotyping as well as genotyping, especially in low coverage regions. Although heterozygous positions are useful for haplotyping, homozygous positions are not informative and weaken the information from heterozygous positions, as majority of positions are homozygous. Thus, we introduce latent variables that determine zygosity at each position to filter out homozygous positions for haplotyping. In performance evaluation with a parent-offspring trio sequencing data, our approach outperforms existing approaches in accuracy on the agreement with single nucleotide polymorphism array genotyping results. Also, performance analysis considering distance between variants showed that the use of phase informative reads is effective for accurate variant calling, and further performance improvement is expected with longer sequencing data. CONTACT: kojima@megabank.tohoku.ac.jp .

Mesh:

Year:  2013        PMID: 24002111     DOI: 10.1093/bioinformatics/btt503

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  4 in total

1.  FamPipe: An Automatic Analysis Pipeline for Analyzing Sequencing Data in Families for Disease Studies.

Authors:  Ren-Hua Chung; Wei-Yun Tsai; Chen-Yu Kang; Po-Ju Yao; Hui-Ju Tsai; Chia-Hsiang Chen
Journal:  PLoS Comput Biol       Date:  2016-06-06       Impact factor: 4.475

2.  Short tandem repeat number estimation from paired-end reads for multiple individuals by considering coalescent tree.

Authors:  Kaname Kojima; Yosuke Kawai; Naoki Nariai; Takahiro Mimori; Takanori Hasegawa; Masao Nagasaki
Journal:  BMC Genomics       Date:  2016-08-31       Impact factor: 3.969

3.  STR-realigner: a realignment method for short tandem repeat regions.

Authors:  Kaname Kojima; Yosuke Kawai; Kazuharu Misawa; Takahiro Mimori; Masao Nagasaki
Journal:  BMC Genomics       Date:  2016-12-03       Impact factor: 3.969

4.  geck: trio-based comparative benchmarking of variant calls.

Authors:  Péter Kómár; Deniz Kural
Journal:  Bioinformatics       Date:  2018-10-15       Impact factor: 6.937

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.