| Literature DB >> 31099384 |
Daniel Tello1, Juanita Gil1, Cristian D Loaiza2, John J Riascos2, Nicolás Cardozo1, Jorge Duitama1,3.
Abstract
MOTIVATION: Accurate detection, genotyping and downstream analysis of genomic variants from high-throughput sequencing data are fundamental features in modern production pipelines for genetic-based diagnosis in medicine or genomic selection in plant and animal breeding. Our research group maintains the Next-Generation Sequencing Experience Platform (NGSEP) as a precise, efficient and easy-to-use software solution for these features.Entities:
Mesh:
Year: 2019 PMID: 31099384 PMCID: PMC6853766 DOI: 10.1093/bioinformatics/btz275
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schematic procedure for indel realignment and haplotype clustering. (A) Flowchart with the overview of the process. (B) Voting mechanism to select the most likely start of a given indel. Blue bars represent aligned reads. The numbers on top represent the reads supporting each possible start position. (C) Realignment of reads supporting indel events according to the results of the voting procedure and clustering of candidate haplotypes to discover and genotype small indels. Reads that do not span across the indel (marked with X) are soft-clipped
Fig. 2.Comparison of different tools for variants discovery from reads taken from an F1 pool of segregants derived from two yeast haploid strains. Results are discriminated by variant type (SNVs, Indels and STRs) and gold standard genotype (homozygous variant or heterozygous). The proportion of FPPM is used as a measure of specificity. Curves are obtained varying the filter of minimum genotype quality (GQ field in the VCF file) from 0 to 90
Fig. 3.(A) Comparison of different tools for variants discovery from reads taken from real WGS data of the Hapmap human individual NA12878. Results are discriminated by region type (single copy or repetitive), variant type (SNVs, Indels and STRs) and genotype in the PlatGen gold standard (homozygous or heterozygous). The proportion of FPPM is used as a measure of specificity. Curves are obtained varying the filter of minimum genotype quality (GQ field in the VCF file) from 0 to 90. (B) Runtime in minutes taken by the six tools to analyze the evaluated human samples. WGS, WES1 and WES2 correspond to one WGS and two WES datasets taken from the human Hapmap individual NA12878. SynDip corresponds to the synthetic diploid individual developed by Li
Fig. 4.Comparison of different tools for population variants discovery and genotyping using reads taken from real GBS experiments on two different biparental populations of cassava (left panels) and rice (right panels). In absence of a gold standard, the total number of genotype calls is used as a measure of sensitivity and the number of errors inferred from the population structure is used as a measure of specificity. Curves are obtained varying the filter of minimum genotype quality (GQ field in the VCF file) according to the observed distribution of GQ values for each tool