| Literature DB >> 27585926 |
Claudia Perea1, Juan Fernando De La Hoz1, Daniel Felipe Cruz1,2, Juan David Lobaton1, Paulo Izquierdo1, Juan Camilo Quintero1,3, Bodo Raatz1, Jorge Duitama4.
Abstract
BACKGROUND: Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data.Entities:
Keywords: Bioinformatics; GBS; NGSEP; SNP calling; Sequencing
Mesh:
Year: 2016 PMID: 27585926 PMCID: PMC5009557 DOI: 10.1186/s12864-016-2827-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1NGSEP wizard. One step wizard to obtain population variability datasets
Fig. 2MAF and H distributions. Statistics on filtered SNPs obtained running the four discovery pipelines compared in this study on the K family GBS data. a Distribution of observed heterozygosity b MAF distribution in SNPs useful to build a genetic map (categories 2 and 3, see Methods for details), c MAF Distribution on highly heterozygous SNPs (category 4), and d Percentage of filtered SNPs useful to build a genetic map that appear at the filtered (upper chart), and unfiltered (lower chart) datasets obtained running each method
Fig. 3Quality assessment for cassava F1 families. Top figures: Number of genotype calls in SNPs classified in the categories that are useful to build a genetic map (C2 and C3, see Methods for details) contrasted with the number of segregation errors identified in such categories in a the K family and d the NxA family. Middle figures: Number of genotype calls in SNPs segregating the two parents (C4) contrasted with the number of (false) homozygous genotypes called in SNPs catalogued in this category in b the K family and e the NxA family. Bottom figures: Number of genotype calls in SNPs classified in the categories C2 and C3 contrasted with the number of genotyping errors identified in SNPs predicted to be monomorphic in c the K family and f the NxA family. For each pipeline the dots represent datapoints obtained filtering genotype calls at different minimum quality scores. Values in all figures are thousands of genotype calls
Fig. 4Quality assessment for the bean MAGIC population. a Total number of genotype calls obtained from sequencing data for the bean MAGIC population contrasted with the number of heterozygous genotype calls. For each pipeline the dots represent datapoints obtained filtering genotype calls at different minimum quality scores. b Total number of SNPs obtained in the same experiments as a function of the number of SNPs with observed heterozygosity larger than 0.05. c Distribution of observed heterozygosity for datasets obtained with the four pipelines compared in this study. d Distribution of imputed genotype calls for different datasets obtained with NGSEP and imputed with NGSEP and with Beagle. The green line represents the percentage of the total dataset that imputed genotype calls represent for each dataset