| Literature DB >> 28049422 |
Davoud Torkamaneh1,2, Jérôme Laroche2, Maxime Bastien1,2, Amina Abed1,2, François Belzile3,4.
Abstract
BACKGROUND: Next-generation sequencing (NGS) technologies have accelerated considerably the investigation into the composition of genomes and their functions. Genotyping-by-sequencing (GBS) is a genotyping approach that makes use of NGS to rapidly and economically scan a genome. It has been shown to allow the simultaneous discovery and genotyping of thousands to millions of SNPs across a wide range of species. For most users, the main challenge in GBS is the bioinformatics analysis of the large amount of sequence information derived from sequencing GBS libraries in view of calling alleles at SNP loci. Herein we describe a new GBS bioinformatics pipeline, Fast-GBS, designed to provide highly accurate genotyping, to require modest computing resources and to offer ease of use.Entities:
Keywords: Bioinformatics pipeline; GBS; Genotype accuracy; NGS; SNP
Mesh:
Substances:
Year: 2017 PMID: 28049422 PMCID: PMC5210301 DOI: 10.1186/s12859-016-1431-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
List of species genotyped using a GBS approach and analyzed using Fast-GBS. For the three different species used in this work, relevant characteristics (ploidy, genome size, reproduction mode and chromosome number) influencing GBS analysis are shown
| Name | Species | Ploidy | Genome size (Mb) | Mode of reproduction | Number of chromosomes |
|---|---|---|---|---|---|
| Soybean |
| Paleotetraploid | 1,100 | Selfing | 20 [ |
| Barley |
| Diploid | 5,300 | Selfing | 7 [ |
| Potato |
| Autotetraploid | 844 | Clonal | 12 [ |
Fig. 1Schematic representation of the analytical steps in the Fast-GBS pipeline. The main steps in the analytical process are indicated in the central portion of the diagram, while the different software tools used are indicated to the left and inputs and outputs of each step to the right
Number of variants detected among 24 soybean, barley, and potato samples. The sequencing platform, number of reads, filtering options, and genotype accuracy for each dataset are also provided
| Filtering optionsa | ||||||||
|---|---|---|---|---|---|---|---|---|
| Name | Sequencing platform | Restriction enzyme | Number of reads | minNR | MinMAF | MaxMD (%) | Number of variants | Accuracy (%) |
| Soybean | Illumina |
| 42 M | 2 | 0.04 | 80 | 35 k | 98.7 |
| Barley | Ion Torrent |
| 72 M | 2 | 0.04 | 80 | 32 k | 95.2 |
| Potato | Illumina |
| 43 M | 11 | 0.04 | 20 | 38 k | 94.0 |
aFiltering options: minNR minimum number of reads to call a variant (depth), MinMAF minimum minor allele frequency, and MaxMD maximum missing data allowed