| Literature DB >> 26818073 |
Nicholas A Tinker1, Wubishet A Bekele2, Jiro Hattori2.
Abstract
Genotyping-by-sequencing (GBS), and related methods, are based on high-throughput short-read sequencing of genomic complexity reductions followed by discovery of single nucleotide polymorphisms (SNPs) within sequence tags. This provides a powerful and economical approach to whole-genome genotyping, facilitating applications in genomics, diversity analysis, and molecular breeding. However, due to the complexity of analyzing large data sets, applications of GBS may require substantial time, expertise, and computational resources. Haplotag, the novel GBS software described here, is freely available, and operates with minimal user-investment on widely available computer platforms. Haplotag is unique in fulfilling the following set of criteria: (1) operates without a reference genome; (2) can be used in a polyploid species; (3) provides a discovery mode, and a production mode; (4) discovers polymorphisms based on a model of tag-level haplotypes within sequenced tags; (5) reports SNPs as well as haplotype-based genotypes; and (6) provides an intuitive visual "passport" for each inferred locus. Haplotag is optimized for use in a self-pollinating plant species.Entities:
Keywords: genotyping-by-sequencing (GBS); haplotype; pipeline; polyploidy; single nucleotide polymorphism (SNP)
Mesh:
Year: 2016 PMID: 26818073 PMCID: PMC4825656 DOI: 10.1534/g3.115.024596
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Flow chart showing input files (green), output files (blue), and dependencies (connecting lines) associated with ‘Haplotag’ GBS discovery software. Default file names are shown in yellow, and are normally appended by “.txt” in the Windows file system. Three alternative pipelines (A, B, and C) are available, with required input labeled for each. The cluster discovery pipeline (A), and the haplotype discovery pipeline (B), start by aligning a complete inventory of tags (A), or a reduced inventory of tags from prior work (B), to produce clusters. In (B), the complete inventory is then realigned against this template to increase the sampling of new haplotypes. A complete tag-by-taxa matrix of tag counts (HTBT) is then formed for all tags belonging to clusters of two or more tags. Other output files are then created based on haplotype model fitting. In the production pipeline, only the files labeled by (C) are required, since genotyping is based on counting copies of haplotype-tags in the output files from previous discovery work.
Figure 2Passport file produced by Haplotag from simulated demonstration files. Here, six tags (potential haplotypes) are identified at the top. After model fitting by population-based filtering, two locus-models are selected. When Haplotag is run in ‘verbose’ mode, the details of model selection are written in a separate file (see File S2). Locus-1 contains three haplotypes and Locus-2 contains two. SNP positions are identified by color. The table at the bottom of the passport shows the tag counts at the presumed haplotypes within each locus. Counts greater than or equal to one are shaded, indicating that they are scored as “present”.
Comparison of GBS data analysis using UNEAK vs. Haplotag software
| Software | Mode | Time (hr:min) | Number of Loci Passing Population Filter at Indicated Genotype Completeness | SNP Loci (80%) Duplicated by Alternate Pipeline | ||||
|---|---|---|---|---|---|---|---|---|
| Haplotype Loci | SNP loci | |||||||
| 50% | 80% | 50% | 80% | Number | Percent | |||
| UNEAK | NA | 6:05 | NA | NA | 12,780 | 4,260 | 4204 | 99% |
| Haplotag | Cluster discovery | 6:54 | 29,421 | 11,950 | 43,378 | 17,117 | 4108 | 24% |
| Haplotag | Production | 0:11 | 24,412 | 7343 | 31,685 | 8872 | NA | NA |
Number and percent of SNP-based loci called by UNEAK that were duplicated by Haplotag at the same 80% filtering level.
Times for Haplotag runs do not include the 6 hr required for generation of tag-count files using UNEAK.
Number and percent of SNP-based loci called by Haplotag that were duplicated by UNEAK at the same 80% filtering level.