| Literature DB >> 25887893 |
Koen Herten1,2, Matthew S Hestand3, Joris R Vermeesch4,5, Jeroen K J Van Houdt6,7.
Abstract
BACKGROUND: Massive parallel sequencing is a powerful tool for variant discovery and genotyping. To reduce costs, sequencing of restriction enzyme based reduced representation libraries can be utilized. This technology is generally referred to as Genotyping By Sequencing (GBS). To deal with GBS experimental design and initial processing specific bioinformatic tools are needed.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25887893 PMCID: PMC4359581 DOI: 10.1186/s12859-015-0514-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Library types with in-line barcode. The GBSX demultiplexer can handle barcode only sequences, sequences starting with barcode and restriction enzyme site (RS), and those that also end with RS and/or common adapter.
Figure 2The importance of a good barcode design. This image shows 2 barcodes, completed with the restriction enzyme recognition site of ApeKI. If these barcodes are used for the demultiplexing of GBS or RAD data with the ApeKI enzyme, the software will recognize the correct barcodes (sample) because of the Hamming distance between the barcodes. When the barcodes are used with another or no enzyme, the barcodes will have a different distance. This could result in misdemultiplexing of the reads.
Demultiplexing statistics as an accumulation of ten synthetic data sets, totaling 12,579,549 synthetic reads
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
| |||
| GBSX GBS | Y | 7 | 0.6 | 12,394,916 | 98.53% |
| GBSX RAD | Y | 7 | 0.6 | 12,394,916 | 98.53% |
| GBSX NA | N | 54 | 4.3 | 12,410,223 | 98.65% |
| Stacks process_radtags | Y | 0 | 0.0 | 7,756,281 | 61.66% |
| Stacks process_shortreads | N | 0 | 0.0 | 10,529,102 | 83.70% |
| Sabre | N | 54 | 4.3 | 12,410,223 | 98.65% |
|
|
|
|
|
|
|
|
|
|
| |||
| GBSX GBS | Y | 7 | 0.6 | 12,394,916 | 98.53% |
| GBSX RAD | Y | 7 | 0.6 | 12,394,916 | 98.53% |
| GBSX NA | N | 54 | 4.3 | 12,410,223 | 98.65% |
| Stacks process_radtags | Y | 0 | 0.0 | 7,820,903 | 62.17% |
| Stacks process_shortreads | N | 0 | 0.0 | 10,530,753 | 83.71% |
| Sabre | N | 54 | 4.3 | 12,410,223 | 98.65% |
NA indicates an enzyme was not provided to GBSX for demultiplexing. The RE Site column indicates if the reads did or did not contain the ApeKI restriction enzyme sequence.
Figure 3The tools and their options comparing the percentage of demultiplexed reads and correctly trimmed reads from total reads, for paired-end and single end data. The GBSX option NA indicates an enzyme was not provided for demultiplexing. Sabre does not perform trimming and therefor has no trimming values in the plot.
Trimming statistics on demultiplexed reads from Table
|
|
|
|
|
|
|---|---|---|---|---|
|
|
|
| ||
| GBSX GBS | Y | 12,381,497 | 99.89% | 1 |
| GBSX RAD | Y | 12,313,669 | 99.34% | 7 |
| GBSX NA | N | 12,364,365 | 99.63% | 4 |
| Stacks process_radtags | Y | 7,407,377 | 95.50% | 45 |
| Stacks process_shortreads | N | 7,190,394 | 68.29% | 317 |
|
|
|
|
|
|
|
|
|
| ||
| GBSX GBS | Y | 11,878,172 | 95.83% | 42 |
| GBSX RAD | Y | 10,821,969 | 87.31% | 127 |
| GBSX NA | N | 10,835,239 | 87.31% | 127 |
| Stacks process_radtags | Y | 7,421,337 | 94.89% | 51 |
| Stacks process_shortreads | N | 7,191,440 | 68.29% | 317 |
The RE Site column indicates if the reads did or did not contain the ApeKI restriction enzyme sequence. Sabre does not perform read trimming and is therefor not included in this comparison.