| Literature DB >> 23527008 |
Xiaowen Sun1, Dongyuan Liu, Xiaofeng Zhang, Wenbin Li, Hui Liu, Weiguo Hong, Chuanbei Jiang, Ning Guan, Chouxian Ma, Huaping Zeng, Chunhua Xu, Jun Song, Long Huang, Chunmei Wang, Junjie Shi, Rui Wang, Xianhu Zheng, Cuiyun Lu, Xiaowu Wang, Hongkun Zheng.
Abstract
Large-scale genotyping plays an important role in genetic association studies. It has provided new opportunities for gene discovery, especially when combined with high-throughput sequencing technologies. Here, we report an efficient solution for large-scale genotyping. We call it specific-locus amplified fragment sequencing (SLAF-seq). SLAF-seq technology has several distinguishing characteristics: i) deep sequencing to ensure genotyping accuracy; ii) reduced representation strategy to reduce sequencing costs; iii) pre-designed reduced representation scheme to optimize marker efficiency; and iv) double barcode system for large populations. In this study, we tested the efficiency of SLAF-seq on rice and soybean data. Both sets of results showed strong consistency between predicted and practical SLAFs and considerable genotyping accuracy. We also report the highest density genetic map yet created for any organism without a reference genome sequence, common carp in this case, using SLAF-seq data. We detected 50,530 high-quality SLAFs with 13,291 SNPs genotyped in 211 individual carp. The genetic map contained 5,885 markers with 0.68 cM intervals on average. A comparative genomics study between common carp genetic map and zebrafish genome sequence map showed high-quality SLAF-seq genotyping results. SLAF-seq provides a high-resolution strategy for large-scale genotyping and can be generally applicable to various species and populations.Entities:
Mesh:
Year: 2013 PMID: 23527008 PMCID: PMC3602454 DOI: 10.1371/journal.pone.0058700
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1SLAF-seq flowchart.
i) Pre-design scheme for SLAF selection using training data. The reduced representation design must be decided based on marker efficiency characteristics, which include random distribution throughout the genome, uniqueness in the genome, and consistent amplification efficiency among selected markers. A pilot experiment was performed to evaluate the amplification efficiency based on the pre-designed scheme. ii) SLAF-seq library construction. Genomic DNA was digested by groups of enzymesdesigned for individuals. Double barcodes were added to two round PCR reactions to discriminate each individual and to facilitate the pooling of samples for size selection, which maintained consistent fragment size among individuals. iii) Deep sequencing for the pooled RRLs with the Illumina paired-end sequencing protocol, and genotype definition and validation by software.
Pilot SLAF-seq data summary in rice and soybeans.
| Genome information | Rice | Soybeans |
| Genome size (Mb) | 382.79 | 950.07 |
| % of repeats in genome | 39.11% | 42.96% |
| GC content | 43.56% | 34.67% |
|
| ||
| Enzymes | HaeIII+MseI | HaeIII+MseI |
| Expected SLAF size range (bp) | 384–454 | 369–439 |
| Expected no. of SLAFs | 21,074 | 76,970 |
| SLAF density per 100 kb | 5.51 | 8.10 |
| Simplification ratio | 0.33% | 0.49% |
| % of SLAFs in repeats | 17.78% | 25.32% |
|
| ||
| Total reads | 546,271 | 1,341,599 |
| Observed no. of SLAFs | 25,433 | 83,055 |
| Matched no. of SLAFs | 19,005 | 61,056 |
| Matched % of SLAFs | 90.18% | 79.32% |
| Reads % in matched SLAFs | 82.48% | 65.86% |
| SLAF average depth | 18.95 | 12.11 |
| SLAFs per 100 kb | 6.64 | 8.74 |
| Simplification ratio | 0.40% | 0.52% |
| % of SLAFs in repeats | 19.49% | 25.34% |
Figure 2Pilot SLAF-seq data analysis using rice and soybeans.
(a)and(b) Insert size distribution of SLAFs. SLAF length was found to cluster tightly around a mean of 430 bp, with 85% of SLAFs in the centermost 50 bp. (c) and (d)Distribution of SLAFs on the chromosomes. SLAFs were evenly distributed on the chromosomes in rice and soybeans. The gap in the middle was caused by the absence of centromere sequences. (e)and(f) Customized SLAF density design. In the rice pilot case, the density was designed using 20 kb per SLAF. In soybeans, 40 kb per SLAF was used. Both rice and soybean pilot SLAF data were found to be consistent with theoretical predictions.
SLAF-seq data summary for common carp F1 population.
|
| |
| No. of reads | 103,800,295 |
| Reads in high quality SLAFs | 48,192,694 |
| Reads in repeated SLAFs | 29,591,938 |
| Reads in low depth SLAFs | 26,015,663 |
|
| |
| No. SLAFs | 50,530 |
| Average SLAF depth | 954 |
| Average depth in parents | 52.37 |
| Average depth in individuals | 4.99 |
|
| |
| No. of polymorphic SLAFs | 10,662 |
| Average depth in parents | 57.33 |
| Average. depth in individuals | 4.28 |
| No. of SNPs | 13,291 |
| No. of InDels | 1,483 |
| SNP ratio per kb | 4.38 |
| InDel ratio per kb | 0.49 |
Figure 3Genotyping quality in common F1 population.
The genotyping quality score was used to select qualified markers and individuals for subsequent analysis. This is a dynamic optimization process. We counted low-quality markers for each SLAF marker and for each individual, and deleted the worst marker or individual. We repeated this process, deleting an individual or a marker each time until the average genotyping quality score of all SLAF markers reached the cutoff value, which was 13. (a)Detailed genotyping quality of SLAF-seqdata. (b) Cumulative quality score distribution of 7559 markers in 166 individuals.
Figure 4Genetic map validation by recombination mapping.
Each two rows represent a genome in a CP population including 211 progenies and 2 parents. Columns correspond to chromosomes. Red and blue shading indicate maternal or paternal haplotype, respectively. Pink shading indicates ambiguous haplotypes, and grey shading indicates missing data. Only 1.51% of the markers were found in small recombination blocks.
Figure 5Comparative study between common carp linkage map and zebrafish chromosomes.
Two common carp linkage groups corresponded to one zebrafish chromosome.