| Literature DB >> 26115486 |
Solen Rocher1, Martine Jean2, Yves Castonguay1, François Belzile2.
Abstract
Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids.Entities:
Mesh:
Year: 2015 PMID: 26115486 PMCID: PMC4482585 DOI: 10.1371/journal.pone.0131918
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
PCR primers with respective Tm (°C) for the amplification of genome regions of M. sativa covering SNP loci identified with GBS.
| Localisation Mt4.0C | GBS SNP loci | Predicted size (bp) | Observed size (bp) | F Primer | R Primer | Tm (°C) |
|---|---|---|---|---|---|---|
| chr2:10386991 10387501 | TP67636 | 511 | 500 |
|
| 60 |
| chr7:22184910 22185498 | TP7278 | 587 | 587 |
|
| 59 |
| chr2:44865228 44865752 | TP80194 TP79240 | 525 | 520 |
|
| 60 |
| chr4:40624202 40624738 | TP91313 | 537 | 520 |
|
| 60 |
| chr1:15783990 15783448 | TP32628 | 543 | 520 |
|
| 63 |
| chr3:38223672 38224218 | TP47889 | 547 | 546 |
|
| 65 |
| chr2:11380116 11380658 | TP61949 TP14949 | 543 | 550 |
|
| 59 |
| chr1:38893124 38893685 | TP31029 | 562 | 560 |
|
| 65 |
| chr7:30091020 30091543 | TP46847 | 550 | 550 |
|
| 59 |
| chr7:46607532 46608181 | TP17289 | 563 | 550 |
|
| 53 |
| chr5:8617244 8617764 | TP1933 TP26408 | 520 | 550 |
|
| 59 |
Predicted and observed size of amplified fragments and their location on the Medicago truncatula genome (v4.0) are indicated.
Summary of GBS analysis of alfalfa samples with UNEAK in the complete 96 samples dataset and the subset of 72 samples with > 1million reads.
| 96 samples | 72 samples | ||
|---|---|---|---|
|
|
| 396,675,286 | |
|
|
| 371,308,770 | 363,877,063 |
|
| 15,467,219 | 15,199,465 | |
|
| 1,899,657 | 1,867,892 | |
|
| 645,553 | 636,705 | |
|
| 97,508 | 95,775 | |
|
| 73,437 | 72,438 | |
|
| 7,438 | 11,694 | |
Summary statistics of GBS analysis of two alfalfa populations with UNEAK using whole dataset-level filters and genotype-level filters .
| Whole dataset-level filters | Genotype level-filters | ||||
|---|---|---|---|---|---|
| ATF0 | ATF5 | ATF0 | ATF5 | ||
|
| 11,694 | 2,732 | |||
|
| 35 | 37 | 35 | 37 | |
|
| 19,967,228 | 18,548,856 | 12,469,118 | 11,701,875 | |
|
| 570,492 | 501,320 | 358,053 | 317,715 | |
|
| 1,707 | 1,586 | 4,587 | 4,303 | |
|
|
| 0.78 | 0.79 | 0.67 | 0.67 |
|
| 0.22 | 0.21 | 0.33 | 0.33 | |
|
| 0.26 | 0.27 | 0.27 | 0.28 | |
The total number of reads in ATF0 and ATF5 populations, mean counts of reads per sample and SNP loci are reported. Homozygous, heterozygous and missing genotypes frequencies were calculated for each population. (Additional descriptive statistics are presented in S2 Table).
(1) 72 samples with > 1million reads, MAF>0.05, mnC>0.5;
(2) 72 samples with > 1million reads, MAF>0.05, mnC>0.5, RC≥ 11 for homozygous genotypes, RC≥2 reads of each allele (A1 and A2) for heterozygous genotypes, 0.1 ≤RCA1/RCA1+A2 ≤ 0.9.
Fig 1Observed frequency distribution of total read counts (A1+A2) supporting each genotype call in ATF0 and ATF5 populations.
Genotype calls (5%) supported by >200 reads are not shown.
GBS SNP loci with significant homology (E value < 1 x 108) with Medicago truncatula reference genome (v4.0).
| Nb Hit | Chromosome | Size (bp) | Nb SNP loci |
|---|---|---|---|
| 1 | Mt1 | 52,787,282 | 949 |
| Mt2 | 45,459,969 | 748 | |
| Mt3 | 55,424,720 | 853 | |
| Mt4 | 56,509,316 | 967 | |
| Mt5 | 43,527,414 | 682 | |
| Mt6 | 34,898,058 | 241 | |
| Mt7 | 48,921,887 | 704 | |
| Mt8 | 45,078,774 | 720 | |
| Scaffolds | 88 | ||
| >1 | 1110 | ||
| Total | Located | 7,062 | |
| Unlocated | 4,632 |
Counts of single hits on individual chromosomes and number of individual SNP with multiple hits are shown. Number of located and unlocated SNP loci among 11,694 SNP loci are also indicated.
Fig 2Number of 454 sequences retained for each of 11 amplicons covering 14 SNP loci identified with UNEAK in eight plant samples.
Results of 454 sequencing of regions covering 14 SNP loci identified with UNEAK.
| Tag Pair | TP67636 | TP7278 | TP80194 TP79240 | TP91313 | TP32628 | TP47889 | TP61949 TP14949 | TP31029 | TP46847 | TP17289 | TP1933 TP26408 | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Nb of 454 sequences | 11,172 | 9,647 | 7,960 | 7,939 | 6,055 | 4,521 | 3,759 | 3,591 | 3, 266 | 2,412 | 362 | |||
| % of 454 sequences supporting GBS SNP loci | 48% | 40% | 53% | 68% | 8% | 78% | 59% | 75% | 70% | 83% | 37% | |||
| 15% | 21% | 5% | 24% | 19% | 20% | |||||||||
| SNP / 400 bp | 31 | 43 | 8 | 12 | 16 | 6 | 6 | 8 | 9 | 5 | 4 | |||
| Total 454 haplotypes | 8 | 9 | 9 | 10 | 7 | 11 | 6 | 10 | 7 | 11 | 5 | 7 | 8 | 8 |
| GBS-like haplotypes | 7 | 5 | 8 | 10 | 5 | 2 | 6 | 10 | 6 | 9 | 5 | 7 | 7 | 8 |
| Maximum GBS-like haplotype / plant sample | 4 | 2 | 5 | 3 | 1 | 3 | 4 | 5 | 4 | 4 | 4 | |||
Number of 454 sequences retained for each amplicon and proportion of those sequences with a perfect 64 bp alignment with the GBS read are presented. The number of SNP in the first 400 bp of 454 sequences are also indicated. Total number of haplotypes and maximum number of haplotypes in individual plant samples were determined using SNPs present in 400 bp.
(1) Contains both SNP loci
(2) Contains one single SNP locus
(3) Haplotypes with 64bp identity with GBS alleles
Fig 3Haplotypes identified with 454 sequences covering TP91313 in eight plant samples.
Haplotypes defined with 454 sequences with perfect and imperfect match with GBS 64bp sequence are listed separately. Position of SNPs is based on location on M. truncatula reference sequence. SNPs included in UNEAK TP are highlighted in bold. RC of GBS alleles (A1 and A2) and 454 sequences covering each haplotype in the eight genotyped plant samples are indicated. Cumulative number of A1 like and A2 like reads are also presented. SNPs with RC ≥ 5% in individual plant samples were used to define haplotypes. Haplotypes with frequency < 5% in all individual plant samples are not indicated but total read counts supporting those other haplotypes are reported. Haplotypes corresponding to each of the 14 GBS loci are presented in S3 Table.
Fig 4Example of comparison of GBS and 454 sequencing of TP61949 in eight plant samples.
A) GBS and 454 read counts of each allele (A1|A2); B) predicted tetraploid allelic ratios (convergent ratios in green and discordant ratios in red); C) bi-allelic predicted genotype (A1, A2 and H) before genotype-level filtration and D) after genotype-level filtration of GBS data for minimum read counts (11 reads for homozygous genotypes, 2 reads of each allele for heterozygous genotypes, 0.1 as minimum minor allele frequency). Genotype calls showing concordance (green), discordance (red for GBS homozygotes and orange for GBS heterozygotes) with both sequencing methods or that are missing (white) before and after genotype-level filtration for minimum read counts. A complete representation of validation results for 14 SNP loci in eight plant samples is provided in S3 Fig.
Observed consistency of genotype calls obtained with GBS and 454 sequencing of 14 SNP loci in eight plant samples.
| Status | GBS | 454 | Tetraploid genotype calls | Diploid genotype calls before correction | Diploid genotype calls after correction |
|---|---|---|---|---|---|
| Concordant | Homozygous | 45 | 44 | 36 | |
| Heterozygous | 12 | 41 | 37 | ||
| Total | 57 (51%) | 85 (75%) | 73 (65%) | ||
| Discordant | Homozygous | Heterozygous | 22 | 17 | 4 |
| Heterozygous | Homozygous | 7 | 8 | 8 | |
| Heterozygous (different ratio) | 24 | - | - | ||
| Total | 53 (47%) | 25 (23%) | 12 (11%) | ||
| Missing | 2 (2%) | 2 (2%) | 27 (24%) | ||
The number of homozygous and heterozygous genotype calls showing concordance, discordance with both sequencing methods or that are missing are indicated for tetraploid allelic dosage and diploid genotype call before and after genotype level filtration. Percentages of consistent and discordant observations are indicated in brackets.
Fig 53D representation of PCA of 72 plant samples (35 ATF0 and 37 ATF5) genotyped with two SNP loci datasets
A) 72 plant samples genotyped with 11,694 SNP loci and B) 72 plant samples genotyped with 2,732 SNP loci retained after genotype-level filtration for minimum read counts C) Cumulative proportion of variance explained by the first three components in the two SNP loci datasets.