| Literature DB >> 23185354 |
Reeta Sharma1, Benoit Goossens, Célia Kun-Rodrigues, Tatiana Teixeira, Nurzhafarina Othman, Jason Q Boone, Nathaniel K Jue, Craig Obergfell, Rachel J O'Neill, Lounès Chikhi.
Abstract
High throughput sequencing technologies are being applied to an increasing number of model species with a high-quality reference genome. The application and analyses of whole-genome sequence data in non-model species with no prior genomic information are currently under way. Recent sequencing technologies provide new opportunities for gathering genomic data in natural populations, laying the empirical foundation for future research in the field of conservation and population genomics. Here we present the case study of the Bornean elephant, which is the most endangered subspecies of Asian elephant and exhibits very low genetic diversity. We used two different sequencing platforms, the Roche 454 FLX (shotgun) and Illumina, GAIIx (Restriction site associated DNA, RAD) to evaluate the feasibility of the two methodologies for the discovery of de novo markers (single nucleotide polymorphism, SNPs and microsatellites) using low coverage data. Approximately, 6,683 (shotgun) and 14,724 (RAD) SNPs were detected within our elephant sequence dataset. Genotyping of a representative sample of 194 SNPs resulted in a SNP validation rate of ~83 to 94% and 17% of the loci were polymorphic with a low diversity (H(o)=0.057). Different numbers of microsatellites were identified through shotgun (27,226) and RAD (868) techniques. Out of all di-, tri-, and tetra-microsatellite loci, 1,706 loci had sufficient flanking regions (shotgun) while only 7 were found with RAD. All microsatellites were monomorphic in the Bornean but polymorphic in another elephant subspecies. Despite using different sample sizes, and the well known differences in the two platforms used regarding sequence length and throughput, the two approaches showed high validation rate. The approaches used here for marker development in a threatened species demonstrate the utility of high throughput sequencing technologies as a starting point for the development of genomic tools in a non-model species and in particular for a species with low genetic diversity.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23185354 PMCID: PMC3504023 DOI: 10.1371/journal.pone.0049533
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary statistics of data obtained using two different sequencing approaches: Illumina (RAD-seq) and 454 (shotgun).
| Feature | Illumina RAD-sequencing | 454 shotgun sequencing | ||||||||
| samples ( | samples ( | |||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | A | B | |
| Reads (millions) | 1.84 | 2.32 | 2.54 | 2.59 | 2.17 | 1.05 | 2.53 | 1.26 | 0.51 | 0.64 |
| Aligned reads between samples (%) | – | – | – | – | – | – | – | – | 219,157 (19.17%) | |
| Mb of sequence | 221.13 | 278 | 305.01 | 311.12 | 259.92 | 125.74 | 303.47 | 150.95 | – | |
| Total number of RAD tags produced (approximate) | 33,698 | 32,330 | 32,937 | 35,014 | 34,815 | 23,532 | 37,362 | 28,573 | – | |
| Total Mb of sequence after contig construction | – | 2.88 | – | 2.68 | – | – | 2.07 | 1.50 | – | |
| Contigs assembled | – | 10,008 | 10,110 | 10,352 | – | – | 7,918 | 5,461 | 16,857 (>100 bp) | |
| Average contig length (bp) | – | 288 | – | 259 | – | – | 262 | 275 | 328.13 | |
| Average depth of coverage per base of contigs | 12.9x | 8x | ||||||||
| N50 | – | 320 | – | 279 | – | – | 281 | 302 | 815 | |
| Contig length range(bp, min-max) | – | 150–560 | – | 150–457 | – | – | 150–544 | 150–527 | 100–6,407 | |
| Reads mapped to | – | – | – | – | – | – | – | – | 497,169 (97.6%) | 617,930 (97.2%) |
| Putative SNPs | 14,724 | 6,683 | ||||||||
| Total number of homozygotes (monomorphic) | 9,676 | – | ||||||||
| Total number of heterozygotes | 5,048 (34%) | – | ||||||||
| Transitions and transversions(Ts/Tv ratio) | 1.61 | 1.52 | ||||||||
| Candidate loci containing SNPs | 20%(2,100 out of 10,352) | 10%(1,753 out of 16,857) | ||||||||
| Loci suitable for Sequenom assay with >q20 | 518 (24.6%),19 | 1,695 (96.6%), 52 | ||||||||
| SNP density | 0.00081 | 0.00056 | ||||||||
| Validation of SNPs(genotyping success rate acrossfive plexes) | 86–95%(plex1–plex4) | 91%(plex5) | ||||||||
| Polymorphic loci( | 28 ( | 5 ( | ||||||||
| Number of contigs containing microsatellite loci | 837 | 18,195 | ||||||||
| Number of SSRs identified(mono-,di-,tri, and tetra-nucleotides) | 868(844 mono-, and 24 di-nucleotides) | 9,038(18,188,7,241, 1,471, 326) | ||||||||
| Potential amplifiable loci(with ≥3 repeats) | 7 (29%) | 1,706 (18.8%) | ||||||||
Note that the eight elephant samples (1 to 8) used in Illumina RAD-sequencing are different from the two.
samples (A and B) used in 454 shotgun sequencing.
weighted median statistic such that 50% of the entire assembly is contained in the number of contigs equal.
to or greater than this value.
>q20: 0.01% chance that a base was wrongly called.
identified in elephant sample 4.
Figure 1Histogram of contig size distribution.
Length distribution of contigs assembled from 454 shotgun sequencing (A) and RAD-sequencing (B) using Illumina platform. The average contig length in the 454 dataset (n = 2) was 328.13 bp and ranged from 100 to 6,407 bp. The average contig length in the RAD-sequencing dataset (n = 8) was between 259 (sample 4) and 288 bp (sample 2). The RAD contig lengths ranged from 150 to 560 bp.