| Literature DB >> 30526480 |
Yun Xia1, Wei Luo1,2, Siqi Yuan1,3, Yuchi Zheng1, Xiaomao Zeng4.
Abstract
BACKGROUND: Even though microsatellite loci frequently have been isolated using recently developed next-generation sequencing (NGS) techniques, this task is still difficult because of the subsequent polymorphism screening requires a substantial amount of time. Selecting appropriate polymorphic microsatellites is a critical issue for ecological and evolutionary studies. However, the extent to which assembly strategy, read length, sequencing depth, and library layout produce a measurable effect on microsatellite marker development remains unclear. Here, we use six frog species for genome skimming and two frog species for transcriptome sequencing to develop microsatellite markers, and investigate the effect of different isolation strategies on the yield of microsatellites.Entities:
Keywords: Amphibians; Genome assembly; MiSeq; Next-generation sequencing; Population genetics; Transcriptome
Mesh:
Year: 2018 PMID: 30526480 PMCID: PMC6286531 DOI: 10.1186/s12864-018-5329-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Overview of the sequencing data from Illumina next-generation sequencing for each species of frog
| NGS Source | Species | Total sequence reads (Million) | Total No. of bases (Gigabase) |
|---|---|---|---|
| Genomica |
| 7.01 | 4.21 |
|
| 10.19 | 6.11 | |
|
| 6.93 | 4.16 | |
|
| 6.63 | 3.98 | |
|
| 8.28 | 4.97 | |
|
| 7.28 | 4.37 | |
| Transcriptomicb |
| 52.28 | 5.04 |
|
| 73.25 | 7.32 |
a Genomic samples were sequenced using the Illumina Miseq (PE300) system
b Transcriptomic samples were sequenced using the Illumina Hiseq 2000 (PE125) system
Fig. 1Statistics and assembly results for genome skimming of six frog species using SOAPdenovo2 with different k-mer sizes and Trinity. a Total number of contigs; b Average length of contigs; c Total number of microsatellites isolated using each assembler; d Maximum repeat region length of microsatellites for each assembler; e Mean repeat region length of microsatellite for each assembler
Assembly and microsatellite loci detection statistics for transcriptome sequencing using Trinity
|
|
| |
|---|---|---|
| Total number of contigs | 121,416 | 278,803 |
| Average length of contigs | 958 | 919 |
| N50 | 1953 | 2080 |
| Total number of SSRs | 958 | 1554 |
| Max RRLa | 25 | 32 |
| Mean RRL | 16 | 15.7 |
| Dinucleotide (%) | 716 (74.74) | 1174 (75.55) |
| Trinucleotide (%) | 226 (23.59) | 364 (23.42) |
| Tetranucleotide (%) | 15 (1.57) | 14 (0.91) |
| Pentanucleotide (%) | 1 (0.10) | 1 (0.06) |
| Hexanucleotide (%) | 0 | 1 (0.06) |
a RRL repeat region length, is the length of repeat units
Effects of data quantity on microsatellite discovery based on genomic data (k-mer size = 65)
|
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Full | 1/2 | 1/4 | 1/8 | 1/16 | Full | 1/2 | 1/4 | 1/8 | 1/16 | |
| Data in megabases (Mb) | 5200 | 2600 | 1300 | 650 | 327 | 4700 | 2400 | 1200 | 580 | 294 |
| Microsatellite loci | 11,116 | 6511 | 3665 | 1929 | 1043 | 17,942 | 10,907 | 6317 | 3448 | 1883 |
| Max RRLa (bp) | 64 | 64 | 60 | 60 | 60 | 66 | 66 | 65 | 65 | 63 |
| Loci with RRL > 40 bp | 241 | 138 | 74 | 36 | 23 | 940 | 561 | 345 | 196 | 118 |
| Mean RRL | 18.8 | 18.8 | 18.7 | 18.7 | 18.6 | 21.4 | 21.3 | 21.3 | 21.3 | 21.2 |
| Dinucleotides | 8794 | 5171 | 2871 | 1517 | 836 | 11,527 | 7123 | 4215 | 2272 | 1289 |
| Trinucleotides | 1640 | 943 | 550 | 300 | 145 | 4891 | 2832 | 1575 | 889 | 447 |
| Tetranucleotides | 609 | 358 | 219 | 98 | 55 | 1361 | 854 | 460 | 266 | 140 |
| Pentanucleotides | 55 | 29 | 18 | 11 | 5 | 127 | 76 | 54 | 18 | 6 |
| Hexannucleotides | 18 | 10 | 7 | 3 | 2 | 36 | 22 | 13 | 3 | 1 |
a RRL repeat region length, is the length of repeat units
Fig. 2Relation of the k-mer sizes to the data quantity (a-d) and to the read lengths (e, f) from genomic datasets. a Total number of contigs from data quantity simulation of Amolops mantzorum; b Total number of identified microsatellites from data quantity simulation of A. mantzorum; c Total number of contigs from data quantity simulation of Quasipaa boulengeri; d Total number of identified microsatellites from data quantity simulation of Q. boulengeri; e Total number of contigs from read length simulation of A. chunganensis; f Total number of identified microsatellites from read length simulation of A. chunganensis
Fig. 3Effects of read length on microsatellite development from genomic datasets of Amolops chunganensis (k-mer size = 65). a The shorter the read length, the fewer the microsatellite loci identified; b The number of dinucleotides, trinucleotides and tetranucleotides identified using different read lengths
Fig. 4Comparison of the polymorphism between transcriptomic and genomic microsatellite loci within Amolops mantzorum and Quasipaa boulengeri. a The number of alleles (Na); b observed heterozygosity (Ho)
Fig. 5Relation of the number of repeat units to the polymorphism for genomic and transcriptomic microsatellite loci in two frog species. a Amolops mantzorum; b Quasipaa boulengeri