| Literature DB >> 25281214 |
Jeremy C Andersen1, Nicholas J Mills.
Abstract
BACKGROUND: Illumina sequencing with its high number of reads and low per base pair cost is an attractive technology for development of molecular resources for non-model organisms. While many software packages have been developed to identify short tandem repeats (STRs) from next-generation sequencing data, these methods do not inform the investigator as to whether or not candidate loci are polymorphic in their target populations.Entities:
Mesh:
Year: 2014 PMID: 25281214 PMCID: PMC4195870 DOI: 10.1186/1471-2164-15-858
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
STR results from
| A) Phobos | B) MSATCOMMANDER | C) iMSAT | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| repeats | di | tri | tetra | penta | di | tri | tetra | penta | di | tri | tetra | penta |
| 5 | 4132 | 1612 | 262 | 38 | 5338 | 3496 | 657 | 44 | 772 | 500 | 93 | 18 |
| 6 | 3737 | 3317 | 683 | 38 | 2090 | 1898 | 218 | 4 | 1718 | 837 | 145 | 8 |
| 7 | 1751 | 1788 | 229 | 6 | 1243 | 980 | 59 | 2 | 1181 | 717 | 80 | 1 |
| 8 | 1104 | 958 | 64 | 2 | 762 | 417 | 28 | 1 | 765 | 426 | 29 | 1 |
| 9 | 616 | 379 | 29 | 1 | 411 | 151 | 10 | 0 | 410 | 178 | 9 | 0 |
| 10 | 355 | 133 | 12 | 0 | 240 | 46 | 9 | 0 | 228 | 72 | 1 | 0 |
| 11 | 194 | 43 | 8 | 0 | 134 | 19 | 2 | 0 | 177 | 41 | 2 | 0 |
| 12 | 105 | 20 | 3 | 0 | 60 | 22 | 0 | 0 | 129 | 17 | 1 | 0 |
| 13 | 52 | 18 | 0 | 0 | 29 | 4 | 0 | 0 | 92 | 5 | 0 | 0 |
| 14 | 19 | 5 | 1 | 0 | 13 | 10 | 1 | 0 | 64 | 9 | 0 | 0 |
| 15 | 15 | 9 | 0 | 0 | 15 | 5 | 6 | 0 | 65 | 5 | 0 | 0 |
| 16 | 7 | 7 | 8 | 0 | 3 | 3 | 2 | 0 | 46 | 1 | 0 | 0 |
| 17 | 3 | 4 | 0 | 0 | 1 | 4 | 0 | 0 | 51 | 5 | 0 | 0 |
| 18 | 1 | 3 | 1 | 0 | 5 | 7 | 0 | 0 | 20 | 0 | 0 | 0 |
| 19 | 5 | 6 | 0 | 0 | 3 | 16 | 0 | 0 | 39 | 0 | 0 | 0 |
| 20 | 3 | 17 | 0 | 0 | 1 | 34 | 0 | 0 | 36 | 0 | 0 | 0 |
| 21+ | 16 | 41 | 0 | 0 | 16 | 7 | 0 | 0 | 124 | 1 | 0 | 0 |
| SUM | 12115 | 8360 | 1300 | 85 | 10363 | 7119 | 992 | 51 | 5917 | 2814 | 360 | 28 |
| Percent | 55.4 | 38.2 | 5.9 | 0.3 | 55.9 | 38.4 | 5.4 | 0.3 | 64.9 | 30.9 | 3.9 | 0.3 |
Results comparing the total numbers of discovered repeats for each pattern type (di, tri, tetra, or penta) using Phobos, MSATCOMMANDER, and iMSAT. The total numbers of repeats for each pattern are summed, and presented as a percentage of total repeats found using each software program.
STR results from
| A) Phobos | B) MSATCOMMANDER | C) iMSAT | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| repeats | di | tri | tetra | penta | di | tri | tetra | penta | di | tri | tetra | penta |
| 5 | 21729 | 7100 | 347 | 29 | 35052 | 16556 | 890 | 33 | 9 | 8 | 3 | 0 |
| 6 | 12739 | 4282 | 123 | 1 | 22177 | 9949 | 326 | 4 | 39 | 7 | 14 | 0 |
| 7 | 8973 | 2628 | 37 | 1 | 16641 | 5961 | 133 | 3 | 63 | 12 | 13 | 0 |
| 8 | 7009 | 1546 | 12 | 1 | 13107 | 3231 | 65 | 2 | 149 | 30 | 11 | 0 |
| 9 | 5416 | 848 | 6 | 0 | 9805 | 1580 | 38 | 3 | 275 | 25 | 4 | 0 |
| 10 | 3860 | 389 | 6 | 1 | 6850 | 754 | 26 | 0 | 344 | 12 | 10 | 0 |
| 11 | 2746 | 193 | 2 | 0 | 4844 | 382 | 11 | 0 | 260 | 6 | 5 | 0 |
| 12 | 1890 | 97 | 2 | 0 | 3282 | 173 | 4 | 0 | 244 | 2 | 2 | 0 |
| 13 | 1283 | 39 | 2 | 0 | 2403 | 95 | 9 | 0 | 164 | 2 | 4 | 0 |
| 14 | 937 | 14 | 2 | 0 | 1875 | 50 | 9 | 0 | 133 | 5 | 0 | 0 |
| 15 | 793 | 12 | 1 | 0 | 1518 | 32 | 7 | 0 | 100 | 1 | 0 | 0 |
| 16 | 709 | 3 | 1 | 0 | 1433 | 14 | 5 | 0 | 81 | 0 | 0 | 0 |
| 17 | 608 | 4 | 0 | 0 | 1205 | 9 | 1 | 0 | 84 | 0 | 0 | 0 |
| 18 | 577 | 3 | 0 | 0 | 1283 | 8 | 0 | 0 | 53 | 0 | 0 | 0 |
| 19 | 626 | 2 | 0 | 0 | 1228 | 10 | 0 | 0 | 44 | 0 | 0 | 0 |
| 20 | 604 | 5 | 0 | 0 | 1211 | 2 | 0 | 0 | 35 | 0 | 0 | 0 |
| 21+ | 12028 | 24 | 0 | 0 | 16100 | 18 | 0 | 0 | 125 | 0 | 0 | 0 |
| SUM | 82527 | 17189 | 541 | 33 | 146877 | 38824 | 1524 | 45 | 2202 | 110 | 66 | 0 |
| Percent | 82.3 | 17.1 | 0.5 | 0.03 | 78.4 | 20.7 | 0.8 | 0.02 | 92.6 | 4.6 | 2.8 | 0 |
Results comparing the total numbers of discovered repeats for each pattern type (di, tri, tetra, or penta) using Phobos, MSATCOMMANDER, and iMSAT. The total numbers of repeats for each pattern are summed, and presented as a percentage of total repeats found using each software program.
Characteristics of the 15 and 12 polymorphic STRs isolated from and
| Locus | Repeat motif | Fragment lengths |
|
| P HWE | GenBank accession |
|---|---|---|---|---|---|---|
|
| ||||||
| TpMSAT01 | (ATC)14–18 | 366 – 378 | 57 | 5 | 0.260 | KC477413 |
| TpMSAT02 | (ATC)6–20 | 345 – 387 | 57 | 7 | 0.918 | KC477414 |
| TpMSAT04 | (CGA)4–10 | 475 – 493 | 57 | 7 | 0.742 | KC477415 |
| TpMSAT05 | (TGA)15–18 | 330 – 336 | 57 | 3 | 0.037 | KC477416 |
| TpMSAT07 | (CAG)4–19 | 322 – 370 | 57 | 6 | 0.324 | KC477417 |
| TpMSAT08 | (GAC)5–10 | 305 – 320 | 57 | 5 | 0.808 | KC477418 |
| TpMSAT09 | (TAC)3–9 | 294 – 312 | 57 | 5 | 0.066 | KC477419 |
| TpMSAT10 | (GCT)2–8 | 396 – 414 | 57 | 7 | 0.093 | KC477420 |
| TpMSAT11 | (TCA)4–10 | 300 – 336 | 50 | 7 | 0.480 | KC477421 |
| TpMSAT12 | (AAC)5–9 | 255 – 267 | 57 | 5 | 0.478 | KC477422 |
| TpMSAT13 | (TCA)3–16 | 422 – 461 | 57 | 8 | 0.002* | KC477423 |
| TpMSAT14 | (AAG)3–11 | 313 – 340 | 57 | 9 | 0.147 | KC477424 |
| TpMSAT16 | (TGA)12–16 | 317 – 329 | 57 | 5 | 0.273 | KC477425 |
| TpMSAT17 | (ATT)6–15 | 340 – 367 | 57 | 6 | 0.940 | KC477426 |
| TpMSAT19 | (GAA)4–13 | 260 – 287 | 57 | 6 | 0.164 | KC477427 |
|
| ||||||
| CjMSAT01 | (TAA)10–11 | 210 – 213 | 50 | 2 | NA | KJ939575 |
| CjMSAT02 | (CAA)9–16 | 375 – 396 | 50 | 5 | 0.090 | KJ939576 |
| CjMSAT03 | (TAC)11–12 | 374 – 377 | 50 | 2 | NA | KJ939577 |
| CjMSAT04 | (TAC)18–21 | 347 – 356 | 50 | 4 | 1 | KJ939578 |
| CjMSAT05 | (ATA)10–12 | 291 – 297 | 50 | 3 | 1 | KJ939579 |
| CjMSAT08 | (TAA)0–15 | 239 – 284 | 57 | 2 | NA | KJ939580 |
| CjMSAT09 | (TAA)9–10 | 276 – 279 | 50 | 2 | NA | KJ939581 |
| CjMSAT13 | (CGT)10–18 | 264 – 288 | 50 | 7 | NA | KJ939583 |
| CjMSAT14 | (ATT)10–13 | 460 – 469 | 50 | 3 | 0.247 | KJ939584 |
| CjMSAT16 | (ATA)7–8 | 367 – 370 | 57 | 2 | NA | KJ939585 |
| CjMSAT18 | (ATT)10–12 | 320 – 326 | 50 | 3 | 1 | KJ939586 |
| CjMSAT19 | (TAC)14–15 | 318 – 321 | 50 | 2 | NA | KJ939587 |
STR name, repeat motif, fragment lengths of observed alleles, annealing temperature in degrees Celsius (T ), number of observed alleles (N ), P values from Hardy-Weinberg Equilibrium statistics (PHWE), and GenBank accession numbers.
*Indicates a significant deviation from HWE after applying Bonferroni’s correction for multiple-comparison.
Source populations of and
| Pop | Location | Host | Collector | Date | N | Ho | He |
|---|---|---|---|---|---|---|---|
|
| |||||||
| J0029 | Bethel, OR |
| J Andersen and C Hedstrom | 24vi2010 | 6 | 0.544 | 0.537 |
| J0030 | McMinnville, OR |
| J Andersen | 24vi2010 | 6 | 0.208 | 0.412 |
| J0001 | Durham, CA |
| N Mills | 06vii2006 | 12 | 0.311 | 0.328 |
| J0008 | Tulare, CA |
| N Mills | 17ix2006 | 15 | 0.271 | 0.373 |
| J0069 | Upper Lake, CA |
| R Elkins | 10ix2010 | 11 | 0.312 | 0.385 |
| J0178 | Yuba City, CA |
| J Andersen | 27ix2011 | 7 | 0.242 | 0.360 |
| J0179 | Escalon, CA |
| J Andersen and M Labbé | 05vi2012 | 12 | 0.344 | 0.354 |
| J0188 | Newark, CA |
| J Andersen and M Labbé | 30viii2012 | 10 | 0.347 | 0.384 |
| J0163 | Tehran, Iran |
| P Starý | 24iii2004 | 12 | 0.321 | 0.381 |
|
| |||||||
| A0046 | Modesto, CA | Walnut | J Andersen and K Anderson | 7vii2010 | 9 | 0.103 | 0.100 |
| A0052 | Linden, CA | Walnut | J Andersen | 10vii2010 | 8 | 0.112 | 0.128 |
| A0073 | Upper Lake, CA | Walnut | J Andersren and M Labbé | 13ix2010 | 9 | 0.151 | 0.165 |
| A0164 | Parnac, France | Walnut | J Andersen and M Labbé | 2vi2011 | 20 | 0.068 | 0.082 |
Populations used in this study including the number of females genotyped (N), averaged observed (Ho), and expected (He) heterozygosity.
Figure 1iMSAT workflow diagram. Before using iMSAT, barcoded NGS sequencing libraries are produced (A) and sequenced (B) and either used to create a de novo assembly or to align an available reference genome (C). Then using SAMtools and BWA the individual sequence reads are used to create a polymorphism report (D) that includs the location of the polymorphic loci, type (SNP or INDEL), and other quality statistics. iMSAT then uses the output and the alignment file to filter the polymorphism data based on a user specified number of base pairs (E), identifies the STR motifs and the number of repeats (F), and outputs separate .FASTA preprocessing files for each candidate locus that can be used with primer design software (G).