| Literature DB >> 31339632 |
Graeme Fox1, Richard F Preziosi1, Rachael E Antwis2, Milena Benavides-Serrato1,3, Fraser J Combe1,4, W Edwin Harris1,5, Ian R Hartley6, Andrew C Kitchener7, Selvino R de Kort1, Anne-Isola Nekaris8, Jennifer K Rowntree1.
Abstract
Bespoke microsatellite marker panels are increasingly affordable and tractable to researchers and conservationists. The rate of microsatellite discovery is very high within a shotgun genomic data set, but extensive laboratory testing of markers is required for confirmation of amplification and polymorphism. By incorporating shotgun next-generation sequencing data sets from multiple individuals of the same species, we have developed a new method for the optimal design of microsatellite markers. This new tool allows us to increase the rate at which suitable candidate markers are selected by 58% in direct comparisons and facilitate an estimated 16% reduction in costs associated with producing a novel microsatellite panel. Our method enables the visualisation of each microsatellite locus in a multiple sequence alignment allowing several important quality checks to be made. Polymorphic loci can be identified and prioritised. Loci containing fragment-length-altering mutations in the flanking regions, which may invalidate assumptions regarding the model of evolution underlying variation at the microsatellite, can be avoided. Priming regions containing point mutations can be detected and avoided, helping to reduce sample-site-marker specificity arising from genetic isolation, and the likelihood of null alleles occurring. We demonstrate the utility of this new approach in two species: an echinoderm and a bird. Our method makes a valuable contribution towards minimising genotyping errors and reducing costs associated with developing a novel marker panel. The Python script to perform our method of multi-individual microsatellite identification (MiMi) is freely available from GitHub (https://github.com/graemefox/mimi).Entities:
Keywords: cost-effective marker development; high-throughput sequencing; in silico quality control; microsatellite design; polymorphic loci detection; short tandem repeat (STR)
Mesh:
Substances:
Year: 2019 PMID: 31339632 PMCID: PMC6900094 DOI: 10.1111/1755-0998.13065
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090
Figure 1Summary statistics showing the rate at which potential microsatellite markers were successfully amplified in the laboratory, and the rate at which they were discovered to be informative. Markers were designed using both methodologies in P. miliaris and C. caeruleus. Stated values are the average for each design method, in each measure of success (amplification rate and informative loci rate). Error bars show the standard deviations. The use of MiMi results in both an increase in the rate at which markers amplify and are informative, and also a reduction in the variability at each of these measures compared to the traditional workflow [Colour figure can be viewed at http://wileyonlinelibrary.com]
A summary of the design methods used in each species, including the data set number (ID), species, treatment (Tx), number of individuals sequenced (N), number of PCR primers tested (Pp), number of PCR primers tested successfully amplifying in 75% of samples tested (Amp), number of amplifiable PCR primers producing informative data after capillary electrophoresis (easily interpretable and polymorphic) (Inf), percentage of amplifiable primers which were informative (Inf/Amp), percentage of total primers tested which were informative (Inf/Pp) genome size estimate (C‐val), raw sequence reads per sample (Reads), (mean and SD given where MiMi applied), estimated sequence coverage (Cov), literature reference and/or accession numbers of NGS data (REF/SRA) where applicable. All genome sizes were retrieved from the Animal Genome Size Database (http://www.genomesize.com) with the closest related species used. Panels of markers were developed in P. miliaris and C. caeruleus using both the traditional method (Castoe et al., 2015; Griffiths et al., 2016) and MiMi methods. The application of the MiMi quality control process produces higher rates of both amplification and production of informative markers in both these instances
| ID | Species | Tx |
| Pp | Amp | Inf | Inf/ Amp | Inf/ Pp | C val | Reads | Cov | REF/ SRA |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 |
| MiMi | 8 | 10 | 10 | 8 | 80% | 80% | 1.47 | 8* 2,901,027, (STDEV ± 878, 838) | 1.20X | SRX5066864 to SRX5066869 |
| 2 |
| MiMi | 8 | 20 | 19 | 18 | 95% | 90% | 1.30 | 8* 1,482,736, (STDEV ± 280, 686) | 0.57X | SRX5162614 to SRX5162621 |
| 3 |
| Trad. | 1 | 30 | 21 | 18 | 86% | 60% | 3.94 | 8,980,510 | 1.10X | Combe et al. ( |
| 4 |
| Trad. | 1 | 30 | 26 | 17 | 65% | 57% | 3.58 | 5,309,686 | 0.74X | SRX5112421 |
| 5 |
| Trad. | 1 | 10 | 4 | 1 | 25% | 10% | 1.47 | 3,913,299 | 1.60X | SRX5066867 |
| 6 |
| Trad. | 1 | 24 | 13 | 9 | 69% | 38% | 1.30 | 1,359,615 | 0.52X | SRX5162614 |
The total number of potential microsatellite loci discovered using the traditional design methodology, retained after filtering with the Griffiths et al. (2016) method and retained after MiMi quality control processing
| Species | pal_finder loci | Griffiths et al. ( | MiMi loci |
|---|---|---|---|
|
| 158,147 | 4,513 (2.9%) | 302 (0.19%) |
|
| 469,047 | 5,657 (1.2%) | 250 (0.05%) |
Figure 2The MiMi tool was used to analyse 5,657 potential microsatellite loci discovered in P. miliaris sequence data and 4,513 discovered in C. caeruleus. Loci were filtered to just those which appeared in the sequence data of three or more individuals. The total number of loci which were successfully detected in multiple individuals, and in how many individuals they were detected is shown below. The bar labels are the absolute number of loci that were detected in each category (number of individuals)
Potential loci are automatically filtered by the MiMi script. Loci are removed under the following conditions: Low quality alignments = loci rejected due to not meeting a minimum requirement for overall quality of alignment. This is indicative of multiple primer binding occurring in the host genome, and of size‐altering INDEL mutations occurring in the flanking regions. Primer mutations = loci rejected due to SNP or INDEL mutations detected within the primer binding sites. Nonvariable = loci rejected due to multiple reads spanning the microsatellite but no motif number variation present. High quality = loci passed due to consistent forward and reverse primer sequences seen in multiple individuals, multiple reads spanning the microsatellite and variable motif number observed, no evidence of INDEL or multiple binding sites, Good quality = identical criteria as “High quality,” but alignment provided no information afforded relating to consistent reverse PCR primer or INDEL mutations
| ID | Species | Total | Low quality alignments | Primer mutations | Nonvariable | High quality | Good quality |
|---|---|---|---|---|---|---|---|
| 1 |
| 302 | 14 (4.6%) | 7 (2.3%) | 205 (67.9%) | 13 (4.3%) | 63 (20.9%) |
| 2 |
| 250 | 102 (40.8%) | 9 (3.6%) | 101 (40.4%) | 12 (4.8%) | 26 (10.4%) |