| Literature DB >> 21853104 |
Joseph I Hoffman1, Hazel J Nichols.
Abstract
An important emerging application of high-throughput 454 sequencing is the isolation of molecular markers such as microsatellites from genomic DNA. However, few studies have developed microsatellites from cDNA despite the added potential for targeting candidate genes. Moreover, to develop microsatellites usually requires the evaluation of numerous primer pairs for polymorphism in the focal species. This can be time-consuming and wasteful, particularly for taxa with low genetic diversity where the majority of primers often yield monomorphic polymerase chain reaction (PCR) products. Transcriptome assemblies provide a convenient solution, functional annotation of transcripts allowing markers to be targeted towards candidate genes, while high sequence coverage in principle permits the assessment of variability in silico. Consequently, we evaluated fifty primer pairs designed to amplify microsatellites, primarily residing within transcripts related to immunity and growth, identified from an Antarctic fur seal (Arctocephalus gazella) transcriptome assembly. In silico visualization was used to classify each microsatellite as being either polymorphic or monomorphic and to quantify the number of distinct length variants, each taken to represent a different allele. The majority of loci (n = 36, 76.0%) yielded interpretable PCR products, 23 of which were polymorphic in a sample of 24 fur seal individuals. Loci that appeared variable in silico were significantly more likely to yield polymorphic PCR products, even after controlling for microsatellite length measured in silico. We also found a significant positive relationship between inferred and observed allele number. This study not only demonstrates the feasibility of generating modest panels of microsatellites targeted towards specific classes of gene, but also suggests that in silico microsatellite variability may provide a useful proxy for PCR product polymorphism.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21853104 PMCID: PMC3154332 DOI: 10.1371/journal.pone.0023283
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Screenshot of a polymorphic trinucleotide-repeat (TTG) microsatellite locus (Agt25) visualised in silico using the program Tablet [23].
The upper ‘overview window’ shows a scaled-to-fit summary of all of the reads comprising the isotig, while the main ‘display window’ shows the microsatellite and its immediate flanking regions visualised under a higher zoom. Within this window, 454 reads are shown aligned against the consensus sequence, with each read occupying a separate row. Individual bases are coloured according to nucleotide type and pad characters, introduced by Newbler to fill any gaps in the assembly, are represented by red star symbols against a light grey background. Two distinct motif length variants can be seen, comprising six and seven repeat units respectively. The same number of alleles was detected when the locus was PCR-amplified in 24 unrelated A. gazella individuals.
Polymorphism characteristics of 21 microsatellite loci that amplified polymorphic and interpretable PCR products in 24 unrelated Arctocephalus gazella individuals.
| Locus | Genbank accession number | Number of alleles | HO
| HE
| Null allele frequency | HWE |
| Agt5 | JF746971 | 2 | 0.053 | 0.235 | 0.626 | 0.012 |
| Agt9 | JF746972 | 2 | 0.125 | 0.120 | −0.032 | 1.000 |
| Agt10 | JF746973 | 3 | 0.417 | 0.377 | −0.061 | 0.483 |
| Agt13 | JF746974 | 3 | 0.125 | 0.121 | −0.025 | 1.000 |
| Agt16 | JF746975 | 2 | 0.167 | 0.156 | −0.044 | 1.000 |
| Agt20 | JF746976 | 2 | 0.250 | 0.223 | −0.067 | 1.000 |
| Agt21 | JF746977 | 6 | 0.625 | 0.785 | 0.103 | 0.242 |
| Agt23 | JF746978 | 2 | 0.042 | 0.042 | −0.011 | NA |
| Agt24 | JF746979 | 7 | 0.727 | 0.778 | 0.022 | 0.513 |
| Agt25 | JF746980 | 2 | 0.042 | 0.042 | −0.011 | NA |
| Agt32 | JF746981 | 4 | 0.875 | 0.668 | −0.145 | 0.182 |
| Agt38 | JF746982 | 2 | 0.087 | 0.085 | −0.022 | 1.000 |
| Agt39 | JF746983 | 5 | 0.739 | 0.728 | −0.019 | 0.791 |
| Agt41 | JF746984 | 9 | 0.917 | 0.839 | −0.055 | 0.595 |
| Agt42 | JF746985 | 5 | 0.333 | 0.420 | 0.105 | 0.181 |
| Agt44 | JF746986 | 2 | 0.208 | 0.191 | −0.055 | 1.000 |
| Agt45 | JF746987 | 3 | 0.522 | 0.581 | 0.043 | 0.394 |
| Agt47 | JF746988 | 3 | 0.391 | 0.476 | 0.087 | 0.243 |
| Agt48 | JF746989 | 6 | 0.875 | 0.757 | −0.083 | 0.786 |
| Agt49 | JF746990 | 5 | 0.417 | 0.621 | 0.186 | 0.012 |
| Agt50 | JF746991 | 9 | 0.833 | 0.715 | −0.087 | 0.364 |
Observed heterozygosity.
Expected heterozygosity.
Negative null allele frequency values are normal using Chakarborty's estimator [31] when the null allele frequency is close to zero and sample sizes are small [64].
Hardy-Weinberg equilibrium P-values could not be calculated for loci indicated by ‘na’ due to only one individual carrying the second allele.
Table summarizing consistency between in silico and PCR product polymorphism across 38 microsatellite loci.
| PCR products | ||||
| Polymorphic | Monomorphic | Total | ||
|
| Polymorphic | 22 | 8 | 30 |
| Monomorphic | 1 | 7 | 8 | |
| Total | 23 | 15 | 38 | |
Results of Generalized Linear Model (GLMs) of PCR product polymorphism (see Materials and Methods for details).
| Term | Estimate | χ2 | df |
|
| Minimum number of repeat units | 1.37 | 20.43 | 1 | <0.0001 |
| Variability | 8.23 | 17.32 | 1 | <0.0001 |
Only significant terms remaining in the reduced model are shown.
Degrees of freedom.
Total deviance = 50.98; total explained deviance = 59.99%.
Results of the Generalized Linear Model (GLM) of the number of alleles (see Materials and Methods for details).
| Term | Estimate | χ2 | df |
|
| Minimum number of repeat units | 0.06 | 3.87 | 1 | 0.049 |
| Number of alleles | 0.34 | 6.88 | 1 | 0.009 |
| Number of reads differing from consensus | −0.09 | 5.21 | 1 | 0.022 |
Only significant terms remaining in the reduced model are shown.
Degrees of freedom.
Total deviance = 24.37; total explained deviance = 57.07%.
Figure 2Relationship between the inferred number of alleles in silico and observed allele number for 21 polymorphic microsatellite loci.
Shown are fitted values from a GLM controlling for the number of repeats and the number of reads differing from the consensus sequence. The solid line shows the regression predicted by the GLM and dashed lines indicate the 95% confidence interval.