| Literature DB >> 30250675 |
Hannah J Barbian1, Andrew Jesse Connell1, Alexa N Avitto1, Ronnie M Russell1, Andrew G Smith1, Madhurima S Gundlapally1, Alexander L Shazad1, Yingying Li1, Frederic Bibollet-Ruche1, Emily E Wroblewski2, Deus Mjungu3, Elizabeth V Lonsdorf4, Fiona A Stewart5, Alexander K Piel5, Anne E Pusey6, Paul M Sharp7, Beatrice H Hahn1.
Abstract
Short tandem repeats (STRs), also known as microsatellites, are commonly used to noninvasively genotype wild-living endangered species, including African apes. Until recently, capillary electrophoresis has been the method of choice to determine the length of polymorphic STR loci. However, this technique is labor intensive, difficult to compare across platforms, and notoriously imprecise. Here we developed a MiSeq-based approach and tested its performance using previously genotyped fecal samples from long-term studied chimpanzees in Gombe National Park, Tanzania. Using data from eight microsatellite loci as a reference, we designed a bioinformatics platform that converts raw MiSeq reads into locus-specific files and automatically calls alleles after filtering stutter sequences and other PCR artifacts. Applying this method to the entire Gombe population, we confirmed previously reported genotypes, but also identified 31 new alleles that had been missed due to sequence differences and size homoplasy. The new genotypes, which increased the allelic diversity and heterozygosity in Gombe by 61% and 8%, respectively, were validated by replicate amplification and pedigree analyses. This demonstrated inheritance and resolved one case of an ambiguous paternity. Using both singleplex and multiplex locus amplification, we also genotyped fecal samples from chimpanzees in the Greater Mahale Ecosystem in Tanzania, demonstrating the utility of the MiSeq-based approach for genotyping nonhabituated populations and performing comparative analyses across field sites. The new automated high-throughput analysis platform (available at https://github.com/ShawHahnLab/chiimp) will allow biologists to more accurately and effectively determine wildlife population size and structure, and thus obtain information critical for conservation efforts.Entities:
Keywords: Pan troglodytes; high‐throughput STR genotyping; length homoplasy; parentage analysis; short tandem repeats
Year: 2018 PMID: 30250675 PMCID: PMC6145012 DOI: 10.1002/ece3.4302
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Comparison of capillary electrophoresis and MiSeq‐based genotyping results
Erroneous allele calls by capillary electrophoresis and MiSeq genotyping methods
| Capillary electrophoresis (automatic) | % | Capillary electrophoresis (manual) | % | High‐throughput MiSeq genotyping | % | |
|---|---|---|---|---|---|---|
| Allelic dropout | 28 | 18 | 21 | 14 | 10 | 7 |
| Missing locus | 4 | 3 | 2 | 1 | 0 | 0 |
| False allele | 3 | 2 | 1 | 1 | 0 | 0 |
| PCR stutter | 18 | 12 | 0 | 0 | 0 | 0 |
| Analysis time | 75 min | 120 min | 5 min |
aPeaks were called automatically using software. bPeaks were called manually. cThe percentage of erroneous alleles was calculated for 152 loci by comparing the newly derived results to the reference genotypes (Table 1). dLocus alleles do not match the locus primer and/or motif sequence. eHands‐on analysis time included allele length calling, binning and individual identification.
Figure 1MiSeq genotyping uncovers cryptic alleles. Eight polymorphic short tandem repeat loci were amplified from the fecal DNA of 19 previously genotyped chimpanzees. (a) Histogram depicting the length (x‐axis) and read count (y‐axis) of unique sequences for one representative heterozygous locus that was previously determined to be homozygous by multiple capillary electrophoresis analyses (sample 4861, locus C, Table 1). The gray box highlights the expected locus size range. The horizontal line indicates the cutoff 5% of total filtered reads. Colored peaks indicate reads that passed the locus‐specific filters (note that peaks can be comprised of identically sized reads that differ in their sequence content). Black reads were eliminated. Pink reads appear to be locus‐specific, but did not pass the PCR artifact filters. Red reads represent the true allele sequences (180 and 181 bp in lengths, respectively). (b,c) Alignment images of locus‐specific allele sequences are shown for locus 1 (b) and locus C (c), respectively (the complete data set is shown in Table 1). Allele sequences are ordered by length (indicated in bp on the right), with the frequency with which they were found in different chimpanzees indicated on the left (the x‐axis indicates the position within the alignment). Nucleotides are colored as shown, with gaps in the alignment shown in gray. The insets highlight alleles that differ in their sequence content and/or length. Nucleotide substitutions are colored; dashes indicate gaps that were introduced to optimize the alignment
Increased allelic and gene diversity as detected by MiSeq STR genotyping
| Locus | Number of alleles | Gene diversity | |||
|---|---|---|---|---|---|
| CE | MiSeq | Cryptic | CE | MiSeq | |
| A | 6 | 7 | 1 | 0.74 | 0.74 |
| B | 5 | 7 | 2 | 0.79 | 0.81 |
| C | 5 | 10 | 5 | 0.70 | 0.83 |
| D | 7 | 13 | 6 | 0.80 | 0.88 |
| 1 | 9 | 16 | 7 | 0.80 | 0.86 |
| 2 | 7 | 9 | 2 | 0.75 | 0.75 |
| 3 | 7 | 14 | 7 | 0.71 | 0.83 |
| 4 | 5 | 6 | 1 | 0.72 | 0.80 |
| Total/mean | 51 | 82 | 31 | 0.75 | 0.81 |
CE: capillary electrophoresis.
aNumber of alleles at eight STR loci determined for 123 Gombe chimpanzees (Supporting Information Table S2). bNine individuals were excluded from heterozygosity calculations because they had incomplete CE genotypes. cAlleles newly discovered by MiSeq genotyping.
Allelic sequence and length differences uncovered by MiSeq‐based genotyping
| Locus | Cryptic allele | Number of apes carrying allele | Substitutions (identical length) | Indels (identical length) | Indels (1 bp length difference) | Mendelian inheritance |
|---|---|---|---|---|---|---|
| A | 157‐b | 3 | 2 | Yes | ||
| B | 204‐a | 14 | 1 | Yes | ||
| B | 231‐b | 2 | 1 | Yes | ||
| C | 181‐a | 10 | 1 | Yes | ||
| C | 181‐b | 1 | 1 | NA | ||
| C | 185‐b | 20 | 1 | Yes | ||
| C | 185‐c | 11 | 1 | Yes | ||
| C | 189‐b | 35 | 1 | Yes | ||
| D | 285‐a | 8 | 3 | 1 | Yes | |
| D | 297‐b | 8 | 2 | Yes | ||
| D | 297‐c | 7 | 1 | Yes | ||
| D | 300‐a | 14 | 1 | 1 | Yes | |
| D | 301‐b | 7 | 3 | Yes | ||
| D | 305‐b | 11 | 1 | Yes | ||
| 1 | 246‐a | 10 | 5 | Yes | ||
| 1 | 247‐b | 4 | 2 | Yes | ||
| 1 | 250‐a | 2 | 5 | Yes | ||
| 1 | 254‐a | 27 | 1 | Yes | ||
| 1 | 258‐a | 6 | 5 | Yes | ||
| 1 | 258‐b | 3 | 3 | Yes | ||
| 1 | 266‐b | 2 | 4 | Yes | ||
| 2 | 310‐b | 1 | 1 | NA | ||
| 2 | 326‐b | 6 | 1 | NA | ||
| 3 | 226‐b | 1 | 2 | NA | ||
| 3 | 230‐b | 1 | 3 | NA | ||
| 3 | 234‐b | 25 | 2 | Yes | ||
| 3 | 234‐c | 10 | 3 | 2 | Yes | |
| 3 | 234‐d | 4 | 2 | Yes | ||
| 3 | 238‐b | 3 | 2 | NA | ||
| 3 | 246‐b | 7 | 2 | Yes | ||
| 4 | 294‐a | 38 | 3 | 2 | Yes |
aCryptic alleles were compared to the most abundant allele of the same or similar length. bAlleles found in only one chimpanzee were confirmed by repeat amplification and sequencing.
Figure 2MiSeq genotyping uncovers increased allelic diversity and heterozygosity. (a) Alignment of four locus 3 alleles that are of identical length (234 bp), but differ in sequence content. Nucleotide substitutions are colored; dashes indicate single nucleotide insertions and deletions (b) Mendelian inheritance of allele 234 for a group of related chimpanzees. Fathers and mothers are shown as squares and circles, respectively, with offspring connected by vertical lines. Both alleles are shown for each animal, with the four allelic variants highlighted in different colors. Individuals of unknown identity or genotype are left blank. (c) Increased allelic diversity resolves a previously ambiguous paternity determination. Two potential fathers with identical allele lengths (238 bp) can now be distinguished based on differences in allele sequence content (238‐a and 238‐b). As the offspring is homozygous for allele 238‐a, the male with allele 238‐b can be excluded as a father
Figure 3Individual identification based on MiSeq genotyping. (a–c) Genotypes of newly collected samples (top) are compared to the genotypes of known community members, with the closest match listed below (based on descending distance scores). Genotypes that differ by fewer than four alleles are indicated in bold because they represent likely matches. Differences are highlighted in yellow. (d) Heatmap showing the relative similarity of sample genotypes (rows) with genotypes of known individuals (columns) based on distance scores. Dark red cells indicate likely matches
MiSeq genotyping of singleplex and multiplex amplified STR loci
| Singleplex PCR | % | One‐step multiplex PCR | % | Two‐step multiplex PCR | % | |
|---|---|---|---|---|---|---|
| Allele detection | 130 | 68 | 130 | 68 | 147 | 77 |
| Incorrect allele | 1 | 0.5 | 1 | 0.5 | 4 | 2.1 |
| PCR stutter | 1 | 0.5 | 1 | 0.5 | 4 | 2.1 |
| DNA Input | 24 μl | 6 μl | 6 μl | |||
Of a total of 192 alleles analyzed for 12 GME chimpanzees.
Figure 4Comparison of MiSeq genotypes across chimpanzee communities. Alignment images of locus‐specific allele sequences are shown for chimpanzees from the Greater Mahale Ecosystem (GME) and Gombe. Two representative loci (locus B on the left; locus D on the right) are shown for (a) 12 chimpanzees from the GME (Supporting Information Table S3), (b) 123 chimpanzees from Gombe (Supporting Information Table S2), and (c) a combination of both. Allele sequences are ordered by length (indicated in base pairs on the right), with the frequency with which they were found in different chimpanzees indicated on the left (the x‐axis indicates the position within the alignment). Nucleotides are colored as indicated, with alignment gaps shown in gray. Arrows indicate alleles that are unique to the GME samples
Figure 5MiSeq‐based short tandem repeat (STR) genotyping of wild chimpanzees. (a) Schematic representation of singleplex STR amplification and MiSeq sequencing of chimpanzee fecal DNA. (b) Schematic representation of the CHIIMP analysis pipeline with decision tree and downstream data reports