| Literature DB >> 35356554 |
Claudius F Kratochwil1,2, Andreas F Kautt1,3, Sina J Rometsch1, Axel Meyer1.
Abstract
High-throughput DNA sequencing technologies make it possible now to sequence entire genomes relatively easily. Complete genomic information obtained by whole-genome resequencing (WGS) can aid in identifying and delineating species even if they are extremely young, cryptic, or morphologically difficult to discern and closely related. Yet, for taxonomic or conservation biology purposes, WGS can remain cost-prohibitive, too time-consuming, and often constitute a "data overkill." Rapid and reliable identification of species (and populations) that is also cost-effective is made possible by species-specific markers that can be discovered by WGS. Based on WGS data, we designed a PCR restriction fragment length polymorphism (PCR-RFLP) assay for 19 Neotropical Midas cichlid populations (Amphilophus cf. citrinellus), that includes all 13 described species of this species complex. Our work illustrates that identification of species and populations (i.e., fish from different lakes) can be greatly improved by designing genetic markers using available "high resolution" genomic information. Yet, our work also shows that even in the best-case scenario, when whole-genome resequencing information is available, unequivocal assignments remain challenging when species or populations diverged very recently, or gene flow persists. In summary, we provide a comprehensive workflow on how to design RFPL markers based on genome resequencing data, how to test and evaluate their reliability, and discuss the benefits and pitfalls of our approach.Entities:
Keywords: Cichlidae; GB‐RFLP; Midas cichlids; genetic markers; species identification; targeted genome sequencing
Year: 2022 PMID: 35356554 PMCID: PMC8941502 DOI: 10.1002/ece3.8751
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
FIGURE 1The Midas cichlid species complex includes fish inhabiting nine lakes (the two great lakes and seven crater lakes) and comprises 13 described species. In this study, we designed genome‐based PCR‐RFLP (GB‐RFLP) markers for population (lake‐specific genetic markers) and species assignment (species‐specific genetic markers)
FIGURE 2Workflow of the whole‐genome resequencing‐based design of PCR‐RFLP markers (GB‐RFLP). (a) Markers were designed based on genetic differentiation (F ST) or genome‐wide genotype‐phenotype association (GWA) data of a 453‐genome dataset using pairwise ingroup–outgroup comparisons. (b) Variants were screened for RFLPs with one allele (but not the other) being cut by a restriction enzyme. (c) Target variants were filtered based on the presence of additional restriction sites and additional SNPs within the restriction site. Quality control was performed by plotting allele frequencies and haplotype networks. (d) Likelihoods that a genotype corresponds to a population (or not) were calculated based on population‐specific allele frequencies. Percentage of correct assignments together with false negative and false positive rates were based on bootstrapping of genotypes (informed by empirical allele frequencies in the genomic dataset). (e) Primers were designed based on 801‐bp sequences (core SNP ± 400 bp) in a way that the restriction enzyme would generate two fragments with a ~1:2 length ratio. (f) PCR conditions were tested and optimized. (g) Restriction digest was performed on 8 ingroup samples and 5–8 outgroup samples. Genotypes were determined by visual inspection. These data were used to calculate the number of correct assignments as well as rates of false positives and negatives (as for the bootstrapping dataset in d)
FIGURE 3Comparison of the percentage of correctly assigned samples (a, d), false negatives (b, e), and false positives (c, f) using lake‐specific GB‐RFLPs (a–c) and species‐specific GB‐RFLPs (d–f). On the left side of the plots are always estimates based on bootstrapping data using allele frequencies from a previously generated genomic dataset, data from GB‐RFLP analyses are on the right. Color code in d–f indicates the populations of Crater Lake Apoyo (red), Xiloá (blue), and the great lake species A. citrinellus and A. labiatus (gray)
Tested RFLP markers, their location in the reference genome, used restriction enzyme, quality of the marker, and correctly assigned individuals (in %)
| Marker (Test population | coordinates) | Enzyme | Marker quality | Genotype (% correctly assigned) | |
|---|---|---|---|---|
| Ingroup | Outgroup | |||
| Lake Managua | Chr3:33,260,056 | HaeII |
402/402 (99.8%) 550/402 (96.3%) | 550/550 (73.8%) | |
| Lake Managua | Chr21:22,155,441 | BclI | − |
400/400 (98.9%) 529/400 (88.2%) | 529/529 (82.1%) |
| Lake Nicaragua | Chr11:21,205,346 | BsrI | + |
395/395 (98.9%) 544/395 (86.7%) | 544/544 (85.5%) |
| Lake Nicaragua | Chr17:561,899 | BsaAI | − |
400/400 (99.2%) 551/400 (89.2%) | 551/551 (88.9%) |
| Lake Apoyeque | Chr13:8,711,066 | PleI | ++ |
607/607 (100%) 607/367 (51.3%) | 367/367 (100%) |
| Lake Apoyeque | Chr11:1,877,014 | PvuII | + |
585/585 (99.7%) 585/363 (59.6%) | 363/363 (99.9%) |
| Lake Apoyo | Chr18:5,677,584 | HpaII | ++++ |
523/523 (100%) 523/362 (59.4%) | 362/362 (100%) |
| Lake Apoyo | Chr11:1,173,519 | ApoI | ++++ |
402/402 (100%) 533/402 (86.5%) | 533/533 (100%) |
| Lake As. Managua | Chr7:6,889,897 | HaeII | +++ |
480/480 (99.9%) 480/361 (84.6%) | 361/361 (99.7%) |
| Lake As. Managua | Chr23:32,083,942 | NcoI | + |
196/196 (100%) 557/196 (100%) | 557/557 (98.5%) |
| Lake As. León | Chr18:14,951,950 | BsaI | ++++ | 530/530 (100%) | 405/405 (100%) |
| Lake As. León | Chr8:21,669,333 | BceAI | ++++ | 500/500 (100%) |
500/188 (100%) 188/188 (100%) |
| Lake Masaya | Chr20:9,539,893 | BccI | ++ |
400/400 (100%) 553/400 (99.5%) | 553/553 (84.7%) |
| Lake Masaya | Chr11:20,264,687 | HaeIII | + |
492/492 (100%) 492/363 (100%) | 363/363 (82.4%) |
| Lake Tiscapa | Chr18:17,567,949 | HphI | ++++ | 411/411 (100%) | 528/528 (100%) |
| Lake Tiscapa | Chr14:11,451,235 | AvaI | ++++ |
524/524 (100%) 524/401 (100%) | 401/401 (99.8%) |
| Lake Xiloá | Chr3:13,299,069 | BsaI | ++++ |
543/543 (100%) 543/408 (97.1%) | 408/408 (99.2%) |
| Lake Xiloá | Chr14:29,887,480 | HgaI | − |
506/506 (99.6%) 506/377 (63.4%) | 377/377 (99.6%) |
|
| AccI | + | 602/602 (98.5%) |
602/351 (71%) 351/351 (99.8%) |
|
| NspI | + |
348/348 (99.5%) 480/348 (57.4%) | 480/480 (99.1%) |
|
| BsmI | − |
370/370 (100%) 515/370 (96.8%) | 515/515 (94.5%) |
|
| AlwNI | − |
536/536 (99.4%) 536/389 (67%) | 389/389 (99.2%) |
|
| PstI | − | 397/397 (99.9%) |
520/397 (100%) 520/520 (100%) |
|
| AciI | ++++ | 112/112 (99.5%) |
488/112 (100%) 488/488 (100%) |
|
| ApoI | − | 538/538 (95.2%) |
538/185 (100%) 185/185 (100%) |
|
| HinfI | − |
524/524 (98.3%) 524/397 (74.7%) | 397/397 (91.5%) |
|
| BsaBI | − |
536/536 (100%) 536/197 (100%) | 197/197 (100%) |
|
| HgaI | − |
382/382 (100%) 517/382 (100%) | 517/517 (99.4%) |
|
| RsaI | − |
583/583 (100%) 583/199 (96.4%) | 199/199 (96.1%) |
|
| BbvI | + | 531/531 (99.9%) |
531/413 (100%) 413/413 (100%) |
|
| BsiEI | − |
514/514 (100%) 514/169 (86.8%) | 169/169 (99.5%) |
|
| BstAPI | ++ |
535/535 (99.3%) 535/171 (71.7%) | 171/171 (97.6%) |
|
| XcmI | + |
175/175 (100%) 563/175 (96.2%) | 563/563 (93.4%) |
|
| BspMI | − |
566/566 (99.8%) 566/185 (82.3%) | 185/185 (97.8%) |
|
| Tth111I | − |
512/512 (100%) 512/155 (100%) | 155/155 (98.5%) |
|
| AleI | − | 593/593 (99.8%) |
593/391 (53.4%) 391/391 (99.9%) |
Quality was assessed based on a combination of the predicted and tested number of correctly assigned specimens (++++: >99%, +++: >95%, ++: >90%, +: >80%, –: <80%). Ingroup means “within test population” and outgroup means “not within test population.” For lake markers (above line), the outgroup contains all samples, for species markers (below line) only sympatric species within the same respective crater lake.