| Literature DB >> 28723901 |
Luis E Hernandez-Castro1, Marta Paterno2,3, Anita G Villacís4, Björn Andersson5, Jaime A Costales4, Michele De Noia6, Sofía Ocaña-Mayorga4, Cesar A Yumiseva4, Mario J Grijalva4,7, Martin S Llewellyn1.
Abstract
BACKGROUND: Rhodnius ecuadoriensis is the main triatomine vector of Chagas disease, American trypanosomiasis, in Southern Ecuador and Northern Peru. Genomic approaches and next generation sequencing technologies have become powerful tools for investigating population diversity and structure which is a key consideration for vector control. Here we assess the effectiveness of three different 2b restriction site-associated DNA (2b-RAD) genotyping strategies in R. ecuadoriensis to provide sufficient genomic resolution to tease apart microevolutionary processes and undertake some pilot population genomic analyses. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2017 PMID: 28723901 PMCID: PMC5536387 DOI: 10.1371/journal.pntd.0005710
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Fig 1Map of the study area and the location of sampled communities in Ecuador.
Purple circles indicate the location of Coamine (CE), La Extensa (EX) and Chaquizhca (CQ) in Loja Province, and El Bejuco (BJ) in Manabí.
Overview of 2b-RAD genotyping: key considerations and further reading.
| Before the protocol starts, gDNA is extracted preferably from only crushed wings, thorax, legs, and head of triatomine bugs (or the vector of interest) to reduce/prevent contamination with gut bacteria or symbiotic fungi. It is crucial at this stage that very pure, high molecular weight, RNA-free gDNA at a concentration above 25 ng/μl is obtained for optimal restriction enzyme digestion [ | |
| Once gDNA is obtained, the protocol is carried out in 3 consecutive steps: enzyme digestion, adaptor ligation, and amplification [ | |
| After library preparation, 2b-RAD tags are ready to be sequenced on Illumina platforms which provide a range of sequencing lengths (50–300 bp) along with other options such as single (forward) or paired-end sequencing (forward and reverse reads) [ | |
| Before genotyping, 2b-RAD raw reads enter a quality control check using different software, such as FastQC, in which a per base score above 28 is expected (see [ | |
| Assembly of loci, either | |
| Before the run ends, the data is stored in MySQL database, which can be accessed via the STACKS EXPORT_SQL.PL utility and downloaded in a compact format (TSV or XLS). During the download, the user can specify different filters such as loci with a determinate number of SNPs, alleles per locus and percentage of sharing among samples. At this stage, additional filtering is recommended to remove loci or samples with a large amount of missing data and set a threshold for the polymorphic sharing by a fixed number of samples. | |
| Finally, the previously identified loci can be exported in a standard output format, such as GENEPOP or STRUCTURE, using the STACKS POPULATIONS core program. From this stage, the biological information is ready to use for further conventional population genetics analysis (e.g., AMOVA, pairwise FST, principal components and coordinates, and Bayesian clustering) or more recent approaches such as landscape genetics [ |
Almost a decade after the first RADseq [34,35] publication, RADseq markers are increasingly used in ecological, evolutionary and conservation genomic studies in non-model organisms [40] and more recently paving the way into epidemiology research. Such a recent but foremost extensive NGS technique cannot be fully covered in this work. Instead, we provide an overview of the steps and key considerations for setting up a 2b-RAD study. Moreover, further literature is indicated throughout each step to elaborate the technique.
Fig 2Step-by-step of 2b-RAD library and genomic data preparation for triatomine genomic population analysis.
(1) gDNA is extracted from heads, legs and thorax of triatomine bugs. (2) After that, gDNA is processed using the 2b-RAD protocol [38] and (3) libraries are sequenced on Illumina instruments. (4) Once the data is delivered, it is trimmed and filtered before (5) used in genotyping software such as STACKS [58]. (6) Then, genotypes are exported from the cloud (MySQL repository) and filtered if large amount of missing data is present. (7) Finally, the polymorphic loci of interest are exported in conventional file formats for population genomic analysis. See Table 1 for an overview of the technique and for particular recommendations.
Fig 3Example of two samples processed with 2b-RAD protocol.
1.8% agarose gel electrophoresis showing gDNA (a), digested DNA (b) and PCR product (c) in 2 samples for each of the 3 IIB-REases (AlfI, CspCI and BcgI).
Fig 4Comparison of read depth and marker identification in R. ecuadoriensis using 3 different Type IIB restriction enzymes.
In line with in silico predictions, AlfI, an abundant in silico cutter did not produce enough molecular markers as compared to BcgI and CspCI, less abundant in silico cutters. In the diagram, enzymes with abundant in silico restriction sites (dark gray rectangles) within the genome (dark blue solid line with yellow squares or SNPs) are more likely to produce fragments (light blue, green and orange rectangles) at different locations among samples during a random experiment. This may yield insufficient read depth and thus compromise polymorphic marker discovery (dark blue rectangles with a yellow square).
Relationship between number of reads and polymorphic loci obtained from STACKS analysis.
| IIB-REases | % Subsampled and total reads | Reads (Mreads) | Polymorphic Loci– 1 and 2 SNPs | |
|---|---|---|---|---|
| 90% sharing | 80% sharing | |||
| AlfI | 25 | 0.7 | 28.7 ± 2.5 | 47 ± 3 |
| 50 | 1.4 | 51 ± 2 | 75.3 ± 4 | |
| 75 | 2.2 | 57.3 ± 3.1 | 99 ± 1.7 | |
| 100 | 2.9 | 68 | 186 | |
| BcgI | 25 | 1.5 | 50 ± 5.2 | 78.3 ± 3.1 |
| 50 | 2.9 | 100.7 ± 6.4 | 149 ± 6.1 | |
| 75 | 4.4 | 162 ± 6.2 | 331 ± 10.4 | |
| 100 | 5.8 | 367 | 899 | |
| CspCI | 25 | 1.2 | 46.3 ± 3.8 | 65 ± 6.9 |
| 50 | 2.4 | 81.7 ± 2.1 | 154.3 ± 5.9 | |
| 75 | 3.6 | 341 ± 10.6 | 995 ± 23.4 | |
| 100 | 4.8 | 1244 | 2289 | |
Mean values provided ± the standard error.
Fig 5Relationship between the number of reads and polymorphic loci obtained by each Type IIB restriction enzyme.
Lines show the comparison of the relationship between the increased number of reads obtained by AlfI (Magenta square), BcgI (Dark blue point) and CspCI (Light blue triangle) IIB-REases, and increasing numbers of polymorphic loci discovered after STACKS analysis. Different read abundances were obtained by randomly subsampling the dataset of each enzyme, and analyzing these in STACKS separately as independent datasets. In the figure, A) shows polymorphic loci with up to 2 SNPs shared by at least 90% of samples and best fit logarithmic (Magenta), geometric (Dark blue) and exponential (Light blue) growth curves. B) shows polymorphic loci with up to 2 SNPs shared by at least 80% of samples and best fit geometric (Magenta and Dark blue) and Power-law (Light blue) growth curves.
Fig 6Genetic clusters (K = 2) assigned by STRUCTURE.
The blue columns (1—CE, 2—EX and 3—CQ) indicate the samples from Loja, and the purple column (4—BJ) indicate the samples from Manabí. A) BcgI and B) CspCI datasets, respectively.