| Literature DB >> 29058768 |
Jessica Hernandez-Rodriguez1, Mimi Arandjelovic2, Jack Lester2, Cesare de Filippo2, Antje Weihmann3, Matthias Meyer3, Samuel Angedakin2, Ferran Casals4, Arcadi Navarro1,5,6, Linda Vigilant2, Hjalmar S Kühl2,7, Kevin Langergraber8, Christophe Boesch2, David Hughes1,9, Tomas Marques-Bonet1,5,6.
Abstract
Target-capture approach has improved over the past years, proving to be very efficient tool for selectively sequencing genetic regions of interest. These methods have also allowed the use of noninvasive samples such as faeces (characterized by their low quantity and quality of endogenous DNA) to be used in conservation genomic, evolution and population genetic studies. Here we aim to test different protocols and strategies for exome capture using the Roche SeqCap EZ Developer kit (57.5 Mb). First, we captured a complex pool of DNA libraries. Second, we assessed the influence of using more than one faecal sample, extract and/or library from the same individual, to evaluate its effect on the molecular complexity of the experiment. We validated our experiments with 18 chimpanzee faecal samples collected from two field sites as a part of the Pan African Programme: The Cultured Chimpanzee. Those two field sites are in Kibale National Park, Uganda (N = 9) and Loango National Park, Gabon (N = 9). We demonstrate that at least 16 libraries can be pooled, target enriched through hybridization, and sequenced allowing for the genotyping of 951,949 exome markers for population genetic analyses. Further, we observe that molecule richness, and thus, data acquisition, increase when using multiple libraries from the same extract or multiple extracts from the same sample. Finally, repeated captures significantly decrease the proportion of off-target reads from 34.15% after one capture round to 7.83% after two capture rounds, supporting our conclusion that two rounds of target enrichment are advisable when using complex faecal samples.Entities:
Keywords: conservation genetics; exome; next-generation sequencing; noninvasive samples; target enrichment
Mesh:
Substances:
Year: 2017 PMID: 29058768 PMCID: PMC5900898 DOI: 10.1111/1755-0998.12728
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090
Figure 1Experiments scheme. Experiment 1: Sixteen libraries were pooled and captured in triplicate. Experiment 2: Two faecal samples from a single chimpanzee, each extracted twice resulting in four extracts. Two libraries from each of the extracts were created resulting in eight libraries. Libraries were combined into two pools (A, B) such that each extract is present once in each pool. Each pool then underwent a single round and two double rounds of capture
Figure 2Sequencing data summary. (a) Averages across all experiments (N = 72) and summaries for each experimental library in (b) Experiment 1 and (d) Experiment 2. (c) Correlation between the percentage of endogenous DNA of each of the 16 libraries from Experiment 1 and the number of raw reads sequenced
Figure 3Variance explained estimated as the eta‐squared statistic from ANOVA sum of squares for univariate and multivariate nested modelling in (a) Experiment 1 and (b) Experiment 2
Figure 4Boxplots for (a) enrichment factor or EF, (b) capture sensitivity or CS, (c) capture specificity or CSp and (d) library complexity or LC as grouped by hybridization reaction. The hybridization reaction ids and descriptions are identified in Figure 1
Figure 5(a) Library richness: restricted to samples from Experiment 2; compared are library richness curves for each library, extraction and faeces from the same individual. Plotted are the number of reads subsampled in millions compared to the count of reliable reads mapped to our target space on the y‐axis. The inset shows the same plot for the total of 48 million reads subsampled (6 m reads × 8 libraries). (b) Estimations of the relative fold enrichment gained if one sampled a total N million reads not from the plotted sample, but N/2 million reads from itself and N/2 million reads from its replicate pair. Replicate pairs are as follows: (1) Lib17&18, Lib 19&20, Lib21&22, Lib23&24, Extract1&Extract2, Extract3&Extract4, Faeces1&Faeces2
Figure 6(a) Genotype discordance dendrogram from 63 libraries (after removing libraries 1 and 9 due to the low amount of reads and library 16 because of potential contamination). (b) Principal component analysis (PCA) of study samples (squares) along with 59 other chimpanzee individuals from the Great Ape Genome Project (circles)
Figure 7(a) Dot plot illustrating the relationship between mean coverage and the percentage of the target space covered at depth 4. Data from all experiments were used (N = 72). (b) Correlation between the number of raw reads obtained and the mean coverage in all experiments