| Literature DB >> 28725400 |
Cindy F Verdu1, Erwan Guichoux2, Samuel Quevauvillers1, Olivier De Thier1, Yec'han Laizet2, Adline Delcamp2, Frédéric Gévaudant3, Arnaud Monty4, Annabel J Porté2, Philippe Lejeune1, Ludivine Lassois1,4, Stéphanie Mariette2.
Abstract
The RADseq technology allows researchers to efficiently develop thousands of polymorphic loci across multiple individuals with little or no prior information on the genome. However, many questions remain about the biases inherent to this technology. Notably, sequence misalignments arising from paralogy may affect the development of single nucleotide polymorphism (SNP) markers and the estimation of genetic diversity. We evaluated the impact of putative paralog loci on genetic diversity estimation during the development of SNPs from a RADseq dataset for the nonmodel tree species Robinia pseudoacacia L. We sequenced nine genotypes and analyzed the frequency of putative paralogous RAD loci as a function of both the depth of coverage and the mismatch threshold allowed between loci. Putative paralogy was detected in a very variable number of loci, from 1% to more than 20%, with the depth of coverage having a major influence on the result. Putative paralogy artificially increased the observed degree of polymorphism and resulting estimates of diversity. The choice of the depth of coverage also affected diversity estimation and SNP validation: A low threshold decreased the chances of detecting minor alleles while a high threshold increased allelic dropout. SNP validation was better for the low threshold (4×) than for the high threshold (18×) we tested. Using the strategy developed here, we were able to validate more than 80% of the SNPs tested by means of individual genotyping, resulting in a readily usable set of 330 SNPs, suitable for use in population genetics applications.Entities:
Keywords: black locust; depth of coverage; putative paralogy filtering; restriction site‐associated DNA sequencing
Year: 2016 PMID: 28725400 PMCID: PMC5513258 DOI: 10.1002/ece3.2466
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Figure 1Photograph of Robinia pseudoacacia taken in Aquitaine (France)
Figure 2Outline of in silico data analyses
Figure 3Effect of the depth of coverage on theta with (A) and without (B) paralogous RAD loci and on the error rate with (C) and without (D) paralogous RAD loci. Estimates obtained with the program mlrho are shown in gray, with dotted lines indicating the nine sequenced individuals and the solid line the mean value. Estimates obtained with fast (alnpi program) are shown in black (theta and error rate), red (transition error rate), and blue (transversion error rate)
Results for single nucleotide polymorphism (SNP) detection with reads2snp software, considering RAD loci detected as paralogous (P) or nonparalogous (NP) for two minimal depths of coverage (4× and 18×)
| 4× | 18× | |||
|---|---|---|---|---|
| P | NP | P | NP | |
| Number of RAD loci | 3,451 | 85,341 | 873 | 3,453 |
| Proportion of monomorphic RAD loci (%) | 0 | 48.2 | 0 | 68.9 |
| Proportion of polymorphic RAD loci with one or two SNPs (%) | 16.4 | 31 | 20.4 | 20.3 |
| Proportion of polymorphic RAD loci with more than two SNPs (%) | 83.6 | 20.2 | 79.6 | 10.8 |
| Number of SNPs | 20,990 | 102,378 | 5,483 | 2,763 |
| Mean number of SNPs/RAD locus | 6.1 | 2.5 | 6.3 | 2.6 |
| Proportion of bi‐allelic SNPs (%) | 97.5 | 99.1 | 98.9 | 99.2 |
| Proportion of tri/tetra‐allelic SNPs (%) | 2.5 | 0.9 | 1.1 | 0.8 |
| Number of paralogous RAD loci containing a “pass” SNP | 2,983 | – | 793 | – |
| Proportion of “pass” SNPs/total SNPs (%) | 62.2 | – | 84.1 | – |
Figure 4Distribution of minor allele frequencies (MAF) for RAD loci detected in silico with a minimum depth of coverage of 4× (A) and 18× (B), as well as for the 330 SNP loci validated by individual genotyping (C). For (A) and (B), the results obtained using all RAD loci are shown in black and those obtained after the removal of paralogous RAD loci are shown in gray
Figure 5Distribution of inbreeding coefficients () for RAD loci detected in silico at 4× (A) and 18× coverage (B), as well as for the 330 SNP loci validated by individual genotyping (C). For (A) and (B), the results obtained using all RAD loci are shown in black and those obtained after the removal of paralogous RAD loci are shown in gray
Distribution of single nucleotide polymorphisms genotyped on the nine sequenced samples across the eight classes defined according to the number of clusters identified. See Materials and Methods for definitions of classes
| 4× (%) | 4× and 18× (%) | 18× (%) | |
|---|---|---|---|
| Two or three clusters (A, B, and C) | 91.2 | 87.8 | 60.0 |
| Monomorphic (D) | 7.6 | 3.7 | 6.7 |
| One heterozygote cluster (E) | 0.6 | 0.5 | 0.0 |
| Unreadable (F) or nonamplified (G) | 0.6 | 8.0 | 33.3 |
| Total | 171 | 188 | 15 |