| Literature DB >> 23593039 |
Philippe Gayral1, José Melo-Ferreira, Sylvain Glémin, Nicolas Bierne, Miguel Carneiro, Benoit Nabholz, Joao M Lourenco, Paulo C Alves, Marion Ballenghien, Nicolas Faivre, Khalid Belkhir, Vincent Cahais, Etienne Loire, Aurélien Bernard, Nicolas Galtier.
Abstract
In animals, the population genomic literature is dominated by two taxa, namely mammals and drosophilids, in which fully sequenced, well-annotated genomes have been available for years. Data from other metazoan phyla are scarce, probably because the vast majority of living species still lack a closely related reference genome. Here we achieve de novo, reference-free population genomic analysis from wild samples in five non-model animal species, based on next-generation sequencing transcriptome data. We introduce a pipe-line for cDNA assembly, read mapping, SNP/genotype calling, and data cleaning, with specific focus on the issue of hidden paralogy detection. In two species for which a reference genome is available, similar results were obtained whether the reference was used or not, demonstrating the robustness of our de novo inferences. The population genomic profile of a hare, a turtle, an oyster, a tunicate, and a termite were found to be intermediate between those of human and Drosophila, indicating that the discordant genomic diversity patterns that have been reported between these two species do not reflect a generalized vertebrate versus invertebrate gap. The genomic average diversity was generally higher in invertebrates than in vertebrates (with the notable exception of termite), in agreement with the notion that population size tends to be larger in the former than in the latter. The non-synonymous to synonymous ratio, however, did not differ significantly between vertebrates and invertebrates, even though it was negatively correlated with genetic diversity within each of the two groups. This study opens promising perspective regarding genome-wide population analyses of non-model organisms and the influence of population size on non-synonymous versus synonymous diversity.Entities:
Mesh:
Year: 2013 PMID: 23593039 PMCID: PMC3623758 DOI: 10.1371/journal.pgen.1003457
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Illumina data sets used in this study.
| Focal species | Outgroup | #Individuals (focal+outgroup) | Megareads (all individuals) | Megabases (per individual) |
|
|
| 10+10 | 139 | 677 |
|
|
| 10+2 | 63 | 471 |
|
|
| 10+1 | 66 | 544 |
|
|
| 10+2 | 94 | 710 |
|
|
| 9+2 | 250 | 1069 |
Figure 1Main data analysis pipeline used in this study.
Robustness of population genomic statistics to SNP calling options.
| #contigs | av. lg | #SNPs | πS (%) | πN (%) | πN/πS | FIS | |
|
| |||||||
| A. Main | 3081 | 225 | 15 826 | 1.54±0.04 | 0.17±0.01 | 0.11±0.01 | −0.04 |
| B. High coverage | 902 | 219 | 3 578 | 1.60 | 0.12 | 0.07 | −0.02 |
| C. Reference | 2030 | 237 | 10 314 | 1.47 | 0.14 | 0.10 | −0.03 |
| D. No paralog filter | 3 056 | 225 | 16 989 | 1.58 | 0.18 | 0.11 | −0.06 |
| E. Samtools | 2 030 | 348 | 14 515 | 1.17 | 0.14 | 0.12 | −0.02 |
|
| |||||||
| A. Main | 2 624 | 276 | 7 261 | 0.41±0.03 | 0.06±0.01 | 0.15±0.02 | −0.04 |
| B. High coverage | 790 | 264 | 1 611 | 0.43 | 0.05 | 0.12 | −0.05 |
| C. Reference | 1 266 | 282 | 3 063 | 0.39 | 0.04 | 0.10 | −0.04 |
| D. No paralog filter | 2 980 | 273 | 11 591 | 0.48 | 0.10 | 0.20 | −0.14 |
| E. Samtools | 1 260 | 513 | 7 297 | 0.37 | 0.10 | 0.27 | −0.03 |
|
| |||||||
| A. Main | 2 538 | 219 | 6 835 | 0.57 | 0.10 | 0.18 | −0.05 |
| E. Samtools | 2 752 | 207 | 6 147 | 0.38 | 0.09 | 0.24 | −0.04 |
|
| |||||||
| A. Main | 8 086 | 366 | 8 697 | 0.12 | 0.02 | 0.19 | 0.12 |
| E. Samtools | 6 432 | 437 | 5 524 | 0.08 | 0.02 | 0.20 | 0.13 |
|
| |||||||
| A. Main | 2 013 | 243 | 4 634 | 0.45 | 0.07 | 0.16 | 0.17 |
| E. Samtools | 2 147 | 225 | 4 365 | 0.37 | 0.13 | 0.34 | 0.15 |
Figure 2Synonymous and non-synonymous site-frequency spectra in the hare Lepus granatensis.
Each histogram displays the distribution of minor allele frequency across SNPs (folded site-frequency spectrum) for a sampling size of 12 chromosomes. The left-most histogram is the expected spectrum for neutral sites in a Wright-Fisher population. The other four histograms were drawn from the data, calling SNPs with either Samtools or reads2snps, and separating non-synonymous (NS) from synonymous (S) positions. The number above each histogram is Tajima's D. This index is equal to zero in the Wright-Fisher case.
Figure 3Sampling variance of πN and πS in the turtle Emys orbicularis and the tunicate Ciona intestinalis A.
X-axis: size of individual sub-samples; Y-axis: box-plot of estimated synonymous (top) and non-synonymous (bottom) diversity in turtle (green) and ciona (blue).
Coding sequence polymorphism and divergence patterns in five non-model animals.
| species | #contigs | #SNPs | πS (%) | πN (%) | πN/πS | dN/dS | α | α0.2 | αEWK | ωA |
| turtle | 1 041 | 2 532 | 0.43±0.03 | 0.05±0.007 | 0.12±0.02 | 0.17±0.03 | 0.01±0.18 | 0.43±0.15 | 0.92 | 0.17 |
| hare | 524 | 2 054 | 0.38±0.04 | 0.05±0.008 | 0.12±0.02 | 0.15±0.03 | −0.11±0.22 | 0.30±0.23 | <0 | <0 |
| ciona | 2 004 | 11 727 | 1.58±0.06 | 0.15±0.011 | 0.10±0.01 | 0.10±0.01 | −0.28±0.10 | 0.10±0.11 | 0.34 | 0.04 |
| termite | 4 761 | 5 478 | 0.12±0.01 | 0.02±0.002 | 0.18±0.02 | 0.26±0.02 | 0.08±0.10 | 0.28±0.11 | 0.74 | 0.20 |
| oyster | 994 | 3 015 | 0.59±0.05 | 0.09±0.011 | 0.15±0.02 | 0.21±0.02 | 0.13±0.12 | 0.22±0.13 | 0.79 | 0.21 |
Figure 4Published estimates of genome-wide πS, πN and πN/πS in animals.
a. πN as function of πS; b. πN/πS as function of πS; Blue: vertebrates; Red: invertebrates; Full circles: species analysed in this study, designated by their upper-case initial (H: hare; Tu: turtle; O: oyster; Te: termite; C: ciona); Dashed blue circles: non-primate mammals (from left to right: mouse, tupaia, rabbit). Estimates were taken from Bustamante et al. 2005 (human), Hvilsom et al 2012 (chimpanzee), Carneiro et al 2012 (rabbit), Perry et al 2012 (other mammals), Begun et al 2007 (D. simulans) and Tsagkogeorga et al 2012 (C. intestinalis B = right-most circle).