| Literature DB >> 29491951 |
Abstract
A growing number of genes responsible for reproductive incompatibilities between species (barrier loci) exhibit the signals of positive selection. However, the possibility that genes experiencing positive selection diverge early in speciation and commonly cause reproductive incompatibilities has not been systematically investigated on a genome-wide scale. Here, I outline a research program for studying the genetic basis of speciation in broadcast spawning marine invertebrates that uses a priori genome-wide information on a large, unbiased sample of genes tested for positive selection. A targeted sequence capture approach is proposed that scores single-nucleotide polymorphisms (SNPs) in widely separated species populations at an early stage of allopatric divergence. The targeted capture of both coding and non-coding sequences enables SNPs to be characterized at known locations across the genome and at genes with known selective or neutral histories. The neutral coding and non-coding SNPs provide robust background distributions for identifying FST-outliers within genes that can, in principle, identify specific mutations experiencing diversifying selection. If natural hybridization occurs between species, the neutral coding and non-coding SNPs can provide a neutral admixture model for genomic clines analyses aimed at finding genes exhibiting strong blocks to introgression. Strongylocentrotid sea urchins are used as a model system to outline the approach but it can be used for any group that has a complete reference genome available.Entities:
Keywords: Dobzhansky–Muller incompatibilities; FST-outliers; barrier loci; genome scan; introgression; positive selection; sequence capture; speciation.
Year: 2016 PMID: 29491951 PMCID: PMC5804258 DOI: 10.1093/cz/zow093
Source DB: PubMed Journal: Curr Zool ISSN: 1674-5507 Impact factor: 2.624
Complete genomes of marine invertebrate species with available gene models
| Phylum | Species name | Size (Mb) | GC (%) | Number of genes | Reference(s) |
|---|---|---|---|---|---|
| Placozoa | 105.6 | 32.7 | 11,518 | ||
| Porifera | 166.7 | 37.5 | 13,998 | ||
| Mollusca | 557.7 | 35.3 | 32,261 | ||
| Echinodermata | 814.0 | 36.9 | 21,092 | ||
| Cephalochordata | 521.9 | 41.2 | 28,627 | ||
| Tunicata | 70.5 | 39.8 | 17,212 | ||
| Tunicata | 115.2 | 33.1 | 15,254 | ||
| Brachiopoda | 425.5 | 36.9 | 34,000 | ||
| Anthozoa | 356.6 | 40.6 | 27,173 | ||
| Cephalopoda | 2,282.8 | 36.1 | 32,819 | ||
| Ctenophora | 155.9 | 38.9 | 16,548 | ||
| Annelida | 333.3 | 40.4 | 31,997 | ||
| Mollusca | 359.5 | 33.3 | 23,827 | ||
| Mollusca | 927.3 | 42.0 | 21,312 | ||
| Priapulida | 511.7 | 45.7 | 17,096 | ||
| Hemichordata | 775.8 | 38.1 | 22,073 | ||
| Arthropoda | 1,828.3 | 34.5 | 22,031 |
aCompiled from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov) in December 2015.
Figure 1.Species tree of the family Strongylocentrotidae constructed from 4-fold degenerate sites from 2,815 genes not experiencing positive selection (adapted from Kober and Bernardi 2013a). Trees reconstructed using maximum parsimony (MP), maximum likelihood (ML), and Bayesian approaches produced identical topologies. All nodes had MP bootstrap values of 100 and Bayesian poster probabilities of 1. Abbreviations denote geographic locations of species distributions (NEP = Northeastern Pacific, NWP = Northwestern Pacific; CIR = circumpolar).
Figure 2.The distinction between historical and recent positive selection. (A) Alignment of portion of a hypothetical protein-coding sequence with one codon (red) showing historical positive selection and another (green) showing recent positive selection based on PAML (Yang 2007) sites tests and branch-sites tests, respectively. Historical and recent selections are not mutually exclusive. However, when both operate it is difficult to produce a significant branch-site test because the external branch needs to have a significantly elevated dN/dS ratio compared with the remainder of the tree. (B) Gene tree with locations of amino acid changes marked. Historical selection at the red codon cannot generate incompatibilities between S. droebachiensis and S. pallidus but recent selection at the green codon can (designated by the “X”).
Figure 3.Identification of FST-outliers by the targeted sequence capture of non-coding and coding SNPs. (A) Two gene models are shown on a scaffold that are targeted for solution-based exon capture. Two non-coding regions >10 kb from either gene are also captured (but could be much further away). (B) Hypothetical distributions of pairwise FST-values for genome-wide coding and non-coding SNPs are presented. The arrow designates the 5% false discovery rate cut-off estimated directly from the non-coding SNPs. Any genes or gene regions with FST-values beyond this threshold are candidates for recent diversifying selection. Null distributions for FST-outlier tests could also be generated from the subset of protein-coding genes that do not exhibit histories of positive selection.