| Literature DB >> 22408741 |
Julie Jeukens1, Louis Bernatchez.
Abstract
While gene expression divergence is known to be involved in adaptive phenotypic divergence and speciation, the relative importance of regulatory and structural evolution of genes is poorly understood. A recent next-generation sequencing experiment allowed identifying candidate genes potentially involved in the ongoing speciation of sympatric dwarf and normal lake whitefish (Coregonus clupeaformis), such as cytosolic malate dehydrogenase (MDH1), which showed both significant expression and sequence divergence. The main goal of this study was to investigate into more details the signatures of natural selection in the regulatory and coding sequences of MDH1 in lake whitefish and test for parallelism of these signatures with other coregonine species. Sequencing of the two regions in 118 fish from four sympatric pairs of whitefish and two cisco species revealed a total of 35 single nucleotide polymorphisms (SNPs), with more genetic diversity in European compared to North American coregonine species. While the coding region was found to be under purifying selection, an SNP in the proximal promoter exhibited significant allele frequency divergence in a parallel manner among independent sympatric pairs of North American lake whitefish and European whitefish (C. lavaretus). According to transcription factor binding simulation for 22 regulatory haplotypes of MDH1, putative binding profiles were fairly conserved among species, except for the region around this SNP. Moreover, we found evidence for the role of this SNP in the regulation of MDH1 expression level. Overall, these results provide further evidence for the role of natural selection in gene regulation evolution among whitefish species pairs and suggest its possible link with patterns of phenotypic diversity observed in coregonine species.Entities:
Keywords: Adaptive divergence; Coregonus; gene expression; natural selection; regulatory evolution; speciation
Year: 2012 PMID: 22408741 PMCID: PMC3297193 DOI: 10.1002/ece3.52
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Figure 1Normal and dwarf lake whitefish (Coregonus clupeaformis). Normal whitefish (top) commonly exceeds 40 cm in length and 1000 g in weight while dwarf whitefish (bottom) rarely exceeds 20 cm and 100 g.
MDH1 regulatory and coding polymorphism within coregonine populations
| Continent | Coregonine population | Regulatory SNP positions | Coding SNP positions | Paralogous SNP positions | Nonsynonymous substitutions |
|---|---|---|---|---|---|
| North America | Cliff Lake, Normal | 373 | 61, 130, 471, 570, 609, 696 | 130, 471, 570, 609 | 130 |
| Cliff Lake, Dwarf | 373 | 61, 130, 471, 570, 609, 696 | 130, 471, 570, 609 | 130 | |
| Indian Pond, Normal | 373 | 61, 130, 471, 570, 609, 696 | 130, 471, 570, 609 | 130 | |
| Indian Pond, Dwarf | 373 | 61, 130, 471, 570, 609, 696 | 130, 471, 570, 609 | 130 | |
| Cisco | 373, 478, 520 | 130, 471, 489 | 130, 471, 489 | 130 | |
| Europe | Pasvik River, Benthic | 188, 236, 373, 374, 408, 478, 572, 573, 702 | 61, 130, 471, 570, 609, 634 | 130, 471, 570, 609 | 130, 634 |
| Pasvik River, Limnetic | 188, 236, 373, 374, 408, 478, 572, 573, 702 | 61, 130, 471, 570, 609, 634 | 130, 471, 570, 609 | 130, 634 | |
| Lake Zurich, Benthic | 236, 373, 374, 408, 491, 572, 573, 702 | 61, 130, 213, 471, 525, 570, 609 | 130, 471, 570, 609 | 130 | |
| Lake Zurich, Limnetic | 163, 236, 373, 374, 408, 491, 572, 573, 702 | 61, 130, 213, 471, 525, 570, 609 | 130, 471, 570, 609 | 130 | |
| Vendace | 149, 236, 244, 373, 374, 408, 458, 580, 702 | 39, 103, 108, 112, 130, 150, 213, 378, 429, 471, 570, 582, 602, 609 | 112, 130, 471, 570, 609 | 108, 130, 602 |
Cliff Lake and Indian Pond: lake whitefish, Pasvik River catchment and Lake Zurich: European whitefish.
Position in the 781-bp regulatory sequence, which corresponds to positions 17,590–18,370 in Genbank accession HQ287747.
Position in the 807-bp coding sequence, which corresponds to positions 18,317–24,113 without introns in Genbank accession HQ287747, begins at start codon.
Position in the coding sequence of SNPs for which essentially all fish were heterozygous. These SNPs are likely to be sequence differences between paralogous sequence variants.
Summary of whitefish MDH1 regulatory sequence annotation
| Region | Position | Annotation |
|---|---|---|
| Untranscribed | 1–658 | SNP A/T (373) |
| Recombination breakpoint (458) | ||
| CpG island (429–658) | ||
| 11 putative TFBSs overlapping A allele (373), see | ||
| Five putative TFBSs overlapping T allele (373), see | ||
| Majority of conserved regions among species 450–658 | ||
| Untranslated | 659–727 | Terminal oligopyrimidine tract (TOP) (659–666) |
| Musashi binding element (MBE) (666–672) |
The 5′ limit of the untranslated region was determined according to salmon MDH1 complete coding sequence (Genbank accession BT060423) and the 5′ extremity of contig 1009 from RNA sequencing (Jeukens et al. 2010).
Position in the 781-bp regulatory sequence, which corresponds to positions 17,590–18,370 in Genbank accession HQ287747.
Recombination breakpoint: GARD, Datamonkey (Kosakovsky Pond and Frost 2005a), CpG island: The Sequence Manipulation Suite (Stothard 2000), Putative transcription factor binding sites (TFBSs): JASPAR CORE Vertebrata (Bryne et al. 2008), Posterior probability in binding simulation (P): Sunflower (Hoffman and Birney 2010), Conserved regions: ConSite (Sandelin et al. 2004) and the MEME suite (Bailey et al. 2009), untranslated region: UTRsite (Mignone et al. 2005). Positions in the 781-bp regulatory sequence are indicated for each element.
Polymorphic positions of 22 unique MDH1 regulatory haplotypes identified among coregonine species
| Position | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Continent | Haplotype | 149 | 152 | 163 | 188 | 197 | 236 | 373 | 374 | 381 | 408 | 458 | 460 | 478 | 491 | 520 | 572 | 573 | 580 | 643 |
| North America | Cocl 1 | T | A | G | G | C | A | A | A | G | G | A | A | C | C | C | C | C | A | G |
| Cocl 2 | . | . | . | . | . | . | T | . | . | . | . | . | . | . | . | . | . | . | . | |
| Coar 1 | . | C | . | . | A | . | T | . | T | . | . | C | . | . | T | . | . | . | A | |
| Coar 2 | . | C | . | . | A | . | T | . | T | . | . | C | . | . | . | . | . | . | A | |
| Coar 3 | . | C | . | . | A | . | . | . | T | . | . | C | T | . | . | . | . | . | A | |
| Europe | Cola 1 | . | C | . | . | A | C | . | . | . | T | . | C | . | . | . | . | . | . | A |
| Cola 2 | . | C | . | . | A | C | T | . | . | T | . | C | . | . | . | . | . | . | A | |
| Cola 3 | . | C | . | . | A | C | T | . | . | T | . | C | T | . | . | . | . | . | A | |
| Cola 4 | . | C | . | . | A | . | T | T | . | T | . | C | . | . | . | . | . | . | A | |
| Cola 5 | . | C | . | . | A | . | T | T | . | . | . | C | . | . | . | T | G | . | A | |
| Cola 6 | . | C | . | A | A | . | T | T | . | . | . | C | . | . | . | . | . | . | A | |
| Cola 7 | . | C | . | . | A | C | . | . | . | T | . | C | . | . | . | T | G | . | A | |
| Cola 8 | . | C | . | . | A | C | . | . | . | T | . | C | . | . | . | . | . | . | A | |
| Cola 9 | . | C | . | . | A | . | . | . | . | . | . | C | . | T | . | T | G | . | A | |
| Cola 10 | . | C | . | . | A | . | . | . | . | . | . | C | . | . | . | T | G | . | A | |
| Cola 11 | . | C | . | . | A | . | T | T | . | . | . | C | . | . | . | . | . | . | A | |
| Cola 12 | . | C | T | . | A | . | T | T | . | . | . | C | . | . | . | . | . | . | A | |
| Cola 13 | . | C | . | . | A | . | T | . | . | . | . | C | . | . | . | T | G | . | A | |
| Coal 1 | . | C | . | . | A | C | T | . | . | T | . | C | . | . | . | . | . | G | A | |
| Coal 2 | . | C | . | . | A | C | T | . | . | T | . | C | . | . | . | . | . | . | A | |
| Coal 3 | . | C | . | . | A | . | T | . | . | . | . | C | . | . | . | . | . | G | A | |
| Coal 4 | C | C | . | . | A | . | . | . | . | . | C | C | . | . | . | . | . | G | A | |
Cocl = lake whitefish (C. clupeaformis), Coar = lake cisco (C. artedi), Cola = European whitefish (C. lavaretus), Coal = vendace (C. albula).
Position in the 658-bp upstream of the transcription start site (positions 17,590–18,247 in Genbank accession HQ287747).
Figure 2Putative binding profile of MDH1 regulatory region among coregonine species. Binding probability: highest posterior probability of being bound to a given transcription factor for each of 658-bp upstream of the transcription start site according to binding simulation using Sunflower (Hoffman and Birney 2010), the value of zero was attributed to a given position when the posterior probability of the unbound state was highest. (A) Lake whitefish (C. clupeaformis), two haplotypes, (B) lake cisco (C. artedi), three haplotypes, (C) European whitefish (C. lavaretus), 13 haplotypes, (D) vendace (C. albula), four haplotypes. *Putative binding sites that are unbound (probability = 0) for one or more haplotypes depending on alleles at positions 373 and 374 (Table 3), labeled with putative transcription factor name.
Figure 3Genetic differentiation of MDH1 5′ regulatory and coding regions between North American and European whitefish species pairs. Based on 781 bp of regulatory sequence (positions 17,590–18,370 in Genbank accession HQ287747) and 807 bp of coding sequence (positions 18,317–24,113 without introns in Genbank accession HQ287747). FST values based on overall mean pairwise genetic p-distances computed with HYPHY (Kosakovsky Pond et al. 2005). Negative FST estimates were forced to zero. For the 5′ regulatory region, both alleles for each fish were used. For the coding region, mean FST for true single nucleotide polymorphisms (SNPs) (excluding paralogous SNPs, see Table 1) was computed by using one copy of each observed haplotype per fish. ‡ Bootstrapped estimator significantly different from zero (P < 0.05, 500 replicates) and probability of a random FST greater than the observed value <0.05 (500 permutations). *Probability of a random FST greater than the observed value <0.05 (500 permutations).
Figure 4Predicted ternary structure of whitefish MDH1. Based on whitefish MDH1 protein sequence (Genbank accession ADV02378) of 78% homology with porcine cytoplasmic malate dehydrogenase in the Protein Data Bank (ID = 5MDH) using the 3D-JIGSAW server (v.2.0, http://bmm.cancerresearchuk.org/~3djigsaw/). The graphical representation was created with Pymol v.1.3. (A) Green = amino acid residues of the malate binding domain, (B) orange = amino acid residues of the NAD binding domain, (C) blue = amino acid residues of the dimer interface, (D) amino acid changes due to nonsynonymous substitutions among coregonine populations (see Table 1).
Figure 5MDH1 expression level as a function of the genotype at SNP –286 in dwarf and normal lake whitefish. Expression level: normalized R/Lowess signal intensity in log2 from a previous microarray experiment (St-Cyr et al. 2008). Data for 16 fish from Cliff Lake and 15 from Indian Pond, half normals (N), half dwarves (D). Frequency of the T allele: Cliff N = 0.2, D = 0.67 and Indian N = 0.1, D = 0.3. One-way ANOVA between the five groups: P-value = 0.007. *Tukey multiple comparisons of means: Dwarf AT/Dwarf AA P-value = 0.05, Dwarf AT/Normal AA: P-value = 0.007.