| Literature DB >> 24803571 |
Brendan Epstein1, Michael J Sadowsky2, Peter Tiffin3.
Abstract
Structural variation, including variation in gene copy number and presence or absence of genes, is a widespread and important source of genomic variation. We used whole-genome DNA sequences from 48 strains of Sinorhizobium (recently renamed Ensifer), including 20 strains of Sinorhizobium meliloti and 12 strains of S. medicae that were the focus of the analyses, to study the fitness effects of new structural variants created by duplication and horizontal gene transfer. We find that derived duplicated and horizontally transferred (HT) genes segregate at lower frequency than synonymous and nonsynonymous nucleotide variants in S. meliloti and S. medicae. Furthermore, the relative frequencies of different types of variants are more similar in S. medicae than in S. meliloti, the species with the larger effective population size. These results are consistent with the hypothesis that most duplications and HT genes have deleterious effects. Diversity of duplications, as measured by segregating duplicated genes per gene, is greater than nucleotide diversity, consistent with a high rate of duplication. Our results suggest that the vast majority of structural variants found among closely related bacterial strains are short-lived and unlikely to be involved in species-wide adaptation.Entities:
Keywords: fitness effects; mutation; pangenome; purifying selection; structural variation
Mesh:
Year: 2014 PMID: 24803571 PMCID: PMC4040998 DOI: 10.1093/gbe/evu090
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Mean Percent of Genes Duplicated (among Strain Range Shown in Parentheses), Number of HT Genes, and Pairwise Diversity of Segregating Duplicates and Nucleotide Variants
| % Duplicates | HT Genes | |||||||
|---|---|---|---|---|---|---|---|---|
| 0.6 (0.4–0.9) | 10,247 | 0.010 | 0.0011 | 0.0079 | −1.14 | −0.97 | −0.78 | |
| chr.-full | 0.1 (0–0.5) | 2,318 | 0.003 | 0.0006 | 0.0044 | −1.80 | −0.99 | −0.87 |
| chr.-1 | 0.1 (0–0.6) | 804 | 0.002 | 0.0009 | 0.0073 | –1.88 | –0.79 | –0.59 |
| chr.-2 | 0.2 (0–0.7) | 1,014 | 0.003 | 0.0003 | 0.0020 | −1.64 | −1.18 | −1.23 |
| pSymA | 1.3 (0.6–2.6) | 1,985 | 0.023 | 0.0017 | 0.0148 | −1.30 | −1.00 | −0.46 |
| pSymB | 0.8 (0–1.4) | 1,271 | 0.011 | 0.0022 | 0.0169 | −0.12 | −0.66 | −0.33 |
| 0.8 (0.2–1.4) | 4,521 | 0.012 | 0.0008 | 0.0038 | −0.13 | −0.28 | −0.12 | |
| chr. | 0.7 (0.1–1.6) | 1,399 | 0.009 | 0.0007 | 0.0025 | 0.66 | −0.06 | 0.15 |
| pSymA | 1.9 (0.4–3.8) | 823 | 0.030 | 0.0019 | 0.0082 | −0.46 | −0.09 | 0.15 |
| pSymB | 0.2 (0.1–0.4) | 515 | 0.003 | 0.0012 | 0.0070 | −1.38 | −0.71 | −0.82 |
aBefore position 1735000.
bDoes not sum to 2,318 because some HT genes assigned to the chromosome had an ambiguous location.
cAfter position 1735000.
Mean Ka, Ks, and Ka/Ks between the Sinorhizobium meliloti and S. medicae Reference Genomes for Duplicated and Unduplicated Genes, Including Only Genes in Both S. meliloti and S. medicae
| Unduplicated | Duplicated | ||||||
|---|---|---|---|---|---|---|---|
| Count | Count | ||||||
| Chr. | 1,963 | 0.027 | 0.40 | 0.068 | 17 | 0.020 | 0.35 |
| pSymA | 233 | 0.026 | 0.29 | 0.15 | 32 | 0.023 | 0.24 |
| pSymB | 666 | 0.032 | 0.45 | 0.073 | 7 | 0.024 | 0.40 |
| Chr. | 1,961 | 0.027 | 0.40 | 0.068 | 9 | 0.019 | 0.24 |
| pSymA | 22 | 0.027 | 0.32 | 0.13 | 34 | 0.018 | 0.17 |
| pSymB | 690 | 0.031 | 0.44 | 0.075 | 2 | 0.018 | 0.26 |
*P < 0.05 (two-sided t-test for difference between duplicated and unduplicated genes).
FDistribution of GC content for HT genes and core genes. The distribution is shown for Sinorhizobium meliloti genes. Sinorhizobium medicae distributions are nearly identical.
FDerived allele frequency spectrum for four classes of mutations. (A) Sinorhizobium medicae and (B) S. meliloti. The y axis is the proportion of sites (for synonymous and nonsynonymous SNPs) or genes (for duplications and HGTs) within a class of mutations. Duplication and HT gene bars marked with an asterisk (*) are significantly different from synonymous sites. The values in the legends are the number of segregating sites for nucleotide variants or segregating genes for duplications and HT genes used to construct the spectra. Only derived duplications and nucleotide sites for which the ancestral state could be confidently inferred were used to construct the spectra.
FProportion of HT genes in each COG category that are fixed. (A) Sinorhizobium meliloti and (B) S. medicae. Only categories with at least 50 HT genes were included.
DFE of Nonsynonymous Mutations: Percent of Sites in Each Selection Bin (Standard Error)
| Species | 0–1 | 1–10 | 10–100 | >100 |
|---|---|---|---|---|
| 10 (0.4) | 8.3 (0.4) | 15 (1.2) | 66 (1.4) | |
| 19 (0.7) | 6.7 (0.8) | 9 (1.5) | 65 (2) | |
Frequencies of Segregating Synonymous (S) and Nonsynonymous (NS) SNPs, HT Genes (HT), and Duplicated Genes (Dup) Segregating in Sinorhizobium meliloti and S. medicae and Results of Randomization Tests to Determine Whether Duplications and HT Genes Are Segregating at Lower Mean Frequency than S and NS SNPs (Significant Values Are in Bold)
| S | NS | Dup | HT | NS versus Dup | NS versus HT | S versus Dup | S versus HT | |
|---|---|---|---|---|---|---|---|---|
| 0.34 | 0.26 | 0.13 | 0.14 | |||||
| Chr. | 0.37 | 0.28 | 0.06 | 0.18 | ||||
| pSymA | 0.25 | 0.22 | 0.13 | 0.20 | 0.09 | |||
| pSymB | 0.32 | 0.25 | 0.27 | 0.19 | 0.63 | 0.05 | ||
| 0.36 | 0.30 | 0.22 | 0.24 | 0.15 | ||||
| Chr. | 0.30 | 0.24 | 0.29 | 0.29 | 0.85 | 1.0 | 0.76 | 0.21 |
| pSymA | 0.36 | 0.32 | 0.21 | 0.32 | 0.77 | 0.75 | 0.05 | |
| pSymB | 0.42 | 0.38 | — | 0.30 | — | — |
Note.—Duplicated genes were included only if they were present as a single copy in one species and duplicated in the other.
aRandomization tests were performed by randomly assigning the total number of variants to a specific variant class (e.g., NS or Dup) and then comparing the difference in the number of counts in each class to the actual difference; values shown are the proportion of 1,000 randomizations in which the difference in frequency of NS (or S) SNPs compared with duplications (or HT) genes was greater than the true difference.
bIndicates that the relative difference between the mean frequency of HT genes or duplications and nucleotide variants was significantly (P < 0.05) greater in S. meliloti than in S. medicae. The point estimates of the relative differences were greater in S. meliloti for all comparisons except duplications on pSymA.
cNo derived duplications on pSymB.