| Literature DB >> 28235826 |
Zhiqiang Ye1, Sen Xu2,3, Ken Spitze2, Jana Asselman4, Xiaoqian Jiang2, Matthew S Ackerman2, Jacqueline Lopez5, Brent Harker5, R Taylor Raborn2,6, W Kelley Thomas7, Jordan Ramsdell7, Michael E Pfrender8,9, Michael Lynch2.
Abstract
Comparing genomes of closely related genotypes from populations with distinct demographic histories can help reveal the impact of effective population size on genome evolution. For this purpose, we present a high quality genome assembly of Daphnia pulex (PA42), and compare this with the first sequenced genome of this species (TCO), which was derived from an isolate from a population with >90% reduction in nucleotide diversity. PA42 has numerous similarities to TCO at the gene level, with an average amino acid sequence identity of 98.8 and >60% of orthologous proteins identical. Nonetheless, there is a highly elevated number of genes in the TCO genome annotation, with ∼7000 excess genes appearing to be false positives. This view is supported by the high GC content, lack of introns, and short length of these suspicious gene annotations. Consistent with the view that reduced effective population size can facilitate the accumulation of slightly deleterious genomic features, we observe more proliferation of transposable elements (TEs) and a higher frequency of gained introns in the TCO genome.Entities:
Keywords: effective population size; gene number; genome annotation; intron; mobile elements
Mesh:
Substances:
Year: 2017 PMID: 28235826 PMCID: PMC5427498 DOI: 10.1534/g3.116.038638
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Summary of the genomic metrics for the PA42 and TCO genomes
| Genome size, including gaps (bp) | 197,261,574 | 156,418,198 |
| Number of scaffolds | 5191 | 1822 |
| Length of largest scaffold (bp) | 4,163,030 | 1,661,524 |
| Mean scaffold length (bp) | 38,001 | 85,849 |
| Number of N50 scaffold | 75 | 96 |
| Total length of gaps (bp) | 38,612,943 | 13,454,790 |
| Number of annotated genes | 30,097 | 18,440 |
| Mean length of a coding gene (including introns) (bp) | 2289 | 2998 |
| Mean number of exons/gene | 6.6 | 6.9 |
| Mean exon size (bp) | 212 | 237 |
| Mean intron size (bp) | 169 | 223 |
| Mean UTR size (bp) | 371 | 214 |
| Fraction of long introns | 10% | 14% |
TCO data are compiled from Colbourne . Long introns are defined as having a length exceeding the average exon size. UTR, untranslated region.
Comparison of TCO and PA42 gene annotations
| Numbers of genes | 30,097 | 18,440 |
| With homologs in Reference Species | 17,062 | 13,260 |
| 5715 | 3818 | |
| Lineage specific genes | 7320 | 1362 |
| Genes without canonical start codon | 2278 | 798 |
| Genes without stop codon | 2219 | 977 |
| Genes without start and stop codons | 1102 | 244 |
| Potential excess genes from gene split events | 190 | 62 |
Reference Species: C. elegans, S. maritime, A. gambiae, D. melanogaster, and H. sapiens.
Genes in PA42 or TCO with no identified Reference homologs, but that share homology with each other as identified by reciprocal blasts.
Genes without identified homologs in either the Reference Species or the alternative D. pulex genome.
Splitting of a single gene in one genome assembly into multiple pieces in the other assembly, with each piece annotated as a separate gene in the second assembly.
Figure 1Distributions of the features of protein-coding sequences for the 11,694 one-to-one orthologs in TCO and PA42 (>300 bp alignment length). (A) Amino acid sequence identity. (B) Pairwise divergence at silent sites, Ks. (C) Pairwise divergence at replacement sites, Ka. (D) The ratio Ka/Ks.
Figure 2Comparison of the features of 1:1 TCO–PA42 orthologs and TCO-specific genes. TCO-specific genes have no obvious orthologs in PA42 or the Reference Species (other metazoans). (A) Distributions of intron numbers. (B) Length distributions for all coding sequences (excluding introns). (C) Comparisons of average GC contents. (D) Comparisons of average sequence coverages. Error bars indicate SEs. Asterisk denotes significance at the P < 0.05 level.
Figure 3The frequency distribution of KS for paralogs within the PA42 genome vs. those within TCO. Only genes in orthologous clusters containing both TCO and PA42 genes were used. Sliding-window analyses were used to remove the low quality regions in the alignments, with a cutoff of 0.4 identity for each 15-bp window. The KS value for each paralog is the average value when comparing the paralog with others in an orthologous cluster of the genome. The vertical dashed line at 0.057 denotes the average silent-site divergence for pairs of TCO-PA42 orthologs, so that paralogous pairs to the left of this benchmark are younger than the average ortholog divergence between these two clones. KS, pairwise divergence at silent sites.
Figure 4Distributions of the synonymous (Ks) and nonsynonymous (Ka) differences per site per gene in TCO and PA42, using D. obtusa as outgroup. Ks and Ka were first calculated in TCO vs. D. obtusa and PA42 vs. D. obtusa, with the plotted difference providing a measure of the increase in the subtending branch length for TCO above that for the PA42 lineage.
Summary of transposable elements in PA42 3.0
| Class | Subclass | Clade | No. of Full-Length (Fragmented) Elements | Fraction of Genome (%) |
|---|---|---|---|---|
| DNA transposons | TIR | Harbinger | 13 (518) | 0.23 |
| Hat | 1 (3) | 0.01 | ||
| p element | 3 (10) | 0.02 | ||
| Mariner | 19 (168) | 0.15 | ||
| isl2eu | 50 (172) | 0.22 | ||
| Helitron | Helitron | 26 (215) | 0.21 | |
| Subtotal | 112 (1086) | 0.83 | ||
| Retrotransposons | LTR retrotransposons | Bel | 61 (1134) | 1.06 |
| Copia | 133 (2408) | 2.12 | ||
| Gypsy | 141 (2059) | 2.16 | ||
| Dirs | 20 (522) | 0.47 | ||
| Non-LTR retrotransposons | Jockey | 18 (246) | 0.21 | |
| L1 | 3 (46) | 0.04 | ||
| L2 | 27 (323) | 0.31 | ||
| Loa | 15 (137) | 0.14 | ||
| Subtotal | 418 (6875) | 6.53 | ||
| Total | 530 (7961) | 7.36 |
TIR, terminal inverted repeat; LTR, long terminal repeat.
Summary of intron gain and loss events in TCO and PA42
| Number of intron gains | 57 | 32 |
| Number of intron losses | 10 | 0 |
| Median length of gained intron (bp) | 72 | 76 |
| Median length of lost intron (bp) | 65 | N/A |
| GC composition of gained intron (%) | 20 | 25 |
| GC composition of lost intron (%) | 26 | N/A |
N/A, not applicable.
Figure 5Dynamic evolution of gene families. The gene family expansions and contractions were predicted by CAFÉ 3.0. The species tree required by CAFÉ 3.0 was constructed by 1:1:1 single-copy gene families using the Maximum Likelihood method in MEGA6 (Tamura ). The RelTime-ML program implemented in the MEGA6 package was used to estimate divergence time among species; calibration time was obtained from the TimeTree database.