| Literature DB >> 30373829 |
Yao-Cheng Lin1,2,3, Jing Wang4,5, Nicolas Delhomme6, Bastian Schiffthaler7, Görel Sundström6, Andrea Zuccolo8, Björn Nystedt9, Torgeir R Hvidsten7,10, Amanda de la Torre4,11, Rosa M Cossu8,12, Marc P Hoeppner13,14, Henrik Lantz13,15, Douglas G Scofield4,16,17, Neda Zamani6,13, Anna Johansson9, Chanaka Mannapperuma7, Kathryn M Robinson7, Niklas Mähler7, Ilia J Leitch18, Jaume Pellicer18, Eung-Jun Park19, Marc Van Montagu20, Yves Van de Peer1,21, Manfred Grabherr13, Stefan Jansson7, Pär K Ingvarsson4,22, Nathaniel R Street23.
Abstract
The Populus genus is one of the major plant model systems, but genomic resources have thus far primarily been available for poplar species, and primarily Populus trichocarpa (Torr. & Gray), which was the first tree with a whole-genome assembly. To further advance evolutionary and functional genomic analyses in Populus, we produced genome assemblies and population genetics resources of two aspen species, Populus tremula L. and Populus tremuloides Michx. The two aspen species have distributions spanning the Northern Hemisphere, where they are keystone species supporting a wide variety of dependent communities and produce a diverse array of secondary metabolites. Our analyses show that the two aspens share a similar genome structure and a highly conserved gene content with P. trichocarpa but display substantially higher levels of heterozygosity. Based on population resequencing data, we observed widespread positive and negative selection acting on both coding and noncoding regions. Furthermore, patterns of genetic diversity and molecular evolution in aspen are influenced by a number of features, such as expression level, coexpression network connectivity, and regulatory variation. To maximize the community utility of these resources, we have integrated all presented data within the PopGenIE web resource (PopGenIE.org).Entities:
Keywords: Populus; coexpression; genome assembly; natural selection; population genetics
Mesh:
Substances:
Year: 2018 PMID: 30373829 PMCID: PMC6243237 DOI: 10.1073/pnas.1801437115
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Genome overview. (A) Simplified phylogram representing estimated divergence times (11, 13) and assembly statistics for the two genome assemblies presented here (P. tremula and P. tremuloides), with P. trichocarpa and A. thaliana indicated for reference. P. trichocarpa assembly statistics are based on Joint Genomes Institute genome release v3.0 of the Joint Genomes Institute. Both genomes were obtained from the Phytozome resource (https://phytozome.jgi.doe.gov/pz/portal.html). (B) Sampling localities (black stars) and distribution of P. tremula, P. tremuloides, and P. trichocarpa. (C) Nucleotide diversity in various genomic contexts calculated from alignments of resequencing data from 24 P. tremula, 22 P. tremuloides, and 24 P. trichocarpa individuals aligned to the corresponding genome assembly. (D) Percentage of genes represented in each Populus genome for BUSCO, PLAZA, and P. trichocarpa v3.0 gene sets. (E) Venn diagram representation of shared and unique gene families among P. tremula, P. tremuloides, P. trichocarpa, and A. thaliana.
Genome assembly, repeat, and gene space annotation summary statistics for P. tremula and P. tremuloides
| Assembly | ||
| No. of scaffolds | 216,318 | 164,504 |
| Total size of scaffolds, bp | 390,124,095 | 377,489,497 |
| No. of scaffolds >500 bp | 57,475 | 59,039 |
| No. of scaffolds >1,000 bp | 31,806 | 39,866 |
| No. of scaffolds >10,000 bp | 5,161 | 10,248 |
| No. of scaffolds >100,000 bp | 687 | 28 |
| N50 bp | 42,844 | 15,222 |
| Repeat annotation (% values) | ||
| Total | 21.54 | 22.09 |
| Ty1-copia | 4 | 4.02 |
| Ty3-gypsy | 7.4 | 7.2 |
| Other-LTR | 0.13 | 0.11 |
| LINEs | 0.38 | 0.35 |
| SINEs | 0.36 | 0.38 |
| DNA | 3.28 | 3.43 |
| NHF | 5.99 | 6.6 |
| Gene annotation (counts) | ||
| High/low-confidence gene loci | 29,252/6,057 | 26,842/8,852 |
| High/low-confidence transcripts | 76,557/8,312 | 34,439/13,899 |
| High/low gene loci expressed | 27,825/4,833 | NA/NA |
LINEs, long interspersed nuclear elements; N50, scaffold length for which at least half of the nucleotides in the assembly belong to scaffolds with the N50 length or longer; NA, not applicable; NHF, no hit found, i.e., elements that are LTR-RTs related but do not have significant similarity with the major families; SINEs, short interspersed nuclear element.
Fig. 2.Genome synteny. (A) Self-alignments of nucleotide sequences for the P. tremula, P. tremuloides, and P. trichocarpa genomes, showing that paralogous regions from the Salicaceae WGD event are largely retained across the three genomes. Synteny matches following a WGD event are indicated by colored blocks. (B) Schematic representation of scaffold Potra00422, with genes shown as light green boxes, and orthologs in P. tremuloides (light blue) and P. trichocarpa (light purple), illustrating an example of retained local synteny between the three genomes. Paralogs are colored in darker shades of each color (to the right side in the representation). Dotted lines between some of the genes in P. tremuloides indicate that the gene is split across different scaffolds. (C) Schematic representation of the sex-determining region in aspen. (Left) Middle shows TOZ19 and 12 genes upstream and downstream. On both sides, the orthologs (detected using both BLAST and conserved synteny) in P. tremula and P. tremuloides are depicted. (Right) TOZ19 paralog, TOZ13, accompanied by 12 genes on each side. Scaffold IDs are noted for the TOZ19 and TOZ13 genes, as well as in cases of more than one gene per scaffold. Dotted lines indicate paralogs in P. trichocarpa. (Inset, Bottom Right) Region on chromosome 13 drawn to the same scale used for the region on chromosome 19.
Fig. 3.Population genetics for P. tremula. (A) Estimates of negative and positive selection on coding and noncoding regions, separated by site type. Error bars represent 95% bootstrap confidence intervals. Estimates of negative and positive selection on zerofold nonsynonymous sites in genes with varying expression level (B), varying connectivity level in coexpression network (C), and with or without eQTLs (D) are shown. Nes categories represent different bins of negative selection strength. α, proportion of divergent sites fixed by positive selection; ω, rate of adaptive substitution relative to neutral divergence. All calculations are based on genomic resequencing of 94 P. tremula individuals with reads aligned to the P. tremula genome assembly. ***P < 0.001.
Estimates of the distribution of fitness effects of new mutations at zerofold nonsynonymous sites, intronic sites, 5′ UTR sites, 3′ UTR sites, and intergenic sites falling in different Nes ranges, and proportion of divergence driven to fixation by positive selection (α) and the rate of adaptive substitution relative to neutral divergence (ω) in P. tremula and P. tremuloides
| Percentage of mutations in | |||||||
| Species | Category | 0–1 | 1–10 | 10–100 | >100 | α | ω |
| Zerofold | 0.275 (0.263–0.277) | 0.118 (0.115–0.132) | 0.167 (0.163–0.196) | 0.440 (0.408–0.447) | 0.298 (0.289–0.335) | 0.107 (0.103–0.121) | |
| 3′ UTR | 0.822 (0.807–0.829) | 0.096 (0.096–0.119) | 0.073 (0.067–0.077) | 0.009 (0.003–0.011) | 0.125 (0.115–0.151) | 0.114 (0.104–0.137) | |
| 5′ UTR | 0.646 (0.614–0.653) | 0.163 (0.148–0.232) | 0.157 (0.150–0.161) | 0.034 (0.005–0.050) | 0.311 (0.299–0.365) | 0.275 (0.263–0.324) | |
| Intronic | 0.727 (0.722–0.731) | 0.088 (0.088–0.089) | 0.096 (0.096–0.096) | 0.089 (0.085–0.095) | 0.159 (0.151–0.167) | 0.133 (0.125–0.141) | |
| Intergenic | 1.000 (1.000–1.000) | 0.000 (0.000–0.000) | 0.000 (0.000–0.000) | 0.000 (0.000–0.000) | 0.215 (0.210–0.220) | 0.273 (0.265–0.282) | |
| Zerofold | 0.264 (0.251–0.266) | 0.131 (0.128–0.147) | 0.194 (0.190–0.227) | 0.411 (0.376–0.416) | 0.338 (0.330–0.374) | 0.122 (0.118–0.135) | |
| 3′ UTR | 0.853 (0.834–0.858) | 0.095 (0.094–0.120) | 0.051 (0.046–0.056) | 0.001 (0.000–0.002) | 0.094 (0.084–0.132) | 0.086 (0.076–0.120) | |
| 5′ UTR | 0.632 (0.601–0.643) | 0.225 (0.207–0.278) | 0.140 (0.123–0.147) | 0.004 (0.000–0.007) | 0.345 (0.329–0.385) | 0.305 (0.289–0.342) | |
| Intronic | 0.748 (0.713–0.752) | 0.091 (0.090–0.140) | 0.096 (0.095–0.124) | 0.065 (0.023–0.069) | 0.140 (0.133–0.197) | 0.118 (0.111–0.165) | |
| Intergenic | 1.000 (1.000–1.000) | 0.000 (0.000–0.000) | 0.000 (0.000–0.000) | 0.000 (0.000–0.000) | 0.207 (0.200–0.213) | 0.261 (0.250–0.271) | |
Ninety-five percent bootstrap confidence intervals are shown in parentheses.
Fig. 4.Selection and constraint effects in 22,306 expressed genes that have been characterized by various expression features. (A) Sizes of constraint effect in expressed genes that are sorted in increasing order. The black line follows the average, the vertical line spans the Bayesian credibility intervals, and the dashed line indicates neutrality. Genes with selective constraint effects are marked in blue, and those with neutrality are marked in gray. (B) Characterization of gene expression features [from left to right: expression level, expression variance, proportion (Prop.) of genes that are core genes, Prop. of genes that harbor eQTLs] for genes evolved under selective constraint (blue) and evolved neutrally (gray). (C) Sizes of selective effects in expressed genes that are sorted in increasing order. The black line follows the average, the vertical line spans the Bayesian credibility intervals, and the dashed line indicates neutrality. Genes with negative selection are marked in blue, genes with neutrality are marked in gray, and genes with positive selection are marked in red. (D) Characterization of gene expression features (from left to right: expression level, expression variance, Prop. of genes that are core genes, Prop. of genes that harbor eQTLs) for genes with negative (blue), neutrality (gray), and positive (red) selection. *P < 0.05; ***P < 0.001. ns, not significant.