Literature DB >> 31580414

The genome of Populus alba x Populus tremula var. glandulosa clone 84K.

Deyou Qiu1, Shenglong Bai2, Jianchao Ma2, Lisha Zhang1, Fenjuan Shao1, Kaikai Zhang1,3, Yanfang Yang1, Ting Sun2, Jinling Huang2, Yun Zhou2, David W Galbraith2,4, Zhaoshan Wang1, Guiling Sun2.   

Abstract

Poplar 84K (Populus alba x P. tremula var. glandulosa) is a fast-growing poplar hybrid. Originated in South Korea, this hybrid has been extensively cultivated in northern China. Due to the economic and ecological importance of this hybrid and high transformability, we now report the de novo sequencing and assembly of a male individual of poplar 84K using PacBio and Hi-C technologies. The final reference nuclear genome (747.5 Mb) has a contig N50 size of 1.99 Mb and a scaffold N50 size of 19.6 Mb. Complete chloroplast and mitochondrial genomes were also assembled from the sequencing data. Based on similarities to the genomes of P. alba var. pyramidalis and P. tremula, we were able to identify two subgenomes, representing 356 Mb from P. alba (subgenome A) and 354 Mb from P. tremula var. glandulosa (subgenome G). The phased assembly allowed us to detect the transcriptional bias between the two subgenomes, and we found that the subgenome from P. tremula displayed dominant expression in both 84K and another widely used hybrid, P. tremula x P. alba. This high-quality poplar 84K genome will be a valuable resource for poplar breeding and for molecular biology studies.
© The Author(s) 2019. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

Entities:  

Keywords:  zzm321990 P. albazzm321990 ; zzm321990 P. tremulazzm321990 ; genome sequencing; poplar 84K; subgenome assignment

Mesh:

Substances:

Year:  2019        PMID: 31580414      PMCID: PMC6796506          DOI: 10.1093/dnares/dsz020

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


1. Introduction

The genus Populus comprises around 30 economically and ecologically important species. Besides wood products, these species provide a range of services including bioenergy production, carbon sequestration, bioremediation, nutrient cycling, biofiltration, and habitat diversification. Due to its modest genome size, rapid growth rate, simple vegetative propagation, and short rotation cycle, coupled to high levels of genetic diversity as well as facile genetic manipulation, this genus has become a model for the study of tree molecular biology. Up to now, several high-quality poplar genomes have become available, including P. trichocarpa (Torr. and Gray),P. euphratica Oliv,P. pruinosa, and P. alba var. pyramidalis. Unfortunately, these species are not easily transformed by Agrobacterium tumefaciens. Interestingly, it seems that only hybrid poplar species, which have much higher transformation rates, have been widely used for genetic transformation, examples including P. alba x P. tremula var. glandulosa clone 84K,P. tremula x P. alba,P. alba x P. grandidentata, and P. alba x P. tremula. White poplar P. alba, commonly called silver poplar, is widely distributed in central Europe and central Asia. It requires abundant light and ample moisture, and stands up well to flooding. Due to its attractive green-and-white leaves, it is often planted as an ornamental tree. Its extensive root system and good tolerance of salt make it an effective tree on windy coasts. This species has also been widely used in short rotation plantation for timber and pulp production. P. tremula, commonly called aspen, is widely distributed in cool temperate regions of Europe and Asia. It is usually found in mountains at high altitudes. P. tremula is a very hardy species and can tolerate long, cold winters. It has been widely planted for timber, firewood, and veneer production. Natural populations of P. alba and P. tremula or its varieties hybridize frequently and have been selected as the parental species for artificial hybrid breeding. Two main hybrids, P. tremula x P. alba and P. alba x P. tremula var. glandulosa, have been frequently used in molecular biological studies due to their high rates of transformation. In 2016, Mader et al. reported a 900 Mb draft genome of a female interspecific hybrid P. tremula x P. alba clone INRA 717-1B4, with a N50 contig length of 3,850 bp. However, the genome sequence of this hybrid has not been assembled to the level of chromosomes. Evidently, further development of high-quality genome sequences in hybrid poplar species having high transformation efficiencies will be critical to advances in molecular biology and genetic engineering of this woody genus. Here we focus on the clone 84K, a male P. alba x P. tremula var. glandulosa interspecific hybrid. This hybrid resulted from a breeding program led by Professor SinKyu Hyun (Seoul National University, Korea), and was first introduced into China in 1984 by Professor Qiwen Zhang (The Research Institute of Forestry, Chinese Academy of Forestry, Beijing). Attractive characteristics of this interspecific hybrid include a high growth rate and excellent adaptation to diverse environments. This superior clone has been commonly used in short rotation plantation for timber, firewood, and pulp production. More importantly, it is easily accessible for genetic transformation using A. tumefaciens, and this clone is widely used by scientists in transgenic experiments as a model for woody species. Many transgenic poplar lines based on this clone have been created for commercial applications in China. Regrettably, no genome sequence is currently available for this clone. In this study, we describe a de novo assembly of the genome sequence of the hybrid poplar (P. alba x P. tremula var. glandulosa) clone 84K (poplar 84K, hereafter), and identify two subgenomes via comparison to the genomes of P. alba var. pyramidalis and P. tremula. The genome resources of this hybrid will facilitate further gene functional analyses, optimization of genetic transformation experiments, and poplar breeding practices, as well as comparative genomic analyses across different poplars.

2. Materials and methods

Plant materials

Poplar 84K (P. alba x P. tremula var. glandulosa) was grown in the greenhouse at 20 °C, under a 12-h light/12-h dark illumination cycle. Fresh young leaves were collected from 1-month-old plants and immediately frozen in liquid nitrogen. Genomic DNA was isolated for library construction following the steps of the CTAB method described by Doyle et al. Total RNA of the leaves and shoots from 1-month-old plants was extracted with TRI Reagent (Sigma, St. Louis, MO) for transcriptome sequencing (RNA-seq).

Genome sequencing and RNA-seq

Two DNA libraries with 270 and 500 bp insertions, and one DNA library with 20 kb insertions were constructed and separately sequenced on Illumina NovaSeq 6000 and PacBio Sequel platforms. Detailed methods for DNA library construction can be found in Supplementary Materials. For transcriptome sequencing (RNA-seq), four RNA libraries for leaves and one library for shoots were constructed according to the TrueSeq® RNA Sample Preparation protocol, and were sequenced on an Illumina NovaSeq sequencing system.

Estimation of genome size

The 1C value of poplar 84K was measured using flow cytometry with propidium iodide (PI) as the DNA stain and Arabidopsis thaliana (col-0) as the standard plant as described previously. The genome size of poplar 84K was calculated using the equation provided by the regression analysis and the known A. thaliana 2C DNA content (see Supplementary Materials for details). We also used 28.44 Gb of Illumina NovaSeq short reads to estimate the genome size and other features using the GCE software based on k-mer depth-frequency distribution. The versions and main parameters of GCE and other software packages used in this study are provided in Supplementary Table S1.

Assembly of the poplar 84K genome sequences

SMRT subreads were corrected, trimmed, and assembled using CANU (see Supplementary Materials for details), and then polished the draft assembly using Arrow. Lastly, Pilon was used to perform two rounds of error correction using Illumina NovaSeq reads from the 270 and 500 bp insert libraries.

Identification of potential contamination and organelle genomes

For contigs less than 1 Mb, we performed a BLASTN search against the non-redundant nucleotide (NT, downloaded on 4 March 2018) database in GenBank with an E-value of 1E-5. Contigs having the most matches to non-plant species were designated as environmental sequence contamination. For the identification of the chloroplast genome, the complete chloroplast genome sequence of P. tremula x P. alba (accession number: NC_028504.1) was used as a query to search against all the remaining PacBio contigs with an E-value of 1E-5. The mitochondrial genome of P. tremula x P. alba (accession number: NC_028329.1) was chosen to search against all remaining PacBio contigs with an E-value of 1E-5 for mitochondrial fragments. Gene structure annotations and figures of the organelle genomes were produced using GESEQ.

Pseudomolecule construction

Hi-C mapping was employed to facilitate pseudomolecule construction and the details were described in Supplementary Materials. The PacBio contigs were divided into fragments having a length of 300 kb, and were then error corrected, clustered, ordered, and orientated by the LACHESIS software operating on the valid interaction read pair dataset. Finally, contact maps were plotted using the HiCPlotter software.

Evaluation of genome completeness, continuity, and accuracy

To evaluate the completeness of genome assembly, we checked the mapping rates by aligning RNA-seq reads from five libraries and DNA short reads from four libraries to the final assembly using HISAT2 and BWA-MEM, respectively, and performed BUSCO analysis. We assessed the continuity of the final assembly of poplar 84K along with two other poplar genomes, P. trichocarpa and P. deltoides (assembly version 2.1, produced by the US Department of Energy Joint Genome Institute in collaboration with the user community), using LTR_retriever by calculating the LTR Assembly Index (LAI) score that evaluates the de novo assembly quality of intergenic and repetitive regions. Raw LAI is defined as the ratio of intact LTR retrotransposon length to total LTR sequence length. After standardization, the corrected LAI can be used as a reference-free genome metric regardless of genome size, LTR retrotransposon content, and gene content completeness. The accuracy of the final assembly was estimated by aligning Illumina short reads using BWA-MEM and GATK to call variants.

Gene prediction and annotation

Tandem repeats within the poplar 84K genome were identified with Tandem Repeat Finder. To identify known transposable elements (TEs), we used RepeatMasker, loaded with the Repbase and the Dfam library databases, Homology-based ncRNA annotation was performed by mapping plant miRNA and snRNA genes from the Rfam database (release 14.0) to the poplar 84K genome using infernal. tRNAscan-SE was used for tRNA annotation. RNAmmer was used to predict rRNAs and their subunits. Protein-coding genes were predicted based on transcriptomic, homologous, and de novo methods, the details of which along with functional annotation of the predicted gene models were described in Supplementary Materials.

Subgenome assignment

The chromosome pairs were firstly identified with MCScanX, based on protein collinearity between all the 38 Hi-C linkage groups in poplar 84K and all the 19 chromosomes of P. trichocarpa. The pairwise relationship of the chromosomes was further confirmed by their DNA-level collinearity identified using nucmer. For each chromosome pair, the chromosomes derived from the parental species were distinguished by similarity to the genomes of related species P. alba var. pyramidalis and P. tremula, and were assigned to subgenome A and G, respectively.

Evolutionary analysis of chloroplast and mitochondrial genome

Phylogenetic tree reconstruction of chloroplast and mitochondrial protein coding regions was described in Supplementary Materials. The organelle-DNA similar fragments in nuclear genome were detected using BLASTN with the chloroplast and mitochondrial contigs as query to search against poplar 84K genome. The parameters of BLASTN and the filtration setting were as described previously.

Analysis of genome collinearity and variation between subgenome A and G

Collinear blocks between subgenome A and subgenome G were determined using MCscanX and plotted with Circos, treating at least eight genes as a collinear block. We further aligned the DNA sequences of the two subgenomes using nucmer, and only retained uniquely anchored sequences larger than 10 kb for variance calling by the Assemblytics software.

Transcriptional profiles of the allelic genes in subgenomes

Being that the RNA-seq experiments performed in this study did not contain replicates and were from few tissues, we downloaded the RNA-seq datasets of four tissues in poplar 84K and of four tissues in P. tremula x P. alba from GenBank SRA database (Supplementary Table S2) and mapped them to the genome of poplar 84K by HISAT2. Gene expression values were calculated with FPKM and TMP values using the Stringtie software. Allelic genes determined by MCSCANX, and those showing transcriptional bias between subgenome A and G were identified using edgeR with a 4-fold change and a P-value of <0.05. The qRT-PCR method was described in Supplementary Materials.

3. Results and discussion

Genome assembly and identification of potential contamination and organelle genomes

Two DNA libraries carrying 270 and 500 bp insertions, and one DNA library carrying 20 kb insertions were constructed and separately sequenced on Illumina NovaSeq 6000 and PacBio Sequel platforms, producing 56.3 Gb (Supplementary Table S3a) and 48.4 Gb (Supplementary Table S3b) data. The mean and N50 of PacBio subread lengths were 8.1 kb and 13.7 kb, respectively (Supplementary Table S3b, Fig. S1). The haploid genome size of poplar 84K was estimated to be 470.155 ± 5.94 Mb by flow cytometry (Supplementary Fig. S2, Table S4) and 427.2 Mb based on k-mer depth distribution (Supplementary Fig. S3). Heterozygosity was estimated to be 2.16% using GCE software. A 753.8-Mb genome assembly of poplar 84K was constructed (Table 1), comprising 1,384 contigs and a contig N50 of 2.24 Mb. The difference between assembly size and estimated genome size may be due to the heterozygous character of poplar 84K and was investigated further in the following analysis. To remove fungal and bacterial DNA contamination, for 1,142 contigs less than 1 Mb (total 157,986,688 bp), we performed a BLASTN search against the non-redundant nucleotide (NT) database in GenBank with an E-value of 1E-5. A total of 5,264,546 bp from 114 contigs were identified as bacterial or fungal contamination, and was excluded from further analysis. Three contigs with length of 40,693, 115,255, and 64,703 bp were found with high identities to the chloroplast sequence of P. tremula x P. alba (NC_028504.1). The chloroplast genome (156,462 bp) was then assembled from these contigs based on a collinear relationship with the chloroplast sequence of P. tremula x P. alba (NC_028504.1). One contig of 874,696 bp was found to encode the mitochondrial genome. Gene structure annotations and figures of the organelle genomes were produced using GESEQ (Supplementary Figs S4 and S5). After assignment of microbial contamination and organelles, 747.5 Mb from poplar 84K nuclear genome were obtained. With the 95.2 Gb clean reads generated by Hi-C sequencing (Supplementary Table S3d), a total of 1,042 contigs were clustered into 38 groups for pseudomolecule construction. Of these, 544 contigs, with total length of 710 Mb, were ordered and oriented in all linkage groups (Fig. 1a, Table 1).
Table 1

Statistics of genome and subgenome assembly of poplar 84K using different sequencing data

Length (bp) /numberPacBio assemblyHi-C assemblyPseudomoleculesSubgenome ASubgenome G
N90429,47312,722,01513,367,99513,017,45713,367,995
N80959,98013,867,02314,133,09013,907,23614,133,090
N701,381,28014,269,01416,373,42216,345,78416,373,422
N601,818,02117,170,36517,428,09317,170,36517,629,401
N502,239,55919,641,61119,971,39918,769,03020,657,227
N402,832,26420,657,22720,657,22720,018,72521,475,114
N303,174,75221,475,11421,827,25421,194,24822,685,518
N204,522,24323,093,40724,417,57624,417,57625,197,140
N106,295,54149,053,66749,053,66749,053,66749,243,677
Maximum length12,124,76949,243,67749,243,67749,053,66749,243,677
Minimum length1,2451,2456,540,22411,888,3676,540,224
Total length753,822,854a747,538,837b710,053,368356,027,211354,026,157
Total sequences1,384841381919
Gap numbers505505227278

a The original assembly obtained using CANU.

b Popar 84K nuclear genome assembly after removing organellar sequences and microorganism contamination.

Figure 1

Assessment of the assembled 84K genome. (a) Interaction frequency distributions of Hi-C linkage groups. The log2 of the valid interaction link number of Hi-C data between any pair of 500 kb non-overlapping bins were calculated and is displayed as a heatmap by HiCPlotter. The black/white bar of the heatmap indicates the interaction frequency of the Hi-C links. The square indicates the abnormal high frequency of interaction between linkage group 35 (chromosome 11 of subgenome G) and linkage group 10 (chromosome 12 in subgenome A). (b) BUSCO analysis of the genomes from poplar 84K, P. trichocarpa, P. alba var. pyramidalis, and P. tremula. (C) complete; (S) single-copy; (D) duplicated; (F) Fragmented; (M) Missing.

Assessment of the assembled 84K genome. (a) Interaction frequency distributions of Hi-C linkage groups. The log2 of the valid interaction link number of Hi-C data between any pair of 500 kb non-overlapping bins were calculated and is displayed as a heatmap by HiCPlotter. The black/white bar of the heatmap indicates the interaction frequency of the Hi-C links. The square indicates the abnormal high frequency of interaction between linkage group 35 (chromosome 11 of subgenome G) and linkage group 10 (chromosome 12 in subgenome A). (b) BUSCO analysis of the genomes from poplar 84K, P. trichocarpa, P. alba var. pyramidalis, and P. tremula. (C) complete; (S) single-copy; (D) duplicated; (F) Fragmented; (M) Missing. Statistics of genome and subgenome assembly of poplar 84K using different sequencing data a The original assembly obtained using CANU. b Popar 84K nuclear genome assembly after removing organellar sequences and microorganism contamination.

3.2.Evaluation of completeness, continuity, and accuracy of final assembly

We found 95.64–97.88% of the RNA-seq reads and 95.74–97.74% of the DNA short reads could be aligned to the final assembly (Supplementary Table S5). BUSCO analysis showed that 1,386 (96.3%) of 1,440 plant single-copy orthologues were complete, but 1,291 (93.1%) of them presented as duplicated copies. Similar percentages of the 1,440 plant single-copy orthologues were detected in P. trichocarpa,P. alba var. Pyramidalis, and P. tremula, and at least 85.1% of them were single copy in these species (Fig. 1b). This suggested that the heterozygous regions were obtained, based on which these heterozygous regions could be separated using their parental genome information. We assessed the continuity of the final assembly of poplar 84K in comparison to two poplar reference genomes, P. trichocarpa and P. deltoides (assembly version 2.1, produced by the US Department of Energy Joint Genome Institute in collaboration with the user community). The standardized LAI scores are 9.34 in P. trichocarpa, 7.95 in P. deltoides, and 14.79 in poplar 84K. This implies the poplar 84K assembly has achieved reference genome quality. The accuracy of the final assembly was estimated by aligning Illumina short reads using BWA-MEM and GATK to call variants. A total of 318 homozygous SNPs and 6,398 homozygous INDELs were considered as errors, and 35,735 heterozygous SNPs and 24,860 heterozygous INDELs were found.

Genome annotation and subgenome assignment

We identified 184.0 Mb of TE sequence (24.4% of the assembly) (Supplementary Table S6). The largest class of TEs comprised retrotransposons, accounting for 18.2% of the assembly, and consisted mostly of Gypsy and Copia retrotransposon families. DNA transposons accounted for 3.8% of the assembly. These analyses also identified 1,983 miRNAs, 1,312 tRNAs, 1,140 rRNAs, and 1,126 snRNAs (Supplementary Table S7). Four RNA libraries for leaves and one library for shoots were constructed, and sequenced using the Illumina NovaSeq platform, generating 145.8 million pair reads for genome annotation (Supplementary Table S3c). Finally, 85,755 consensus protein-coding genes were predicted, the average gene length, average transcript length, average CDS length, and exon number per gene being 2,948 bp, 2,937 bp, 1,075 bp and 4.48, respectively (Supplementary Table S8). Functional annotation of the predicted protein-coding genes revealed that 72,574 of the total of 85,755 genes (84.6%) could be assigned putative functions (Supplementary Table S9). Chromosome pairs were first identified based on protein collinearity between all the 38 Hi-C linkage groups in poplar 84K and all the 19 chromosomes of P. trichocarpa (Supplementary Fig. S6). The pairwise relationship of the chromosomes was further confirmed by their DNA-level collinearity (Supplementary Fig. S7). For each chromosome pair, the chromosomes derived from the parental species were distinguished by similarity to the genomes of related species P. alba var. pyramidalis and P. tremula (Supplementary Fig. S8). The chromosome numbering and orientation were determined by comparison to P. trichocarpa (Supplementary Fig. S6 and Table S10). Ultimately, 356 Mb from female parental species P. alba and 354 Mb from male parental species P. tremula var. glandulosa were obtained, and were designated as subgenome A and subgenome G. BUSCO analysis indicated each subgenome has characteristics similar to those of P. trichocarpa, P. alba var. pyramidalis and P. tremula (Fig. 1b), which implies high completeness of the two subgenomes. Specifically, the numbers of duplicated genes dropped from 1,291 (89.7%) to 187 in subgenome A and to 186 in subgenome G. This supports our previous speculation that combination of two haploid genomes causes the assembly size to increase by 30%.

Characteristics of the poplar 84K genome

The characteristics of the two subgenomes, along with sequencing depths, are shown in Fig. 2, including the chromosome length, gene density, TE content, GC content, and location of collinear regions. Chromosome 11 of subgenome G (6.54 Mb) was much shorter than that of subgenome A (17.42 Mb) (Fig. 2 and Supplementary Table S10). This may be due to large fragment loss during inter-species hybridization and/or subsequent chromosome stabilization, or defects of the current assembly algorithm in dealing with the regions with high identities. Another anomalous region is a 6.92 Mb fragment at the 5′ end of chromosome 12 of subgenome A, which showed a doubled sequencing depth as compared with other regions (Fig. 2). This 6.92 Mb region also showed a high frequency of interaction with the short chromosome 11 of subgenome G (Fig. 1a). We speculate that one additional 6.92 Mb region derived from recent large DNA fragment duplication may be responsible for the connection to chromosome 11 of subgenome G.
Figure 2

Characteristics of the poplar 84K genome. From the outer edge inward: (a) circles represent the subgenomes A (right) and G (left); (b) gene density on each chromosome; (c) repeat density at 10 kb intervals; (d) GC content at 10 kb intervals; (e) the sequencing depth of Illumina short reads at 10 kb intervals, and collinear blocks linked by grey lines.

Characteristics of the poplar 84K genome. From the outer edge inward: (a) circles represent the subgenomes A (right) and G (left); (b) gene density on each chromosome; (c) repeat density at 10 kb intervals; (d) GC content at 10 kb intervals; (e) the sequencing depth of Illumina short reads at 10 kb intervals, and collinear blocks linked by grey lines. Further phylogenetic analyses of chloroplast proteins were conducted. The tree topology indicated that P. trichocarpa is closely related to P. balsamifera and P. fremontii (Supplementary Fig. S9), consistent with previous studies.P. yunnanensis formed a clade with P. alba, P. tremula and P. tremula x P. alba, within which the poplar 84K genome formed a sister branch with the genome of P. alba (Supplementary Fig. S9). The mitochondrial genome in poplar 84K also showed maternal inheritance in that they were mostly related to those from P. alba (Supplementary Fig. S10). The lateral transfer of organelle fragments into the nuclear genome has been widely reported in plants. To get an insight into the scale of such transfer and to exclude the possibility of misidentification of organelle genome in poplar 84K, we analysed the insertion times, sequence identities, and maximum and total lengths of organelle-DNA similar fragments in the nuclear genome. About 2,234 and 2,093 insertions with total length of 533 kb and 530 kb of chloroplast sequences, and 3,916 and 3,844 insertions with total length of 597.2 kb and 625.6 kb of mitochondrial sequences were found in all the chromosomes of subgenome A and G, respectively (Supplementary Table S11). The mean insertion lengths were 236 bp for chloroplasts and 155 bp for mitochondria (Supplementary Fig. S11). The longest chloroplast and mitochondrial insertion lengths were 15.2 kb (98.9% identity) and 19.1 kb (98.1% identity), respectively. The statistics of organelle-DNA similar fragments in the nuclear genome of poplar 84K were similar to the previous report of P. trichocarpa. We performed orthologous assignment and gene family comparison in the two poplar 84K subgenomes and four other Populus species using OrthoFinder. Total of 30,342 clusters were obtained in them, among which 17,690 (58.30%) were shared by the four genomes and two poplar 84K subgenomes (Supplementary Fig. S12).

Genome collinearity and variation between subgenome A and G

About 323 Mb of regions of collinearity, including 28,974 allelic gene pairs, were identified between the two subgenomes (90.7% of subgenome A and 91.2% of subgenome G). We then aligned the DNA sequences of the two subgenomes and only retained uniquely anchored sequences larger than 10 kb for variant calling. We found 5,398,437 SNPs and 1,108,357 indels (1 bp–10 kb) in these collinear regions. These variations within allelic genes would influence their gene structures, expression patterns, functions, and regulation, which together contribute the morphologic and physiologic characteristics of hybrid 84K.

Transcriptional bias of the allelic genes in the two subgenomes

We further explored the expressional bias of allelic genes in the two subgenomes by mapping RNA-seq data of four tissues from 84K and four tissues from backcross hybrid P. tremula x P. alba to the poplar 84 genome assembly (Supplementary Table S2). Of all the 28,974 allelic gene pairs, 6,780 gene pairs were found with transcriptional bias in poplar 84K, and 5,349 in P. tremula x P. alba (Fig. 3a and b, Supplementary Table S12). Among these allelic genes with transcriptional bias, 2,121 gene pairs showed bias in all four tissue in poplar 84K, the corresponding number in P. tremula x P. alba being 1,166 (Fig. 3a and b). We further calculated the number of genes that showed transcriptional bias to subgenomes A and G in two hybrids. Interestingly, we found more than 54.4% of allelic genes showed transcriptional bias to subgenome G, and less than 45.6% of allelic genes to subgenome A in all the tissues that were used (Table 2), which implies subgenome G plays a more important role during the growth process in both hybrids. We selected four allelic gene pairs and performed qRT-PCR analysis in shoots and leaves, which showed expression patterns that were consistent with the RNA-seq data (Supplementary Fig. S13, Table S13). The similarities of promoters were additionally analysed between the allelic pairs showing expression bias. The result indicated that a higher divergence in promoter regions was present in the allelic pairs having expression bias than those without detectable expression bias (Fig. 3c).
Figure 3

Allelic genes with transcriptional bias and their promoter region identity in poplar 84K and P. tremula x P. alba. (a) Identified allelic genes with transcriptional bias in poplar 84K. (b) Identified allelic genes with transcriptional bias in P. tremula x P. alba. (c) Boxplot of the identity in allelic gene promoter region. Significant differences of identity between allelic genes with transcriptional bias and no-bias were supported by the Kolmogorov–Smirnov test (P-value < 0.001) in poplar 84K and P. tremula x P. alba.

Table 2

Genes with transcriptional bias within different tissues in two poplar hybrids

SpeciesTissuesDEG numbersDominant expression in subgenome ADominant expression in subgenome G
P. alba x P. tremula var. glandulosaShoot41241,856 (45.00%)2,268 (55.00%)
P. alba x P. tremula var. glandulosaRooting41091,826 (44.44%)2,283 (55.56%)
P. alba x P. tremula var. glandulosaCallus39651,802 (45.45%)2,163 (54.55%)
P. alba x P. tremula var. glandulosaLeaf42531,936 (45.52%)2,317 (54.48%)
 P. tremula x P. albaLeaf27921,065 (38.14%)1,727 (61.86%)
 P. tremula x P. albaBark31971,314 (41.10%)1,883 (58.90%)
 P. tremula x P. albaXylem28081,150 (40.95%)1,658 (59.05%)
 P. tremula x P. albaRoot31211,288 (41.27%)1,833 (58.73%)

P. alba x P. tremula var. glandulosa = 84K.

Allelic genes with transcriptional bias and their promoter region identity in poplar 84K and P. tremula x P. alba. (a) Identified allelic genes with transcriptional bias in poplar 84K. (b) Identified allelic genes with transcriptional bias in P. tremula x P. alba. (c) Boxplot of the identity in allelic gene promoter region. Significant differences of identity between allelic genes with transcriptional bias and no-bias were supported by the Kolmogorov–Smirnov test (P-value < 0.001) in poplar 84K and P. tremula x P. alba. Genes with transcriptional bias within different tissues in two poplar hybrids P. alba x P. tremula var. glandulosa = 84K. Further examination uncovered 285 allelic gene pairs with bias to subgenome A (Supplementary Fig. S14a) and 339 with bias toward subgenome G, for all the tissues and in both hybrids (Supplementary Fig. S14b). These genes might be essential, playing important functions in both hybrids. Gene enrichment analysis of these two datasets revealed that protein metabolic process and nitrogen compound metabolic process were represented in both subgenomes. Subgenome A has more allelic gene pairs in the categories of monovalent inorganic cation transport and chromatin organization, whereas subgenome G has more in the categories of mRNA processing, ncRNA metabolic process, and sulphur compound metabolic process. As for their cellular component classifications, allelic gene pairs showing bias to subgenome A function in mitochondria and the nucleoplasm, and those in subgenome G play roles in plastids, endosomes, vesicles, and ribosomes (Supplementary Table S14). This implies these genes function in different cellular compartments and different biological processes. More transcriptome data will be needed to obtain further insights into the transcriptional bias of each subgenome in different developmental stages under different stresses.

4. Summary and conclusions

We employed PacBio single-molecular real-time sequencing and Hi-C technology to generate a reference sequence of the poplar 84K (Populus alba x Populus tremula var. glandulosa) genome. The genome sequences were assembled into the chromosome levels, with a contig N50 size of 1.99 Mb and a scaffold N50 size of 19.6 Mb. About 356 Mb from the female parental species (P. alba) and 354 Mb from the male parental species (P. tremula var. glandulosa) were assigned. The two subgenomes showed high collinearity over 323 Mb. The allelic gene pairs with transcriptional bias toward subgenome A or G function in different cellular compartments and different biological processes. More allelic gene pairs showed transcriptional bias toward subgenome G in poplar 84K and another widely used hybrid P. tremula x P. alba. The dominant expression of subgenome G indicates that it plays a more important role than subgenome A during hybrid growth. The high-quality genome of poplar 84K provides an important gene resource for poplar breeding and molecular biology research.

Data availability

The sequencing reads from each sequencing library have been deposited at NCBI with the Project ID PRJNA556338 and CNGB Nucleotide Sequence Archive (CNSA) with the Project ID CNP0000339. Software versions and main parameters are provided in Supplementary Table S1 in Supplementary Materials. Click here for additional data file.
  41 in total

1.  Genomics and forest biology: Populus emerges as the perennial favorite.

Authors:  Stan D Wullschleger; Stefan Jansson; Gail Taylor
Journal:  Plant Cell       Date:  2002-11       Impact factor: 11.277

2.  Rapid flow cytometric analysis of the cell cycle in intact plant tissues.

Authors:  D W Galbraith; K R Harkins; J M Maddox; N M Ayres; D P Sharma; E Firoozabady
Journal:  Science       Date:  1983-06-03       Impact factor: 47.728

3.  Using RepeatMasker to identify repetitive elements in genomic sequences.

Authors:  Nansheng Chen
Journal:  Curr Protoc Bioinformatics       Date:  2004-05

4.  Repbase Update, a database of repetitive elements in eukaryotic genomes.

Authors:  Weidong Bao; Kenji K Kojima; Oleksiy Kohany
Journal:  Mob DNA       Date:  2015-06-02

5.  Versatile and open software for comparing large genomes.

Authors:  Stefan Kurtz; Adam Phillippy; Arthur L Delcher; Michael Smoot; Martin Shumway; Corina Antonescu; Steven L Salzberg
Journal:  Genome Biol       Date:  2004-01-30       Impact factor: 13.583

6.  Assessing genome assembly quality using the LTR Assembly Index (LAI).

Authors:  Shujun Ou; Jinfeng Chen; Ning Jiang
Journal:  Nucleic Acids Res       Date:  2018-11-30       Impact factor: 16.971

7.  MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity.

Authors:  Yupeng Wang; Haibao Tang; Jeremy D Debarry; Xu Tan; Jingping Li; Xiyin Wang; Tae-ho Lee; Huizhe Jin; Barry Marler; Hui Guo; Jessica C Kissinger; Andrew H Paterson
Journal:  Nucleic Acids Res       Date:  2012-01-04       Impact factor: 16.971

8.  Correlation between nuclear plastid DNA abundance and plastid number supports the limited transfer window hypothesis.

Authors:  David Roy Smith; Kate Crosby; Robert W Lee
Journal:  Genome Biol Evol       Date:  2011-02-03       Impact factor: 3.416

9.  GeSeq - versatile and accurate annotation of organelle genomes.

Authors:  Michael Tillich; Pascal Lehwark; Tommaso Pellizzer; Elena S Ulbricht-Jones; Axel Fischer; Ralph Bock; Stephan Greiner
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

10.  The Dfam database of repetitive DNA families.

Authors:  Robert Hubley; Robert D Finn; Jody Clements; Sean R Eddy; Thomas A Jones; Weidong Bao; Arian F A Smit; Travis J Wheeler
Journal:  Nucleic Acids Res       Date:  2015-11-26       Impact factor: 16.971

View more
  15 in total

1.  Integrated Transcriptome and Metabolome Analyses Reveal the Anthocyanin Biosynthesis Pathway in AmRosea1 Overexpression 84K Poplar.

Authors:  Huiling Yan; Xinxin Zhang; Xiang Li; Xuelai Wang; Hanxi Li; Qiushuang Zhao; Peng Yin; Ruixue Guo; Xiaona Pei; Xiaoqing Hu; Rui Han; Xiyang Zhao
Journal:  Front Bioeng Biotechnol       Date:  2022-06-06

2.  Functional analysis of PagNAC045 transcription factor that improves salt and ABA tolerance in transgenic tobacco.

Authors:  Xuemei Zhang; Zihan Cheng; Gaofeng Fan; Wenjing Yao; Wei Li; Sixue Chen; Tingbo Jiang
Journal:  BMC Plant Biol       Date:  2022-05-25       Impact factor: 5.260

3.  A genome wide transcriptional study of Populus alba x P. tremula var. glandulosa in response to nitrogen deficiency stress.

Authors:  Caixia Liu; Song Chen; Sui Wang; Xiyang Zhao; Kailong Li; Su Chen; Guan-Zheng Qu
Journal:  Physiol Mol Biol Plants       Date:  2021-05-31

4.  Poplar Autophagy Receptor NBR1 Enhances Salt Stress Tolerance by Regulating Selective Autophagy and Antioxidant System.

Authors:  Wanlong Su; Yu Bao; Yingying Lu; Fang He; Shu Wang; Dongli Wang; Xiaoqian Yu; Weilun Yin; Xinli Xia; Chao Liu
Journal:  Front Plant Sci       Date:  2021-01-20       Impact factor: 5.753

5.  Growth-regulating factor 5 (GRF5)-mediated gene regulatory network promotes leaf growth and expansion in poplar.

Authors:  Wenqi Wu; Jiang Li; Qiao Wang; Kaiwen Lv; Kang Du; Wenli Zhang; Quanzi Li; Xiangyang Kang; Hairong Wei
Journal:  New Phytol       Date:  2021-02-14       Impact factor: 10.151

6.  Transposable Elements: Distribution, Polymorphism, and Climate Adaptation in Populus.

Authors:  Yiyang Zhao; Xian Li; Jianbo Xie; Weijie Xu; Sisi Chen; Xiang Zhang; Sijia Liu; Jiadong Wu; Yousry A El-Kassaby; Deqiang Zhang
Journal:  Front Plant Sci       Date:  2022-02-01       Impact factor: 5.753

Review 7.  Perception of lipo-chitooligosaccharides by the bioenergy crop Populus.

Authors:  Kevin R Cope; Thomas B Irving; Sanhita Chakraborty; Jean-Michel Ané
Journal:  Plant Signal Behav       Date:  2021-04-02

8.  Transcriptome analysis reveals key genes involved in the regulation of nicotine biosynthesis at early time points after topping in tobacco (Nicotiana tabacum L.).

Authors:  Yan Qin; Shenglong Bai; Wenzheng Li; Ting Sun; David W Galbraith; Zefeng Yang; Yun Zhou; Guiling Sun; Bingwu Wang
Journal:  BMC Plant Biol       Date:  2020-01-20       Impact factor: 4.215

9.  An Improved CRISPR/Cas9 System for Genome Editing in Populus by Using Mannopine Synthase (MAS) Promoter.

Authors:  Yi An; Ya Geng; Junguang Yao; Chun Wang; Juan Du
Journal:  Front Plant Sci       Date:  2021-07-12       Impact factor: 5.753

Review 10.  From Genome Sequencing to CRISPR-Based Genome Editing for Climate-Resilient Forest Trees.

Authors:  Hieu Xuan Cao; Giang Thi Ha Vu; Oliver Gailing
Journal:  Int J Mol Sci       Date:  2022-01-16       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.