Literature DB >> 33730157

Genome Assembly of Salicaceae Populus deltoides (Eastern Cottonwood) I-69 Based on Nanopore Sequencing and Hi-C Technologies.

Shengjun Bai¹, Hainan Wu¹, Jinpeng Zhang¹, Zhiliang Pan¹, Wei Zhao¹, Zhiting Li¹, Chunfa Tong¹.

Abstract

Populus deltoides has important ecological and economic values, widely used in poplar breeding programs due to its superior characteristics such as rapid growth and resistance to disease. Although the genome sequence of P. deltoides WV94 is available, the assembly is fragmented. Here, we reported an improved chromosome-level assembly of the P. deltoides cultivar I-69 by combining Nanopore sequencing and chromosome conformation capture (Hi-C) technologies. The assembly was 429.3 Mb in size and contained 657 contigs with a contig N50 length of 2.62 Mb. Hi-C scaffolding of the contigs generated 19 chromosome-level sequences, which covered 97.4% (418 Mb) of the total assembly size. Moreover, repetitive sequences annotation showed that 39.28% of the P. deltoides genome was composed of interspersed elements, including retroelements (23.66%), DNA transposons (6.83%), and unclassified elements (8.79%). We also identified a total of 44 362 protein-coding genes in the current P. deltoides assembly. Compared with the previous genome assembly of P. deltoides WV94, the current assembly had some significantly improved qualities: the contig N50 increased 3.5-fold and the proportion of gaps decreased from 3.2% to 0.08%. This high-quality, well-annotated genome assembly provides a reliable genomic resource for identifying genome variants among individuals, mining candidate genes that control growth and wood quality traits, and facilitating further application of genomics-assisted breeding in populations related to P. deltoides. © The American Genetic Association. 2021.

Entities: Chemical

Keywords: zzm321990 Populus deltoideszzm321990 ; Nanopore sequencing; chromosome conformation capture technology; genome assembly

Mesh：

Year: 2021 PMID： 33730157 PMCID： PMC8141683 DOI： 10.1093/jhered/esab010

Source DB: PubMed Journal: J Hered ISSN： 0022-1503 Impact factor: 2.645

Populus deltoides is one of the approximately 30 species in the genus Populus (Salicaceae), which can be classified into 6 sections, including Abaso, Aigeiros, Leucoides, Populus, Tacamahaca, and Turanga (Eckenwalder 1996). Populus deltoides belongs to the section Aigeiros and is naturally distributed from the southeastern United States to southern Canada (Fahrenkrog et al. 2017). It plays an important role in forest breeding research and its genetic resources have become the main gene donors of poplar cultivars (Chen et al. 2020). One of the most important cultivars generated from this species, namely I-69, has the superior characteristics of rapid growth and resistance to Marssonina leaf spot disease, which usually causes significant economic and ecological losses in poplar plantations in most regions of China (Han et al. 2000; Zhu et al. 2012). To accelerate the domestication of P. deltoides for improving the wood quality and yield to meet the growing demand in industry, a variety of genetic and genomic resources need to be developed, especially the high-quality genome sequence (Fang et al. 2018). Furthermore, an accurate genome sequence would be beneficial to performing comparative genomics and studying the evolution of the species. In forest trees, poplar genomes are relatively small, but they are highly heterozygous, and possess abundant repetitive sequences (Michael and VanBuren 2020). In recent years, the genomes of several poplar species have been sequenced, assembled, and annotated, including Populus trichocarpa (Tuskan et al. 2006), Populus pruinosa (Yang et al. 2017), Populus alba (Liu et al. 2019), Populus euphratica (Zhang et al. 2020), and Populus simonii (Wu et al. 2020). Although the genome sequence of P. deltoides WV94 is available in the Phytozome database (http://phytozome.jgi.doe.gov), it contains many gaps (3.2%) and has a lot of sequences that were unanchored into chromosomes, thereby impeding downstream analysis. Therefore, there is an active demand to improve the current assembly of P. deltoides genome. Recent advances in Nanopore sequencing and chromosome conformation capture (Hi-C) technologies play an important role in improving plant genomes at the chromosome level (Giani et al. 2020). Nanopore sequencing exhibits many advantageous qualities, including high sequencing throughput, low cost, and extremely long-read lengths that can span repetitive regions (Cali et al. 2019). Moreover, Hi-C technology has emerged as a robust tool to reconstruct genomes at the chromosome level, providing long-range information about the grouping and linear organization of sequences along entire chromosomes (Burton et al. 2013). Recently, several high-quality plant genomes had been obtained by using Nanopore sequencing and Hi-C technologies, including Eriobotrya japonica (Jiang et al. 2020), Aquilaria sinensis (Ding et al. 2020), and Asparagus setaceus (Li et al. 2020). In this study, we performed the de novo genome assembly of the P. deltoides cultivar I-69 using Nanopore sequencing and Hi-C technologies. The P. deltoides I-69 was originally collected from a natural population in Illinois, United States, and selected as a cultivar in Italy in the 1950s. Later on, it was introduced to China in 1972 and greatly improved industrial timber production, agroforestry and ecoremediation since then (Zhang et al. 2009). As a female parent, the P. deltoides I-69 was crossed with P. simonii L-3 to generate an F1 hybrid population for constructing genetic linkage map and locating quantitative trait loci (QTL) in Populus (Mousavi et al. 2016; Tong et al. 2016). Although we successfully obtained the genome sequence of the male parent of P. simonii in a previous study (Wu et al. 2020), the lack of high-quality genomic information of the female parent P. deltoides I-69 was a barrier to develop new cultivars in the F1 hybrid population. Therefore, we conducted the whole genome sequencing and performed de novo genome assembly of the female parent, resulting in an improved genome sequence of P. deltoides. The high-quality assembly of the P. deltoides genome provided a valuable resource for poplar breeding and genetic improvement, as well as comparative genomics analysis with related species.

Methods

Biological Materials

A single cutting of the P. deltoides cultivar I-69 was collected from the Siyang Forest Farm in Siyang country, Jiangsu Province, China and planted at the Xiashu Forest Farm of Nanjing Forestry University, Jurong, Jingsu Province, China in 2012 (Mousavi et al. 2016). Fresh leaves were collected from the clonal tree, immediately transferred to liquid nitrogen, and stored until DNA extraction.

DNA Sequencing and Genome Assembly

Genomic DNA was extracted using the Qiagen DNeasy Blood and Tissue kit and evaluated on a 0.75% agarose gel. The concentration and purity of the genomic DNA were determined by Nanodrop (OD260/280) and Qubit (Thermo Fisher Scientific). To generate Nanopore long reads, ~15 µg of genomic DNA was size-selected (>20 Kb) with a Blue Pippin System (Sage Science). According to 1D Genomic DNA by Ligation protocol (Oxford Nanopore Technologies, ONT), ONT SQK-LSK108 library preparation kit was used to construct PCR-free libraries. Nextomics (Nextomic Biosciences Technologies Corporation, Wuhan, China) and Biozeron (Shanghai Biozeron Biotechnology Corporation, Shanghai, China) performed the Nanopore sequencing of prepared libraries on 2 flow-cells using the GrildION X5 sequencer according to the manufacturer’s instructions (Oxford Nanopore, Oxford, UK). The raw signal data were base-called using ONT MinKNOW software (v1.4.2), and the statistics of the generating Nanopore sequencing data were calculated using Nanostat (v0.8.1) (De Coster et al. 2018). In addition, using the same individual tree described above, we sequenced the whole genome of P. deltoides by the Illumina HiSeq 2000 sequencing platform at BMK (Biomarker Technologies Corporation, Beijing, China). The detailed procedure used for whole-genome sequencing was described in Mousavi et al. (2016). The paired-end (PE) reads data are available from the NCBI SRA database under the accession number SRP071167. For the purpose of correcting the contigs assembled with the Nanopore sequencing data, these whole-genome sequencing data were processed to filter the low-quality reads using the NGS QC Toolkit (v2.3.3) (Patel and Jain 2012) with default parameters to obtain high-quality reads. To assist gene prediction, RNA-sequencing (RNA-seq) data were generated from samples of leaf and stem tissue. Total RNA was isolated using TRIzol Reagent (Invitrogen). RNA libraries were prepared using the TruSeq RNA library Preparation Kit (Illumina), and then sequenced on the Illumina HiSeq 2000 platform with PE mode at Beijing Genomics Institute (BGI). The generating RNA-seq data were also filtered using the NGS QC Toolkit (v2.3.3) with default parameters (Patel and Jain 2012). The Hi-C library was constructed and sequenced by Frasergen (Wuhan Frasergen Biotechnology Corporation, Wuhan, China) following the standard procedures. Briefly, nuclear chromatin was cross-linked by 1% formaldehyde at room temperature for 30 min. The cross-linked DNA was extracted and then digested with MboI restriction enzyme at 37 °C for 4 h, end-labeled with biotin, and ligated using T4 DNA ligase. The cross-links were reversed by overnight incubation at 65 °C, and the circular DNA was sheared into 300–500 bp fragments. After end repairing, the DNA fragments were purified and sequenced on Illumina HiSeq X Ten platform with PE mode. The software packages used in the genome assembly and annotation were summarized in Table 1. Oxford Nanopore sequencing data were employed for genome assembly by using the software Canu (v1.9) with default parameters (Koren et al. 2017). To improve the base accuracy, the assembled genome sequence was polished in 3 steps. First, the Nanopore reads were mapped back to the assembled contigs with minimap2 (v2.12) (Li 2018), and then these contigs were polished 3 rounds with Racon (v1.4.3) with default parameters (Vaser et al. 2017). Second, one round of Medaka (v1.4.3) (https://github.com/nanoporetech/medaka) polishing was performed with parameter “-m r941_min_fast_g303” using the Nanopore reads. Third, high-quality Illumina short reads were mapped to the Medaka consensus sequence using BWA-MEM (v0.7.17) (Li 2013), and then subjected to 3 rounds of polishing with NextPolish (v1.1.0) (Hu et al. 2020). Additionally, considering the high heterozygosity of forest tree genome could hinder downstream analysis, we aligned the Nanopore reads to the polished contigs using Minimap2 (v2.12) (Li 2018). Based on this alignment, Purge Haplotigs (v1.1.1) (Roach et al. 2018) was used to identify and remove redundant heterozygous sequences, resulting in the draft genome of the P. deltoides I-69.

Table 1.

Software used for genome assembly and annotation pipeline in the current study

	Software	Version
Genome assembly pipeline
De novo assembly	Canu	v1.9
Contig polishing	Racon	v1.4.3
	Medaka	v1.4.3
	NextPolish	v1.1.0
Remove heterozygous sequences	Minimap2	v2.12
	Purge Haplotigs	v1.1.1
Hi-C contact map generation	Juicer	v1.7.6
Hi-C scaffolding	3D-DNA	v180114
Manual assembly inspection	Juicebox Assembly Tools	v1.11.8
Genome evaluation
Assembly completeness	BUSCO	v4.0.1
Read mapping	BWA-MEM	v0.7.17
SNP calling	Sambamba	v0.7.1
	GATK	v4.1.8
Repeat annotation
Construct a de novo repeat library	RepeatModeler	v2.0.1
Repeat assessment	RepeatMasker	v4.1.0
Gene prediction
De novo transcript assembly	Trinity	v2.1.1
Transcript evidence	PASA	v2.3.3
Homology prediction	tBlastn	v2.2.31
	Exonerate	v2.4.0
Ab initio prediction	SNAP	v2006-07-28
	GeneMark-ES	v4.59
	AUGUSTUS	v3.3.3
Integrate all gene structures	EVM	v1.1.1
Update final gene sets	PASA	V2.3.3

Software used for genome assembly and annotation pipeline in the current study To obtain the chromosome-level genome, we aligned the Hi-C reads to the draft genome and computed the Hi-C contact frequency between genomic loci using the Juicer pipeline (v1.7.6) (Durand et al. 2016). Afterwards, the contigs were corrected for mis-joins, ordered, oriented, and anchored into a candidate chromosome-length assembly using the 3D-DNA (v180114) with default parameters (Dudchenko et al. 2017). The candidate assembly was visualized with the Juicebox Assembly Tools (JBAT v1.11.8) (Durand et al. 2016). Then the misassembled contigs or the mis-joined scaffolds were manually detected and corrected based on the appearance of a bright band of elevated contact frequency along the diagonal of the Hi-C heatmap (Dudchenko et al. 2018). After JBAT review, we used the “run-asm-pipeline-post-review.sh” in 3D-DNA to generate the adjusted assembly, and then the scaffolds shorter than 2 Kb were removed.

Genome Evaluation

To assess the quality of the genome assembly, we mapped the high-quality Illumina short reads back to the final assembled genome using BWA-MEM (v0.7.17) (Li 2013). Then the duplicated reads were marked using the “markdup” function of Sambamba (v0.7.1) (Tarasov et al. 2015), and variant calling was performed to evaluate the accuracy of the genome at the single-base level using GATK HaplotypeCaller (v4.1.8) (DePristo et al. 2011). We further performed Benchmarking Universal Single-Copy Orthologs (BUSCO v4.0.1) analysis to evaluate the completeness of the P. deltoides I-69 assembly with embryophyta_odb10 database (Simao et al. 2015).

Repetitive Sequences Annotation

Plant genomes usually consist of large numbers of repeat elements that have been demonstrated to have structural and functional roles (Biscotti et al. 2015). Repetitive sequences were annotated using RepeatMasker (v4.1.0) (Tarailo-Graovac and Chen 2009) based on a combined library generated by de novo-based and homology-based approaches. We used the software RepeatModeler (v2.0.1) (Tarailo-Graovac and Chen 2009) to construct a de novo repeat library with the assembled genome sequence as input. Then combined with the known repeat library from Repbase (Bao et al. 2015), RepeatMasker was used to perform a homology-based repeat search. To ensure the integrity of genes in the subsequent analyses, low complexity or simple repeats were not masked because some of these sequences could be within genes (Xing et al. 2019).

Gene Prediction

The repeat-masked genome was used for annotating gene models by combining the methods of homology-based, RNA-Seq-assisted, and ab initio predictions. With the homology-based prediction approach, protein sequences from the genome annotations of P. trichocarpa (Tuskan et al. 2006), P. simonii (Wu et al. 2020), Salix suchowensis (Wei et al. 2020), P. euphratica (Zhang et al. 2020), and P. deltiodes WV94 (http://phytozome.jgi.doe.gov) were collected and aligned against the genome using tBlastn (v2.2.31) (Altschul et al. 1997). The blast hits were refined for exact intron and exon positions using Exonerate (v2.4.0) (Slater and Birney 2005). For RNA-seq-assisted prediction, high-quality Illumina RNA-Seq reads were de novo assembled into transcripts using Trinity (v2.1.1) with default parameters (Grabherr et al. 2011), and then these transcripts were aligned to the genome using PASA (v2.3.3) (Haas et al. 2003) to determine the potential gene models. For ab initio prediction, the following 3 ab initio prediction software packages were used: AUGUSTUS (v3.3.3) (Stanke et al. 2006), SNAP (v2006-07-28) (Korf 2004), and GeneMark (v4.59) (Ter-Hovhannisyan et al. 2008). The prediction results obtained from the above 3 methods were integrated by EVidenceModeler (v1.1.1) (Haas et al. 2008). Finally, PASA was used once again, in conjunction with transcripts, to update the untranslated regions and alternate splicing information of the predicted genes. The quality of the gene annotation was assessed using BUSCO (v4.0.1) (Simao et al. 2015) with the parameters: “-m proteins -l embryophyta_odb10.”

Results

Genome Assembly

A total of 44.4 Gb long-read data was obtained from the Oxford Nanopore sequencing platform, covering ~100× of the P. deltoides genome, with an average sequence length of 16 Kb and N50 of 25 Kb (Supplementary Figure 1). In addition to the long-read data, the data of Illumina short reads, Hi-C reads, and RNA-seq reads were also obtained and summarized in Table 2. De novo assembly was performed using the Canu software with the Nanopore sequencing data, and further polished for base accuracy in 3 steps (see Methods). As a result, the primary assembly of the P. deltoides consisted of 2186 contigs, amounting to 545 Mb with a contig N50 length of 1.83 Mb, and the length of longest contigs reached 16.8 Mb. After removing the potential heterozygous sequences in the primary genome (Supplementary Figure 2), the assembly size reduced to 429 Mb, with the number of contigs reduced to 657 and length of contig N50 increased to 2.62 Mb (Table 3).

Table 2.

Summary of sequencing data for the genome assembly and annotation of Populus deltoides I-69

Data type	Number of reads	Total bases(bp)	Coverage
HQ Illumina sequencing	235 343 502	23 769 512 714	55×
Nanopore sequencing	2 752 090	44 352 852 004	100×
Hi-C sequencing	340 036 044	51 005 406 600	118×
HQ RNA-seq of stem	49 765 508	7 464 826 200	17×
HQ RNA-seq of leaf	47 295 142	4 256 562 780	10×

HQ, high-quality.

Table 3.

Comparison between the assemblies of Populus deltoides WV94 and P. deltoides I-69 genomes

Assembly statistics	P. deltoides WV94	P. deltoides I-69
Assembly size (Mb)	446.8	429.3
Total number of contigs	NA	657
Longest contigs (Mb)	NA	16.8
Contig N50 (Mb)	0.59	2.62
Scaffold N50 (Mb)	21.7	21.5
Total number of scaffolds	1375	934
19 chromosomes (%)	90.2	97.4
Gaps (%)	3.2	0.08
GC content (%)	32.32	33.38
Complete BUSCOs (%)	97.7	98.2

NA, not applicable.

Summary of sequencing data for the genome assembly and annotation of Populus deltoides I-69 HQ, high-quality. Comparison between the assemblies of Populus deltoides WV94 and P. deltoides I-69 genomes NA, not applicable. To construct the chromosome-level genome, a total of 51.04 Gb Hi-C reads were generated. Before scaffolding the assembled contigs, we assessed the quality of Hi-C data, of which 99.18% were mapped to the draft genome and 46.9% were uniquely mapped. Based on Hi-C linking information (Figure 1), 421 contigs were identified as containing assembly errors and broken into 1716 contigs by using the 3D-DNA software. Then, 3D-DNA assigned 776 contigs with a total size of 418 Mb (97.4%) into 19 groups, perfectly matching the karyotype of Populus. Each group represented a chromosome-level assembly of which the order number was determined by aligning its sequence to the genome sequences of the P. deltoides W94 using Minimap2 (v2.12) (Li 2018). The length of the largest chromosome was 53.17 Mb, while the smallest one was 13.45 Mb. The scaffold N50 of the chromosome-level genome assembly reached 21.51 Mb (Supplementary Table 1).

Figure 1.

The genome-wide Hi-C interaction heatmap of 19 chromosomes in Populus deltoides. Each chromosome is framed with a bigger block, and each contig is framed with a smaller block. Heatmap shows Hi-C interactions under the resolution of 500 Kb. Deeper color indicates higher contact frequency. Based on the alignment of the high-quality Illumina short reads (Table 2) to the genome, the high mapped rate (98.82%) and coverage rate (96.9%) indicated a high consistency between the reads and the genome assembly. Through variant calling, we identified a total of 2 744 622 SNPs (0.64% of the genome), of which 63 341 SNPs (0.0147% of the genome) belonged to homozygous SNPs, indicating a high accuracy of genome assembly at the single-base level (Liang et al. 2020). The completeness of the genome assembly was evaluated using BUSCO with embryophyta_odb10 database. Of the 1614 plant-specific orthologs, 1585 (98.2%) were identified to be complete BUSCO genes, with 0.3% being partial BUSCOs identified and only 1.5% missed. Overall, all the results suggested that the quality of the assembly was high with respect to the base level accuracy and the completeness of the assembly. In addition, the alignment of our P. deltoides I-69 genome to the P. deltoides W94 genome with the MUMmer software (Kurtz et al. 2004) showed that there was a high degree of synteny between the 2 genome sequences (Krzywinski et al. 2009) (Supplementary Figure 3). Moreover, the differences in chromosome sizes among the genomes available in Populus were summarized in Supplementary Table 2. Repetitive sequences analysis of the assembly showed that the interspersed repeats comprised approximately 169 Mb (39.28%) of the P. deltoides genome, including retroelements (23.66%), DNA transposons (6.83%), and unclassified elements (8.79%) (Supplementary Table 3). This result was similar to the genomes of P. trichocarpa (44.06%) and P. alba (41.08%) (Liu et al. 2019). The Trinity assembly of the RNA-seq data contained 117 189 transcripts with a contig N50 of 2134 bp. Combining the ab initio, homology-based, and RNA-seq-assisted predictions, a total of 32 245 genes encoding 44 362 proteins were predicted in the P. deltoides genome. The average CDS length was 1258bp, with the average number of exons was 5.6 for a single CDS (Supplementary Table 4). With the obvious improvement in the contiguity of the P. deltoides I-69 genome, BUSCO analysis showed that the P. deltoides I-69 had more complete BUSCO genes identified (95.2%) than P. deltoides WV94 (92.7%) in the protein-coding genes (Figure 2). Furthermore, the software OrthoVenn2 was used to identify orthologous genes among the genomes of P. deltoides, P. euphratica, P. trichocarpa, P. simonii, and S. suchowensis (Lee et al. 2020). These species shared a total of 14 446 orthologous groups, and 986 unique orthologous groups for P. deltoides were identified (Supplementary Figure 4).

Figure 2.

Comparison of the BUSCO analysis between genome assemblies and gene annotations of Populus deltoides I-69 and P. deltoides WV94.

Discussion

The genome sequence of the P. deltoides I-69 was assembled by integrating Nanopore sequencing and Hi-C technologies, exhibiting some improvements over the genome assembly of P. deltoides WV94 which was assembled using PacBio sequencing technology and a genetic linkage map (http://phytozome.jgi.doe.gov). First, the contig N50 length of our assembly reached 2.62 Mb, a 3.5-fold improvement over that of P. deltoides WV94 (0.59 Mb), possibly due to the Nanopore reads generally being longer than the SMRT-seq reads (Rice and Green 2019). Second, although P. deltoides WV94 genome had its contigs assigned into chromosomes based on a genetic linkage map, there remained 1356 scaffolds with total size of 43.6 Mb unanchored, accounting for 9.8% of the assembly. By applying Hi-C technology, we accurately constructed 19 chromosomes with total size of 418 Mb accounting for 97.4% of the assembly size, only leaving about 12 Mb sequences unanchored into chromosomes. Third, the genome completeness analysis showed that more completely conserved genes (98.2%) were found in the P. deltoides I-69 than P. deltoides WV94 (97.5%) (Figure 2). These aspects demonstrated that our newly assembled genome achieved a higher level of continuity and quality, suggesting that it can be a good alternative resource for Populus. High heterozygosity is a common feature in forest tree genomes, and the software Canu assembles heterozygous regions into separate contigs when a pair of allelic sequences exceeds a certain threshold of nucleotide diversity, rather than the expected single haplotype-fused contig (Roach et al. 2018). Thus, the overall assembly size resulted from Canu will be larger than the haploid genome size, making it difficult to scaffold contigs and predict genes downstream (Guan et al. 2020). To solve such a problem, we applied the Purge Haplotigs software to remove the heterozygous sequences of the P. deltoides I-69 genome in this study (Roach et al. 2018). Before the analysis with Purge Haplotigs, the assembly size of the P. deltoides I-69 genome was 545 Mb, substantially exceeding the size of the existed P. deltoides WV94 genome (447 Mb). After the removing process, the assembly size of the P. deltoides I-69 reduced to 429 Mb, and the contig N50 increased from 1.8 to 2.62 Mb. Meanwhile, BUSCO analysis showed that the proportion of complete BUSCO genes was also increased from 96.6% to 98.2%. The most obvious effect was that the duplicated BUSCO proportion decreased from 30.9% to 15.7%, and the proportion of the complete and single-copy BUSCOs increased from 65.7% to 82.5%, suggesting that the heterozygosity of the P. deltoides I-69 genome reached a lower level (Ran et al. 2019). With the advances in genomic assembly technologies, Hi-C technology was able to identify more complete sequences at the chromosome level, but it still contained some assembly errors during scaffolding (Giani et al. 2020). For instance, the folding of chromatin in the topologically associated domain could result in higher interaction frequencies of distant regions along chromosome, which will potentially cause misestimation of the distance between 2 adjacent regions along chromosome, leading to some artificial inversions and mis-joined scaffolds during assembly (Oddes et al. 2018; Xu and Dixon 2020). These errors can be better corrected by resorting a dense genetic map, so that several species had applied genetic linkage maps to refine assemblies at present (Maroso et al. 2018; Gaur et al. 2020; Shi et al. 2020). Therefore, efforts could be made to construct a high-quality, high-density linkage map of P. deltoides for further refining the P. deltoides genome assembly. The current genome assembly of the P. deltoides I-69 provides an essential genetic resource for mining genes underlying economically important traits and studying the evolution of the species in Populus. First, the new assembly provides a specific parental reference sequence for extracting tens of thousands of SNP genotypes across an F1 hybrid population in which the P. deltoides I-69 was as a female parent (Mousavi et al. 2016; Tong et al. 2016). With the large number of SNPs, high-density genetic linkage maps of the parents can be constructed and thus the QTL mapping or genome-wide association studies could be conducted for growth and woody traits (Tong et al. 2020; Chen et al. 2021). Second, this genome sequence allowed us to identify genes unique to the cultivar I-69 through orthologous analysis (Supplementary Figure 4). The unique genes may contain some detrimental genes that could cause this cultivar to have a poor rooting ability (Zhang et al. 2009). It is very important to purge these detrimental genes by selective breeding so as to develop new cultivars with improved rooting capacity for increasing the survival rate and expanding the scale of planting. Finally, the improved genome sequence would be beneficial to further elucidating the inconsistent results of phylogenetic relationships among P. deltoides, P. trichocarpa, and P. simonii. According to morphology, P. trichocarpa and P. simonii were assigned into the same section of Tacamahaca, while P. deltoides belonged to the different section of Aigeiros (Eckenwalder 1996), indicating that P. trichocarpa and P. simonii were more closely related than any other pair of the 3 species. However, recent molecular evolution analyses showed that the relationship of P. deltoides with P. trichocarpa was closer than with P. simonii (Zong et al. 2019; Wu et al. 2020). Click here for additional data file.

57 in total

1. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training.

Authors: Vardges Ter-Hovhannisyan; Alexandre Lomsadze; Yury O Chernoff; Mark Borodovsky
Journal: Genome Res Date: 2008-08-29 Impact factor: 9.043

2. Repbase Update, a database of repetitive elements in eukaryotic genomes.

Authors: Weidong Bao; Kenji K Kojima; Oleksiy Kohany
Journal: Mob DNA Date: 2015-06-02

3. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors: Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal: Bioinformatics Date: 2015-06-09 Impact factor: 6.937

4. Sambamba: fast processing of NGS alignment formats.

Authors: Artem Tarasov; Albert J Vilella; Edwin Cuppen; Isaac J Nijman; Pjotr Prins
Journal: Bioinformatics Date: 2015-02-19 Impact factor: 6.937

5. Improved genome assembly provides new insights into genome evolution in a desert poplar (Populus euphratica).

Authors: Zhiyang Zhang; Yang Chen; Junlin Zhang; Xinzhi Ma; Yiling Li; Mengmeng Li; Deyan Wang; Minghui Kang; Haolin Wu; Yongzhi Yang; Matthew S Olson; Stephen P DiFazio; Dongshi Wan; Jianquan Liu; Tao Ma
Journal: Mol Ecol Resour Date: 2020-03-04 Impact factor: 7.090

6. A high-density SNP-based linkage map using genotyping-by-sequencing and its utilization for improved genome assembly of chickpea (Cicer arietinum L.).

Authors: Rashmi Gaur; Subodh Verma; Seema Pradhan; Heena Ambreen; Sabhyata Bhatia
Journal: Funct Integr Genomics Date: 2020-08-27 Impact factor: 3.410

7. De novo SNP discovery and genetic linkage mapping in poplar using restriction site associated DNA and whole-genome sequencing technologies.

Authors: Mohaddeseh Mousavi; Chunfa Tong; Fenxiang Liu; Shentong Tao; Jiyan Wu; Huogen Li; Jisen Shi
Journal: BMC Genomics Date: 2016-08-18 Impact factor: 3.969

8. Fast and accurate de novo genome assembly from long uncorrected reads.

Authors: Robert Vaser; Ivan Sović; Niranjan Nagarajan; Mile Šikić
Journal: Genome Res Date: 2017-01-18 Impact factor: 9.043

9. De Novo Genome Assembly of Populus simonii Further Supports That Populus simonii and Populus trichocarpa Belong to Different Sections.

Authors: Hainan Wu; Dan Yao; Yuhua Chen; Wenguo Yang; Wei Zhao; Hua Gao; Chunfa Tong
Journal: G3 (Bethesda) Date: 2020-02-06 Impact factor: 3.154

10. Gene finding in novel genomes.

Authors: Ian Korf
Journal: BMC Bioinformatics Date: 2004-05-14 Impact factor: 3.169

1 in total

Review 1. Advanced Breeding for Biotic Stress Resistance in Poplar.

Authors: Chiara Biselli; Lorenzo Vietto; Laura Rosso; Luigi Cattivelli; Giuseppe Nervo; Agostino Fricano
Journal: Plants (Basel) Date: 2022-08-04

1 in total