| Literature DB >> 31806765 |
Hainan Wu1, Dan Yao1, Yuhua Chen1, Wenguo Yang1, Wei Zhao1, Hua Gao1, Chunfa Tong2.
Abstract
Populus simonii is an important tree in the genus Populus, widely distributed in the Northern Hemisphere and having a long cultivation history. Although this species has ecologically and economically important values, its genome sequence is currently not available, hindering the development of new varieties with wider adaptive and commercial traits. Here, we report a chromosome-level genome assembly of P. simonii using PacBio long-read sequencing data aided by Illumina paired-end reads and related genetic linkage maps. The assembly is 441.38 Mb in length and contain 686 contigs with a contig N50 of 1.94 Mb. With the linkage maps, 336 contigs were successfully anchored into 19 pseudochromosomes, accounting for 90.2% of the assembled genome size. Genomic integrity assessment showed that 1,347 (97.9%) of the 1,375 genes conserved among all embryophytes can be found in the P. simonii assembly. Genomic repeat analysis revealed that 41.47% of the P. simonii genome is composed of repetitive elements, of which 40.17% contained interspersed repeats. A total of 45,459 genes were predicted from the P. simonii genome sequence and 39,833 (87.6%) of the genes were annotated with one or more related functions. Phylogenetic analysis indicated that P. simonii and Populus trichocarpa should be placed in different sections, contrary to the previous classification according to morphology. The genome assembly not only provides an important genetic resource for the comparative and functional genomics of different Populus species, but also furnishes one of the closest reference sequences for identifying genomic variants in an F1 hybrid population derived by crossing P. simonii with other Populus species.Entities:
Keywords: Illumina sequencing; PacBio sequencing; Populus simonii; genetic linkage maps; genome assembly
Year: 2020 PMID: 31806765 PMCID: PMC7003099 DOI: 10.1534/g3.119.400913
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Integrated work-flow for sequencing, assembly and annotation of the Populus simonii genome.
Clean data# generated by the PacBio sequel platform
| Cell ID# | Reads Num# | Total Bases(bp) | Reads N50(bp) | Mean Length(bp) | Longest Read(bp) |
|---|---|---|---|---|---|
| 4_D03 | 427,147 | 2,583,955,526 | 11,500 | 6,049 | 70,237 |
| 3_G01 | 651,595 | 4,705,217,582 | 12,948 | 7,221 | 84,525 |
| 4_H01 | 329,754 | 2,855,303,758 | 13,761 | 8,659 | 61,755 |
| 2_B05 | 509,634 | 5,109,633,303 | 14,523 | 10,026 | 84,905 |
| 1_A09 | 827,878 | 6,622,523,144 | 13,683 | 7,999 | 96,519 |
| G01 | 666,295 | 7,840,437,521 | 19,479 | 11,767 | 112,390 |
| Total | 3,412,303 | 29,717,070,834 | 15,473 | 8,709 | 112,390 |
Clean data: The sequences remaining after filtering out low-quality reads and adapters.
Cell ID: Chip ID.
Reads Num: number of reads.
Statistics of high-quality reads data and library information
| Data Type | Platform | Number of Reads | Bases(bp) | Insert Size(bp) | Average Read Length(bp) |
|---|---|---|---|---|---|
| DNA Sequence (short insert) | Illumina | 256,124,774 | 25,868,602,174 | 300-500 | 101 |
| DNA Sequence (long insert) | PacBio Sequel | 3,147,743 | 29,647,972,079 | 20,000 | 9,418 |
| RNA-Seq (short insert) | Illumina | 94,464,940 | 8,501,844,600 | 300-500 | 90 |
Statistics of the Populus simonii assembly
| Method# | Type | Genome Size (Mb) | Sequence number | Longest sequence (Mb) | N50 length (Mb) |
|---|---|---|---|---|---|
| FALCON (G1) | contig | 447 | 911 | 8.12 | 1.89 |
| FALCON-unzip (G2) | contig | 440 | 722 | 8.13 | 1.93 |
| G3 | contig | 441 | 686 | 8.12 | 1.94 |
| ALLMAPS (G4) | scaffold | 441 | 369 | 52.00 | 19.6 |
G1: assembled by the software FALCON; G2: assembled by the software FALCON-unzip; G3: corrected by the software Pilon and Arrow; G4: assembled by combining the genetic maps using the software ALLMAPS.
Summary of the consensus map
| Anchored | Oriented | Unplaced | |
|---|---|---|---|
| Markers (unique) | 5,971 | 5,755 | 126 |
| Markers per Mb | 15.0 | 15.5 | 2.9 |
| N50 contigs# | 70 | 69 | 0 |
| Total number of contigs | 336 | 255 | 350 |
| Contigs with 1 marker | 44 | 0 | 44 |
| Contigs with 2 markers | 32 | 13 | 24 |
| Contigs with 3 markers | 26 | 21 | 7 |
| Contigs with >=4 markers | 234 | 221 | 3 |
| Total bases (bp) | 398,322,698 | 371,036,734 | 43,052,653 |
| Percentage of genome | 90.2% | 84.1% | 9.8% |
N50 contigs: the number of contigs longer than or equal to the contig N50.
Summary statistics of annotated repeats
| Type | Number of elements | Length occupied (bp) | Percentage of sequence |
|---|---|---|---|
| DNA | 40,251 | 17,621,301 | 3.99 |
| LINE | 4,300 | 3,138,076 | 0.71 |
| SINE | 8,971 | 1,758,144 | 0.40 |
| LTR | 69,002 | 47,295,184 | 10.72 |
| Unknown | 232,651 | 107,493,878 | 24.35 |
| Simple repeats | 143,553 | 5,723,970 | 1.30 |
| Total | 498,728 | 183,030,553 | 41.47 |
Figure 2Gene Ontology (GO) function annotation of Populus simonii using WEGO 2.0 (Ye ). The horizontal axis shows the GO classification types, and the vertical axis represents the number of annotated protein-coding genes.
Figure 3Shared gene families and their distribution per species. Venn diagram showing the shared gene families between the selected Populus species: Populus simonii, Populus trichocarpa, Populus deltoides, and Populus euphratica. The histogram represents the total number of gene families for each species. The numbers of gene families shared by 4, 3, 2, and only one of these species are presented at the bottom.
Figure 4Phylogenetic relationships of Populus simonii and related species. A maximum likelihood phylogenetic tree of P. simonii and 8 other plant species was constructed through the concatenated alignment of 966 1-to-1 single-copy orthologous genes and then using RAxML software with the model JTT+G+F. The number on the nodes represents the bootstrap support value estimated from 1000 bootstrap tests.