| Literature DB >> 31241155 |
Qiang Liang1, Huayang Li2, Shouke Li3, Fuling Yuan1, Jingfeng Sun1, Qicheng Duan1, Qingyun Li2, Rui Zhang2, Ya Lin Sang1, Nian Wang1, Xiangwen Hou4, Ke Qiang Yang1, Jian Ning Liu4, Long Yang2.
Abstract
BACKGROUND: Yellowhorn (Xanthoceras sorbifolium Bunge), a deciduous shrub or small tree native to north China, is of great economic value. Seeds of yellowhorn are rich in oil containing unsaturated long-chain fatty acids that have been used for producing edible oil and nervonic acid capsules. However, the lack of a high-quality genome sequence hampers the understanding of its evolution and gene functions.Entities:
Keywords: 10X Genomics Chromium; BioNano Genomics; Illumina paired-end sequencing; PacBio sequencing; high-throughput chromosome conformation capture; yellowhorn (Xanthoceras sorbifolium Bunge)
Mesh:
Year: 2019 PMID: 31241155 PMCID: PMC6593362 DOI: 10.1093/gigascience/giz071
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Morphological characteristic of yellowhorn superior “WF18”. (A) Raceme and shoot. (B) Hermaphrodite flower at 1, 3, and 5 DPF (days post flower). (C) Capsular fruits. (D) Seeds and kernel.
Statistics of Illumina, 10X Genomics, and Hi-C sequencing data
| Platform | Library type | Read length (bp) | No. of raw reads (Mb) | Reads retained after trimming (Mb) | Total valid base (Gb) |
|---|---|---|---|---|---|
| Illumina | 280 bp size | 150 | 451.34 | 439.84 | 65.98 |
| 450 bp size | 150 | 696.88 | 658.72 | 98.81 | |
| 10X Genomics | 350 bp size | 150 | 457.40 | 457.40 | 63.35 |
| Hi-C | 600 bp size | 150 | 932.76 | 891.72 | 133.76 |
Figure 2:Flowchart of genome assembly and annotation. GTF: gene transfer format; LR: long read; PE: paired end.
Statistics of PacBio Sequel sequencing data
| Index | PacBio |
|---|---|
| Total No. of reads | 7,062,244 |
| Mean length of raw reads (bp) | 226,712 |
| N50 of raw reads (bp) | 374,500 |
| Mean length of subreads (bp) | 156,717 |
| N50 of subreads (bp) | 237,539 |
| Coverage (X) | 160.51 |
*Coverage (X) = (read count * read length)/estimated genome size.
Summary of yellowhorn genome assembly
| Statistics | Contigs | Contigs (polished) | 10X Genomics | BioNano | Hi-C | |
|---|---|---|---|---|---|---|
| Scaffold | Chromosome | |||||
| Total No. | 2,002 | 2,002 | 707 | 29 | 267 | 15 |
| Length (bp) | ||||||
| Total | 505,787,109 | 508,445,799 | 513,924,146 | 461,662,473 | 439,965,977 | 419,835,445 |
| N50 | 642,338 | 645,453 | 2,334,658 | 29,979,918 | 29,432,808 | 29,432,808 |
| N90 | 113,799 | 114,103 | 492,748 | 15,941,042 | 17,893,618 | 17,893,618 |
| Maximum | 4,375,484 | 4,395,303 | 21,312,255 | 75,772,594 | 39,123,600 | 39,123,600 |
| GC content (%) | 35.25 | 35.13 | 34.67 | 32.39 | 32.76 | 34.18 |
Figure 3:Contact maps of Hi-C links among chromosomes. The blue squares represent the draft scaffold. The green squares represent the chromosome-length superscaffold. The color bar illuminates the Hi-C contact density in the plot.
Figure 4:Yellowhorn genome features. The chromosome size is shown in Mb scale. The syntenic blocks are represented by curves in the center of the graph. The figure was created using the circos software package v. 0.69. GC: guanine-cytosine.
BUSCO assessment of yellowhorn genome assembly
| Description | No. (%) |
|---|---|
| Complete BUSCOs (C) | |
| Complete and single-copy BUSCOs (S) | 1,175 (81.60) |
| Complete and duplicated BUSCOs (D) | 43 (2.98) |
| Fragmented BUSCOs (F) | 23 (1.60) |
| Missing BUSCOs (M) | 199 (13.82) |
| Total BUSCO groups | 1,440 (100) |
Repeat content of yellowhorn genome assembly
| Category | Term | Length (bp) | Percentage of genome (%) |
|---|---|---|---|
| DNA transposons | DNA | 374,909 | 0.09 |
| DNA/CMC-EnSpm | 1,699,637 | 0.39 | |
| DNA/MuLE-MuDR | 3,896,024 | 0.89 | |
| DNA/PIF-Harbinger | 1,104,979 | 0.25 | |
| DNA/TcMar-Pogo | 94,067 | 0.02 | |
| DNA/hAT-Ac | 4,103,980 | 0.93 | |
| DNA/hAT-Tag1 | 890,950 | 0.20 | |
| DNA/hAT-Tip100 | 1,213,576 | 0.28 | |
| SINEs | SINE | 343 | 0 |
| SINE/tRNA | 10,674 | 0 | |
| LINEs | LINE/L1 | 16,861,661 | 3.83 |
| LTRs | LTR | 2,861 | 0 |
| LTR/Caulimovirus | 1,360,538 | 0.31 | |
| LTR/Copia | 52,384,264 | 11.91 | |
| LTR/Gypsy | 51,370,228 | 11.68 | |
| LTR/Pao | 88 | 0 | |
| Low_complexity | 1,516,978 | 0.34 | |
| RCs | 4,215 | 0 | |
| RC/Helitron | 5,949 | 0 | |
| Ribosomal RNA | 64,618 | 0.01 | |
| SSRs | 6,971,711 | 1.58 | |
| Unknown | 104,792,508 | 23.76 | |
| Total | 248,724,758 | 56.39 | |
| Genome size | 439,965,977 | 100 |
LINE: long interspersed nuclear element; LTR: long terminal repeat; RC: rolling circle replication; SINE: short interspersed nuclear element; SSR: simple sequence repeat.
Figure 5:Phylogenomics analysis of yellowhorn genome. (A) OrthoMCL clusters of yellowhorn and 10 other species. (B) Phylogenetic tree and estimated divergence time of yellowhorn and 10 other species. The numbers above the branches are the predicted divergence time. The numbers below the branches are the bootstrap support value. The light blue bars at the internodes represent the 95% confidence interval. The bottom scale bar shows divergence time, with 1 time unit representing 100 MYA.
Figure 6:Tissue-specific gene analysis. (A) Venn diagram showing shared and unique genes among 5 tissues. Numbers represent the number of genes that were unique or shared. (B−D) GO enrichment of tissue-specific genes. The node size represents the gene numbers enriched in each GO category. The color bar illuminates P-value from red (low) to blue (high) in the plot.
software resources list.
|
|
|
| FASTQC |
|
| Trimmomatic |
|
| FALCON |
|
| pbalign |
|
| arrow |
|
| BWA |
|
| fragScaff |
|
| Solve |
|
| PBJelly |
|
| GMcloser |
|
| Juicer |
|
| BUSCO |
|
| QUAST |
|
| RepeatMasker |
|
| RepeatModeler |
|
| Trinity |
|
| PASA |
|
| Augustus |
|
| SNAP |
|
| GeneMark-ES/ET |
|
| Exonerate |
|
| EVidenceModeler |
|
| OrthoMCL |
|
| topGO |
|
| MAFFT |
|
| RaxML |
|
| PAML |
|
| Tophat |
|
| GenomeScope |
|
| KMC |
|
|
|
|
| RepBase plant repeat database |
|
| TimeTree database |
|
| UniProt plant protein database |
|
| NR |
|
| UniProt |
|
| Pfam database |
|
| CAZy database |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Theobroma cacao v2 |
|