| Literature DB >> 31964866 |
Xiaoyue Yang1, Zefu Wang2, Lei Zhang2, Guoqian Hao3, Jianquan Liu1,2, Yongzhi Yang4.
Abstract
Betulaceae, the birch family, comprises six living genera and over 160 species, many of which are economically valuable. To deepen our knowledge of Betulaceae species, we have sequenced the genome of a hornbeam, Carpinus fangiana, which belongs to the most species-rich genus of the Betulaceae subfamily Coryloideae. Based on over 75 Gb (~200x) of high-quality next-generation sequencing data, we assembled a 386.19 Mb C. fangiana genome with contig N50 and scaffold N50 sizes of 35.32 kb and 1.91 Mb, respectively. Furthermore, 357.84 Mb of the genome was anchored to eight chromosomes using over 50 Gb (~130x) Hi-C sequencing data. Transcriptomes representing six tissues were sequenced to facilitate gene annotation, and over 5.50 Gb high-quality data were generated for each tissue. The structural annotation identified a total of 27,381 protein-coding genes in the assembled genome, of which 94.36% were functionally annotated. Additionally, 4,440 non-coding genes were predicted.Entities:
Mesh:
Year: 2020 PMID: 31964866 PMCID: PMC6972722 DOI: 10.1038/s41597-020-0370-5
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Photograph and location of the C. fangiana tree sampled for genome sequencing. (a) A photograph of a C. fangiana individual on Emei Mountain, Leshan, Sichuan, China. (b) Location of the C. fangiana sample used for genome sequencing.
DNA sequencing metrics of C. fangiana, before and after quality control.
| Sequencing technique | Library type | Insert size (bp) | Read length (bp) | Amount of sequence | Depth (x-times) | ||
|---|---|---|---|---|---|---|---|
| Raw data (Gb) | clean data (Gb) | Raw data | clean data | ||||
| Next-generation | paired-end | 230 | 150 | 11.32 | 10.92 | 28.54 | 27.52 |
| paired-end | 500 | 150 | 10.28 | 10.21 | 25.91 | 25.73 | |
| paired-end | 800 | 150 | 15.82 | 15.64 | 39.88 | 39.42 | |
| mate pair | 2,000 | 150 | 16.49 | 6.55 | 41.56 | 16.51 | |
| mate pair | 5,000 | 150 | 13.25 | 9.71 | 33.39 | 24.47 | |
| mate pair | 10,000 | 150 | 17.97 | 10.71 | 45.30 | 27.00 | |
| mate pair | 20,000 | 150 | 29.99 | 14.12 | 75.59 | 35.59 | |
| Total | 115.12 | 77.85 | 290.17 | 196.23 | |||
| Hi-C | Hi-C | 300-700 | 150 | 52.54 | 52.19 | 132.43 | 131.55 |
Note: The data contains Next-generation and Hi-C sequencing data. The estimated genome size is 396.74 Mb.
Illumina RNA sequencing metrics, before and after quality control.
| Tissue | Raw reads | Clean reads | Raw bases (Gb) | Clean bases (Gb) |
|---|---|---|---|---|
| Bark | 19,815,362 | 19,725,663 | 5.95 | 5.92 |
| Branch | 22,825,277 | 22,766,831 | 6.85 | 6.83 |
| Bract | 22,847,208 | 22,789,778 | 6.85 | 6.84 |
| Flower | 34,835,605 | 34,834,910 | 10.45 | 10.45 |
| Fruit | 18,628,078 | 18,570,700 | 5.59 | 5.57 |
| Leaf | 21,888,088 | 22,789,778 | 6.57 | 6.55 |
Fig. 2K-mer distribution used to estimate the genome’s size. The distribution was determined based on the Jellyfish analysis using a k-mer size of 17.
Summary of C. fangiana genome assembly.
| Type | Hi-C assembly | |
|---|---|---|
| Scaffold length (bp) | 386,190,506 | 386,249,499 |
| Gap length (bp) | 30,727,985 | 30,804,875 |
| Scaffold number | 4,789 | 4,602 |
| Longest scaffold (bp) | 8,871,445 | 60,187,804 |
| Scaffold N50 (bp) | 1,908,393 | 37,105,143 |
| Scaffold N90 (bp) | 425,779 | 595,656 |
| Contig length (bp) | 355,461,404 | 355,441,862 |
| Contig number | 21,775 | 22,086 |
| Longest contig (bp) | 1,041,408 | 912,918 |
| Contig N50 (bp) | 35,323 | 34,845 |
| Contig N90 (bp) | 8,542 | 8,427 |
| GC content | 37.59% | 37.55% |
Note: The estimated genome size is 396.74 Mb. GC content of the genome without N.
Summary of the assembled chromosomes in the C. fangiana genome.
| Type | Sequence Number | Sequence Length (bp) | GenBank accession |
|---|---|---|---|
| Cfa01 | 128 | 62,383,991 | CM017321 |
| Cfa02 | 97 | 51,103,020 | CM017322 |
| Cfa03 | 107 | 42,654,226 | CM017323 |
| Cfa04 | 135 | 44,816,785 | CM017324 |
| Cfa05 | 88 | 39,651,540 | CM017325 |
| Cfa06 | 104 | 40,118,261 | CM017326 |
| Cfa07 | 92 | 39,687,453 | CM017327 |
| Cfa08 | 109 | 37,421,582 | CM017328 |
| Total Sequences Clustered (Ratio %) | 860 (16.32) | 357,836,858 (92.66) | |
| Total Sequences Ordered and Oriented (Ratio %) | 677 (78.72) | 319,127,541 (89.18) |
Repeat element metrics for the C. fangiana genome.
| Type | Length (bp) | Percent (%) |
|---|---|---|
| DNA | 14,244,548 | 3.69 |
| LINE | 15,452,667 | 4.00 |
| Low_complexity | 1,653,498 | 0.43 |
| LTR | 56,262,090 | 14.57 |
| Other | 660 | 1.71E-04 |
| RC | 1,272,200 | 0.33 |
| rRNA | 5,881 | 1.52E-03 |
| Satellite | 232,066 | 0.06 |
| Simple_repeat | 7,594,441 | 1.97 |
| SINE | 281,915 | 0.07 |
| Uknown | 61,686,663 | 15.97 |
| All | 158,686,629 | 41.08 |
Summary of predicted protein-coding genes in the C. fangiana genome.
| Gene set | Number | Average gene length (bp) | Average CDS length (bp) | Average exons per gene | Average exon length (bp) | Average intron length (bp) | |
|---|---|---|---|---|---|---|---|
| Augustus | 36,499 | 3,740.33 | 1,371.15 | 5.20 | 342.17 | 678.20 | |
| Geneid | 43,054 | 4,539.67 | 1,023.87 | 4.14 | 247.27 | 1,755.27 | |
| GeneMark | 28,642 | 1,900.29 | 892.05 | 3.15 | 283.15 | 492.58 | |
| GlimmerHMM | 45,800 | 1,657.35 | 867.05 | 2.65 | 327.78 | 398.26 | |
| SNAP | 63,982 | 1,087.42 | 656.98 | 2.62 | 250.80 | 220.80 | |
| Homolog prediction | 21,976 | 3,251.94 | 1,100.22 | 4.45 | 247.27 | 631.93 | |
| 23,733 | 3,293.62 | 1,047.44 | 4.59 | 228.23 | 633.86 | ||
| 24,493 | 3,204.43 | 1,088.71 | 4.35 | 250.14 | 639.44 | ||
| 25,252 | 3,200.15 | 1,076.69 | 4.24 | 253.84 | 662.00 | ||
| 31,130 | 2,907.56 | 990.15 | 4.00 | 247.72 | 647.70 | ||
| 32,669 | 2,901.71 | 958.97 | 3.94 | 243.51 | 668.90 | ||
| RNA seq | PASA | 33,115 | 5,076.06 | 1,100.55 | 5.09 | 414.69 | 800.10 |
| EVM | 36,585 | 3,692.57 | 1,283.06 | 4.67 | 274.71 | 1,197.00 | |
| PASA update* | 36,439 | 4,067.94 | 1,384.96 | 5.27 | 320.73 | 1,253.00 | |
| Final* | 27,381 | 3,948.29 | 1,415.09 | 5.16 | 345.16 | 1,165.54 | |
Note: *UTR regions were contained.
Summary of functional annotation in the C. fangiana genome.
| Type | Gene number | % in genome |
|---|---|---|
| Total | 27,381 | |
| GO | 19,679 | 71.87 |
| KEGG | 18,845 | 68.83 |
| InterProScan | 15,582 | 56.91 |
| Pfam | 19,688 | 71.90 |
| Uniprot_sprot | 19,733 | 72.07 |
| Uniprot_trembl | 24,110 | 88.05 |
| All | 25,836 | 94.36 |
Summary of non-coding genes in the C. fangiana genome.
| Type | Number | Average length (bp) | Total length (bp) | % of genome |
|---|---|---|---|---|
| tRNA | 632 | 76.71 | 48,478 | 0.01255 |
| rRNA | 936 | 122.70 | 114,844 | 0.03136 |
| miRNA | 197 | 124.27 | 24,481 | 0.00669 |
| snRNA | 117 | 141.58 | 16,565 | 0.00452 |
| snoRNA | 232 | 97.28 | 22,570 | 0.00616 |
| SRPRNA | 9 | 280.33 | 2,523 | 0.00069 |
| other ncRNA | 2,317 | 109.13 | 252,859 | 0.06905 |
| Total | 4,440 | 108.63 | 482,320 | 0.12490 |
Mapping ratio of Illumina DNA reads for the C. fangiana genome.
| Reads | Genome | ||
|---|---|---|---|
| Library (bp) | Mapping rate (%) | Coverage | Value (%) |
| 230 | 93.19 | at least 1x | 99.74 |
| 500 | 91.04 | at least 10x | 99.28 |
| 800 | 90.54 | at least 20x | 98.87 |
| 2 k | 99.07 | at least 30x | 98.87 |
| 5 k | 99.42 | at least 50x | 98.51 |
| 10 k | 98.93 | at least 80x | 97.84 |
| 20 k | 98.36 | at least 100x | 95.03 |
Assessment of BUSCOs in the C. fangiana genome.
| BUSCOS | Number | Percent |
|---|---|---|
| Complete BUSCOs | 1,372 | 95.30% |
| Complete and single-copy BUSCOs | 1,329 | 92.30% |
| Complete and duplicated BUSCOs | 43 | 3.00% |
| Fragmented BUSCOs | 8 | 0.60% |
| Missing BUSCOs | 60 | 4.10% |
| Total BUSCO groups searched | 1,440 |
Fig. 3Heat map of chromosomal interactions in the C. fangiana genome. Cfa01-Cfa08 represent the eight chromosomes in the C. fangiana genome. The horizontal and vertical coordinates represent the order of each ‘bin’ on the corresponding chromosome.
| Measurement(s) | DNA • RNA • sequence_assembly • sequence feature annotation |
| Technology Type(s) | DNA sequencing • RNA sequencing • genome assembly • sequence annotation |
| Sample Characteristic - Organism | Carpinus fangiana |