| Literature DB >> 32306894 |
Rui Guo1, Long Zhao2,3, Kaijian Zhang4, Dan Gao4, Chunwu Yang5.
Abstract
BACKGROUND: Puccinellia tenuiflora, a forage grass, is considered a model halophyte given its strong tolerance for multiple stress conditions and its close genetic relationship with cereals. This halophyte has enormous values for improving our understanding of salinity tolerance mechanisms. The genetic information of P. tenuiflora also is a potential resource that can be used for improving the salinity tolerance of cereals.Entities:
Keywords: Genome; Halophyte; Puccinellia tenuiflora; Salinity
Mesh:
Substances:
Year: 2020 PMID: 32306894 PMCID: PMC7168874 DOI: 10.1186/s12864-020-6727-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Chromosome number (a) and habitat (b) of P. tenuiflora
Results of K-mer analysis. The K-mer was defined as 17 bp to assess P. tenuiflora genome size by the following formula: total K-mer number/K-mer depth. The heterozygous ratio was determined by the number of heterozygous K-mer/total K-mer number
| Depth | n_ | Genome_size (Mb) | aRevised genome_size (Mb) | Heterozygous_rate (%) | |
|---|---|---|---|---|---|
| 17 | 31 | 41,192,925,796 | 1328.80 | 1303.06 | 1.56 |
aExcluded effects of uncorrected K-mer
Raw data of P. tenuiflora sequencing
| Libraries | Insert size | Total data (Gb) | Sequence coverage (X) |
|---|---|---|---|
| Illumina reads | 250 bp | 122.03 | 93.87 |
| 450 bp | 87.1 | 67 | |
| 2 kb | 70.29 | 54.07 | |
| 5 kb | 51.3 | 39.46 | |
| 10 kb | 75.79 | 58.3 | |
| PacBio reads | 20 kb | 56.12 | 43.17 |
| 10× Genomics | 161.03 | 123.87 | |
| Total | – | 623.66 | 479.74 |
Assembly results of P. tenuiflora genome
| Sample ID | Length | Number | ||
|---|---|---|---|---|
| Contig (bp) | Scaffold (bp) | Contig | Scaffold | |
| Total | 1,095,388,111 | 1,107,157,923 | 14,036 | 2638 |
| Max | 803,180 | 7,202,224 | – | – |
| Number > =2000 | – | – | 13,349 | 2183 |
| N50 | 117,188 | 949,910 | 2936 | 338 |
| N60 | 97,500 | 788,398 | 3958 | 465 |
| N70 | 80,583 | 601,430 | 5194 | 625 |
| N80 | 64,330 | 447,145 | 6714 | 839 |
| N90 | 45,138 | 278,370 | 8711 | 1152 |
Overview of the annotation of the P. tenuiflora genome
| Total Length (bp) | % of Genomea | |||
|---|---|---|---|---|
| Transposable | DNA | 81,228,002 | 7.34 | |
| Elements | LINE | 33,892,567 | 3.06 | |
| SINE | 154,638 | 0.01 | ||
| LTR | 580,518,664 | 52.43 | ||
| Unknown | 4,544,534 | 0.41 | ||
| Total | 691,362,441 | 62.44 | ||
| Types/Copies | Total Length (bp) | % of Genomea | ||
| Non-coding | miRNA (1376) | 171,853 | 0.015522 | |
| RNAs | tRNA (692) | 52,086 | 0.004704 | |
| rRNA (68) | 14,130 | 0.001276 | ||
| snRNA (702) | 83,103 | 0.007506 | ||
| Protein-coding | Predicted | Supported by | Supported by | Function |
| Genes | Transcriptome | Homologs | Assigned | |
| 39,725 | 26,529 | 33,316 | 39,470 (99.4%) |
aAssembled genome
General statistics for feature of predicted protein-coding genes of P. tenuiflora genome. Protein-coding genes were predicted through the annotation strategy of de novo prediction and evidence based on homology and transcriptome data. The gene model was integrated with EVM and corrected by PASA to obtain the final set of protein-coding genes
| Gene set | Number | Average gene length (bp) | Average CDS length (bp) | Average exons per gene | Average exon length (bp) | Average intron length (bp) | |
|---|---|---|---|---|---|---|---|
| De novo | Augustus | 59,267 | 1866.04 | 873.71 | 3.04 | 287.43 | 486.52 |
| GlimmerHMM | 195,821 | 4538.43 | 540.7 | 2.16 | 250.76 | 3457.37 | |
| SNAP | 115,465 | 3464.46 | 615.94 | 2.8 | 220.02 | 1582.94 | |
| Geneid | 122,152 | 2958.40 | 684.27 | 3.08 | 222.01 | 1092.23 | |
| Genscan | 92,436 | 5507.46 | 609.14 | 2.96 | 205.68 | 2497.19 | |
| Homologb | 40,162 | 1988.08 | 978.69 | 3.32 | 294.72 | 434.93 | |
| 73,561 | 2000.94 | 1121.24 | 2.57 | 436.89 | 561.61 | ||
| 67,858 | 2097.32 | 1124.66 | 2.8 | 401.87 | 540.8 | ||
| 62,339 | 1568.92 | 826.04 | 2.68 | 308.16 | 442.05 | ||
| 43,096 | 1629.35 | 839.3 | 2.77 | 302.45 | 445.1 | ||
| 76,835 | 1550.38 | 915.3 | 2.35 | 389.23 | 469.88 | ||
| RNA-seq | Cufflinksc | 62,560 | 5041.52 | 1845.64 | 5.54 | 333.32 | 704.38 |
| PASA | 63,952 | 2292.77 | 934.6 | 3.9 | 239.86 | 468.9 | |
| EVM | 66,649 | 2149.27 | 869.1 | 3.23 | 268.94 | 573.67 | |
| PASA-update | 66,482 | 2122.71 | 871.22 | 3.22 | 270.77 | 564.36 | |
| Final set c | 39,725 | 2818.49 | 1081.99 | 4.15 | 260.54 | 550.76 | |
aStatistics calculated from the gene set predicted from each method.
bStatistics calculated from the gene set predicted by homolog proteins from each species.
cFinal results of P. tenuiflora genome
Functional annotation of protein-coding genes against different databases. Gene functions were obtained from the best BLASTP hit
| Database | Annotated Number | Annotated Percent (%) | |
|---|---|---|---|
| NR | 36,064 | 90.8 | |
| Swiss-Prot | 25,684 | 64.7 | |
| KEGG | 24,167 | 60.8 | |
| InterPro | aAll | 39,202 | 98.7 |
| Pfam | 26,709 | 67.2 | |
| GO | 35,648 | 89.7 | |
| Total | 39,470 | 99.4 | |
aCombination of Pfam annotation and GO annotation
Identification of non-coding RNAs of P. tenuiflora genome. The tRNAs were predicted by tRNAscan-SE software. The rRNA, miRNA and snRNA genes were extracted by INFERNAL software against the Rfam database
| Type | Copy | Average length (bp) | Total length (bp) | % of genome | |
|---|---|---|---|---|---|
| miRNA | 1376 | 124.89 | 171,853 | 0.015522 | |
| tRNA | 692 | 75.27 | 52,086 | 0.004704 | |
| rRNA | 68 | 207.79 | 14,130 | 0.001276 | |
| 18S | 21 | 406.57 | 8538 | 0.000771 | |
| 28S | 11 | 129.91 | 1429 | 0.000129 | |
| 5.8S | 4 | 103.5 | 414 | 0.000037 | |
| 5S | 32 | 117.16 | 3749 | 0.000339 | |
| snRNA | 702 | 118.21 | 83,103 | 0.007506 | |
| CD-box | 449 | 106.31 | 47,734 | 0.004311 | |
| HACA-box | 65 | 132.71 | 8626 | 0.000779 | |
| splicing | 188 | 141.41 | 26,585 | 0.002401 | |
Genome coverage rate of raw data based on the BWA method. Mapping rate was generated by mapping raw reads to the P. tenuiflora genome to express the reliability of the genome coverage
| Percentage | ||
|---|---|---|
| Reads | Mapping rate (%) | 87.41 |
| Genome | Average sequencing depth | 79.35 |
| Coverage (%) | 93.34 | |
| Coverage at least 4X (%) | 90.11 | |
| Coverage at least 10X (%) | 86.97 | |
| Coverage at least 20X (%) | 82.46 | |
CEGMA analysis results of P. tenuiflora genome
| Species | Complete | Complete + partial | ||
|---|---|---|---|---|
| Prots | % completeness | Prots | % completeness | |
| 216 | 87.1 | 223 | 89.92 | |
BUSCO results of P. tenuiflora genome. C: Complete BUSCOs; S: Complete and single-copy BUSCOs; D: Complete and duplicated BUSCOs; F: Fragmented BUSCOs; M: Missing BUSCOs; n: Total BUSCO groups searched
| Species | BUSCO notation assessment results |
|---|---|
| C:86.8% [S:75.7%, D:11.1%], F:1.7%, M:11.5%, n:1440 |