| Literature DB >> 31921561 |
Zhe Liang1, Yuke Geng2, Changmian Ji3,4, Hai Du5, Chui Eng Wong1, Qian Zhang2, Ye Zhang2, Pingxian Zhang2, Adeel Riaz2, Sadaruddin Chachar2, Yike Ding6, Jing Wen5, Yunwen Wu2,5, Mingcheng Wang3, Hongkun Zheng3, Yanmin Wu5, Viktor Demko7, Lisha Shen8, Xiao Han9, Pengpeng Zhang2, Xiaofeng Gu2, Hao Yu1,8.
Abstract
The Streptophyta include unicellular and multicellular charophyte green algae and land plants. Colonization of the terrestrial habitat by land plants is a major evolutionary event that has transformed the planet. So far, lack of genome information on unicellular charophyte algae hinders the understanding of the origin and the evolution from unicellular to multicellular life in Streptophyta. This work reports the high-quality reference genome and transcriptome of Mesostigma viride, a single-celled charophyte alga with a position at the base of Streptophyta. There are abundant segmental duplications and transposable elements in M. viride, which contribute to a relatively large genome with high gene content compared to other algae and early diverging land plants. This work identifies the origin of genetic tools that multicellular Streptophyta have inherited and key genetic innovations required for the evolution of land plants from unicellular aquatic ancestors. The findings shed light on the age-old questions of the evolution of multicellularity and the origin of land plants.Entities:
Keywords: Mesostigma viride; Streptophyta; evolution; green algae; multicellularity
Year: 2019 PMID: 31921561 PMCID: PMC6947507 DOI: 10.1002/advs.201901850
Source DB: PubMed Journal: Adv Sci (Weinh) ISSN: 2198-3844 Impact factor: 16.806
Figure 1M. viride morphology and genome assembly. A) Scanning electron micrograph of M. viride cell surface shows its unified basket‐like scales. Scale bar, 2.5 µm. B) Ultrastructure of a M. viride cell observed under transmission electron microscope. 1, cytoderm; 2, pyrenoid; 3, eyespots; 4, starch granule. Scale bars, 2.5 µm. C) The assembly of the M. viride genome combines PacBio long reads, Illumina short reads, and optical map generated from Saphyr System. D) Circos plot depicting the genome content based on the 20 longest scaffolds in a 200 kb nonoverlapping window. Numbers on the circumference are at the megabase scale. “a” track represents the 20 longest scaffolds of M. viride, while the distribution of repeat b), gene c), pseudogene d), and ncRNA e), including tRNA, lncRNA, snRNA, miRNA, rRNA, and snoRNA, are indicated in the other tracks. Linked lines in the middle of the Circos plot connect syntenic blocks (minimum five gene pairs) from the most recent segmental duplication events. Different colors were used to distinguish different scaffolds a) or syntenic blocks (linked lines).
Statistics of M. viride genome assembly and annotation
| Feature |
|
|---|---|
| Genome size [bp] | 441 847 188 |
| Contig number | 3074 |
| Maximum contig length [bp] | 2003 508 |
| Contig N50 [bp] | 319 906 |
| Contig N90 [bp] | 56 379 |
| Scaffold N50 [bp] | 2558 729 |
| Scaffold N90 [bp] | 58 377 |
| Gap ratio [%] | 0.04 |
| Gene number | 24 431 |
| Average gene length [bp] | 5940.83 |
| CDS length [bp] | 1585.60 |
| Exons number per gene | 4.81 |
| Exon length [bp] | 329.36 |
| Exons number per gene | 3.81 |
| Intron length [bp] | 1141.87 |
Figure 2Evolutionary analysis of M. viride with other selected green plant species. A) Gene family clustering statistics. The M. viride genome contains a large portion of species‐specific genes, which represent those belonging to a gene family that only exists in a particular species. Multiple and single copy orthologs include the common orthologs with different copy numbers in the species studied. Other orthologs include unclassified orthologs, whereas unclustered genes include those that are not assigned into any gene families. B) Gene family gains (+) and losses (−) mapped onto the plant phylogenetic tree. The minimum numbers of gene families present in the ancestors of different plant lineages are circled. Branch lengths are arbitrary. The analysis includes all the species in (A), only the representative species for each lineage are shown in the schematic diagram. C,D) Frequency distribution with Chi‐square test (C) and scatter plot with two‐sample Kolmogorov–Smirnov test of protein sequence identity (D) between 11 239 homologous gene pairs of M. viride versus C. reinhardtii and M. viride versus M. polymorpha. Only 1:1:1 common orthologs of M. viride, C. reinhardtii, and M. polymorpha were considered. There is a significantly higher identity between homologous gene pairs in M. viride versus M. Polymorpha. Red or blue represents the sequence identity between M. viride and C. reinhardtii or between M. viride and M. polymorpha, respectively.
Figure 3Duplication and repetitive sequences in M. viride. A) Fluorescent microscopy of chromosome number in M. viride. The sample was stained with DAPI. Scale bar, 1 µm. B) A schematic diagram showing segmental duplications in the 20 longest scaffolds. Colored lines connect syntenic blocks (minimum five gene pairs) from the most recent segmental duplication events. C) Frequency distribution of values of synonymous substitutions Ks (synonymous substitutions/synonymous site) between pairs of paralogs in M. viride, C. reinhardtii and M. polymorpha. The peak in C. reinhardtii represents 698 tandem duplicated genes at Ks = 0.14, whereas the peak in M. viride indicates a possible early whole genome duplication of M. viride at Ks = 0.7. The latter comparison consists of 56595 paralogous gene pairs. D) Pie chart illustrating major repeat classes in the M. viride genome. LTR, long terminal repeat; LINE, long interspersed nuclear element; PLE, Penelope‐like element; TIR, terminal inverted repeat. E) Box plots showing the length distribution of LTR families in the M. viride genome. Boxes indicate the first quartile, median and third quartile with whiskers extending up to 1.5 times the interquartile distance. F) Relative age (Kimura distance) computed for LTR retroelements suggests a prolonged transposition activity of the retroelements.
Figure 4Transcription factors in M. viride. A) Heat map comparing the numbers of transcription factor genes in M. viride with those of representative land plants and green algae. The detailed information is shown in Table S2C in the Supporting Information. B) The R2R3‐MYB neighbor‐joining (NJ) phylogenetic tree includes representative sequences from previously identified 73 subfamilies and 95 nonplant orphan genes based on 50 eukaryotes,30 and all R2R3‐MYB proteins from K. nitens, M. polymorpha, and M. viride (Mv2R‐MYB1‐3).
Figure 5Epigenetic and miRNA regulation in M. viride. A) Ion chromatograms for 5mC nucleoside standard and 5mC nucleosides in genomic DNA purified from M. viride. B) Pie chart showing the composition of 5mC methylation motifs with CG as the major methylation site in M. viride. C) The average methylation levels of genes (including 5000 bp upstream of TSS and 5000 bp downstream of TTS) for each 100 bp interval plotted. D) Methylation levels of genes grouped into deciles based on expression levels (fragments per kilobase of transcript per million mapped reads, FPKM). The levels for four deciles (from the lowest first to the highest tenth) are shown. E) List of the numbers of 5mC‐methylated genes with high (FPKM ≥ 1) and low (FPKM < 1) expression levels. Asterisks indicate statistically significant differences in the numbers of methylated genes between highly and lowly expressed genes (Chi‐square test, p < 10−5). F) qPCR analysis of ten randomly selected pri‐miRNA in samples cultured under different pH conditions. Gene expression levels in the control are set as 1. Error bars, mean ± SD; n = 3 biological replicates. G) Heat map showing the expression of miRNA target genes extracted from the RNA‐seq data. Their expression negatively correlates with the expression of their corresponding pre‐miRNAs (F) under different pH conditions.
Figure 6Transcriptome profiles of M. viride cultured under different environmental conditions. A) Heat map showing differentially expressed genes (p < 0.01, fold change > 2) under different light intensity, temperature and pH compared to optimal growth conditions as indicated in Methods. Two biological replicates were included for each treatment. Samples cultured under different light intensity and temperature conditions were compared with Control 1, while those cultured under different pH conditions were compared with Control 2. B) Principal component analysis of RNA‐seq data derived from samples cultured under different conditions. Axis percentages indicate variance contribution. C) Scatter plots of significant biological processes as determined by GO enrichment analysis of differentially expressed genes (DEGs) under different light intensity and temperature. The size of the circle is proportional to the number of DEGs.
Groups for gene family evolution analysis
| Common ancestor | Present in at least four chlorophyte species, |
| Angiosperm – | Absent in all angiosperms but present in at least three of the followings: four chlorophyte species, |
| Angiosperm + | Present only in at least three angiosperm species |
|
| Present in |
|
| Present in |
| Early diverging land plant − | Present in |
| Early diverging land plant + | Present in |
|
| Present in |
|
| Present in |
|
| Present in |
|
| Present in |
|
| Present in |
| Chlorophyte + | Present in |