| Literature DB >> 32128472 |
Daniel S Carvalho1, Aime V Nishimwe1, James C Schnable1.
Abstract
The number of plant species with genomic and transcriptomic data has been increasing rapidly. The grasses-Poaceae-have been well represented among species with published reference genomes. However, as a result the genomes of wild grasses are less frequently targeted by sequencing efforts. Sequence data from wild relatives of crop species in the grasses can aid the study of domestication, gene discovery for breeding and crop improvement, and improve our understanding of the evolution of C4 photosynthesis. Here, we used long-read sequencing technology to characterize the transcriptomes of three C3 panicoid grass species: Dichanthelium oligosanthes, Chasmanthium laxum, and Hymenachne amplexicaulis. Based on alignments to the sorghum genome, we estimate that assembled consensus transcripts from each species capture between 54.2% and 65.7% of the conserved syntenic gene space in grasses. Genes co-opted into C4 were also well represented in this dataset, despite concerns that because these genes might play roles unrelated to photosynthesis in the target species, they would be expressed at low levels and missed by transcript-based sequencing. A combined analysis using syntenic orthologous genes from grasses with published reference genomes and consensus long-read sequences from these wild species was consistent with previously published phylogenies. It is hoped that these data, targeting underrepresented classes of species within the PACMAD grasses-wild species and species utilizing C3 photosynthesis-will aid in future studies of domestication and C4 evolution by decreasing the evolutionary distance between C4 and C3 species within this clade, enabling more accurate comparisons associated with evolution of the C4 pathway.Entities:
Keywords: C4 photosynthesis; grasses; panicoideae; phylogenetics; transcriptomics
Year: 2020 PMID: 32128472 PMCID: PMC7047018 DOI: 10.1002/pld3.203
Source DB: PubMed Journal: Plant Direct ISSN: 2475-4455
Published reference genomes for grass species within the PACMAD clade
| Species | Relevance | C3/C4 | Genome publication |
|---|---|---|---|
|
| Wild Species | C3 | Studer et al. ( |
|
| Grain Crop | C4 | Hittalmani et al.( |
|
| Grain Crop | C4 |
Cannarozzi et al. ( VanBuren et al. ( |
|
| Biomass Crop | C4 | Swaminathan et al.( |
|
| Wild Species | C4 | VanBuren et al. ( |
|
| Wild Species | C4 | Lovell et al. ( |
|
| Grain Crop | C4 | Zou et al. ( |
|
| Biomass Crop | C4 | Casler et al. ( |
|
| Grain Crop | C4 | Varshney, Shi, et al. ( |
|
| Sugar Crop | C4 | Garsmeur et al. ( |
|
| Grain Crop | C4 | Bennetzen et al. ( |
|
| Genetic Model | C4 | Brutnell et al. ( |
|
| Grain/Biomass/Sugar Crop | C4 | Paterson et al. ( |
|
| Grain Crop & Genetic Model | C4 | Schnable et al. ( |
Species sharing a common inferred evolutionary origin of C4 photosynthesis as reported in (GPWG II, 2012) are indicated by superscript letters.
Figure 1(a) Current literature consensus phylogeny of the relationships between the grass species studied here. Lineages in green utilize C4 photosynthesis, while lineages in black utilize C3 photosynthesis. The green stars indicate apparent independent origins of C4 photosynthesis. (b) Inflorescence of Hymenachne amplexicaulis. (c) Inflorescence of Chasmanthium laxum. (d) Inflorescence of Dichanthelium oligosanthes
Summary statistics for raw and processed long read sequence data generated from each of the three target species
| Species | Total reads | Raw data | CCS reads | FL reads | Average FL length | Consensus transcripts | Average consensus transcript length | Transcripts containing start codon |
|---|---|---|---|---|---|---|---|---|
|
| 734,932 reads | 5.8 GB | 732,158 reads | 284,027 reads | 963 bp | 193,422 | 925 bp | 34,016/193,422 (17.5%) |
|
| 708,681 reads | 10.1 GB | 701,802 reads | 380,381 reads | 1,460 bp | 190,632 | 1,438 bp | 36,055/190,632 (18.9%) |
|
| 729,710 reads | 12.5 GB | 649,149 reads | 306,566 reads | 1,294 bp | 164,640 | 1,236 bp | 26,490/164,640 (16%) |
Alignment rates of consensus transcripts generated from each of the three target species to the sorghum gene space
| Species | Sorghum gene space coverage | Sorghum syntenic genes space coverage | Transcript alignment rate |
|---|---|---|---|
|
| 11,485 genes/34,211 genes (33.5%) | 6,402 transcripts/11,800 genes (54.2%) | 115,361 transcripts/193,422 transcripts (59.6%) |
|
| 13,446 genes/34,211 genes (39.3%) | 7,418 transcripts/11,800 genes (62.8%) | 125,357 transcripts/164,640 transcripts (76.1%) |
|
| 14,159 genes/34,211 genes (41.3%) | 7,760 transcripts/11,800 genes (65.7%) | 171,465 transcripts/190,632 transcripts (89.9%) |
Figure 2Transcript coverage of the C4 PPDK gene in Sorghum bicolor Sobic.009G132900 in each of the three species texted. Red‐brown boxes represent regions of similar sequence identified by BLASTN between the sorghum genome and consensus transcript sequences retrieved from Hymenachne amplexicaulis, Dichanthelium oligosanthes, Chasmanthium laxum (from top most to bottom most). The bottom track indicates the annotated gene structure, with intronic sequence indicated in gray and exonic sequence indicated in either blue (5' or 3' untranslated regions) or green (coding sequence). Top y‐axis indicates scale of the displayed genomic region in kilobases
Figure 3Seven hundred distinct phylogenetic trees calculated from separate multiple sequence alignments of 267 putatively orthologous gene groups with large regions of alignment scored as high quality. Blue indicates the most commonly observed topology (291 trees (42.5% of the total), purple and red indicate the second (43 trees (6.2%) and third most commonly observed topologies (28 trees (4%)), respectively. Numerical labels of branches for each topology indicate average bootstrap support from separately calculated bootstrap trees for 100 randomly selected gene groups, considering data from those gene trees consistent with that particular topology