| Literature DB >> 28572841 |
Chiara Evangelistella1, Alessio Valentini1, Riccardo Ludovisi1, Andrea Firrincieli1, Francesco Fabbrini1,2, Simone Scalabrin3, Federica Cattonaro3, Michele Morgante4,5, Giuseppe Scarascia Mugnozza1, Joost J B Keurentjes6, Antoine Harfouche1.
Abstract
BACKGROUND: Arundo donax has attracted renewed interest as a potential candidate energy crop for use in biomass-to-liquid fuel conversion processes and biorefineries. This is due to its high productivity, adaptability to marginal land conditions, and suitability for biofuel and biomaterial production. Despite its importance, the genomic resources currently available for supporting the improvement of this species are still limited.Entities:
Keywords: Arundo donax; Biofuel; Carbon fixation; De novo leaf transcriptome; Genic-SSRs; Phenylpropanoid; Purine and thiamine metabolism; RNA-Seq; SAPs; Stomata
Year: 2017 PMID: 28572841 PMCID: PMC5450047 DOI: 10.1186/s13068-017-0828-7
Source DB: PubMed Journal: Biotechnol Biofuels ISSN: 1754-6834 Impact factor: 6.040
Fig. 1Flowchart of the pipeline for the A. donax leaf transcriptome sequencing, de novo assembly, annotation, and analysis. The pipeline performs multiple operations from sampling and preparation to sequencing and de novo assembly to functional annotation and analysis. First, mRNA extraction from the first fully expanded leaf (5th from the top) was carried out followed by cDNA preparation and library construction (gray). Sequencing was performed using an Illumina HiSeq platform. The sequenced reads were then subjected to quality control and filtering, and identical sequences were removed (blue). Next, de novo assemblies of transcripts were generated using a two-step approach: first, multi-k-mer (Trans-ABySS and rnaSPAdes) and single-k-mer (Trinity) methods were used to generate the pre-assemblies (pink); second, pre-assemblies were then concatenated and redundant transcripts were removed using CD-HIT and the EvidentialGene tr2aadcs pipeline (purple). The quality of the de novo assembled leaf transcriptome was then assessed (green). Finally, the non-redundant (NR) transcript dataset was functionally annotated by homology and gene ontology (GO), and metabolic pathways were analyzed (mint blue). Simple sequence repeats (SSRs) and polymorphic SSRs (PolySSRs) were also identified (orange)
Statistics of the leaf transcriptome pre-assemblies and the final de novo assembly of A. donax
| Metric | Assembler | Final transcriptome | ||
|---|---|---|---|---|
| Trinitya | rnaSPAdesb | Trans-ABySSc | ||
| Number of sequences | 136,294 | 138,896 | 158,095 | 62,596 |
| Total nucleotide count (bp) | 74,660,373 | 66,613,178 | 73,637,461 | 52,719,740 |
| Max. transcript length (bp) | 10,795 | 8173 | 10,360 | 10,360 |
| Mean transcript length (bp) | 547 | 479 | 465 | 842 |
| N50 (bp) | 659 | 538 | 520 | 1134 |
Summary statistics of the A. donax pre-assemblies using Trinity, rnaSPAdes, and Trans-AbySS, and of the final A. donax leaf transcriptome after clustering and redundancy removal
aTrinity: K-mer 25
brnaSPAdes: K-mer 21, 33
cTrans-ABySS: K-mer 17, 21, 25; the minimum transcript length was set to 200 bp
Fig. 2Sequencing and de novo assembly of A. donax leaf transcriptome. Sequence length distribution of A. donax non-redundant (NR) unique unitranscript sequences. The X-axis represents the length range bins in bp. The Y-axis represents the frequency of transcripts in each bin
Percentage of reads mapped back to A. donax leaf transcriptome
| Aligner | ||
|---|---|---|
| Bowtie2 | BWA | |
| Reads aligned 1 time (%) | 40.09 | 42.53.38 |
| Reads aligned >1 times (%) | 34.18 | 25.82 |
| Overall alignment rate (%) | 74.27 | 68.35 |
Percentage of reads uniquely mapped, aligned more than one time to the unitranscripts, and overall alignment rate using Bowtie2 and BWA aligners
Full-length transcripts’ analysis
| Template transcript dataset | 100% coverage | >70% coverage | >20% coverage |
|---|---|---|---|
|
| 6526 | 11,443 | 17,430 |
|
| 6503 | 12,385 | 21,661 |
|
| 6281 | 10,851 | 16,521 |
|
| 6176 | 11,283 | 16,631 |
|
| 6066 | 11,039 | 17,378 |
Percentage A. donax leaf transcripts aligned completely (100%), virtually (>70%), or partially (>20%) to the transcripts of related species
Overview of functional annotation by homology
| Category | No. of transcripts |
|---|---|
| Predicted ORFs | 98,781 |
| Predicted proteins | 83,758 |
| Tr_EMBL_Top_BLASTX_hit | 50,850 |
| sprot_Top_BLASTX_hit | 34,177 |
| Tr_EMBL_Top_BLASTP_hit | 54,660 |
| sprot_Top_BLASTP_hit | 33,597 |
| Pfam | 31,513 |
| SignalP | 2729 |
Summary of the functional annotation by homology
ORFs, open reading frames; Tr_EMBL_Top_BLASTX_hit, top blastx hits against UniRef90 database; sprot_Top_BLASTX_hit, top blastx hits against UniProtKB/Swiss-Prot database; Tr_EMBL_Top_BLASTP_hit, top blastp hits against UniRef90 database; sprot_Top_BLASTP_hit, top blastp hits against UniProtKB/Swiss-Prot database; Gene_ontology_blast, gene ontology using Blast; Gene_ontology_pfam, gene ontology using Pfam
Fig. 3Graphical representations of functional annotations in A. donax leaf transcriptome. a BLAST top-hits species distribution of A. donax unitranscripts against the non-redundant (NR) protein database. b Histogram of leaf transcriptome sequences with InterPro domains and gene ontology (GO) terms. The X-axis represents transcripts without InterProScan (IPS), with IPS, and with GO. The Y-axis shows the frequency of transcripts in each bin. c Representation of mapping database (UniprotKB, GR_protein, PDB, TAIR) sources
Fig. 4Gene ontology (GO) functional classification using Blast2GO. Histograms of the frequency of transcripts annotated to specific GO categories; biological process, cellular components, and molecular functions are represented by green, blue, and yellow bars, respectively
Fig. 5Study of purine metabolism (a) and thiamine metabolism (b) pathways by Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showing the different identified enzymes in A. donax leaf transcriptome (one color for each Enzyme Code or EC)
Fig. 6Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showing genes involved in phenylpropanoid biosynthesis (a) and starch and sucrose biosynthesis (b) representing each colored EC in A. donax leaf transcriptome
Comparison between A. donax transcriptome and S. italica and A. thaliana transcripts for cellulose biosynthesis
| Gene symbol | Gene name | Accession no.a | Species | No. of |
|---|---|---|---|---|
|
|
| Seita.5G122700.1 |
| 7 |
|
|
| AT4G39350.1 |
| 6 |
|
|
| Seita.4G211600.1 |
| 4 |
|
|
| Seita.2G115400.1 |
| 6 |
|
|
| Seita.9G227400.1 |
| 1 |
|
|
| Seita.9G020600.1 |
| 6 |
|
|
| AT5G64740.1 |
| 7 |
|
|
| Seita.3G332300.1 |
| 4 |
|
|
| Seita.5G319100.1 |
| 3 |
|
|
| Seita.1G268900.1 |
| 3 |
|
|
| AT2G33100.2 |
| 2 |
|
|
| Seita.2G243900.1 |
| 2 |
A total of nine S. italica transcripts and three A. thaliana transcripts involved in cellulose biosynthesis showed homology with A. donax transcripts
aPhytozome (v11.0) accession numbers
Comparison between A. donax transcriptome and S. italica transcripts for lignin biosynthesis
| Gene symbol | Gene name | Accession no.a | Species | No. of |
|---|---|---|---|---|
|
|
| Seita.1G240400.1 |
| 13 |
|
|
| Seita.6G197700.1 |
| 1 |
|
|
| Seita.2G256200.1 |
| 1 |
|
|
| Seita.1G057300.1 |
| 1 |
|
|
| Seita.6G059400.1 |
| 3 |
|
|
| Seita.9G193900.1 |
| 5 |
|
|
| Seita.7G155700.1 |
| 7 |
|
|
| Seita.3G194300.1 |
| 6 |
|
|
| Seita.5G361200.1 |
| 8 |
|
|
| Seita.6G093400.1 |
| 3 |
A total of ten S. italica transcripts involved in lignin biosynthesis showed homology with A. donax transcripts
aPhytozome (v11.0) accession numbers
Comparison between A. donax transcriptome and S. italica transcripts for stomatal development and distribution
| Gene symbol | Gene name | Accession no.a | Species | No. of |
|---|---|---|---|---|
|
|
| Seita.4G086700.1 |
| 5 |
|
|
| Seita.5G425700.1 |
| 1 |
|
|
| Seita.7G184400.1 |
| 1 |
|
|
| Seita.4G019700.1 |
| 2 |
|
|
| XM_004961002.3 |
| 5 |
A total of six S. italica transcripts involved in stomatal development and distribution showed homology with A. donax transcripts
a Phytozome (v11.0) and NCBI accession numbers
Comparison between A. donax transcriptome and O. sativa and S. bicolor transcripts for SAPs
| Gene symbol | Gene name | Accession no.a | Species | No. of |
|---|---|---|---|---|
|
|
| LOC_Os09g31200.1 |
| 1 |
|
|
| LOC_Os02g10200.1 |
| 1 |
|
|
| Sobic.004G079100.2 |
| 3 |
|
|
| Sobic.002G245800.2 |
| 2 |
|
|
| LOC_Os06g41010.1 |
| 3 |
|
|
| Sobic.002G046100.1 |
| 1 |
|
|
| LOC_Os07g07350.1 |
| 1 |
|
|
| Sobic.002G345300.2 |
| 3 |
|
|
| LOC_Os08g39450.1 |
| 1 |
|
|
| LOC_Os05g23470.1 |
| 1 |
|
|
| LOC_Os07g38240.1 |
| 3 |
A total of seven O. sativa transcripts and four S. bicolor transcripts encoding for SAPs showed homology with A. donax transcripts
aPhytozome (v11.0) accession numbers