| Literature DB >> 26217707 |
Roberto A Barrero1, Felix D Guerrero2, Paula Moolhuijzen1, John A Goolsby3, Jason Tidwell3, Stanley E Bellgard4, Matthew I Bellgard1.
Abstract
The giant reed, Arundo donax, is a perennial grass species that has become an invasive plant in many countries. Expansive stands of A. donax have significant negative impacts on available water resources and efforts are underway to identify biological control agents against this species. The giant reed grows under adverse environmental conditions, displaying insensitivity to drought stress, flooding, heavy metals, salinity and herbaceous competition, thus hampering control programs. To establish a foundational molecular dataset, we used an llumina Hi-Seq protocol to sequence the transcriptome of actively growing shoots from an invasive genotype collected along the Rio Grande River, bordering Texas and Mexico. We report the assembly of 27,491 high confidence transcripts (≥200 bp) with at least 70% coverage of known genes in other Poaceae species. Of these 13,080 (47.58%), 6165 (22.43%) and 8246 (30.0%) transcripts have sequence similarity to known, domain-containing and conserved hypothetical proteins, respectively. We also report 75,590 low confidence transcripts supported by both trans-ABBySS and Velvet-Oases de novo assembly pipelines. Within the low confidence subset of transcripts we identified partial hits to known (19,021; 25.16%), domain-containing (7093; 9.38%) and conserved hypothetical (16,647; 22.02%) proteins. Additionally 32,829 (43.43%) transcripts encode putative hypothetical proteins unique to A. donax. Functional annotation resulted in 5,550 and 6,070 transcripts with assigned Gene Ontology and KEGG pathway information, respectively. The most abundant KEGG pathways are spliceosome, ribosome, ubiquitin mediated proteolysis, plant-pathogen interaction, RNA degradation and oxidative phosphorylation metabolic pathway. Furthermore, we also found 12, 9, and 4 transcripts annotated as stress-related, heat stress, and water stress proteins, respectively. We envisage that these resources will promote and facilitate studies of the abiotic stress capabilities of this exotic plant species, which facilitates its invasive capacity.Entities:
Keywords: Arundo donax; Giant reed; RNA de novo assembly; RNA-Seq; Shoot; Transcriptome
Year: 2015 PMID: 26217707 PMCID: PMC4509983 DOI: 10.1016/j.dib.2014.12.007
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Arundo donax transcriptome assembly statistics.
| Number of transcripts | 27,491 | 75,590 |
| Total size of transcripts | 32,326,850 | 55,020,434 |
| Longest transcript | 14,995 | 8091 |
| Shortest transcript | 200 | 200 |
| Number of transcripts>1K nt | 13,877 (50.5%) | 14,879 (19.7%) |
| Number of transcripts>10K nt | 2 (0.0%) | 0 (0.0%) |
| Number of transcripts>100K nt | 0 (0.0%) | 0 (0.0%) |
| Mean transcript size | 1176 | 728 |
| Median transcript size | 1008 | 584 |
| N50 transcript length | 1413 | 870 |
| L50 transcript count | 7811 | 19,821 |
| Transcript %A | 24.16 | 26.16 |
| Transcript %C | 25.11 | 23.28 |
| Transcript %G | 26.36 | 24.02 |
| Transcript %T | 24.37 | 26.53 |
| Transcript %N | 0 | 0 |
Fig. 1Functional annotation of A. donax transcripts: (A) classification of high confidence and low confidence transcripts based on comparison against NCBI NR database. (B) The fold abundance of top 20 KEGG pathways in high confidence transcripts as compared to the low confidence subset is shown. P1=Ribosome; P2=Spliceosome; P3=Ubiquitin mediated proteolysis; P4=Metabolic pathways Oxidative phosphorylation; P5=Plant–pathogen interaction; P6=Proteasome; P7=Protein export; P8=Metabolic pathways, Purine metabolism, Pyrimidine metabolism, RNA polymerase; P9=RNA degradation; P10=Basal transcription factors; P11=Endocytosis; P12=Metabolic pathways, Starch and sucrose metabolism; P13=Peroxisome; P14=Metabolic pathways, N-Glycan biosynthesis; P15=Aminoacyl-tRNA biosynthesis; P16=Natural killer cell mediated cytotoxicity; P17=Base excision repair; P18=Regulation of autophagy; P19=Metabolic pathways, Pyrimidine metabolism; P20=Metabolic pathways, Porphyrin and chlorophyll metabolism. (C) Gene Ontology terms for biological process, molecular function, and cellular componentry were assigned using AutoFACT [9] and summarized using WEGO [10].
| Subject area | |
|---|---|
| More specific subject area | |
| Type of data | |
| How data was acquired | |
| Data format | |
| Experimental factors | |
| Experimental features | |
| Data source location | |
| Data accessibility | |