| Literature DB >> 24523899 |
Claudio Benicio Cardoso-Silva1, Estela Araujo Costa1, Melina Cristina Mancini1, Thiago Willian Almeida Balsalobre1, Lucas Eduardo Costa Canesin1, Luciana Rossini Pinto2, Monalisa Sampaio Carneiro3, Antonio Augusto Franco Garcia4, Anete Pereira de Souza5, Renato Vicentini1.
Abstract
Sugarcane is an important crop and a major source of sugar and alcohol. In this study, we performed de novo assembly and transcriptome annotation for six sugarcane genotypes involved in bi-parental crosses. The de novo assembly of the sugarcane transcriptome was performed using short reads generated using the Illumina RNA-Seq platform. We produced more than 400 million reads, which were assembled into 72,269 unigenes. Based on a similarity search, the unigenes showed significant similarity to more than 28,788 sorghum proteins, including a set of 5,272 unigenes that are not present in the public sugarcane EST databases; many of these unigenes are likely putative undescribed sugarcane genes. From this collection of unigenes, a large number of molecular markers were identified, including 5,106 simple sequence repeats (SSRs) and 708,125 single-nucleotide polymorphisms (SNPs). This new dataset will be a useful resource for future genetic and genomic studies in this species.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24523899 PMCID: PMC3921171 DOI: 10.1371/journal.pone.0088462
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of Illumina transcriptome sequencing data for the sugarcane varieties included in this study.
| Sample | Read length (bp) | Raw data | Trimmed data | GC (%) | Q20 (%) |
|
| 72+72 | 84,105,462 | 64,906,391 | 49.04 | 98.09 |
|
| 72+72 | 103,971,718 | 71,002,186 | 47.52 | 97.32 |
|
| 72+72 | 112,124,334 | 77,476,268 | 46.91 | 97.11 |
|
| 72+72 | 101,983,186 | 73,160,814 | 47.59 | 97.56 |
|
| 72+72 | 119,280,444 | 87,873,521 | 46.62 | 97.66 |
|
| 72+72 | 88,767,346 | 70,955,324 | 48.07 | 98.25 |
Summary of the de novo assembly results for the sugarcane transcriptome.
| Unigene length (bp) | Total unigenes | Percentage |
| 300–500 | 31,971 | 44.24% |
| 500–1000 | 20,634 | 28.55% |
| 1000–2000 | 12,007 | 16.61% |
| 2000–3000 | 4,827 | 6.68% |
| 3000–4000 | 1,790 | 2.47% |
| 4000–5000 | 636 | 0.88% |
| >5000 | 404 | 0.56% |
| Total length (bp) | 66,572,642 | - |
| Unigenes | 72,269 | - |
| N50 length | 1,367 | - |
| GC (%) | 46.39 | - |
Summary of the annotation of each database.
| Database | Number of unigenes | Number of proteins matched | Percentage of unigenes |
| Viridiplantae proteins | 35,456 | 34,969 | 49.06% |
| Grass proteins | 34,814 | 34,304 | 48.17% |
| Sorghum proteins | 28,788 | 28,030 | 39.83% |
| Hits against sorghum proteins and sugarcane ESTs | 22,171 | 20,969 | 30.68% |
| Total of no-hit unigenes | 36,813 | - | 50.94% |
| No-hit unigenes with high similarity to the sorghum genome | 18,910 | - | 26,16 |
Percentage relative to the total number of sugarcane unigenes.
Figure 1Proportions of sugarcane transcripts showing homology to sugarcane unigenes and sorghum and rice proteins.
For annotation, the best BLASTX/N hit against the protein or nucleotide sequences of the reference organisms was employed, with an E-value cut-off of ≤10−6. The number between the parentheses indicates the number of different proteins/unigenes in each species (sugarcanea, sorghumb and ricec). The number outside of the Venn diagram indicates no-hit transcripts and the number of transcriptsd that mapped to the sorghum genome.
Figure 2Histogram of the Clusters of Orthologous Groups (COG) classifications of the sugarcane transcripts and sorghum proteins.
Figure 3Enrichment of Gene Ontology terms for each sugarcane variety.
Figure 4Hierarchical clustering of the 358 putative sugarcane lncRNAs.
The expression patterns allowed the identification of the genotypes based on their ability to store sucrose and according to the bi-parental crosses involved in the different mapping populations.
Summary of the simple sequence repeat (SSR) types in the sugarcane transcriptome.
| Repeat motif | Number | Unigenes | Percentage (%) |
|
| |||
| AC/GT | 551 | ||
| AG/CT | 962 | ||
| AT/TA | 336 | ||
| CG/GC | 78 | ||
|
|
|
|
|
|
| |||
| AAC/GTT | 141 | ||
| AAG/CTT | 152 | ||
| AAT/ATT | 60 | ||
| AGC/GCT | 219 | ||
| ACG/CGT | 197 | ||
| AGT/ACT | 62 | ||
| ACC/GGT | 122 | ||
| AGG/CCT | 252 | ||
| ACA/TGT | 97 | ||
| AGA/TCT | 46 | ||
| ATA/TAT | 24 | ||
| ATC/GAT | 42 | ||
| ATG/CAT | 43 | ||
| CAC/GTG | 69 | ||
| CAG/CTG | 228 | ||
| CCG/CGG | 442 | ||
| CGC/GCG | 241 | ||
| CTC/GAG | 148 | ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Number of the total SSRs (di-, tri- and other motifs).
Number of unigene sequences containing SSRs.
The relative percentage of SSRs with different repeat motifs among the total SSRs.
The total number of SSRs of other sizes.
Figure 5Unique and shared heterozygous putative SNPs in the parental genotypes of the three sugarcane mapping populations.