| Literature DB >> 23496985 |
Chao Liang, Xuan Liu, Siu-Ming Yiu, Boon Leong Lim.
Abstract
BACKGROUND: Biofuels extracted from the seeds of Camelina sativa have recently been used successfully as environmentally friendly jet-fuel to reduce greenhouse gas emissions. Camelina sativa is genetically very close to Arabidopsis thaliana, and both are members of the Brassicaceae. Although public databases are currently available for some members of the Brassicaceae, such as A. thaliana, A. lyrata, Brassica napus, B. juncea and B. rapa, there are no public Expressed Sequence Tags (EST) or genomic data for Camelina sativa. In this study, a high-throughput, large-scale RNA sequencing (RNA-seq) of the Camelina sativa transcriptome was carried out to generate a database that will be useful for further functional analyses.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23496985 PMCID: PMC3635884 DOI: 10.1186/1471-2164-14-146
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Comparison of SOAPdenovo and Trinity assembly results
| Total length (nt) | 26,651,285 | 71,935,591 |
| Total number (n) | 83,493 | 103,196 |
| N50 | 346 | 976 |
| Mean length (nt) | 319 | 697 |
| 100–500 nt | 72,616 | 53,377 |
| 500–1000 nt | 9,006 | 27,983 |
| 1000–1500 nt | 1,339 | 12,228 |
| 1500–2000 nt | 354 | 5,623 |
| ≥ 2000 nt | 178 | 3,985 |
Summary details of sequences produced by SOAPdenovo assembly after Illumina sequencing
| Clean reads | 26,942,130 | 2,424.79 | 90 | - |
| Contigs (≥ 75 bp) | 204,190 | 37.24 | 182 | 194 |
| Scaffolds (≥ 100 bp) | 129,539 | 32.72 | 253 | 284 |
| Total unigenes (≥ 100 bp) | 83,493 | 26.65 | 319 | 346 |
Assembly results with TAIR 10.0 CDS using SOAPdenovo and Trinity software
| Number of unigenes/transcripts (n) | 83,493 | 103,196 |
| Number of unigenes/transcripts mapped to TAIR CDS (n) | 52,864 | 70,753 |
| Total number of CDS in TAIR (n) | 35,386 | 35,386 |
| Total number of mapped CDS in TAIR(n) | 22,435 | 22,433 |
| Total length of CDS in TAIR (bp) | 43,546,300 | 43,546,300 |
| Total length of mapped CDS in TAIR (bp) | 31,264,828 | 31,145,430 |
| Total length of unigenes/transcripts (bp) | 26,651,285 | 71,935,591 |
| Total length of mapped unigenes/transcripts(bp) | 18,193,518 | 55,883,786 |
| Total length of overlapping sequences (bp) | 15,234,784 | 44,020,562 |
| Percentage of mapped CDS number | 22,435/35,386 = 63.4% | 22,433/35,386 = 63.4% |
Summary of unigene annotations from SOAPdenovo assembly
| All assembled unigenes | 83,493 | - | - |
| Gene annotations against | 45,307 | 45,307 | - |
| Gene annotations against | 16,969 | 16,969 | - |
| Gene annotation against | 722 | 722 | - |
| Unique gene annotations against plant NR | 67,497 | 67,497 | - |
| Gene annotation against Swiss-Prot | 40,804 | 40,804 | - |
| Gene annotation against COG | 14,190 | 14,190 | 25 categories |
| Gene annotation against KEGG | 27,042 | 27,042 | 119 pathways |
| GO annotations for biological process | 23,524 | 49,164 | 27 sub-categories |
| GO annotations for cellular component | 25,885 | 37,439 | 9 sub-categories |
| GO annotations for molecular function | 26,825 | 30,888 | 10 sub-categories |
| All annotated unigenes | 67,791 | - | - |
| Unigenes matching all four databases | 11,685 | - | - |
Figure 1Unigene homology searches against NR and Swiss-Prot databases. (a) E-value proportional frequency distribution of BLAST hits against the NR database. (b) E-value proportional frequency distribution of BLAST hits against the Swiss-Prot database. (c) Proportional frequency distribution of unigene similarities against the NR database based on the best BLAST hits (E-value ≤ 1.0E-5). (d) Proportional frequency distribution of unigene similarities against the Swiss-Prot database based on the best BLAST hits (E-value ≤ 1.0 E-5). (e) Proportional homology distribution among other plant species based on the best BLAST hits against the NR database (E-value ≤ 1.0 E-5).
Figure 2COG functional classification of the transcriptome. Of 67,497 hits in the NR database, 14,190 unigenes with significant homologies in the COG database (E-value ≤ 1.0 E-5) were classified into 25 COG categories.
Figure 3Gene Ontology (GO) term assignment of unigenes. Based on high-score BLASTX matches in the NR plant proteins database, Camelina unigenes were classified into three main GO categories and 46 sub-categories. The left y-axis indicates the percentage of a specific category of genes in each main category. The right y-axis indicates the number of genes in the same category. In total, we assigned 33,475 unigenes with BLASTX matches to known proteins.
Figure 4Frequency distribution of unigenes that mapped to sequences in the NR database.