| Literature DB >> 30012088 |
Yasunobu Okamura1,2, Kengo Kinoshita3,4,5.
Abstract
BACKGROUND: Data generated by RNA sequencing (RNA-Seq) is now accumulating in vast amounts in public repositories, especially for human and mouse genomes. Reanalyzing these data has emerged as a promising approach to identify gene modules or pathways. Although meta-analyses of gene expression data are frequently performed using microarray data, meta-analyses using RNA-Seq data are still rare. This lag is partly due to the limitations in reanalyzing RNA-Seq data, which requires extensive computational resources. Moreover, it is nearly impossible to calculate the gene expression levels of all samples in a public repository using currently available methods. Here, we propose a novel method, Matataki, for rapidly estimating gene expression levels from RNA-Seq data.Entities:
Keywords: Gene expression; Mapping; RNA-Seq
Mesh:
Substances:
Year: 2018 PMID: 30012088 PMCID: PMC6048772 DOI: 10.1186/s12859-018-2279-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Summary of the results using simulation data. a Spearman correlation coefficient with the expected expression and estimated expression values using each method. “Matataki” indicates the results of the proposed method, and “MatatakiSubset” indicates the results of the proposed method without uncovered genes. To compare the gene-level expression profile and transcript-level expression profile, the sum of TPM by each gene was calculated. b Means of absolute difference from the expected expression levels
Fig. 2Comparison of TPM when k was varied. The x-axis shows the TPM values of eXpress, the y-axis shows the TPM values of our method, and the color indicates the indexed k-mer coverage of each gene when changing k from 16 to 56 with a step of 8
Details of the uncovered genes
| Type of Gene | Number of uncovered genes | Total number of genes | Percentage of uncovered genes |
|---|---|---|---|
| Non-coding RNA | 393 | 6250 | 6.3% |
| MicroRNA | 233 | 1880 | 11.9% |
| Ribosomal RNA | 19 | 21 | 90.5% |
| Small nuclear RNA | 35 | 109 | 32.1% |
| Small nucleolar RNA | 45 | 390 | 11.5% |
| Other non-coding RNA | 61 | 3850 | 1.6% |
| Pseudo-gene | 21 | 927 | 2.7% |
| Protein-coding gene | 303 | 18,720 | 1.6% |
| Paralogous gene | 137 | 505 | 27.1% |
Comparison of running times among methods
| Run accession | ERR188074 | ERR188125 | ERR188171 | ERR188362 | |
|---|---|---|---|---|---|
| Run and mapping statics | Number of reads | 31,540,813 | 28,810,860 | 30,386,179 | 26,255,381 |
| Length of reads | 75 | 75 | 75 | 75 | |
| Bowtie mapping rate | 84.7% | 80.2% | 84.6% | 80.4% | |
| CPU time comparison (s)a | eXpress | 14,546.6 | 24,036.1 | 13,429.5 | 23,103.9 |
| RSEM | 22,700.6 | 20,545.9 | 21,753.1 | 18,842.2 | |
| Bowtie | 1487.8 | 1477.5 | 1472.6 | 1319.5 | |
| Sailfish | 299.0 | 281.0 | 294.2 | 285.5 | |
| Kallisto | 138.7 | 144.2 | 136.7 | 129.5 | |
| Matataki | 57.2 | 46.4 | 43.9 | 42.5 | |
| Acceleration rate compared with existing methods | eXpress | 254 | 517 | 305 | 543 |
| RSEM | 397 | 442 | 495 | 443 | |
| Bowtie | 26.0 | 31.8 | 33.5 | 31.0 | |
| Sailfish | 5.23 | 6.05 | 6.69 | 6.71 | |
| Kallisto | 2.43 | 3.107 | 3.11 | 3.05 | |
aValues represent the median for 10 measurements
Fig. 3Comparison of CPU time for different methods