| Literature DB >> 22140575 |
Geng Chen1, Kangping Yin, Leming Shi, Yuanzhang Fang, Ya Qi, Peng Li, Jian Luo, Bing He, Mingyao Liu, Tieliu Shi.
Abstract
In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22140575 PMCID: PMC3227660 DOI: 10.1371/journal.pone.0028318
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Exon number of brain and cell line transcripts.
| Items | Brain | Cell lines |
| Number of one exon transcripts | 1,748 | 1,723 |
| Number of two exon transcripts | 2,281 | 2,429 |
| Number of three exon transcripts | 2,888 | 3,294 |
| Number of four exon transcripts | 3,048 | 3,526 |
| Number of five exon transcripts | 2,943 | 3,410 |
| Number of six or more exon transcripts | 15,663 | 18,332 |
Isoforms of the expressed genes in brain and cell lines.
| Items | Brain | Cell lines |
| Total number of expressed genes | 16,818 | 18,431 |
| Total number of isoforms | 28,571 | 32,714 |
| Number of protein-coding transcripts | 27,524 | 31,641 |
| Number of noncoding transcripts | 1,047 | 1,073 |
| Number of long ncRNAs | 760 | 808 |
| Number of genes generated one isoform | 11,048 | 11,697 |
| Number of genes generated two isoforms | 2,933 | 3,334 |
| Number of genes generated three isoforms | 1,352 | 1,548 |
| Number of genes generated four isoforms | 699 | 839 |
| Number of genes generated five isoforms | 352 | 446 |
| Number of genes generated six or more isoforms | 434 | 567 |
In this table, only those isoforms with expression level equal or greater than 0.1 RPKM were took into account.
Figure 1Protein coding capacity and expression of brain and cell line protein-coding transcripts and long ncRNAs.
A, B are coding capacities of brain and cell line protein-coding transcripts and long ncRNAs, shown as the cumulative distribution of CPC scores. C, D are expression levels of brain and cell line protein-coding transcripts and long ncRNAs, shown as the bar plot distribution of expression levels, in reads per kilobase of exonic sequence per million aligned reads (RPKM).
Figure 2Comparison of the expression between brain and cell lines on the gene level and isoform level.
A is the comparison in the number of expressed genes between brain and cell lines. B is the comparison in the number of expressed isoforms between brain and cell lines. C is the comparison in percentage between brain and cell line of expressed genes and isoforms.
Figure 3RT-PCR validation of the expression profiles of long ncRNAs in human diseases.
A is the expression profiles of long ncRNAs in breast cancer cells (MCF-7, MDA-MB-231) and normal breast cells (MCF-10A). B is the expression profiles of long ncRNAs in lung cancer cells (A549, H1299: non-small cell lung carcinoma) and normal lung cells (lung fibroblast). GAPDH was used as an expression control. Four long ncRNAs are all differentially expressed in breast cancer versus normal breast cells, and two differentially expressed (“Pred10150” and “Pred32359”) between lung cancer and normal lung cells.