| Literature DB >> 27004791 |
Jianying Li1,2,3, Pierre R Bushel4,5.
Abstract
BACKGROUND: RNA sequencing (RNA-Seq) measures genome-wide gene expression. RNA-Seq data is count-based rendering normal distribution models for analysis inappropriate. Normalization of RNA-Seq data to transform the data has limitations which can adversely impact the analysis. Furthermore, there are a few count-based methods for analysis of RNA-Seq data but they are essentially for pairwise analysis of treatment groups or multiclasses but not pattern-based to identify co-expressed genes.Entities:
Keywords: Cancer; Clustering; EPIG-Seq; Gene expression; Pattern analysis; RNA-Seq; Toxicogenomics
Mesh:
Year: 2016 PMID: 27004791 PMCID: PMC4804494 DOI: 10.1186/s12864-016-2584-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
EPIG-Seq configuration
| Data type | Count level |
|---|---|
| Distribution assumption | Poisson |
| Correlation measurement | CYs |
| Spread of the data | Dispersion |
| Magnitude of change | Wilcoxon test Z-Statistic |
| Dynamic range | Variance-to-mean ratio (VMR) |
Fig. 1EPIG-Seq workflow. The workflow depicts the main steps of EPIG-Seq. The parameters are used in steps 1 and 2 to extract the patterns and cluster the genes respectively. The output is the statistically significant patterns with co-expressed genes
Fig. 2EPIG-Seq GUI. The EPIG-Seq GUI contains a main panel which allows users to define parameters for steps 1 and 2 of the analysis process. A dialog box displays the processing status and a command window displays the dependent processes running in the background
Fig. 3EPIG-Seq analysis of the toxicogenomics MOA data. a Thumbnail plots of the gene expression profiles that are the representatives (those with the highest PCS) of each of the extracted patterns from the toxicogenomics MOA data. The title of each thumbnail plot indicates the number of the pattern extracted and the gene symbol. MOA groups are color-coded as follows: Control (green), AhR 2 (red), CAR/PXR (yellow), Cytotox (light blue), DNA Damage (blue) and PPARA (pink), with 9 samples (groups of 3 biological replicates per chemical) in each. The y-axis is the log base 2 ratio of each sample data RPM relative to the average of the control. b The heat map representation of the genes clustered to the four extracted patterns from the EPIG-Seq analysis of the toxicogenomics MOA data. The symbols of the genes are shown to the left of the heat map with the 4 colors indicative of the pattern number assigned to. The columns indicate the chemicals within each of the MOA groups. The color scale represents the log base 2 ratio of each sample data relative to the average of the control. c PCA of the toxicogenomics MOA data using the CYs correlation measures of the genes clustered to the patterns by EPIG-Seq. The groups are color-coded as denoted in the legend. The x-axis is PC1, the y-axis is PC2 and the z-axis is PC3
Co-expressed genes from EPIG-Seq analysis of the MOA RNA-Seq data
| Pattern # | Genebank Acc. # | Symbol | Description |
|---|---|---|---|
| 1 | NM_001040019 | Acaa1b | Acetyl-Coenzyme A acyltransferase 1B |
| 1 | NM_001108181 | Acad11 | Acyl-CoA dehydrogenase family, member 11 |
| 1 | NM_001137643 | Gstt3 | Glutathione S-transferase, theta 3 |
| 1 | NM_012600 | Me1 | Malic enzyme 1, NADP(+)-dependent, cytosolic |
| 1 | NM_017321 | Aco1 | Aconitase 1, soluble |
| 1 | NM_031559 | Cpt1a | Carnitine palmitoyltransferase 1a, liver |
| 1 | NM_057197 | Decr1 | 2,4-dienoyl CoA reductase 1, mitochondrial |
| 1 | NM_130826 | Hadha | Hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA hydratase (trifunctional protein), alpha subunit |
| 1 | NM_175837 | Cyp4a1 | Cytochrome P450, family 4, subfamily a, polypeptide 1 |
| 2 | NM_001004086 | Pon3 | Paraoxonase 3 |
| 2 | NM_001025720 | Dhtkd1 | Dehydrogenase E1 and transketolase domain containing 1 |
| 2 | NM_001037180 | Fkbp8 | FK506 binding protein 8 |
| 2 | NM_001105965 | Dpt | Dermatopontin |
| 2 | NM_001109604 | Tmem86b | Transmembrane protein 86B |
| 2 | NM_053995 | Bdh1 | 3-hydroxybutyrate dehydrogenase, type 1 |
| 2 | NM_139102 | Dmgdh | Dimethylglycine dehydrogenase |
| 2 | XM_002728268 | NA | NA |
| 2 | XM_002728512 | NA | NA |
| 2 | XM_002728876 | NA | NA |
| 3 | NM_001004271 | Ugt2b15 | UDP glucuronosyltransferase 2 family, polypeptide B15 |
| 3 | NM_001007701 | Tram1 | Translocation associated membrane protein 1 |
| 3 | NM_001013110 | Tf | Transferrin |
| 3 | NM_001033868 | Surf4 | Surfeit 4 |
| 3 | NM_012738 | Apoa1 | Apolipoprotein A-I |
| 3 | NM_012998 | P4hb | Prolyl 4-hydroxylase, beta polypeptide |
| 3 | NM_017013 | Gsta2 | Glutathione S-transferase alpha 2 |
| 3 | NM_021766 | Pgrmc1 | Progesterone receptor membrane component 1 |
| 3 | NM_138547 | Akr1c14 | Aldo-keto reductase family 1, member C14 |
| 3 | NM_175843 | NA | NA |
| 4 | NM_022521 | Oat | Ornithine aminotransferase |
| 4 | NM_031332 | Slc22a8 | Solute carrier family 22 (organic anion transporter), member 8 |
| 4 | NM_134382 | Elovl5 | ELOVL fatty acid elongase 5 |
| 4 | NM_173305 | Hsd17b6 | Hydroxysteroid (17-beta) dehydrogenase 6 |
GO biological processes of MOA clustered genes
| Pattern # | # of Genes | Top GOBP |
| FDR |
|---|---|---|---|---|
| 1 | 9 | GO:0006631 - Fatty acid metabolic process | 3.8E-06 | 4.4E-03 |
| 2 | 10 | GO:0055114 - Oxidation reduction process | 2.3E-02 | 2.1E + 01 |
| 3 | 10 | GO:0042592 - Homeostatic process | 6.0E-02 | 5.5E + 01 |
| 4 | 4 | - | - | - |
GOBP Gene Ontology biological processes filtered to remove very broad GO terms
EPIG-Seq clustering cohesiveness of patterns extracted from the TCGA breast cancer sampled data
| Sample # | GS | MS | # of Patterns | # of Genes |
|---|---|---|---|---|
| 1 | 0.31 | 0.54 | 6 | 192 |
| 2 | 0.37 | 0.51 | 4 | 169 |
| 3 | 0.21 | 0.52 | 6 | 344 |
| 4 | 0.41 | 0.59 | 4 | 197 |
GS general silhouette, MS Maximal silhouette
Agreement of clusters extracted from the TCGA breast cancer sampled data
| Samples compared | Agreement |
|---|---|
| 1 vs 2 | 0.770 |
| 1 vs 3 | 0.524 |
| 1 vs 4 | 0.452 |
| 2 vs 3 | 0.691 |
| 2 vs 4 | 0.751 |
| 3 vs 4 | 0.500 |
All comparisons based on AMI except for those with sample 2 where concordance was used
Pathway enrichment of breast cancer co-expressed genes
| Pattern # | # of Genes | Enriched Pathways (example of co-expressed genes) |
| FDR |
|---|---|---|---|---|
| 1 | 45 | GO:0006260 - DNA replication (PCNA, TOP2A, S100A14) | 1.10E-05 | 1.70E-02 |
| 2 | 182 | GO:0006886 - Intracellular protein transport (ERBB3, PTMS, SLC25A5, SLC9A3R1, SOX4, STAT1) | 1.1E-06 | 1.8E-03 |
| 3 | 9 | BST2, C17orf37, CEACAM6, IFI27, IFI6, MX1, OAS3, RAB31, TPX2 | - | - |
| 4 | 40 | KEGG:03320 - Peroxisome proliferator-activated receptor signaling pathway (TRIM29, PDK4, FOSB, CD36) | 1.0E-04 | 9.1E-02 |
| 5 | 41 | GO:0010033 - Response to organic substance (ANXA1, CD34, EGR1, FOS, TGFBR2) | 1.3E-04 | 2.0E-01 |
| 6 | 27 | GO:0006414 - Translation elongation (CD59, ITGB1, ribosomal protein genes) | 2.80E-05 | 3.90E-02 |
Fig. 4PCNA expression. a Gene expression of PCNA in TCGA normal and breast cancer samples. The x-axis denotes the breast cancer tumor subtype. The y-axis is the average of the log base 2 ratio of PCNA in each tumor subtype relative to the average of the normal samples. Standard error bars are shown for each data point. b PCNA protein immunohistochemistry staining of normal breast tissue with benign adenomas from a female age 23 (ID: 2773) and using the HPA030522 antibody. c PCNA protein immunohistochemistry staining of breast cancer tissue (ductal carcinoma) from a female age 55 (ID: 2773) and using the HPA030522 antibody