| Literature DB >> 23289441 |
Yuhua Su1, Dahlia Nielsen, Lei Zhu, Kristy Richards, Steven Suter, Matthew Breen, Alison Motsinger-Reif, Jason Osborne.
Abstract
: A bivariate mixture model utilizing information across two species was proposed to solve the fundamental problem of identifying differentially expressed genes in microarray experiments. The model utility was illustrated using a dog and human lymphoma data set prepared by a group of scientists in the College of Veterinary Medicine at North Carolina State University. A small number of genes were identified as being differentially expressed in both species and the human genes in this cluster serve as a good predictor for classifying diffuse large-B-cell lymphoma (DLBCL) patients into two subgroups, the germinal center B-cell-like diffuse large B-cell lymphoma and the activated B-cell-like diffuse large B-cell lymphoma. The number of human genes that were observed to be significantly differentially expressed (21) from the two-species analysis was very small compared to the number of human genes (190) identified with only one-species analysis (human data). The genes may be clinically relevant/important, as this small set achieved low misclassification rates of DLBCL subtypes. Additionally, the two subgroups defined by this cluster of human genes had significantly different survival functions, indicating that the stratification based on gene-expression profiling using the proposed mixture model provided improved insight into the clinical differences between the two cancer subtypes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23289441 PMCID: PMC3618031 DOI: 10.1186/1479-7364-7-2
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Possible categories of
| 0 | (NDE,NDE) | (0,0) | 0 |
| 1 | (pDE,pDE) | (+,+) | |
| 2 | (nDE,nDE) | (−,−) | |
| 3 | (pDE,nDE) | (+,−) | |
| 4 | (nDE,pDE) | (−,+) | |
| 5 | (NDE,pDE) | (0,+) | 0 |
| 6 | (NDE,nDE) | (0,−) | 0 |
| 7 | (pDE,NDE) | (+,0) | 0 |
| 8 | (nDE,NDE) | (−,0) | 0 |
The -means clustering results
| ABC DLBCL | ||
| GCB DLBCL |
Summary of parameter estimates for the bivariate mixture model averaged over the 156 LOOCV outcomes
| | ||||||
|---|---|---|---|---|---|---|
| 0 | 0.823(0.077) | NE | NE | 0.013(0.001) | NE | 0.012(0.001) |
| 1 | 0.001(0.005) | 0.341(0.081) | 0.022(0.130) | 0.004(0.010) | 0.001(0.017) | 0.022(0.060) |
| 2 | 0.001(0.001) | -0.602(0.243) | -0.831(0.193) | 0.199(0.082) | -0.128(0.048) | 0.089(0.036) |
| 3 | 0.001(0.004) | 0.495(0.195) | -0.564(0.244) | 0.042(0.078) | -0.042(0.041) | 0.119(0.074) |
| 4 | 0.000(0.003) | -1.131(0.422) | 0.758(0.305) | 0.000(0.028) | 0.000(0.015) | 0.000(0.050) |
| 5 | 0.020(0.008) | NE | 0.492(0.121) | 0.020(0.021) | NE | 0.058(0.049) |
| 6 | 0.011(0.004) | NE | -0.517(0.136) | 0.034(0.093) | NE | 0.040(0.044) |
| 7 | 0.130(0.077) | 0.331(0.038) | NE | 0.018(0.009) | NE | 0.025(0.001) |
| 8 | 0.012(0.008) | -0.478(0.339) | NE | 0.011(0.051) | NE | 0.042(0.054) |
Numbers in parentheses are the bootstrap standard errors; B=5,000 (the number of bootstrap replications); NE not estimated.
Figure 1Scatter plots of. (a) all orthologs, (b) orthologs differentially expressed in both species (categories (1, 2, 3, and 4)), (c) orthologs for which the corresponding human genes are differentially expressed (categories (1, 2, 3, 4, 5, and 6)), (d) orthologs identified by analyzing the human data alone and controlling FDR at 0.00001, and (e) orthologs identified by analyzing the human data alone and controlling FDR at 0.01.
Misclassification tables using different criteria
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| ABC DLBCL | 72 | 5 | 64 | 13 | 58 | 19 | 0 | 77 |
| GCB DLBCL | 12 | 67 | 3 | 76 | 0 | 79 | 2 | 77 |
Figure 2Kaplan-Meier survival probability estimates for the dog and human lymphoma study. (a) No stratification, (b) stratification based on the results of gene-expression profiling performed in [6] and [7], and (c) stratification based on the gene-expression profiling resulted from the proposed nine-component mixture model. Numbers in parentheses are medians.
Summary of the gene-specific information (retrieved from Entrez Gene, an NCBI’s database for gene-specific information)
| 1 | 486810 | 156 | CD39; ATPDase; | Ectonucleoside triphosphate | |
| | | | | FLJ40921; FLJ40959; | diphosphohydrolase 1 |
| | | | | NTPDase-1; DKFZp686D194; DKFZp686I093; ENTPD1 | |
| 1278 | 2 | 403824 | 156 | OI4; COL1A2 | Collagen, type I, alpha 2 |
| 2530 | 1 | 448804 | 156 | MGC26465; FUT8 | Fucosyltransferase 8 (alpha (1,6) fucosyltransferase) |
| 2 | 609006 | 156 | TTG2; RBTN2; RHOM2; | LIM domain only 2 (rhombotin-like 1) | |
| | | | | RBTNL1; LMO2 | |
| 3 | 486631 | 156 | JAW1; LRMP | Lymphoid-restricted membrane protein | |
| 5919 | 2 | 475532 | 156 | TIG2; HP10433; RARRES2 | Retinoic acid receptor responder (tazarotene induced) 2 |
| 3 | 612789 | 124 | MGC17173; RGS13 | Regulator of G-protein signaling 13 | |
| 6856 | 3 | 475889 | 154 | SYPL; H-SP1; SYPL1 | Synaptophysin-like 1 |
| 6925 | 1 | 403949 | 156 | E2-2; ITF2; PTHS; SEF2; | Transcription factor 4 |
| | | | | SEF2-1; SEF2-1A; SEF2-1B; | |
| | | | | bHLHb19; MGC149723; | |
| | | | | MGC149724; TCF4 | |
| 7037 | 1 | 403703 | 11 | TFR; CD71; TFR1; TRFR; | Transferrin receptor (p90, CD71) |
| | | | | TFRC | |
| 9435 | 1 | 485701 | 156 | C6ST; GST2; GST-2; Gn6ST-1; CHST2 | Carbohydrate ( |
| 9760 | 2 | 486964 | 156 | TOX1; KIAA0808; TOX | Thymocyte selection-associated high mobility group box |
| 10447 | 3 | 612336 | 154 | ILEI; GS3786; FAM3C | Family with sequence similarity 3, member C |
| 23075 | 3 | 485385 | 4 | HSPC321; SWAP-70; | SWAP switching B-cell complex 70kDa |
| | | | | FLJ39540; KIAA0640; | subunit |
| | | | | SWAP70 | |
| 25816 | 1 | 481428 | 156 | GG2-1; SCCS2; SCC-S2; | Tumor necrosis factor, alpha-induced |
| | | | | MDC-3.13; TNFAIP8 | protein 8 |
| 1 | 484692 | 156 | QRF1; 12CC4; hFKH1B; | Forkhead box P1 | |
| | | | | HSPC215; FLJ23741; | |
| | | | | MGC12942; MGC88572; | |
| | | | | MGC99551; FOXP1 | |
| 56941 | 2 | 484628 | 156 | DC12; MGC111075; | Chromosome 3 open reading frame 37 |
| | | | | C3orf37 | |
| 81552 | 1 | 608562 | 145 | ECOP; GASP; FLJ20532; | Vesicular, overexpressed in cancer, |
| | | | | DKFZp564K0822; VOPP1 | prosurvival protein 1 |
| 81641 | 1 | 479002 | 116 | Apm; Apn; KZP; AP-M; AP-N; Lap1; rAPN; Anpep | Alanyl (membrane) aminopeptidase |
| 121355 | 4 | 477590 | 156 | FAM112B; FLJ32942; | Gametocyte specific factor 1 |
| | | | | GTSF1 | |
| 219972 | 1 | 475960 | 156 | MPG1; MGC132657; | Macrophage expressed 1 |
| MGC138435; MPEG1 |
For the 21 human genes in categories (1, 2, 3, and 4) determined by the bivariate mixture model.
Genes in italics were identified by this search as potentially associated with the development of DLBCL.
anumber of times the corresponding gene was found in the classification gene set for the 156 LOOCV instances.