Literature DB >> 27634898

Differential expression analysis for individual cancer samples based on robust within-sample relative gene expression orderings across multiple profiling platforms.

Qingzhou Guan1, Rou Chen1, Haidan Yan1, Hao Cai1, You Guo1,2, Mengyao Li1, Xiangyu Li1, Mengsha Tong1, Lu Ao1, Hongdong Li1, Guini Hong1, Zheng Guo1.   

Abstract

The highly stable within-sample relative expression orderings (REOs) of gene pairs in a particular type of human normal tissue are widely reversed in the cancer condition. Based on this finding, we have recently proposed an algorithm named RankComp to detect differentially expressed genes (DEGs) for individual disease samples measured by a particular platform. In this paper, with 461 normal lung tissue samples separately measured by four commonly used platforms, we demonstrated that tens of millions of gene pairs with significantly stable REOs in normal lung tissue can be consistently detected in samples measured by different platforms. However, about 20% of stable REOs commonly detected by two different platforms (e.g., Affymetrix and Illumina platforms) showed inconsistent REO patterns due to the differences in probe design principles. Based on the significantly stable REOs (FDR<0.01) for normal lung tissue consistently detected by the four platforms, which tended to have large rank differences, RankComp detected averagely 1184, 1335 and 1116 DEGs per sample with averagely 96.51%, 95.95% and 94.78% precisions in three evaluation datasets with 25, 57 and 58 paired lung cancer and normal samples, respectively. Individualized pathway analysis revealed some common and subtype-specific functional mechanisms of lung cancer. Similar results were observed for colorectal cancer. In conclusion, based on the cross-platform significantly stable REOs for a particular normal tissue, differentially expressed genes and pathways in any disease sample measured by any of the platforms can be readily and accurately detected, which could be further exploited for dissecting the heterogeneity of cancer.

Entities:  

Keywords:  differentially expressed genes; gene expression profiling; heterogeneity of cancer; individual level; multiple platforms

Mesh:

Year:  2016        PMID: 27634898      PMCID: PMC5356599          DOI: 10.18632/oncotarget.11996

Source DB:  PubMed          Journal:  Oncotarget        ISSN: 1949-2553


INTRODUCTION

Recently, we have reported an interesting biological phenomenon that the within-sample relative expression orderings (REOs) of gene pairs in a particular type of normal tissue are highly stable but widely reversed in the corresponding cancer tissue. Based on this finding, we have developed an algorithm, named RankComp [1], to identify differentially expressed genes (DEGs) and deregulated pathways in each disease tissue in comparison with its own previously normal state by exploiting the reversal REO patterns of this disease sample [1]. Totally different from the traditional population-level case-control comparison methods, such as T-Test [2], SAM [3], Limma [4] and RanProd [5], the individual-level analysis can capture the heterogeneity of cancer and help us study cancer subtype-specific mechanisms and develop cancer prognostic biomarkers [6]. Especially, RankComp is an economic and efficient method which can identify DEGs for individual disease samples measured by different laboratories by fully using previously accumulated gene expression data of normal samples [1]. It is well known that gene expression profiling is susceptible to various technical artifacts or ‘batch effects’ introduced by differences in laboratory conditions, reagent lots and personals [7-13], especially when studies have to be carried out over a long period of time or when clinical specimens originate from different hospitals [9]. Current batch effect adjustment or data normalization algorithms, such as DWD [8], XPN [14], PAMR [15], SVA [16] and EB [17], are usually inadequate and may even distort the real biological signals [18, 19]. In contrast, because the within-sample REOs of gene pairs are insensitive to experimental batch effects [20-22], the application of RankComp obviates the need of batch effect adjustment and inter-sample normalization. As validated in our previous studies, the within-sample REOs are highly reproducible and comparable between data produced by different laboratories using the same or similar platforms [1, 6]. However, the within-sample REOs may be subject to a certain degree of uncertainty in samples measured by different platforms due to the differences in probe design principles. Thus, it is necessary to further evaluate the cross-platform properties of within-sample REOs in order to extend the application scope of the individual-level differential expression analysis. Another problem of the current RankComp algorithm is that it is based on REOs that are highly stable in a pre-defined percentage (e.g., 99%) of normal samples, which is lack of statistical control and may limit the detection power of DEGs in individual samples. Thus, it is also necessary to evaluate the performance of RankComp when using significantly stable gene pairs, selected with statistical control rather than a pre-defined percentage, in a particular type of normal tissue as the basis for the individual-level differential expression analysis. In this article, we firstly compared gene expression profiles generated with four commonly used gene expression profiling platforms (Affymetrix, Illumina, Agilent microarray platforms and a RNA-sequencing platform) for normal lung and colorectal tissues, respectively. For each type of normal tissue, we showed that tens of millions of gene pairs with significantly stable REOs, especially those with large expression differences, can be consistently detected in samples measured by different platforms. Then, we showed that, comparing with RankComp based on gene pairs with highly stable REOs in at least 99% normal samples, RankComp based on significantly stable REOs in normal samples can detect much more DEGs for each disease sample with slightly decrease of precision. Finally, based on the individual-level differential expression analysis, we applied the individual-level pathway analysis to reveal some common and subtype-specific functional mechanisms of lung adenocarcinoma and colon adenocarcinoma, respectively.

RESULTS

Significantly stable REOs in normal samples measured by four platforms

For a particular type of normal tissue, we focused on evaluating the consistency between the within-sample relative expression orderings (REOs) in samples separately measured by four platforms, including three commonly used microarray platforms (Affymetrix, Illumina, Agilent) and a RNA-sequencing platform. All the data used in this study are described in Table 1 and the flowchart of the analysis procedure is described in Figure 1.
Table 1

Description of normal sample data and paired cancer-normal sample data used in this study

GEO Acc or Data sourcePlatformNormal sample sizeaTumor sample size
The normal sample data for REO evaluation
Lung
SetAGSE19804Affymetrix GPL57060
GSE18842Affymetrix GPL57045
GSE27262Affymetrix GPL57025
GSE31210Affymetrix GPL57020
SetBGSE19188Affymetrix GPL57065
SetAGSE32863Illumina GPL688458
SetBGSE31267Illumina GPL694724
SetAGSE40588Agilent GPL648060
SetBGSE15197Agilent GPL648013
GSE57148bIllumina GPL1115491
Colorectal
SetAGSE21510Affymetrix GPL57025
GSE18105Affymetrix GPL57017
GSE4107Affymetrix GPL57010
SetBGSE8671Affymetrix GPL57032
SetAGSE56789Illumina GPL1055840
SetBGSE31279Illumina GPL610432
GSE43841Illumina GPL149516
SetAGSE46271Agilent GPL1455022
GSE50114Agilent GPL64809
SetBGSE28000Agilent GPL413323
GSE50760bIllumina GPL1115418
The paired cancer-normal sample data for the performance of RankComp evaluation
Lung
GSE27262Affymetrix GPL5702525
GSE32863Illumina GPL68845757
TCGA_luadbIlluminaHiSeq_RNASeqV25858
Colorectal
GSE8671Affymetrix GPL5703232
GSE31279Illumina GPL61043232
TCGA_coadbIlluminaHiSeq_RNASeqV22626

Note:

To determine stable gene pairs for a particular type of normal tissue, only the normal sample sizes were described for the datasets.

Denotes mRNA_seq data, especially TCGA_luad and TCGA_coad denote paired lung adenocarcinoma and colon adenocarcinoma samples from TCGA, respectively.

Figure 1

The flowchart of the analysis procedure

Note: To determine stable gene pairs for a particular type of normal tissue, only the normal sample sizes were described for the datasets. Denotes mRNA_seq data, especially TCGA_luad and TCGA_coad denote paired lung adenocarcinoma and colon adenocarcinoma samples from TCGA, respectively. Firstly, for the Affymetrix platform, we collected a set of 150 normal lung tissue samples from four datasets (GSE19804, GSE18842, GSE27262 and GSE31210) and another set of 65 normal lung tissue samples from the GSE19188 dataset, referred to as SetA and SetB, respectively. From SetA and SetB, 197,546,446 and 195,767,556 significantly stable REOs (binomial test, FDR< 0.01) were identified, respectively. The two lists of significantly stable REOs had 190,118,028 overlaps, among which 98% showed the same REO patterns in SetA and SetB, indicating that the significantly stable REOs of gene pairs were highly reproducible (binomial test, p<1.0-16). As shown in Table 2, 94.34% of the significantly stable REOs for SetA with 150 samples could be found in SetB with 65 samples, and 95.19% of the significantly stable REOs for SetB could be found in SetA. To further evaluate the sample size required for detecting significantly stable REOs, we resettled the GSE31210 dataset with 20 samples as SetB' and all the other 195 samples as SetA' and found that 88.50% of the significantly stable REOs found in SetA' could be found in SetB'. Similar results were observed in the normal lung tissue samples measured by the Illumina and Agilent platforms, respectively, as described in Table 2. Especially, for the data measured by the Agilent platform, 89.06% of the significantly stable REOs found in SetA with 58 samples could be found in SetB with 24 samples. For the data measured by the Illumina platform, 78.33% of the significantly stable REOs detected from SetA with 60 samples could be found in SetB with only 13 samples. Similar results were observed for the normal colorectal tissue, as described in Table 2.
Table 2

Reproducibility of significantly stable REOs in normal samples measured by each of the platforms

LabelNormal sample sizeGene#Number of stable REOsNumber of overlapsPOG12POG21ConsistencyP
Lung
AffymetrixSetA15020283197,546,446190,118,0280.94340.95190.9802<1.0-16
SetB65195,767,556
IlluminaSetA5823364251,964,302231,498,8340.89060.90610.9694<1.0-16
SetB24247,667,868
AgilentSetA6019596181,534,752151,185,2410.78330.91050.9406<1.0-16
SetB13156,176,364
Colorectal
AffymetrixSetA5220283193,475,574184,134,7740.91360.91350.96<1.0-16
SetB32193,501,698
IlluminaSetA4017789148,902,375131,019,2850.80480.87230.9147<1.0-16
SetB38137,385,589
AgilentSetA3118583145,935,881121,390,8450.80990.870.9736<1.0-16
SetB23135,855,195

Note:

denotes the number of genes of SetA and setB measured by a particular platform. POG12 (or POG21) denotes the percentage of the significantly stable gene pairs (FDR<0.01) detected from SetA (or SetB) that are consistently detected in SetB (or SetA). Consistency denotes the percentage of overlapped gene pairs that display the same REO patterns between SetA and SetB and P denotes the significance of the consistency.

Note: denotes the number of genes of SetA and setB measured by a particular platform. POG12 (or POG21) denotes the percentage of the significantly stable gene pairs (FDR<0.01) detected from SetA (or SetB) that are consistently detected in SetB (or SetA). Consistency denotes the percentage of overlapped gene pairs that display the same REO patterns between SetA and SetB and P denotes the significance of the consistency. Notably, for both the normal lung and colorectal tissue, the gene pairs with significantly stable REOs found in each dataset involved all genes measured by the corresponding platform. For each type of tissue, when combining all the samples measured by a platform, above 80% of all the possible gene pairs were significantly stable (FDR<0.01), as shown in Figure 2. In addition, above 80% of the significantly stable REOs detected in the combined data measured by a platform could be found with a relatively small sample size (about 20 samples), as described in Supplementary Table S1.
Figure 2

The percentage of the gene pairs with significantly stable REOs (FDR<0.01) in all measured gene pairs

The above results together indicated that the REOs of gene pairs are widely stable in a particular type of human normal tissue and most of them could be found with only about 20 samples.

Cross-platform significantly stable REOs in normal tissue samples

Then, we evaluated the consistency between the significantly stable REOs detected by different platforms. For each of the three microarray platforms, we defined the REOs consistently detected in SetA and SetB measured by the platform for a type of normal tissue as the stable REOs of the platform for this type of normal tissue. Especially, we only analyzed the gene pairs consisting of genes commonly detected by all the four platforms. For the 94,145,902 significantly stable REOs detected from the normal lung tissue samples measured by the Affymetrix platform, 85.50% were also detected from the data measured by the Illumina platform, among which 82.37% showed the same REO patterns in the samples measured by the two platforms (binomial test, p<1.0-16). For the 66,305,728 significantly stable REOs consistently detected by the above two platforms, 79.91% were included in the 77,825,426 significantly stable REOs found in the data measured by the Agilent platform and the consistency increased to 92.1%. Furthermore, for the 48,802,858 significantly stable REOs consistently detected by the three microarray platforms, 98.01% were included in the 99,202,212 significantly stable REOs (binomial test, FDR<0.01) detected by the RNA-sequencing platform and the consistency further increased to 96.79%. Totally 46,295,854 gene pairs with significantly stable REOs (FDR<0.01) were consistently detected by the four platforms for the normal lung samples. Similar results were also observed for normal colorectal tissue, as described in Table 3.
Table 3

Cross-platform evaluation of the significantly REOs for normal tissues

Number of stable REOsNumber of overlapsPOG12POG21ConsistencyP
lung
Affymetrix94,145,90280,493,9150.70430.74710.8237<1.0-16
Illumina88,746,864
Affy_Illu66,305,72852,986,9970.7360.62710.921<1.0-16
Agilent77,825,426
Affy_Illu_Agi48,802,85847,832,8440.94860.46670.9679<1.0-16
RNA_seq (GSE57148)99,202,212
Colorectal
Affymetrix100,855,01278,495,7900.67290.75690.8645<1.0-16
Illumina89,653,488
Affy_Illu67,862,35152,201,9600.73470.62230.9551<1.0-16
Agilent80,116,625
Affy_Illu_Agi49,856,95948,851,7490.96620.44530.9861<1.0-16
RNA_seq (GSE50670)108,187,244

Note: Affy_Illu denotes stable gene pairs consistently detected from the data measured by Affymetrix and Illumina platforms. Similarly, Affy_Illu_Agi denotes stable gene pairs consistently detected from the data measured by Affymetrix, Illumina and Agilent platforms.

Note: Affy_Illu denotes stable gene pairs consistently detected from the data measured by Affymetrix and Illumina platforms. Similarly, Affy_Illu_Agi denotes stable gene pairs consistently detected from the data measured by Affymetrix, Illumina and Agilent platforms. The above results indicated that the cross-platform ability of significantly stable REOs increases as the significantly stable REOs can be consistently detected in increasing number of platforms. A possible explanation could be that the REOs kept across multiple platforms with different probe design principles tend to have large rank differences which are difficult to be reversed by the probe detection biases of various platforms. To illustrate this possibility, for the significantly stable REOs commonly detected by the Affymetrix and Illumina platforms for normal lung tissue, we compared the rank differences between the gene pairs with consistent REOs and the gene pairs with inconsistent REOs detected by the two platforms using the GSE19188 dataset. As expected, the median of the rank differences of gene pairs with consistent REOs was 6855, which was significantly larger than that (median=3144) of the gene pairs with inconsistent REOs (Wilcoxon rank sum test, p<1.0-16). In summary, for a particular type of human normal tissue, significantly stable within-sample REOs especially for gene pairs with large expression differences are largely consistent across samples measured by different platforms.

Individualized DEGs detection based on cross-platform significantly stable REOs

As described above, totally 46,295,854 gene pairs with significantly stable REOs (FDR<0.01) were consistently detected in the normal lung samples separately measured by the four platforms. Based on these significantly stable REOs, we applied RankComp [1] to detect differentially expressed genes (DEGs) in a given cancer sample compared with its own previously (usually unknown) normal state. The detail of the RankComp algorithm was described in [1] and briefly in the METHODS section. We evaluated the performance of RankComp using cancer samples with paired adjacent normal samples, assuming that the previously normal state of a cancer tissue could be approximately represented by the adjacent normal tissue of the cancer tissue. After identifying DEGs for each cancer sample, we evaluated the precision of DEGs identified for this cancer sample using the observed expression differences (up- or down-regulations) between the cancer sample and the paired adjacent normal sample as the benchmark (see Methods). Considering that other individual-specific factors irrelevant to the cancer condition may induce transcriptional alternations in the cancer sample, we focused on individualizing the population-level DEGs predetermined to ensure the DEGs identified in individual cancer samples to be associated with cancer. We also evaluated the performance of RankComp for individualized differential expression analysis based on the highly stable REOs (stable in at least 99% of the samples) consistently detected in the normal lung and colorectal tissue samples measured by the four platforms, respectively. For lung cancer, using the GSE19188 and GSE19804 datasets, we firstly detected 12,359 and 8,681 DEGs (Student's t-test, FDR<0.01) between the cancer and normal tissues, respectively. The two lists of DEGs had 6,929 overlaps, among which 98.46% showed the same deregulation directions in the cancer tissues in the two datasets (binomial test, p<1.0-16). We defined the 6,822 reproducible DEGs as the population-level DEGs for lung cancer. Then, using three datasets with paired lung cancer and adjacent normal samples separately measured by three different platforms, we evaluated the performance of RankComp in individualizing the population-level DEGs. For each cancer sample, we performed RankComp based on the 46,295,854 gene pairs with significantly stable REOs (FDR<0.01) consistently detected in the normal lung samples measured by the four platforms. For the 25 lung cancer samples of the GSE27262 dataset measured by the Affymetrix platform, RankComp identified averagely 1,184 DEGs per sample with averagely 96.51% precision according to the observed expression differences between the cancer samples and their paired adjacent normal samples. We also evaluated RankComp using the GSE32863 and TCGA-luad (lung adenocarcinoma samples from TCGA) datasets which included 57 and 58 paired cancer and adjacent normal samples measured by the Illumina microarray platform and the Illumina HiSeq 2000 platform, respectively. Averagely 1,335 and 1,116 DEGs were identified per cancer sample and the average precisions were 95.95% and 94.78% for the two datasets, respectively. In contrast, based on the 21,789,916 highly stable REOs (stable in at least 99% of the samples) consistently detected in the normal lung samples measured by the four platforms, the average precision increased to 98.96%, 98.97 % and 95.65% but averagely only 392, 542 and 474 DEGs were identified per sample in the three datasets, respectively. Similar results were also observed for colorectal cancer, as shown in Figure 3.
Figure 3

RankComp based on significantly stable REOs can detect much more DEGs with slightly decreased precision for each disease sample than RankComp based on highly stable REOs (stable in above 99% samples)

The above results showed that RankComp based on significantly stable REOs exhibited greatly enhanced detection power at the cost of slightly decrease of precision, compared with RankComp based on highly stable REOs.

Individualized pathway analysis based on individualized DEGs

After identifying DEGs for a disease sample, we can detect deregulated pathways for this disease sample. Here, we analyzed all the 515 lung adenocarcinoma samples and the 285 colon adenocarcinoma samples documented in TCGA to illustrate this application. First, we detected pathways separately enriched with up- or down-regulated genes for each of the 515 lung adenocarcinoma samples (hypergeometric test, FDR<0.1) [23]. As shown in Figure 4A, some well-known cancer pathways could be commonly altered in lung adenocarcinoma samples. For examples, the ‘Osteoclast differentiation’ [24] and ‘TNF signaling’ [25] significantly enriched with down-regulated genes in about 60% of the 515 samples (FDR<0.1) and the coverage increased to above 90% with a looser significance threshold of p<0.05. In addition, the ‘Cell cycle’ [26] pathway significantly enriched with up-regulated genes in about 75% or 90% (with FDR<0.1 or p<0.05) of the 515 samples. In contrast, some pathways could be subtype-specific. For example, the ‘Fanconi anemia pathway’ significantly enriched with up-regulated genes in about 20% or 65% (with FDR<0.1 or p<0.05) of the 515 samples, indicating that it might be associated with cancer prognosis [27]. For another example, the ‘Chemokine signaling pathway’ significantly enriched with down-regulated genes in about 18% or 77% (with FDR<0.1 or p<0.05) of the 515 lung cancer samples, indicating that it might also be associated with cancer prognosis [28].
Figure 4

The KEGG pathways separately enriched with up- and down-regulated genes in at least 10% of the TCGA lung adenocarcinoma samples A. and the TCGA colon adenocarcinoma samples B

Similarly, we performed pathway enrichment analysis for each of the 285 colon adenocarcinoma samples. As shown in Figure 4B, two pathways (‘oxidative phosphorylation’ and ‘metabolic pathway’) were significant in all the 285 samples with FDR<0.1 and another five pathways (‘Mineral absorption’, ‘Cardiac muscle contraction’, ‘Fatty acid degradation’, ‘Nitrogen metabolism’ and ‘Peroxisome’) were also significant in above 90% of the 285 samples when defined with a looser significance threshold (p<0.05). Thus, these pathways, such as the ‘metabolic’ [29] and ‘oxidative phosphorylation’ [30] pathways, could be commonly altered in colon cancer. In contrast, some other pathways such as ‘valine, leucine and isoleucine degradation’ [31] and ‘Tyrosine metabolism’ [32] pathways could be subtype-specific and thus could be associated with cancer prognosis. Especially, with p<0.05, the ‘Cell cycle’ pathway significantly enriched with up-regulated genes in 89% of the 515 lung adenocarcinoma and in 82% of the 285 colon adenocarcinoma samples, indicating that this pathway might be commonly deregulated in cancer [33]. In addition, the ‘Mineral absorption’ pathway significantly enriched with down-regulated genes in 97% of the 515 lung adenocarcinoma and in 100% of the 285 colon adenocarcinoma samples, indicating that this pathway might also be commonly deregulated in cancer [34, 35]. The above results suggested that individualized pathway analysis could provide hints for revealing common and subtype-specific functional mechanisms of cancer. The functional analysis results also provided extra evidence for the authenticity of individualized DEGs at the functional level.

DISCUSSION

As demonstrated in this article, tens of millions of gene pairs with significantly stable REOs in a particular type of normal tissue, especially those with large expression differences, can be consistently detected by different platforms. This provides the basis for individual-level differential expression analysis for cancer samples measured by different platforms. Compared with RankComp based on highly stable REOs (e.g., stable in above 99% samples), RankComp based on significantly stable REOs can detect much more DEGs with slightly decrease of precision for each disease sample, as demonstrated by the results for both lung cancer and colorectal cancer. Individual-level DEGs analysis naturally enables us to perform pathway analysis at the individual level, which could reveal common functional mechanisms as well as subtype-specific functional mechanisms of cancer. This is totally different from the traditional population-level pathway analysis which cannot discriminate whether a significant pathway is altered in a group of patients (i.e., a subtype) or all patients. Furthermore, our results showed that almost all gene pairs had significantly stable REOs across samples for a given normal tissue. This indicated that the relative ordering of gene expression is overall stable in a particular type of normal human tissue, indicating that genes may need to express in a comprehensive coordination structure to carry normal function systematically [1]. Based on the significantly stable REOs consistently detected by multiple platforms for a particular type of tissue, DEGs and deregulated pathways for any disease sample measured by any of these platforms can be readily detected. This could be of particular valuable when we need to analyze multiple datasets of disease samples measured by different platforms to identify and validate various cancer signatures (such as prognostic signatures) [36]. Moreover, for a particular normal tissue, our result showed that, the significantly stable REOs consistently detected in more platforms tend to be more likely to remain consistent in a new platform. Especially, almost all (above 97%) significantly stable REOs consistently detected (binomial test, FDR<0.01) by the three microarray platforms could be reproducibly found by the RNA-sequencing platform. This might be helpful for analyzing disease samples measured by a less commonly used platforms when no or insufficient normal samples are measured by the platform for determining the stable REOs for this platform. Notably, a major limitation of selecting cross–platform stable REOs is that many truly stable REOs could be lost. As shown in Table 3, about half of the stable REOs detected by a particular platform will be lost when screening the stable REOs from samples measured by the four platforms for lung and colorectal tissue. Although our results indicated that it might be sufficient to detect DEGs based on the cross-platform stable REOs, the effects of using a certain percentage of stable REOs on the DEGs detection power need to be further studied. In summary, a large fraction of the widely stable REOs in a particular type of normal tissue can be consistently detected in samples measured by different platforms. By fully using previously accumulated gene expression data of normal samples, RankComp is an economic and efficient method which can identify DEGs for individual disease samples measured by different platforms. Moreover, the individual-level analysis of DEGs can also provide the possibility to identify robust diagnostic and prognostic biomarker for precision medicine [36].

MATERIALS AND METHODS

Data and preprocessing

The gene expression profiles analyzed in this study are described in Table 1. The data generated with three commonly used microarray platforms (Affymetrix, Illumina, Agilent) were downloaded from Gene Expression Omnibus [37] (GEO, http://www.ncbi.nlm.nih.gov/geo/) and the mRNA-seq data measured by RNA-sequencing platform were downloaded from GEO and TCGA [38] (http://cancergenome.nih.gov/). For the data measured by the Affymetrix platform, we downloaded the raw mRNA expression data (.CEL files) and used the Robust Multi-array Average algorithm for background adjustment [39]. For the data measured by the Illumina platform, we directly downloaded the processed data. For the data measured by the Agilent platform, we downloaded the raw fluorescent signal intensities data of the channel (gMedianSignal or rMedianSignal) for normal samples and used the intensities to minus the corresponding background signal intensities as the probe-expression matrix. Especially, for the data of TCGA, we directly downloaded the expression data of level 3. For array-based data, each probeset ID was mapped to Entrez gene ID with the corresponding platform file. If a probeset was mapped to multiple or zero gene, then the data of this probeset was deleted. If multiple probesets were mapped to the same gene, the expression value for the gene was defined as the arithmetic mean of the value of multiple probesets. For the sequence-based data form GEO, we directly downloaded the processed data and each Gene Symbol was mapped to Entrez gene ID with biological DataBase network (bioDBnet). For the sequence-based data of level 3 from TCGA, we removed genes whose expression measurements were at or below a noise threshold of 0.2 normalized counts in at least 75% of samples [40].

Identification of significantly stable REOs in normal tissue

The REO of two genes, A and B, is denoted as A >B (or B41] as follows: where n denotes the total number of normal samples, k denotes the number of samples that have a certain REO pattern (e.g., A>B or A42].

Evaluation of the reproducibility of the significantly stable REOs

We used the POG (Percentage of Overlapping Gene pairs) score [43, 44] to evaluate the reproducibility of significantly stable gene pairs identified from two independent datasets. If two lists of stable gene pairs, list 1 with length L and list 2 with length L, have n overlaps, among which k have the same REO patterns, then the POG score from list 1 (or list 2) to list 2 (or list 1), denoted as POG12 (or POG21), is calculated as k/L (or k/L), and the concordance score is calculated as k/n. The probability of observing the concordance score by chance is calculated with the binomial distribution model as described above, where p (p =0.5) is the probability of a gene pair having the same REO patterns in the two lists by chance.

Performance evaluation of RankComp

The detail of the RankComp algorithm is described in [1]. Briefly, for each cancer sample, gene pairs with reversal ordering in comparison with their stable ordering in normal samples are firstly determined as reversal gene pairs by RankComp. Then, to determine whether a given gene A is differentially expressed in a given disease sample, Fisher's exact test [45] is used to test the null hypothesis that the proportion of reversal gene pairs supporting the up-regulation of gene A is equal to the proportion of gene pairs supporting its down-regulation. For a given gene A, if its ordering is significantly lower (or higher) than that of another gene in normal samples but this REO is reversed in a cancer sample, then this reversal gene pair could support up-regulation (or down-regulation) of gene A in the cancer sample. If gene A itself is not changed in expression level, the effect of the expression changes of other genes on the upward or downward shift in the rank of gene A is assumed to be a random event. Finally, a filtering process is utilized to retain only those DEGs which are still significant with Fisher's exact test after excluding their coupled gene pairs including any other DEGs. We used paired cancer and adjacent normal samples to evaluate the performance of RankComp, assuming that the unknown previously normal state of a cancer tissue could be approximately represented by the adjacent normal tissue of the cancer tissue. After identifying DEGs for one cancer sample, if the deregulation directions (up- or down-regulations) of DEGs are consistent with the deregulation directions observed in the cancer sample compared with its own adjacent normal sample, then they are defined as true positives (TP); otherwise, false positives (FP). The precision rate is calculated as TP/(TP+FP) for each cancer sample. To ensure the association between the individualized DEGs and cancer, we restricted our evaluation to the reproducible population-level DEGs predetermined using two datasets for each type of cancer. For each cancer, the statistical significance of the concordance score between two lists of DEGs between cancer samples and normal controls, detected by Student's t-test, is calculated by the binomial distribution model as described above. Finally, the DEGs reproducibly detected in the two datasets for each cancer are defined as the population-level DEGs associated with the cancer.

The KEGG pathways

Data of 223 pathways covering 6290 unique genes were extracted from the Kyoto Encyclopedia of Genes and Genomes (KEGG) [46] on 10 May 2015. The hypergeometric distribution model is used to determine the significance of biological pathways enriched with up- and down-regulated DEGs, respectively [23]. The p-values are adjusted using the Benjamini and Hochberg procedure.
  42 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Mutational analysis of the tyrosine kinome in colorectal cancers.

Authors:  Alberto Bardelli; D Williams Parsons; Natalie Silliman; Janine Ptak; Steve Szabo; Saurabh Saha; Sanford Markowitz; James K V Willson; Giovanni Parmigiani; Kenneth W Kinzler; Bert Vogelstein; Victor E Velculescu
Journal:  Science       Date:  2003-05-09       Impact factor: 47.728

Review 3.  DNA damage, aging, and cancer.

Authors:  Jan H J Hoeijmakers
Journal:  N Engl J Med       Date:  2009-10-08       Impact factor: 91.245

4.  Extensive increase of microarray signals in cancers calls for novel normalization assumptions.

Authors:  Dong Wang; Lixin Cheng; Mingyue Wang; Ruihong Wu; Pengfei Li; Bin Li; Yuannv Zhang; Yunyan Gu; Wenyuan Zhao; Chenguang Wang; Zheng Guo
Journal:  Comput Biol Chem       Date:  2011-04-30       Impact factor: 2.877

5.  Osteoclast differentiation factor RANKL controls development of progestin-driven mammary cancer.

Authors:  Daniel Schramek; Andreas Leibbrandt; Verena Sigl; Lukas Kenner; John A Pospisilik; Heather J Lee; Reiko Hanada; Purna A Joshi; Antonios Aliprantis; Laurie Glimcher; Manolis Pasparakis; Rama Khokha; Christopher J Ormandy; Martin Widschwendter; Georg Schett; Josef M Penninger
Journal:  Nature       Date:  2010-09-29       Impact factor: 49.962

6.  Simple decision rules for classifying human cancers from gene expression profiles.

Authors:  Aik Choon Tan; Daniel Q Naiman; Lei Xu; Raimond L Winslow; Donald Geman
Journal:  Bioinformatics       Date:  2005-08-16       Impact factor: 6.937

7.  Effects of atmospheric ozone on microarray data quality.

Authors:  Thomas L Fare; Ernest M Coffey; Hongyue Dai; Yudong D He; Deborah A Kessler; Kristopher A Kilian; John E Koch; Eric LeProust; Matthew J Marton; Michael R Meyer; Roland B Stoughton; George Y Tokiwa; Yanqun Wang
Journal:  Anal Chem       Date:  2003-09-01       Impact factor: 6.986

8.  LGR5, a relevant marker of cancer stem cells, indicates a poor prognosis in colorectal cancer patients: a meta-analysis.

Authors:  Ye Han; Xiaofeng Xue; Min Jiang; Xiaobo Guo; Pu Li; Fei Liu; Bin Yuan; Yichen Shen; Xingpo Guo; Qiaoming Zhi; Hong Zhao
Journal:  Clin Res Hepatol Gastroenterol       Date:  2014-09-02       Impact factor: 2.947

9.  Common human cancer genes discovered by integrated gene-expression analysis.

Authors:  Yan Lu; Yijun Yi; Pengyuan Liu; Weidong Wen; Michael James; Daolong Wang; Ming You
Journal:  PLoS One       Date:  2007-11-07       Impact factor: 3.240

10.  Gene-pair expression signatures reveal lineage control.

Authors:  Merja Heinäniemi; Matti Nykter; Roger Kramer; Anke Wienecke-Baldacchino; Lasse Sinkkonen; Joseph Xu Zhou; Richard Kreisberg; Stuart A Kauffman; Sui Huang; Ilya Shmulevich
Journal:  Nat Methods       Date:  2013-04-21       Impact factor: 28.547

View more
  27 in total

1.  A qualitative transcriptional signature to reclassify histological grade of ER-positive breast cancer patients.

Authors:  Jing Li; Wenbin Jiang; Qirui Liang; Guanghao Liu; Yupeng Dai; Hailong Zheng; Jing Yang; Hao Cai; Guo Zheng
Journal:  BMC Genomics       Date:  2020-04-06       Impact factor: 3.969

2.  A qualitative transcriptional signature for the early diagnosis of colorectal cancer.

Authors:  Qingzhou Guan; Qiuhong Zeng; Haidan Yan; Jiajing Xie; Jun Cheng; Lu Ao; Jun He; Wenyuan Zhao; Kui Chen; You Guo; Guoxian Guan; Zheng Guo
Journal:  Cancer Sci       Date:  2019-09-03       Impact factor: 6.716

3.  Prognostic value of a microRNA-pair signature in laryngeal squamous cell carcinoma patients.

Authors:  Shu Zhou; Qingchun Meng; Zexuan Wang
Journal:  Eur Arch Otorhinolaryngol       Date:  2022-04-27       Impact factor: 3.236

4.  Discriminating cancer-related and cancer-unrelated chemoradiation-response genes for locally advanced rectal cancers.

Authors:  You Guo; Jun Cheng; Lu Ao; Xiangyu Li; Qingzhou Guan; Juan Zhang; Haidan Yan; Hao Cai; Qiao Gao; Weizhong Jiang; Zheng Guo
Journal:  Sci Rep       Date:  2016-11-15       Impact factor: 4.379

5.  An individualised signature for predicting response with concordant survival benefit for lung adenocarcinoma patients receiving platinum-based chemotherapy.

Authors:  Lishuang Qi; Yang Li; Yuan Qin; Gengen Shi; Tianhao Li; Jiasheng Wang; Libin Chen; Yunyan Gu; Wenyuan Zhao; Zheng Guo
Journal:  Br J Cancer       Date:  2016-11-17       Impact factor: 7.640

6.  Robust transcriptional signatures for low-input RNA samples based on relative expression orderings.

Authors:  Huaping Liu; Yawei Li; Jun He; Qingzhou Guan; Rou Chen; Haidan Yan; Weicheng Zheng; Kai Song; Hao Cai; You Guo; Xianlong Wang; Zheng Guo
Journal:  BMC Genomics       Date:  2017-11-28       Impact factor: 3.969

7.  Identifying CpG sites with different differential methylation frequencies in colorectal cancer tissues based on individualized differential methylation analysis.

Authors:  Haidan Yan; Jun He; Qingzhou Guan; Hao Cai; Lin Zhang; Weicheng Zheng; Lishuang Qi; Suyun Zhang; Huaping Liu; Hongdong Li; Wenyuan Zhao; Sheng Yang; Zheng Guo
Journal:  Oncotarget       Date:  2017-07-18

8.  Robust transcriptional tumor signatures applicable to both formalin-fixed paraffin-embedded and fresh-frozen samples.

Authors:  Rou Chen; Qingzhou Guan; Jun Cheng; Jun He; Huaping Liu; Hao Cai; Guini Hong; Jiahui Zhang; Na Li; Lu Ao; Zheng Guo
Journal:  Oncotarget       Date:  2017-01-24

9.  Individualized predictive signatures for 5-fluorouracil-based chemotherapy in right- and left-sided colon cancer.

Authors:  Kai Song; Wenyuan Zhao; Wen Wang; Na Zhang; Kai Wang; Zhiqiang Chang
Journal:  Cancer Sci       Date:  2018-05-23       Impact factor: 6.716

10.  Quantitative or qualitative transcriptional diagnostic signatures? A case study for colorectal cancer.

Authors:  Qingzhou Guan; Haidan Yan; Yanhua Chen; Baotong Zheng; Hao Cai; Jun He; Kai Song; You Guo; Lu Ao; Huaping Liu; Wenyuan Zhao; Xianlong Wang; Zheng Guo
Journal:  BMC Genomics       Date:  2018-01-29       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.