Literature DB >> 22768051

A bioinformatics filtering strategy for identifying radiation response biomarker candidates.

Jung Hun Oh1, Harry P Wong, Xiaowei Wang, Joseph O Deasy.   

Abstract

The number of biomarker candidates is often much larger than the number of clinical patient data points available, which motivates the use of a rational candidate variable filtering methodology. The goal of this paper is to apply such a bioinformatics filtering process to isolate a modest number (<10) of key interacting genes and their associated single nucleotide polymorphisms involved in radiation response, and to ultimately serve as a basis for using clinical datasets to identify new biomarkers. In step 1, we surveyed the literature on genetic and protein correlates to radiation response, in vivo or in vitro, across cellular, animal, and human studies. In step 2, we analyzed two publicly available microarray datasets and identified genes in which mRNA expression changed in response to radiation. Combining results from Step 1 and Step 2, we identified 20 genes that were common to all three sources. As a final step, a curated database of protein interactions was used to generate the most statistically reliable protein interaction network among any subset of the 20 genes resulting from Steps 1 and 2, resulting in identification of a small, tightly interacting network with 7 out of 20 input genes. We further ranked the genes in terms of likely importance, based on their location within the network using a graph-based scoring function. The resulting core interacting network provides an attractive set of genes likely to be important to radiation response.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22768051      PMCID: PMC3387230          DOI: 10.1371/journal.pone.0038870

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

In the ‘omics’ era, the number of biomarker candidates potentially available for statistical testing is often much larger than the number of patient data points. This presents a fundamental problem in biomarker research: the number of candidate genetic or epigenetic markers often overwhelms the inherent statistical power available in a clinical dataset, which usually has tens or hundreds of patient cases available rather than thousands. This statistical mismatch is typically becoming worse as more of the intracellular complexity of molecular machinery is identified. At one extreme, a genome-wide association study (GWAS) examining the correlations of millions of tag single-nucleotide polymorphisms (SNPs) to cancer treatment outcome may require a very high, and biologically unlikely, odds ratio given the number of multiple comparisons, to reach statistical significance. At the other extreme, it is clear that investigators cannot a priori identify the most important biomarker genes or SNPs for testing. These unsatisfying extreme cases motivated our search for a middle strategy that would objectively identify a modest number of promising SNPs/proteins, etc. as a cohort for testing against a given dataset. Because clinical datasets for a given endpoint are commonly of modest size (tens or hundreds, not thousands, of patients), we searched for key protein interaction networks that result in less than approximately a hundred candidate SNPs. Our methodology, of course, could be adopted to throw a wider net if much larger datasets become available. Our endpoint of interest is late toxicity following radiation therapy for cancer. Many cancer patients who receive radiation therapy suffer from acute or late side effects; the risk for experiencing these side effects is expected to have a genetic component [1]. Numerous genes participate in a cascade of events in response to radiation and the resulting DNA damage in a complex signal transduction network [2]. Recently, many studies have focused on finding radio-responsive genes at the whole genome level with gene expression microarrays. Rieger and Chu used oligonucleotide microarrays to develop a genome-wide portrait of transcriptional response to ionizing radiation (IR) and ultraviolet (UV) radiation in cell lines collected from 15 healthy individuals [3]. In another study [1] using samples extracted from cancer patients with acute radiation toxicity, Rieger et al. showed that toxicity after radiation therapy (radiotherapy) could be associated with abnormal transcriptional responses to DNA. Jen and Cheung [2] assessed transcriptional levels of genes in lymphoblastoid cells at various time points with 3 Gy and 10 Gy of ex vivo IR exposure. Following 10 Gy of IR exposure, more genes were induced, suggesting that a higher radiation dose causes a more complex response. A high percentage of significant genes were involved in cell cycle, cell death, DNA repair, DNA metabolism, and RNA processing. Eschrich et al. [4] analyzed microarray gene expression data derived from 48 human cancer cell lines and generated an interaction network using MetaCore software (GeneGo, Encinitas, CA) with the top 500 genes identified by linear regression analysis. Subsequently, based on 10 hub genes obtained from the network, they modeled radiosensitivity (survival fraction at 2 Gy) using a linear regression method. Normal tissue toxicity after radiotherapy may partially be attributable to specific genetic mutations. In an effort to identify candidate polymorphisms at the SNP level involved in the cellular response to irradiation in breast and prostate cancers, Popanda et al. [5] surveyed many published studies that show associations of SNPs in candidate genes with acute or late side effects of radiotherapy. Andreassen and Alsner [6] summarized studies published on genetic variation in normal tissue toxicity and proposed a model of allelic architecture that illustrates relative risk for genetic variants associated with normal tissue radiosensitivity. In this study, we attempted to define an objective method for identifying key radiosensitivity genes likely to have a significant impact on clinical outcome following radiotherapy. We elected to construct a staged filter. The first step was a comprehensive literature review of radiosensitivity-related genes. These genes were then further delimited to genes responding to IR in an analysis of publicly available microarray gene expression datasets. We further focused the search on interacting networks, based on the hypothesis that good biomarkers are likely to be embedded in important pathways or networks involving multiple genes known to be important to the endpoint in question [7]. This last step may potentially add new, previously unreported targets, based on curated pathway libraries.

Materials and Methods

In summary, we used a multi-component filtering process: (1) genes associated with radiation response in the literature and (2) genes associated with radiation response in two microarray mRNA datasets. Overlapping genes from these three sources were fed into a curated protein interaction network system (MetaCore) to identify key interacting networks. The most important network was taken as our target set.

Literature Review of Radiosensitivity-related Genes

We attempted a complete literature review of all genes implicated in radiation response. Published papers were searched by using PubMed and Scopus search engines in 2010 and by following citations within the identified papers. The search strategy was based on a combination of the following search keywords: “SNPs, polymorphisms, or microsatellites” and “irradiation, radiation, or radiotherapy” and “morbidity, radiosensitivity, normal tissue, toxicity, or complications” and “siRNA, knockdown, or knockout”. Papers referred to in the original search returns, or referring to the original papers at a later date were also reviewed. This resulted in an in-depth review of around 200 published papers, and a list of 221 genes implicated in radiation response.

Microarray Gene Expression Datasets

To identify significant radio-responsive genes based on microarray gene expression profiling, we searched for all relevant, publically available microarray datasets, resulting in locating two datasets. We analyzed GSE1977 and GSE23393, downloaded from the publicly available Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). In GSE1977, lymphoblastoid cell lines obtained from 15 healthy individuals were established by immortalization of peripheral blood B-lymphocytes [3]. The response of numerous genes was measured by mock treatment, UV, and X-ray exposures. Cells were exposed to 5 Gy radiation doses and harvested for RNA 4 hours later. In our work, the differential between mock and X-ray cases was used. In contrast, in GSE23393 [8], blood was gathered from eight radiotherapy patients (at our institution): eight samples were collected immediately before irradiation and another eight samples were collected at 4 hours after total body irradiation with 1.25 Gy X-rays.

Preprocessing for Identification of Significant Genes

Before the microarray datasets were analyzed, gene expression values were log-base-2 transformed, followed by quantile normalization across all samples [9]. Microarray gene expression values from two different conditions (before and after exposure) were compared using a two-tailed t-test to identify differentially expressed genes (radio-responsive genes). To estimate the likelihood of identifying significant genes by chance, we computed permutation-based p-values using 10,000 permutations. Then, using Storey’s method, the false discovery rate (FDR) and q-value for each gene were calculated [10]. Significance Analysis of Microarrays (SAM) and t-test are widely used for indentifying differentially expressed genes in the analysis of microarray data [11]. We chose a permutation t-test with an assumption that the permutation t-test and SAM could yield a set of similar significant genes, as recommended by Chen et al. [11]. In this analysis, we did not use a fold change cutoff in order to avoid losing some important genes, a problem described by Larsson et al. [12].

Pathway and Process Analysis

Significant genes were identified both in the literature review and the analysis of two microarray datasets (GSE1977 and GSE23393). These genes were then entered into a manually curated pathway analysis database (MetaCore™, GeneGo, Inc., Carlsbad, California). The commercial pathway analysis system, MetaCore, computes p-values for overrepresented pathways and processes. MetaCore is based on a comprehensive manually curated attempt to capture protein interactions as networks. We used MetaCore to attempt to find the most probable interaction pathways among a set of genes uploaded by the user. Several algorithms are available to do this; we used the “Analyze network” option. If necessary, MetaCore adds appropriate genes to complete a network.

Gene Ontology Analysis

A further analysis of the resulting significant genes was performed using the Gene Ontology (GO) database (www.geneontology.org), in which genes are annotated with known molecular functions, biological processes, and cellular component locations.

Gene Ranking

In our previous work to identify blood-based protein biomarkers to predict radiation-induced pneumonitis [7], we proposed a graph-based scoring function to rank proteins in a protein–protein interaction network. The network consisted of candidate proteins we identified in mass spectrometry analysis and four previously identified (‘regularization’) biomarker proteins. Using the proposed method, we attempted to measure a ‘functional distance’ between each candidate protein and the four regularization proteins, based on the hypothesis that some proteins relevant to a specific disease exist in close proximity, in a network sense. In the current study, we modified that algorithm such that within a given protein–protein interaction network for a biological process, we estimate the functional distance between each protein and all the remaining proteins in the network, since all the proteins in the network are more likely to be related to one another and act together in the biological process. To rank biomarkers, we modelled each protein–protein interaction network as a directed graph, G = (V, E), where V consists of a set of nodes (proteins) and E is the set of possible edges (protein–protein interactions) between pairs of nodes. Let A and B be two proteins in a network. We assume that there are two concepts of distance between A and B: a geometrical distance that is defined in terms of the number of nodes in the shortest path between A and B, as well as a virtual distance that is defined in terms of the number of publications that verify the interactions along the shortest path. Intuitively, as the number of intermediate nodes between A and B increases, the geometrical distance increases and the two proteins are less likely to be correlated. In contrast, considering virtual distance, we expect that as the number of references demonstrating a relationship between two proteins increases, they are more likely to be related. In other words, the number of references is proportional to relatedness while the number of nodes is inversely proportional. Using a power law, we calculate two scores from A to B: a reference score (rs) and a node score (ns) as follows: where r and n are the total number of references and nodes in the shortest path from A to B. We suppose that the influence of the number of nodes is greater than that of the number of references. Therefore, as the number of intermediate nodes between any two given nodes increases, the relationship between the two nodes becomes much less likely. The score capturing the path from A to B is defined as the summation of two different scores:Likewise, we also estimate a score from B to A, Then, the final score, between A and B, is defined as the maximal value among and : We suppose that the final score of a protein is computed by the summation of all scores between the protein and all the remaining proteins in the network. Hence, the final score of a protein A is defined by: To estimate the number of references and nodes, we employed two methods. For the number of references, we used a function in the MetaCore software that provides the number of references between two connected proteins in a network. For the number of nodes, we used the Floyd-Warshall algorithm that was originally designed to find the shortest paths between all pairs of nodes based on dynamic programming [13]. To apply this algorithm to our problem of estimating the number of nodes, we modified the original Floyd-Warshall algorithm such that an equal weight of 1 was assigned to all connected edges in a network. As a result, the modified algorithm generated a matrix that represents the number of nodes on the all-pairs shortest-paths in a given protein–protein interaction network.

Results

Identification of Significant Biomarkers via Literature Review

Based on the literature review, several types of biomarkers, including genes, proteins, kinases, ligands, and protein complexes were identified. To unify the biomarker terms differently used across studies, we converted all the biomarkers into their corresponding gene symbols. As a result, 221 unique genes and 4 protein complexes (DNA-PK, HSP70, MRN(95), RAS) were identified from around 200 papers that studied radiation response-related biomarkers [4], [14]–[185]. displays the 221 unique genes and their corresponding GO processes, including DNA repair, cell proliferation/cycle, apoptosis, RNA processing, and response to stress. It is well known that ionizing radiation causes DNA damage that activates the p53 pathway through ATM [186]. Genes that are involved in cell cycle, such as CDKN1A, GADD45A, MDM2, and CCNG1, are known to be dependent on p53 [2]. Also, other cell cycle-related genes including CCNB1 and CDC20 were identified. Among cell cycle or proliferation genes, TOB1, BTG2, and CDKN1A are anti-proliferative/check-point related [3]. Several genes (XPC, DDB2, PCNA, ERCC4, and NBN) are involved in DNA repair. Two major pathways to repair IR-induced DNA double-strand breaks are homologous recombination (HR; genes include XRCC2, XRCC3, MRE11A, RAD50, NBN, BRCA1, and BRCA2) and non-homologous end joining (NHEJ; genes include LIG4, XRCC4, XRCC5, XRCC6, and DNA-PK) [3]. Some genes, including FAS, BBC3, and TNF, are involved in apoptosis [187]. BCL2 and DDR1 are anti-apoptotic.
Table 1

Radio-responsive biomarkers identified by literature review and their biological processes.

Gene SymbolEntrez Gene IDDNA repairCell proliferationCell cycleApoptosisResponse to stressReference
ABCA119 [14]
ABL125vvvv [4]
ACTA259 [15]
AEN64782vv [16]
AKR1B1231v [17]
AKT1207vvv [18]
ALAD210 [19], [20]
ANXA1301vvvv [21]
APEX1328vv [22][26]
APOE348vvv [27]
AR367v [4]
ATF3467v [28]
ATM472vvvvv [25], [26], [29][43]
BAD572vv [44]
BAK1578v [45]
BAX581v [37], [44][47]
BAZ1B9031vv [17]
BBC327113v [16], [48][50]
BCL2596vvvv [37], [44], [46], [51], [52]
BCL2L1598vvv [51]
BIRC5332vv [53]
BRCA1672vvvvv [54][59]
BRCA2675vvvvv [54][57], [59], [60]
BTG27832vvvv [61]
CAT847vv [62]
CAV1857vv [63]
CCNB1891vvv [48]
CCND1595vvv [46]
CCNE1898v [61]
CCNG1900vvv [49]
CD24100133941vvv [21]
CD40958vv [64]
CD68968 [20]
CD69969 [16]
CD70970vv [65]
CD839308v [21]
CDC20991vv [66]
CDC6990vv [61]
CDH21000 [21]
CDK1983vvvv [4], [61]
CDK21017vv [61]
CDKN1A1026vvvv [15], [16], [37], [47][50], [61], [67], [68]
CDKN2A1029vvvv [69]
CHEK11111vvvv [70], [71]
CLIC11192 [72]
CRYAB1410vv [21]
CSNK2A21459v [73]
CXCR47852vv [15]
CYP2D61565 [74]
DCN1634v [21]
DDB21643vv [15], [21]
DDR1780 [15]
DDR24921v [15]
DDX1710521 [17]
DRAP110589 [17]
DUSP81850 [16]
EGFR1956vvvv [46]
EGR11958v [16]
EGR41961v [16]
EI249538v [47]
EIF2AK39451vv [75]
EPDR154749 [20]
ERBB22064vvv [46], [76][78]
ERCC12067vv [52]
ERCC22068vvvvv [36], [74]
ERCC42072vvv [52], [79]
ERCC52073vvv [52]
FAS355v [47], [80]
FASLG356vvv [80]
FDXR2232 [49], [50], [68]
FGF12246vv [81]
FGF22247vvvv [81]
GADD45A1647vvvv [15], [16], [48], [49], [61], [67]
GBP12633 [21]
GDF159518 [15], [82]
GFER2671v [83]
GRAP10750 [16]
GSTA12938 [62]
GSTM12944 [62], [84]
GSTP12950v [36], [62], [85], [86]
GSTT12952 [62], [84]
H2AFX3014vvv [41], [87], [88]
HDAC13065vvv [4], [89]
HERC28924vvv [90]
HSP90AB13326v [91]
HSP90B17184vv [82]
HSPB13315vv [91], [92]
HUS13364vvv [93]
ICAM23384 [94]
ID33399vvv [20]
IER551278 [95]
IFNG3458vvvv [16]
IGF1R3480vv [46], [96]
IGFBP33486vv [21]
IL12RB23595v [14]
IL17A3605vv [97]
ILK3611vvv [98]
IRF13659 [4]
JUN3725vvvv [4], [16]
KRAS3845 [99]
LIG13978vvv [20]
LIG33980vvv [19], [20]
LIG43981vvvvv [35], [60], [74], [100]
LOX4015v [21]
LSM751690 [17]
MAD2L210459v [19]
MAP3K76885vv [19], [20]
MC1R4157vvv [101]
MCL14170v [102]
MCM24171v [61]
MDC19656vvv [35]
MDM24193vvv [16], [49], [52], [103]
MGMT4255vvv [20]
MLH14292vvvv [46], [74]
MMP24313v [104]
MMP94318v [105]
MPO4353vv [62], [106]
MR13140 [15]
MRE11A4361vvv [29]
MRPL236150 [17]
MSH24436vvvv [46]
MTHFR4524 [107]
MTOR2475vv [45]
MYC4609vvv [108]
NBN4683vvvv [29], [109]
NEIL179661vv [110]
NEK24751v [111]
NFKB14790vv [112]
NNMT4837 [91]
NONO4841vv [113]
NOS34846vvv [62], [106]
NOX450507vv [114]
NUDT14521vv [17]
OGG14968vv [115]
PAH5053 [20]
PAK656924 [116]
PARP1142vv [70], [117]
PCNA5111vvv [16], [118]
PER38863 [20]
PHLPP223035 [119]
PHPT129085 [50]
PIK3CA5290v [120]
PIM211040vvv [52]
PLK210769v [16]
PLK31263 [61]
PMS25395vvv [46]
POLB5423vvv [121]
POLQ10721vv [60]
PPA15464 [111], [119]
PPM1D8493vv [15], [122]
PRDX15052vvv [116]
PRDX410549 [123]
PRKCB5579v [4]
PRKCZ5590vv [52]
PRKDC5591vvv [124][126]
PROCR10544v [21]
PROM18842 [127]
PSMB45692vv [17]
PSMD15707v [17]
PTCH15727v [128]
PTEN5728vvv [129]
PTGS25743vvvv [46]
PTTG19232vvv [19], [130]
RAD215885vvvv [30], [31], [43], [131]
RAD23B5887vv [17]
RAD5010111vvv [29], [132]
RAD515888vvv [133]
RAD54L8438vvv [134]
RAD9A5883vvvv [19], [52]
RALBP110928 [135], [136]
RELA5970vvv [4], [112]
RND127289 [16]
RRM26241 [137]
RRM2B50484vvv [138]
S100A116282v [15]
SAG6295 [139]
SART19092vv [20]
SEC22B9554 [17]
SEPHS122929 [140]
SERPINA312v [20]
SERPINE15054vvv [141]
SESN127244vvv [49], [50]
SIRT123411vvvv [142]
SMPD16609v [81]
SOD16647vvv [143]
SOD26648vvv [25], [26], [30], [31], [36], [62], [144], [145]
SRC6714 [146]
SRF6722v [17]
STAT16772vvv [4], [147]
STAT36774v [148], [149]
SUMO17341vv [4]
TGFB17040vvvv [25], [26], [30], [36], [43], [145], [150][154]
TNF7124vvvv [155]
TNFRSF10B8795v [47]
TNFRSF1A7132vv [112]
TNFSF108743v [156]
TNFSF98744vv [16]
TOB110140v [157]
TOP2A7153vvv [158]
TOR1AIP126092 [16]
TP537157vvvvv [37], [41], [46], [48]
TP638626vvvv [159]
TPP27174 [160]
TRAF27186vv [161]
TRAF49618vv [16]
TXN7295v [162]
TXNRD17296v [163]
UBB7314vv [17]
UHRF129128vvvv [164]
UIMC151720vvv [165]
VEGFA7422vvv [46], [166]
WRN7486vvv [110]
WT17490vv [167]
XIAP331vv [49], [168], [169]
XPC7508vvv [124], [170]
XRCC17515vv[22,23,25,26,30,31,36,43,115,
144,145,150,154,171–175]
XRCC27516vvvv [109]
XRCC37517vv[25,26,30,31,74,109,115,134,
144,145,172,174–176]
XRCC47518vvvv [164]
XRCC57520vvvv [52], [60], [172], [177][179]
XRCC62547vv [20], [176], [178], [180], [181]
DNA-PK [52], [182]
HSP70 [183]
MRN(95) [184]
RAS [185]
For biological process and pathway analysis, the 221 unique genes were uploaded into the MetaCore. illustrates a direct interaction network generated with these genes. As shown, numerous genes are strongly connected to one another, suggesting that interacting genes are more likely to play related roles. shows the top ten GeneGo pathways, GeneGo processes, and GO processes. As can be seen in the table, the most highly ranked pathways and processes are associated with DNA damage and repair, cell cycle, and apoptosis.
Figure 1

Direct protein-protein interaction network.

A network representation that illustrates the complexity of direct connections among genes identified via literature review.

Table 2

The top ten GeneGo pathways/processes and GO processes resulting from genes identified via literature review.

RankingGeneGo Pathways
1DNA damage_Role of Brca1 and Brca2 in DNA repair
2DNA damage_ATM/ATR regulation of G1/S checkpoint
3DNA damage_NHEJ mechanisms of DSBs repair
4DNA damage_Brca1 as a transcription regulator
5Signal transduction_AKT signaling
6Some pathways of EMT in cancer cells
7Apoptosis and survival_Ceramides signaling pathway
8Signal transduction_PTEN pathway
9Transcription_P53 signaling pathway
10DNA damage_ATM/ATR regulation of G2/M checkpoint
Ranking GeneGo Processes
1DNA damage_Checkpoint
2DNA damage_DBS repair
3Cell cycle_G1-S Growth factor regulation
4DNA damage_BER-NER repair
5Cell cycle_Meiosis
6DNA damage_Core
7Apoptosis_Apoptotic nucleus
8Cell cycle_G1-S Interleukin regulation
9Development_EMT_Regulation of epithelial-to-mesenchymal transition
10Cell cycle_S phase
Ranking GO Processes
1Cellular response to stimulus
2Cellular response to stress
3Response to stress
4Regulation of programmed cell death
5Regulation of cell death
6Regulation of apoptosis
7Response to DNA damage stimulus
8Response to stimulus
9DNA repair
10Response to organic substance

Direct protein-protein interaction network.

A network representation that illustrates the complexity of direct connections among genes identified via literature review.

Identification of Significant Genes via Microarray Dataset Analysis

To identify significant changes in gene expression values between the two groups (before and after irradiation) in two microarray datasets, a t-test with 10,000 permutations was performed. To estimate p-values, we counted the number of permutations for each gene whose t-scores are greater than or equal to the t-score calculated with observed values. Then, the number of permutations passed the criterion was divided by the total number of permutations [188]. With an FDR of 20%, 631 probes (corresponding to 550 unique genes) were significantly identified for GSE1977. shows a normal quantile plot of t-scores for GSE1977. Data points of genes that are farther away from the black diagonal line are considered to be differentially expressed. displays a volcano plot that depicts the –log10 of q-values against log2 of fold changes for all genes. The majority of genes with an FDR of 20% changed 1.2-fold or higher. For GSE23393, with an FDR of 20%, 224 probes (corresponding to 184 unique genes) were identified ( and ).
Figure 2

A normal quantile plot of t-scores for GSE1977.

Significant genes have red circles.

Figure 3

Significant gene detection.

A volcano plot that depicts the –log10 of q-values against log2 of fold changes for all genes in GSE1977.

A normal quantile plot of t-scores for GSE1977.

Significant genes have red circles.

Significant gene detection.

A volcano plot that depicts the –log10 of q-values against log2 of fold changes for all genes in GSE1977.

Overlapping Genes

To delimit our potential biomarker set, we investigated which genes are commonly or uniquely found among the set of genes identified by our literature review and two sets of genes identified in the analysis of the two gene microarray datasets, as summarized in . The rationale is that those are genes likely to be key to an active response, but unlikely to be false positives due to the literature review. Twenty genes were commonly identified among the three different analyses (literature review and two microarray datasets), as shown in . We further analyzed pathways and biological processes associated with the 20 genes. shows the top ten GeneGo pathways generated by the MetaCore software (). Not surprisingly, even with the 20 genes, DNA damage/repair and apoptosis-related pathways were highly ranked.
Figure 4

Comparison of significant genes among three sources.

A Venn diagram depicting the number of shared and unique genes among a set of genes identified by literature review and two sets of genes identified in the analysis of two gene microarray datasets.

Table 3

Twenty genes commonly identified by literature review and analysis of two microarray datasets.

Gene SymbolEntrez IDGene Name
ACTA259actin, alpha 2, smooth muscle, aorta
BAX581BCL2-associated X protein
BBC327113BCL2 binding component 3
BTG27832BTG family, member 2
CCNG1900cyclin G1
CD70970CD70 molecule
CDKN1A1026cyclin-dependent kinase inhibitor 1A (p21, Cip1)
DDB21643damage-specific DNA binding protein 2, 48 kDa
EI249538etoposide induced 2.4 mRNA
FDXR2232ferredoxin reductase
GADD45A1647growth arrest and DNA-damage-inducible, alpha
MDM24193Mdm2 p53 binding protein homolog (mouse)
MR13140major histocompatibility complex, class I-related
MYC4609v-myc myelocytomatosis viral oncogene homolog (avian)
PCNA5111proliferating cell nuclear antigen
PLK210769polo-like kinase 2
PLK31263polo-like kinase 3
PPM1D8493protein phosphatase, Mg2+/Mn2+ dependent, 1D
TNFRSF10B8795tumor necrosis factor receptor superfamily, member 10b
XPC7508xeroderma pigmentosum, complementation group C
Table 4

The top ten GeneGo pathways generated by MetaCore when the 20 overlapping genes were used.

#GeneGo Pathways
1DNA damage_Brca1 as a transcription regulator
2DNA damage_ATM/ATR regulation of G1/S checkpoint
3Signal transduction_AKT signaling
4Apoptosis and survival_Apoptotic TNF-family pathway
5DNA damage_ATM/ATR regulation of G2/M checkpoint
6Apoptosis and survival_p53-dependent apoptosis
7DNA damage_Role of Brca1 and Brca2 in DNA repair
8DNA damage_Nucleotide excision repair
9Transcription_P53 signaling pathway
10Cytoskeleton remodeling_TGF, WNT and cytoskeletal remodeling

Comparison of significant genes among three sources.

A Venn diagram depicting the number of shared and unique genes among a set of genes identified by literature review and two sets of genes identified in the analysis of two gene microarray datasets.

The most probable interaction network when 20 genes were entered into MetaCore software.

The resulting interacting network uses only 7 genes. Red, green, and gray lines indicate inhibitory, stimulatory, and unspecified interactions, respectively.

Gene Ranking and Identification of a Core Radio-response Network

shows the most probable/robust single interaction network when the 20 overlapping genes were entered into the MetaCore software. Of the 20 input genes, seven genes appeared in this core radio-response network. We applied our graph-based scoring function to this network and the results are summarized in . MYC was ranked first with a score of 113.74, which had a high p-value in GSE23393 and a statistically significant p-value, yet still relatively high compared to other genes, in GSE1977. As a hub gene, MYC had the highest number of edges (n = 12) that seem to contribute to the score. Overall p-values in GSE23393 (in situ IR) are higher than those of GSE1977 (ex vivo IR). Intuitively, as the number of edges increases, the score seems to increase. However, it should be noted that although GADD45A has 9 edges, it obtained a higher score than PPM1D, which has 11 edges. This is attributed to the fact that when we calculate the score for a gene, our scoring function takes into account all network interactions and the number of references on the interactions in the network. Interestingly, CDKN1A obtained a relatively high score of 100.13, considering only 3 edges and substantially low p-values (0.00027 in GSE1977 and 0.00367 in GSE23393).
Figure 5

The most probable interaction network when 20 genes were entered into MetaCore software.

The resulting interacting network uses only 7 genes. Red, green, and gray lines indicate inhibitory, stimulatory, and unspecified interactions, respectively.

Table 5

The results of the proposed scoring function test applied to the network in .

RankingProteinGene symbolScoreGSE1977 p-valueGSE23393 p-valueNo. of edges
1c-MycMYC113.740.024200.1489812
2GADD45 alphaGADD45A110.348.63E-060.001729
3WIP1PPM1D108.160.000440.0031011
4PUMABBC3102.700.071710.010196
5p21CDKN1A100.130.000270.003673
6PLK3 (CNK)PLK399.700.002850.080724
7XPCXPC85.620.000680.033302

Discussion

We have demonstrated an unbiased bioinformatics filtering methodology to objectively identify a core network of key interacting genes that are important to radiation response. We hypothesized that, by combining several different types of datasets, we are increasingly likely to identify interacting genes that are particularly important to radiation response. We also hypothesize that these genes are therefore attractive candidates for biomarker testing. For example, the 7 key genes contain 89 relevant SNPs in our radiation therapy cancer dataset and we are in the process of testing late toxicity with the dataset. We make no claim that the network shown in dominates radiation response and do not expect that to be the case. Nevertheless, this network seems to be highly relevant to radiation response: among the 7 genes, 5 and 4 genes are involved in cell cycle control and apoptosis, respectively. More detailed information is shown in . Five of these genes, including MYC, BBC3, GADD45A, CDKN1A, and XPC belong to a list of 34 radio-responsive genes observed by Tusher et al. [187]. Moreover, this network is consistent with (though slightly different from) the programmed cell death network reported by Moussay et al. [189]. shows the number of genes commonly or uniquely identified among three different studies (literature review and analysis of two microarray datasets). Interestingly, relatively few genes overlapped among the three analyses. Literature coverage is expected to be incomplete regarding coverage of radiosensitivity genes. Microarray analysis is subject to high false positive and false negative rates [190]. Another possible reason for the small number of overlapped genes is the widely differing irradiation conditions and doses. Despite this, the biological processes and pathways generated from the 20 overlapping genes were similar to those generated from the whole literature review. We further analyzed the 20 genes, uploading these genes into the MetaCore software. In the network of the most probable biological process shown in , only seven out of 20 genes appeared in the network. Additional genes were automatically added to the network by MetaCore, including AKT1, RELA, BCL2L1, PTEN, CDK1, and XIAP. Note, however, that these genes were also members of the list generated by our radiation response literature review, suggesting some consistency between these sources. This also suggests a potential ability to find novel biomarker candidates through the network mapping/ranking process, though that did not occur in this case. The graph-based scoring function proposed in our previous study [7] was modified and applied to the network shown in . In some studies, researchers tend to regard genes with high degrees of connectivity (hub genes) as significant in an interaction network, while neglecting others [4]. While this is rational, finding hub genes based on edge connectivity considers only direct interactions between genes whereas our proposed approach takes into account all interactions in a network (that is, the entire graph structure) and the number of published references on the interactions. To measure the closeness between two proteins (say A and B), we employed two scores; a node score and a reference score. In a protein interaction network, it is obvious that as the number of internal nodes between A and B increases, these two proteins are less likely to be related with each other. In contrast, the reference score is a score calculated using the number of papers that studied on an interaction between two proteins, which can be important evidence that there is an actual relationship between the two proteins. As can be seen in , MYC was first ranked using a total score. However, BBC3 and PPM1D were first ranked using a reference score and a node score, respectively. CDKN1A, PLK3, and XPC obtained somewhat high scores considering their connectivity, suggesting that they could play important roles in this core network. We believe that the use of both scores could be more effective for ranking proteins in a protein interaction network. Future work will test SNPs identified in this network against toxicity resulting from radiation therapy. As the number of patients available for SNP analyses increases, it may be rational to expand the number of candidate SNPs to several hundreds or more. The general methodology may be applied in many genetic/protein biomarker studies with limited patient data. A normal quantile plot of t-scores for GSE23393 after 10,000 permutations. (TIF) Click here for additional data file. Significant gene detection. A volcano plot that depicts the –log10 of q-values against log2 of fold changes for all genes in GSE23393. (TIF) Click here for additional data file. The top ten GeneGo pathways/processes and GO processes generated by the MetaCore software when 20 overlapped genes were used. (DOC) Click here for additional data file. Biological processes for the seven genes shown in . (DOC) Click here for additional data file. Scores obtained using the graph-based scoring function. (DOC) Click here for additional data file.
  189 in total

1.  Transcriptional response of lymphoblastoid cells to ionizing radiation.

Authors:  Kuang-Yu Jen; Vivian G Cheung
Journal:  Genome Res       Date:  2003-08-12       Impact factor: 9.043

2.  Large-scale detection of ubiquitination substrates using cell extracts and protein microarrays.

Authors:  Yifat Merbl; Marc W Kirschner
Journal:  Proc Natl Acad Sci U S A       Date:  2009-01-30       Impact factor: 11.205

3.  [Effect of protein kinase CK2 gene silencing on radiosensitization in human nasopharyngeal carcinoma cells].

Authors:  Li Liu; Jin-jin Zou; He-san Luo; De-hua Wu
Journal:  Nan Fang Yi Ke Da Xue Xue Bao       Date:  2009-08

4.  TGFB1 single-nucleotide polymorphisms are associated with adverse quality of life in prostate cancer patients treated with radiotherapy. In regard to Peters et al. (Int J Radiat Oncol Biol Phys 2008;70:752-759).

Authors:  Tanja Langsenlehner; Karin S Kapp; Uwe Langsenlehner
Journal:  Int J Radiat Oncol Biol Phys       Date:  2008-07-01       Impact factor: 7.038

5.  APE1 and XRCC1 protein expression levels predict cancer-specific survival following radical radiotherapy in bladder cancer.

Authors:  Sei C Sak; Patricia Harnden; Colin F Johnston; Alan B Paul; Anne E Kiltie
Journal:  Clin Cancer Res       Date:  2005-09-01       Impact factor: 12.531

6.  Poly(ADP-ribose) polymerase, a major determinant of early cell response to ionizing radiation.

Authors:  M Fernet; V Ponette; E Deniaud-Alexandre; J Ménissier-De Murcia; G De Murcia; N Giocanti; F Megnin-Chanet; V Favaudon
Journal:  Int J Radiat Biol       Date:  2000-12       Impact factor: 2.694

7.  Pancreatic cancer cell radiation survival and prenyltransferase inhibition: the role of K-Ras.

Authors:  Thomas B Brunner; Keith A Cengel; Stephen M Hahn; Junmin Wu; Douglas L Fraker; W Gillies McKenna; Eric J Bernhard
Journal:  Cancer Res       Date:  2005-09-15       Impact factor: 12.701

8.  Radiosensitization by inhibiting STAT1 in renal cell carcinoma.

Authors:  Zhouguang Hui; Maria Tretiakova; Zhongfa Zhang; Yan Li; Xiaozhen Wang; Julie Xiaohong Zhu; Yuanhong Gao; Weiyuan Mai; Kyle Furge; Chao-Nan Qian; Robert Amato; E Brian Butler; Bin Tean Teh; Bin S Teh
Journal:  Int J Radiat Oncol Biol Phys       Date:  2009-01-01       Impact factor: 7.038

9.  Adenovirus-mediated transfer of siRNA against peroxiredoxin I enhances the radiosensitivity of human intestinal cancer.

Authors:  Bo Zhang; Yan Wang; Kaiyuan Liu; Xaoya Yang; Min Song; Yanyan Wang; Yun Bai
Journal:  Biochem Pharmacol       Date:  2007-09-21       Impact factor: 5.858

10.  Induction of PPM1D following DNA-damaging treatments through a conserved p53 response element coincides with a shift in the use of transcription initiation sites.

Authors:  Matteo Rossi; Oleg N Demidov; Carl W Anderson; Ettore Appella; Sharlyn J Mazur
Journal:  Nucleic Acids Res       Date:  2008-11-10       Impact factor: 16.971

View more
  6 in total

1.  A literature mining-based approach for identification of cellular pathways associated with chemoresistance in cancer.

Authors:  Jung Hun Oh; Joseph O Deasy
Journal:  Brief Bioinform       Date:  2015-07-27       Impact factor: 11.622

2.  Transcriptional Responses to Ultraviolet and Ionizing Radiation: An Approach Based on Graph Curvature.

Authors:  Yongxin Chen; Jung Hun Oh; Romeil Sandhu; Sangkyu Lee; Joseph O Deasy; Allen Tannenbaum
Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)       Date:  2017-01-19

Review 3.  Radiogenomics and radiotherapy response modeling.

Authors:  Issam El Naqa; Sarah L Kerns; James Coates; Yi Luo; Corey Speers; Catharine M L West; Barry S Rosenstein; Randall K Ten Haken
Journal:  Phys Med Biol       Date:  2017-08-01       Impact factor: 3.609

4.  Computational methods using genome-wide association studies to predict radiotherapy complications and to identify correlative molecular processes.

Authors:  Jung Hun Oh; Sarah Kerns; Harry Ostrer; Simon N Powell; Barry Rosenstein; Joseph O Deasy
Journal:  Sci Rep       Date:  2017-02-24       Impact factor: 4.379

5.  Inference of radio-responsive gene regulatory networks using the graphical lasso algorithm.

Authors:  Jung Hun Oh; Joseph O Deasy
Journal:  BMC Bioinformatics       Date:  2014-05-28       Impact factor: 3.169

6.  Quantitative proteomic analysis of pancreatic cyst fluid proteins associated with malignancy in intraductal papillary mucinous neoplasms.

Authors:  Misol Do; Dohyun Han; Joseph Injae Wang; Hyunsoo Kim; Wooil Kwon; Youngmin Han; Jin-Young Jang; Youngsoo Kim
Journal:  Clin Proteomics       Date:  2018-04-18       Impact factor: 3.988

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.