Literature DB >> 27573188

Analysis of gene expression profile identifies potential biomarkers for atherosclerosis.

Luran Liu1, Yan Liu1, Chang Liu1, Zhuobo Zhang1, Yaojun Du1, Hao Zhao1.   

Abstract

The present study aimed to identify potential biomarkers for atherosclerosis via analysis of gene expression profiles. The microarray dataset no. GSE20129 was downloaded from the Gene Expression Omnibus database. A total of 118 samples from the peripheral blood of female patients was used, including 47 atherosclerotic and 71 non‑atherosclerotic patients. The differentially expressed genes (DEGs) in the atherosclerosis samples were identified using the Limma package. Gene ontology term and Kyoto Encyclopedia of Genes and Genomes pathway analyses for DEGs were performed using the Database for Annotation, Visualization and Integrated Discovery tool. The recursive feature elimination (RFE) algorithm was applied for feature selection via iterative classification, and support vector machine classifier was used for the validation of prediction accuracy. A total of 430 DEGs in the atherosclerosis samples were identified, including 149 up‑ and 281 downregulated genes. Subsequently, the RFE algorithm was used to identify 11 biomarkers, whose receiver operating characteristic curves had an area under curve of 0.92, indicating that the identified 11 biomarkers were representative. The present study indicated that APH1B, JAM3, FBLN2, CSAD and PSTPIP2 may have important roles in the progression of atherosclerosis in females and may be potential biomarkers for early diagnosis and prognosis as well as treatment targets for this disease.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27573188      PMCID: PMC5042771          DOI: 10.3892/mmr.2016.5650

Source DB:  PubMed          Journal:  Mol Med Rep        ISSN: 1791-2997            Impact factor:   2.952


Introduction

Atherosclerosis is a chronic inflammatory disease which mainly results from the abnormal accumulation of macrophages, white blood cells and lipids in the artery wall, and eventually form mature plaques (1). It causes various types of cardiovascular disease, which account for ~50% of all mortalities worldwide (2). To date, the underlying molecular pathogenesis of atherosclerosis has remained largely elusive. An understanding of the molecular basis of atherosclerotic processes is crucial for the development of potential therapeutic strategies. Atherosclerosis is characterized by endothelial dysfunction, vascular inflammation, plaque formation and diminished oxygen supply to target organs (3,4). Endothelial dysfunction is an initial step in atherosclerosis. Dysfunctional endothelial cells release a large variety of pro-inflammatory mediators, leading to the amplification of inflammatory response (5,6), and systemic inflammation may in turn contribute to endothelial dysfunction and accelerated atherosclerosis (7,8). Considering the crucial roles of inflammation in the pathogenesis of atherosclerosis, systemic markers of inflammation may be the most obvious candidates for potential biomarkers for early diagnosis, risk prediction and clinical prognosis of this disease (9). C-reactive protein has been implicated in multiple aspects of atherosclerosis and is currently the best validated inflammatory biomarker to predict the risk of atherosclerotic events (10,11). Fibrinogen, apolipoproteins and interleukins are associated with the progression of atherosclerosis and have a great relevance in risk prediction (12). Inflammatory molecules, including intercellular adhesion molecule-1 and vascular cell adhesion molecule-1, are also considered as biomarkers for risk prediction of atherosclerosis (11,13). Several studies have identified a large number of biomarkers with potential prognostic value for atherosclerosis; their overall clinical predictive value, however, is modest (14). Therefore, it is urgently required to identify a reliable, specific and cost-effective biomarker for facilitating the clinical diagnosis of atherosclerosis. The microarray dataset no. GSE20129 from the Gene Expression Omnibus database (GEO; http://www.ncbi.nlm.nih.gov/geo/) has been utilized to reveal variations in gene expression associated with coronary artery calcification between Caucasian and African Americans based on a multi-ethnic study of atherosclerosis (15); furthermore, the dataset has been used to reveal systemic transcriptional alterations of innate and adaptive immune signaling pathways in atherosclerosis (16). The present study used the microarray dataset GSE20129 to identify differentially expressed genes (DEGs) associated with atherosclerosis using comprehensive bioinformatics methods. In addition, the recursive feature elimination (RFE) algorithm was applied for feature selection via iterative classification, and the support vector machine (SVM) classifier was used for the validation of prediction accuracy. The present study aimed to identify potential biomarkers for atherosclerosis via analysis of gene expression profiles.

Materials and methods

Microarray data

The array dataset GSE20129, which was provided by Huang et al (15), was downloaded from the GEO database. A total of 118 samples from the peripheral blood of females, including 47 atherosclerotic and 71 non-atherosclerotic patients, was used for expression profiling. The raw CEL data and annotation files were obtained based on the Illumina human Ref-8 v2.0 expression beadchip platform (Illumina Inc., San Diego, CA, USA) for subsequent analyses.

Data processing and screening of DEGs

All probe IDs of the dataset were transformed into gene symbols based on probe annotation files. If multiple probes corresponded to the same gene symbol, the mean value was calculated as the expression value of this gene. Z-score normalization of expression values was preformed and then the expression value (X) of each sample was corrected. The corrected expression value (X′) was calculated as follows: X′=(X-mean)/standard deviation. The DEGs in samples from atherosclerotic patients compared with those in non-atherosclerotic patients were screened using the Limma package (http://www.biocon-ductor.org/packages/release/bioc/html/limma.html) (17) in R. P<0.01 was defined as the cut-off value for DEG screening.

Unsupervised hierarchical clustering analysis

Unsupervised hierarchical clustering analysis (18) for DEGs was performed to effectively distinguish atherosclerotic and non-atherosclerotic samples using cluster 3.0 (19) and to visualize these samples using the heatmap tool in TreeView (20). Filtering and normalization of all the expression data were performed using cluster 3.0. Genes with expression in at least 80% of all samples were retained, while others were discarded. All of the genes and samples were median centered (21) and normalized. The similarity matrixes were correlation centered (22) and normalized.

Functional enrichment analyses

The Gene Ontology (GO) tool (23) provides functional annotations for large-scale genomic or transcription data. The Kyoto Encyclopedia of Genes and Genomes database (KEGG; http://www.genome.ad.jp/kegg/) (24) holds information on pathways and networks of molecules or genes. The Database for Annotation, Visualization and Integrated Discovery (DAVID; http://david.abcc.ncifcrf.gov/) software (25) is an integrated biological database and analytic tool for systematic annotation of biological meaning for large lists of genes or proteins. To understand the biological significance of DEGs, GO term and KEGG pathway analyses for up-and downregulated DEGs were performed using the DAVID online tool. The significance threshold was set at P<0.05.

Feature selection with the RFE algorithm

Feature selection offers classification models with high relevance by eliminating irrelevant features (26). In order to obtain candidates for diagnostic and prognostic biomarkers of atherosclerosis, the RFE algorithm (27,28) in python was applied for feature selection from the constellation of DEGs via iterative classification. Classification analyses were performed using a linear Support Vector Classification (SVC; linear kernel, step=1; cross-validation=5) provided by the python module scikit-learn (29). This recursive procedure adopted a backward elimination strategy to iteratively remove irrelevant features and was iterated until all relevant features were obtained.

Prediction with SVM classifier

All of the relevant features were processed with a linear SVM classifier (30) for classification prediction, which was repeated five times. Linear kernel and five-fold cross-validation of all samples were performed in this prediction procedure to assess classification performance. To further illustrate the prediction accuracy of this method, a receiver operating characteristic curve (ROC) was implemented and the area under curve (AUC) was calculated.

Results

DEG screening and hierarchical clustering analysis

Based on probe annotation files, a total of 18,196 gene symbols were acquired after transformation. A total of 430 DEGs with P<0.01 were screened using the Limma package, including 149 upregulated and 281 downregulated genes. The results of the unsupervised hierarchical clustering analysis were visualized in a heatmap (Fig. 1).
Figure 1

Heat map of hierarchical clustering analysis. The abscissa represents clustering of specimens with atherosclerosis samples in dark blue and non-atherosclerosis samples in light blue, and the mixture of the two sample groups in orange. The ordinate represents differentially expressed genes and the clustering of genes. Red indicates upregulated genes while green signifies downregulated genes.

Functional enrichment analyses for DEGs were performed using DAVID, and the significantly enriched GO terms and KEGG pathways are shown in Tables I and II, respectively. According to the enrichment results, the over-represented GO terms of upregulated genes were mainly associated with ribonucleoprotein complex biogenesis, ribosome biogenesis and positive regulation of T-cell proliferation, while downregulated genes were mainly associated with response to lipopolysaccharides, immune response and responses to molecules of bacterial origin (Table I). In addition, upregulated genes were significantly enriched in viral myocarditis and purine metabolism, while downregulated genes were significantly involved in leukocyte transendothelial migration and the Notch signaling pathway (Table II).
Table I

Over-represented GO terms enriched by differentially expressed genes.

TermCountP-value
Upregulated genes
 GO:0022613 - Ribonucleoprotein complex biogenesis60.010748
 GO:0042254 - Ribosome biogenesis50.012705
 GO:0042102 - Positive regulation of T-cell proliferation30.033574
 GO:0007010 - Cytoskeletal organization80.042602
 GO:0006941 - Striated muscle contraction30.045367
Downregulated genes
 GO:0032496 - Response to lipopolysaccharides79.36×10−4
 GO:0006955 - Immune response220.001271
 GO:0002237 - Response to molecules of bacterial origin70.001665
 GO:0009611 - Response to wounding180.002148
 GO:0002684 - Positive regulation of immune system processes110.002768
 GO:0001892 - Embryonic placenta development40.005054
 GO:0045936 - Negative regulation of phosphate metabolic processes50.005334
 GO:0010563 - Negative regulation of phosphorus metabolic processes50.005334
 GO:0050778 - Positive regulation of immune response80.005621
 GO:0042127 - Regulation of cell proliferation220.005981
 GO:0006954 - Inflammatory response120.008674

GO, gene ontology

Table II

Significantly enriched Kyoto Encyclopedia of Genes and Genomes pathways among differentially expressed genes.

TermCountP-value
Upregulated
 hsa05416: Viral myocarditis30.02366
 hsa00230: Purine metabolism40.03880
Downregulated
 hsa04670: Leukocyte transendothelial migration70.01940
 hsa04330: Notch signaling pathway40.04248

hsa, Homo sapiens.

Feature selection with the RFE algorithm and prediction with SVM classifier

By applying the RFE algorithm, a total of 11 biomarkers were obtained, including APH1B, JAM3, FBLN2, CSAD and PSTPIP2 (Table III). These 11 biomarkers were then processed with the SVM classifier for classification prediction. The AUC of the ROC obtained with the SVM classifier was calculated as 0.92 (Fig. 2), indicating that the identified 11 biomarkers were representative. In order to verify that the biomarkers that we screened were representative for the DEGs in atherosclerotic patients, all 430 DEGs were processed with the SVM classifier, resulting in an AUC of the ROC of 0.95 (Fig. 3), which confirmed that the 11 biomarkers were representative as the prediction accuracy of the identified 11 biomarkers was markedly higher than the randomly selected 11 DEGs. In addition, to rule out any randomization, the 11 DEGs were randomly selected as features which were subjected to the process of classification for 100,000 times, and the AUC of every step was calculated. The results showed that the rank position of the AUC corresponding to 11 biomarkers screened was 33 (P=0.0033), suggesting that the prediction accuracy of the screened biomarkers was significantly higher than that of random results.
Table III

Biomarkers screened using the recursive feature elimination algorithm.

BiomarkerP-valuelogFC
APH1B5.81×10−4−0.089736
HIATL11.44×10−3−0.111004
JAM32.24×10−3  0.164411
HOXB132.25×10−3−0.031921
FBLN22.35×10−3  0.049115
RNF1482.82×10−3−0.030481
RALB2.84×10−3−0.093036
CPEB33.07×10−3−0.059267
ABHD14B6.30×10−3  0.069626
CSAD7.75×10−3−0.041617
PSTPIP29.45×10−3−0.096693

FC, fold change.

Figure 2

ROC of support vector machine classifier constructed by 11 biomarkers. Five curves represent the results of every time (refinement cycle) in five-fold cross-validation of all samples. The black dotted line represents the mean ROC, and the light gray dotted line represents the random ROC. ROC, receiver operating characteristic curve; area, area under curve.

Figure 3

ROC of support vector machine classifier constructed by all differentially expressed genes. Five curves represent the results of every time (refinement cycle) in five-fold cross-validation of all samples. The black dotted line represents the mean ROC, and the light gray dotted line represents the random ROC. ROC, receiver operating characteristic curve; area, area under curve.

Discussion

Atherosclerosis is the proximate cause of occlusive arterial disease (31). Identification of potential biomarkers that have crucial roles in the progression of atherosclerosis is important for the development of therapeutic approaches. In the present study, a bioinformatics approach was used to predict potential biomarkers for the diagnosis and treatment of atherosclerosis. The screening of gene expression profiles of atherosclerotic patients identified 11 biomarkers, including APH1B, JAM3, FBLN2, CSAD and PSTPIP2, and the AUC of the ROC obtained using SVM classifier was calculated as 0.92, which indicated that these 11 biomarkers were representative. The crucial roles of these genes in the pathogenesis of atherosclerosis and their potential diagnostic and therapeutic values for this disease warrant further evaluation. APH1B is a functional component of the gamma-secretase enzyme complex, which catalyzes the cleavage of transmembrane proteins such as Notch receptors and amyloid-b precursor protein (APP) (32). Notch activation induces senescence of endothelial cells and results in pro-inflammatory responses, and Notch signaling has pivotal roles in the pathogenesis of atherosclerosis (33). Blockade of Notch ligand delta-like 4 - Notch signaling attenuates the development of atherosclerosis (34). APP has also been found to have a potential pathogenic role in carotid atherogenesis (35). In addition, the single-nucleotide polymorphism Phe217Leu (rs1047552; T>G) in the APH1B gene is associated with premature coronary atherosclerosis (36). It can therefore be speculated that APH1B may be implicated in the progression of atherosclerosis and function as a biomarker for risk prediction. Junctional adhesion molecule 3 (JAM3) is a member of vascular adhesion molecules, which mediate adhesion and interactions among cells or between cells and the extracellular matrix (37). Increased expression and activation of adhesion molecules in vascular endothelial cells and circulating leukocytes stimulate leukocyte recruitment into the vascular endothelium, which is an important step in the development of atherosclerosis (38,39). JAM-C has been shown to be involved in the control of inflammatory leukocyte recruitment in atherosclerosis (40). JAM3 may act as a counter-receptor for Mac-1, which mediates leukocyte-platelet interactions between vascular cells, and may thereby provide a molecular target for antagonizing interactions involved in several vascular pathologies, such as in atherothrombosis (37). In the present study, JAM3 was identified as a potential biomarker for atherosclerosis, which was in line with the abovementioned previous findings and suggested that JAM3 may be a potential therapeutic target in atherosclerosis. The FBLN2 gene encodes an the extracellular matrix protein fibulin 2. Upregulation of FBLN2 may promote the arterial response to injury and accelerate atherogenesis among patients with diabetes (41). In addition, FBLN2 may lead to the aberrant regulation of activator protein-1 factors, which have been shown to be associated with the progression of atherosclerosis (42). Thus, FBLN2 may be implicated in the progression of atherosclerosis and serve as a biomarker for this disease. Cysteine sulfinic acid decarboxylase (CSAD) can catalyze the conversion of cysteine sulfinic acid to hypotaurine, which may limit the generation of taurine (43). Taurine has been shown to have the potential protective effects against cardiovascular diseases and can effectively prevent the progression of atherosclerotic diseases (44). Therefore, it may be hypothesized that CSAD may have roles in the progression of atherosclerosis via limiting the generation of taurine. Proline-serine-threonine phosphatase-interacting protein 2 (PSTPIP2) is expressed in macrophages and is tyrosine-phosphorylated in response to colony-stimulating factor-1 (CSF-1) (45). CSF-1 is thought to have a role in the accumulation of cholesterol-laden macrophages (foam cells) in atherosclerotic plaques (46). Chitu and Stanley (47) found that CSF-1 was involved in several inflammatory disorders, such as artherosclerosis. Although the causal roles of PSTPIP2 in the development of atherosclerosis have not been fully elucidated, it may be speculated that PSTPIP2 has a pivotal role in the progression of atherosclerosis via interaction with CSF-1. The results of the present study indicated that PSTPIP2 may be a potential biomarker for atherosclerosis, which requires further investigation. Only gene expression profiles of atherosclerotic/non-atherosclerotic females were investigated in the present study. However, atherosclerosis may have different pathologies in males and females; therefore, sex differences should be explored. Furthermore, there was no experi mental validation in the present study; therefore, further investigations are required in order to verify the present findings and speculation. In particular, additional prospective biomarkers, including HIATL1, HOXB13, RNF148, RALB, CPEB3 and ABHD14B, were identified, which merit further investigation and discussion. In conclusion, the bioinformatics analysis of the present study indicated that APH1B, JAM3, FBLN2, CSAD and PSTPIP2 may have important roles in the pathogenesis of atherosclerosis. They may be considered as potential biomarkers for early diagnosis and prognosis as well as therapeutic targets for this disease, which requires experimental validation.
  40 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Vascular adhesion molecules in atherosclerosis.

Authors:  Elena Galkina; Klaus Ley
Journal:  Arterioscler Thromb Vasc Biol       Date:  2007-08-02       Impact factor: 8.311

Review 3.  Substrate specificity of gamma-secretase and other intramembrane proteases.

Authors:  A J Beel; C R Sanders
Journal:  Cell Mol Life Sci       Date:  2008-05       Impact factor: 9.261

4.  Predicting the ecotoxicity of ionic liquids towards Vibrio fischeri using genetic function approximation and least squares support vector machine.

Authors:  Shuying Ma; Min Lv; Fangfang Deng; Xiaoyun Zhang; Honglin Zhai; Wenjuan Lv
Journal:  J Hazard Mater       Date:  2014-10-22       Impact factor: 10.588

Review 5.  Colony-stimulating factor-1 in immunity and inflammation.

Authors:  Violeta Chitu; E Richard Stanley
Journal:  Curr Opin Immunol       Date:  2005-12-06       Impact factor: 7.486

6.  Trends and disparities in coronary heart disease, stroke, and other cardiovascular diseases in the United States: findings of the national conference on cardiovascular disease prevention.

Authors:  R Cooper; J Cutler; P Desvigne-Nickens; S P Fortmann; L Friedman; R Havlik; G Hogelin; J Marler; P McGovern; G Morosco; L Mosca; T Pearson; J Stamler; D Stryer; T Thom
Journal:  Circulation       Date:  2000-12-19       Impact factor: 29.690

7.  Notch ligand delta-like 4 blockade attenuates atherosclerosis and metabolic disorders.

Authors:  Daiju Fukuda; Elena Aikawa; Filip K Swirski; Tatiana I Novobrantseva; Victor Kotelianski; Cem Z Gorgun; Aleksey Chudnovskiy; Hiroyuki Yamazaki; Kevin Croce; Ralph Weissleder; Jon C Aster; Gökhan S Hotamisligil; Hideo Yagita; Masanori Aikawa
Journal:  Proc Natl Acad Sci U S A       Date:  2012-06-13       Impact factor: 11.205

Review 8.  Biomarkers of plaque instability.

Authors:  P K Shah
Journal:  Curr Cardiol Rep       Date:  2014-12       Impact factor: 2.931

Review 9.  Inflammation and atherosclerosis.

Authors:  Göran K Hansson; Anna-Karin L Robertson; Cecilia Söderberg-Nauclér
Journal:  Annu Rev Pathol       Date:  2006       Impact factor: 23.472

10.  Astragalus polysaccharides suppress ICAM-1 and VCAM-1 expression in TNF-α-treated human vascular endothelial cells by blocking NF-κB activation.

Authors:  Yu-ping Zhu; Tao Shen; Ya-jun Lin; Bei-dong Chen; Yang Ruan; Yuan Cao; Yue Qiao; Yong Man; Shu Wang; Jian Li
Journal:  Acta Pharmacol Sin       Date:  2013-06-03       Impact factor: 6.150

View more
  5 in total

1.  Adipocyte-specific deletion of HuR induces spontaneous cardiac hypertrophy and fibrosis.

Authors:  Adrienne R Guarnieri; Sarah R Anthony; Anamarie Gozdiff; Lisa C Green; Salma M Fleifil; Sam Slone; Michelle L Nieman; Perwez Alam; Joshua B Benoit; A Phillip Owens; Onur Kanisicak; Michael Tranter
Journal:  Am J Physiol Heart Circ Physiol       Date:  2021-05-21       Impact factor: 5.125

2.  A 2 miRNAs-based signature for the diagnosis of atherosclerosis.

Authors:  Xiujiang Han; Huimin Wang; Yongjian Li; Lina Liu; Sheng Gao
Journal:  BMC Cardiovasc Disord       Date:  2021-03-24       Impact factor: 2.298

3.  Transcriptome Profiling in Systems Vascular Medicine.

Authors:  Suowen Xu
Journal:  Front Pharmacol       Date:  2017-08-25       Impact factor: 5.810

4.  A transcriptome profile in gallbladder cancer based on annotation analysis of microarray studies.

Authors:  Chunlin Ge; Xuan Zhu; Xing Niu; Bingye Zhang; Lijie Chen
Journal:  Mol Med Rep       Date:  2020-11-12       Impact factor: 2.952

5.  Malnutrition and the risk for contrast-induced acute kidney injury in patients with coronary artery disease.

Authors:  Liling Chen; Zhidong Huang; Weiguo Li; Yibo He; Jingjing Liang; Jin Lu; Yanfang Yang; Haozhang Huang; Yihang Lin; Rongwen Lin; Mengfei Lin; Yan Liang; Yunzhao Hu; Jianfeng Ye; Yuying Hu; Jin Liu; Yong Liu; Yong Fang; Kaihong Chen; Shiqun Chen
Journal:  Int Urol Nephrol       Date:  2021-06-25       Impact factor: 2.370

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.