Literature DB >> 29399082

Identifying pathway modules of tuberculosis in children by analyzing multiple different networks.

Lu Cheng1, Yuling Han1, Xiuxia Zhao1, Xiaoli Xu1, Jing Wang1.   

Abstract

Tuberculosis (TB), which is caused by the mycobacterium TB, is the major cause of human death worldwide. The aim of this study was to identify the biomarkers involved in child TB. Gene expression data were obtained from the Array Express Archive of Functional Genomics Data. Gene expression data and protein-protein interaction (PPI) data were downloaded to construct differential gene co-expression networks (DCNs). The Benjamini-Hochberg algorithm was used to correct the P-value. In total, 3,820 edges (PPIs) and 1,359 nodes (genes) were obtained from the human-related PPIs data and gene expression data at the criteria of absolute value of Pearson's correlation coefficient >0.8. The DCNs were formed by these edges and nodes. Thirteen seed genes were obtained by ranging z-scores. Eight significant multiple different modules were identified from DCNs using the statistical significant test. In conclusion, the seed genes and significant modules constitute potential biomarkers that reveal the underlying mechanisms in child TB. The new identified biomarkers may contribute to an understanding of TB and provide a new therapeutic method for the treatment of TB.

Entities:  

Keywords:  differential gene co-expression networks; multiple differential modules; protein-protein interactions; tuberculosis

Year:  2017        PMID: 29399082      PMCID: PMC5769296          DOI: 10.3892/etm.2017.5434

Source DB:  PubMed          Journal:  Exp Ther Med        ISSN: 1792-0981            Impact factor:   2.447


Introduction

Tuberculosis (TB), which is caused by mycobacterium TB, is a major cause of human mortality worldwide, with two million deaths and ten million new cases of TB occurring annually (1). Children are more susceptible to the infection of mycobacterium TB due to their having a relatively weaker immune system compared with adults (2,3). The World Health Organization (WHO) reported that almost one million children were infected with the mycobacterium TB in 2015 (4). India, Indonesia, China, Nigeria, Pakistan and South Africa account for 60% of newly identified cases (5). There are more than 30,000 new children cases of multidrug-resistant TB in 2015 worldwide (6). Vaccination with BacilleCalmette-Guerin (BCG) is an effective form of prevention of TB. The BCG vaccine has 60–80% protective effect against severe types of TB in children, especially meningitis (7). The Xpert Mycobacterium tuberculosis/rifampicin (MTB/RIF) assay can be used to diagnose TB and yield reliable results. Zar et al reported that Xpert MTB/RIF was a useful assay for the rapid and reliable diagnosis of paediatric TB in African children, using induced sputum and nasopharyngeal as the specimens (8). Gous et al also used the Xpert MTB/RIF assay to diagnose TB in childhood (9). Fiebig et al used the nucleic acid amplification tests and culture of gastric aspirates to detect bacteriological confirmation of TB in German children. Those authors found that the combined use of molecular assay and culture method had an improved test accuracy rate (10). Protein-protein interactions (PPI) play an important role in all biological processes. The interaction networks can be used to explore the intricate protein organizations and cellprocesses (11,12). Safaei et al carried out a PPI network study on cirrhosis liver disease. Authors of that study found that the regulation of cell survival and lipid metabolism were pivotal biological processes in cirrhosis disease (13). In ovarian cancer, 12-gene network modules have been identified using the differential co-expression PPI network. The gene expression data and PPI networks can be used to develop effective biomarkers for understanding disease mechanisms (14). Ramadan et al combined the PPI and gene co-expression network (GCN) to analyze breast cancer (15). In the present study, the PPI network and GCN were employed to analyze the latent and active period of TB in children. Thirteen seed genes were found in the differential gene co-expression networks (DCNs), and eight multiple differential modules (M-DMs) were identified based on the DCNs (16). The identified M-DMs provided new insights into the development of TB in children.

Materials and methods

Gene expression data

The Array Express Archive of Functional Genomics Data is a functional genomics database at the European Bioinformatics Institute. The microarray data of E-GEOD-39940 were downloaded from the Array Express database. The data contained the gene expression profilings of patients who were HIV-negative, suffered from latent period of TB (n=54) and active period of TB (n=70). In order to eliminate the influence of non-specific hybridization, the robust multichip average method was used to correct background. The quantile-based algorithm was carried out to normalize the data. The probes were discarded when they did not match any genes. In total, 13,997 genes were obtained after the mapping between gene IDs and probe IDs.

PPI data

Human related PPI data were obtained from the The Search Tool for the Retrieval of Interacting database, containing 787,896 pairs and 16,730 genes. The genes that were included in gene expressions and PPIs were selected to construct DCN. After processing, 501,736 PPI pairs and 12,310 genes were obtained.

Construction of DCNs

The absolute value of the Pearson's correlation coefficient of PPI pairs of the active TB samples were calculated. The PPIs were selected if the corresponding absolute value was >0.8. Finally, 3,820 edges (PPIs) and 1,359 nodes (genes) were obtained to construct the DCNs. The one-tailed t-test was used to calculate the P-value of differentially expressed genes in the latent and active TB. The weight value of each interaction was calculated based on the P-values of genes according to EdgeR (17) as follows: Where pi and pj are the P-values of the differential expression of gene i and gene j, respectively. V is the node set of the co-expression network. In addition, cor(i,j) indicates the absolute value of Pearson's correlation between gene i and j.

Construction of M-DMs

The construction of M-DMs consists of three steps: i) Seed genes prioritization, ii) module search based on each gene, and iii) the refinement of candidate modules. i) The importance of each gene in the networks was calculated as: where g(i), the importance of vertex i in the network; N(i), the adjacent set of gene i; A', the degree normalized weighted adjacent set, which is calculated as A' = D−1/2AD1/2, where D is the diagonal set of A. The g (i) = z-score, and the genes were then ranked by the z-scores. The genes with the highest 1% z-scores were selected as the seed genes. ii) For each seed gene v ϵ V, it was selected as one differential module C. Then the gene u, which was adjacent to the gene v in the network was incorporated into this module, designated as module C'. The entropy change of the two modules was assessed as: ΔH(C',C)=H(C')-H(C). ΔH(C',C)>0 exhibited that the connectivity of module C was increased by the joining of gene u. This was then joined to the adjacent gene u, which potentially increased the ΔH in module C until the ΔH was no longer able to increase. iii) The candidate module was removed if it contained <5 nodes. If the overlapping degree between two modules was ≥0.5, the two modules were merged into one module.

The statistical significant test of candidate M-DMs

In total, 3,820 edges were selected randomly from 501,736 edges and formed the random network. The module searching was carried out following the above mentioned steps. The random networks were constructed 100 times, and 2,318 modules were constructed. The empirical P-value of the candidate module was calculated as the probability of the module, which has the observed score or smaller score by chance. The Benjamini-Hochberg algorithm was used to correct the P-value (16). The modules that had the P-value of ≤0.05 were selected as the differential modules.

Results

The human-related PPI and gene expression data were downloaded to construct the DCNs. Based on the criteria of absolute value of Pearson's correlation coefficient >0.8, 3,820 edges (PPIs) and 1,359 nodes (genes) were obtained (Fig. 1). The DCNs consisted of these edges and nodes.
Figure 1.

The DCNs contained 3,820 edges (PPIs) and 1,359 nodes (genes). DCNs, differential gene co-expression networks; PPI, protein-protein interaction.

Identification of candidate M-DMs

The genes which had the highest 1% z-scores in DCNs were selected as the seed genes. On aggregate, 13 seed genes were obtained (Table I). The z-scores ranged from 284.5787 to 473.111. The seed genes contained SS18L2, NOL11, ADSL, ILF2, DDX18, DDX1, CLNS1A, ENOPH1, MTERF3, MRPL32, NUP37, RPL35 and EEF1B2. After the modules were investigated and refined, 11 modules were obtained.
Table I.

Genes with highest 1% z-scores in DCNs were selected as the seed genes[a].

Gene namez-score
SS18L2473.111
NOL11457.7947
ADSL438.8713
ILF2365.7652
DDX18345.6201
DDX1330.0789
CLNS1A306.1616
ENOPH1300.3362
MTERF3300.3337
MRPL32294.2793
NUP37287.2869
RPL35285.7214
EEF1B2284.5787

In total, 13 seed genes were obtained.

The P-value of the 11 candidate M-DMs were calculated and corrected using the Benjamini-Hochberg algorithm. The modules with P≤0.05 were regarded as the objective modules. Finally, 8 modules were selected as significant differential modules (Table II and Fig. 2). The module entropy ranged from 0.687 to 0.851.
Table II.

The P-value of 11 candidate M-DMs was calculated using the Benjamini-Hochberg algorithm[a].

ModulesP-valuesEntropy
  100.847
  200.687
  300.739
  500.721
  600.775
  700.851
1100.798
1200.716

P≤0.05 was considered statistically significant.

Figure 2.

The M-DMs identified from the DCNs. (A-H) The 8 M-DMs, respectively, are presented. M-DMs, multiple differential modules; DCNs, differential gene co-expression networks.

Discussion

From a systematic biology point of view, diseases are caused by the fluctuations to the gene expression network. Such fluctuations change significantly during the disease progressions (18). Schwarz et al combined the PPI works and expression genes to examine the biological processes and genes related with schizophrenia (19). The PPI and gene-gene functional interaction networks were constructed to identify potential biomarkers of pediatric adreno cortical carcinoma (20). In the present study, we introduced a new method based on M-DMs to identify new biomarkers to better understand the molecular mechanisms and search for potential biomarkers of TB. We identified 8 modules associated with TB. Humans possess two SS18 homologous genes, SS18L1 and SS18L2. The SS18L2 gene has three exons and is mapped to chromosome 3, with band p21 (21). de Bruijn reported that SS18 encoded nuclear proteins and functioned as a transcriptional co-activator. The fusion of either SSX genes or SS18 is a hallmark of human synovial sarcoma (22). Nuclear protein 11 (NOL11) is a metazoan-specific protein and is involved in ribosome biogenesis. NOL11 also plays an important role in the maturation of 18S RNA and pathogenesis of North American Indian childhood cirrhosis (23). Human adenylosuccinatelyase (ADSL) is a bifunctional enzyme acting in two pathways of purine nucleotide metabolism including de novo purine synthesis and purine nucleotide recycling (24). The human liver ADSL gene was cloned and mapped to chromosome 22 (25,26). The antisense oligonucleotides (ASOs) combine with RNA to form heteroduplexes, which can be specifically recognized by the interleukin enhancer-binding factor 2 and 3 complex (ILF2/3). The combination of ASO and ILF2/3 modulates gene expression by alternative splicing (27). ILF2 mRNA accumulates in the pachytene spermatocytes. ILF2 is also expressed in the adult ovary and different embryo tissues (28). DEAD-Box Helicase 1 (DDX1) was found in a high-molecular complex containing a series of Drosha-associated polypeptides (29). Low DDX1 levels are associated with poor clinical outcome in serious ovarian cancer by the cancer genome atlas and DDX1 plays an important role in the modulation of miRNA maturation (30). Nevertheless, there are some drawbacks to the present study. The study included 124 samples, which is not a sufficient amount of samples to support the conclusions and future studies are to be conducted to confirm the findings. In addition, the results were not verified by clinical experiments. In conclusion, in the present study, we identified 8 significant different modules using the new bioinformatic methods. We believe that the present study will benefit the understanding of TB in children and provide new therapeutic methods to combat the disease.
  29 in total

Review 1.  Immunobiology of childhood tuberculosis: a window on the ontogeny of cellular immunity.

Authors:  S Smith; R F Jacobs; C B Wilson
Journal:  J Pediatr       Date:  1997-07       Impact factor: 4.406

2.  Human adenylosuccinate lyase (ADSL), cloning and characterization of full-length cDNA and its isoform, gene structure and molecular basis for ADSL deficiency in six patients.

Authors:  S Kmoch; H Hartmannová; B Stibůrková; J Krijt; M Zikánová; I Sebesta
Journal:  Hum Mol Genet       Date:  2000-06-12       Impact factor: 6.150

3.  Effect of BCG vaccination on childhood tuberculous meningitis and miliary tuberculosis worldwide: a meta-analysis and assessment of cost-effectiveness.

Authors:  B Bourdin Trunz; Pem Fine; C Dye
Journal:  Lancet       Date:  2006-04-08       Impact factor: 79.321

4.  The RNA-binding protein DDX1 promotes primary microRNA maturation and inhibits ovarian tumor progression.

Authors:  Cecil Han; Yunhua Liu; Guohui Wan; Hyun Jin Choi; Luqing Zhao; Cristina Ivan; Xiaoming He; Anil K Sood; Xinna Zhang; Xiongbin Lu
Journal:  Cell Rep       Date:  2014-08-28       Impact factor: 9.423

5.  RePORT International: Advancing Tuberculosis Biomarker Research Through Global Collaboration.

Authors:  Carol D Hamilton; Soumya Swaminathan; Devasahayam J Christopher; Jerrold Ellner; Amita Gupta; Timothy R Sterling; Valeria Rolla; Sudha Srinivasan; Muhammad Karyana; Sophia Siddiqui; Sonia K Stoszek; Peter Kim
Journal:  Clin Infect Dis       Date:  2015-10-15       Impact factor: 9.079

6.  A mutation in adenylosuccinate lyase associated with mental retardation and autistic features.

Authors:  R L Stone; J Aimi; B A Barshop; J Jaeken; G Van den Berghe; H Zalkin; J E Dixon
Journal:  Nat Genet       Date:  1992-04       Impact factor: 38.330

7.  Synthetic oligonucleotides recruit ILF2/3 to RNA transcripts to modulate splicing.

Authors:  Frank Rigo; Yimin Hua; Seung J Chun; Thazha P Prakash; Adrian R Krainer; C Frank Bennett
Journal:  Nat Chem Biol       Date:  2012-04-15       Impact factor: 15.040

8.  Frontotemporal dementia: insights into the biological underpinnings of disease through gene co-expression network analysis.

Authors:  Raffaele Ferrari; Paola Forabosco; Jana Vandrovcova; Juan A Botía; Sebastian Guelfi; Jason D Warren; Parastoo Momeni; Michael E Weale; Mina Ryten; John Hardy
Journal:  Mol Neurodegener       Date:  2016-02-24       Impact factor: 14.195

9.  Protein Interaction Networks Link Schizophrenia Risk Loci to Synaptic Function.

Authors:  Emanuel Schwarz; Rauf Izmailov; Pietro Liò; Andreas Meyer-Lindenberg
Journal:  Schizophr Bull       Date:  2016-04-07       Impact factor: 9.306

10.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

View more
  1 in total

1.  Comprehensive plasma proteomic profiling reveals biomarkers for active tuberculosis.

Authors:  Diana J Garay-Baquero; Cory H White; Naomi F Walker; Marc Tebruegge; Hannah F Schiff; Cesar Ugarte-Gil; Stephen Morris-Jones; Ben G Marshall; Antigoni Manousopoulou; John Adamson; Andres F Vallejo; Magdalena K Bielecka; Robert J Wilkinson; Liku B Tezera; Christopher H Woelk; Spiros D Garbis; Paul Elkington
Journal:  JCI Insight       Date:  2020-09-17
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.