Literature DB >> 30388204

SJARACNe: a scalable software tool for gene network reverse engineering from big data.

Alireza Khatamian¹, Evan O Paull², Andrea Califano², Jiyang Yu¹.

Abstract

SUMMARY: Over the last two decades, we have observed an exponential increase in the number of generated array or sequencing-based transcriptomic profiles. Reverse engineering of biological networks from high-throughput gene expression profiles has been one of the grand challenges in systems biology. The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) represents one of the most effective and widely-used tools to address this challenge. However, existing ARACNe implementations do not efficiently process big input data with thousands of samples. Here we present an improved implementation of the algorithm, SJARACNe, to solve this big data problem, based on sophisticated software engineering. The new scalable SJARACNe package achieves a dramatic improvement in computational performance in both time and memory usage and implements new features while preserving the network inference accuracy of the original algorithm. Given that large-sampled transcriptomic data is increasingly available and ARACNe is extremely demanding for network reconstruction, the scalable SJARACNe will allow even researchers with modest computational resources to efficiently construct complex regulatory and signaling networks from thousands of gene expression profiles.
AVAILABILITY AND IMPLEMENTATION: SJARACNe is implemented in C++ (computational core) and Python (pipelining scripting wrapper, ≥3.6.1). It is freely available at https://github.com/jyyulab/SJARACNe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 30388204 PMCID： PMC6581437 DOI： 10.1093/bioinformatics/bty907

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Due to the power of technologies in transcriptome profiling from microarray to next-generation sequencing, we have observed a tremendous increase in the number of gene expression profiles from normal to malignant samples over the past two decades. For example, The Cancer Genome Atlas project has profiled over 30 000 human adult cancer patients. The Gene Expression Omnibus accumulated array—or sequence-based transcriptomic profiles of over 2.5 million samples from bacteria to humans by July 2018. Reverse engineering of gene regulatory networks from transcriptomic profiles has been proven to be powerful in discovering hidden drivers and master regulators of disease phenotypes including cancer (Rodriguez-Barrueco ), immunology (Du ), drug resistance (Piovan ) and drug response (Woo ). Various computational algorithms have been developed to reconstruct gene regulatory networks from large-scaled gene expression data. Among these, Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) (Margolin ) represents one of the most efficient and widely-used network reconstruction methods based on mutual information (MI) that captures non-linear relationships between two variables. ARACNe-adaptive partitioning (AP) (Lachmann ) improved MI estimation using an AP approach. However, neither ARACNe nor ARACNe-AP can handle big input data with thousands of samples. For example, the original ARACNe fails when the sample size is over 1500 and ARACNe-AP requires too much memory to be runnable on a standard computer. Here we present a scalable solution, SJARACNe, to address the big data problem by optimizing the depth of AP and redesigning the data structure. SJARACNe dramatically improves the computational performance, especially on the memory usage to allow even researchers with modest computational power to generate networks from thousands of samples. For example, SJARACNe can process data with 2000 samples on a laptop with only 8 GB RAM while ARACNe would fail and ARACNe-AP would require a supercomputer with at least 10 times more memory. We benchmarked the performance improvements of SJARACNe with datasets of various sizes compared to ARACNe and ARACNe-AP.

2 SJARACNe features and functions

2.1 Efficient data structures

Data structures alongside algorithms are critical to efficient computing. Here we replaced an inefficient and inextensible data structure used in ARACNe with a pointer-based and flexible data structure in SJARACNe, to enhance the computational power by reducing the access time and, therefore, the overall run time.

2.2 Optimization of the depth of adaptive partitioning

AP is an efficient solution for MI estimation. ARACNe forced a fixed convergence point in its AP implementation which limited its scalability; ARACNe-AP used a high threshold which results in high memory problem. SJARACNe solves both problems by utilizing a flexible convergence point.

2.3 New features and functions

SJARACNe provides enhanced annotations of network output including annotations of nodes, and extra statistics such as Spearman and Pearson correlation and regression coefficients. In addition, SJARACNe generates the network in various formats that can be taken by visualization tools.

3 Datasets

To benchmark the performance of SJARACNe in comparison with ARACNe and ARACNE-AP, we have chosen a large breast cancer dataset with 1981 samples (Curtis ) and sampled the data into four datasets with different sample sizes: small (N = 100), medium (N = 500), large (N = 1000) and very large (N = 1981) while fixing the gene dimension (28 278 genes).

4 Results and discussion

We compared SJARACNe with ARACNe and ARACNe-AP on both runtime (Fig. 1 A) and memory usage (Fig. 1B) using four different datasets, from a small to very large number of samples with 100 bootstraps (the same seeds were used across all three methods). The performance results show that ARACNe-AP is the most memory consuming method among the three, while being close to SJARACNe in terms of run time. SJARACNe and ARACNe are in the same level in terms of memory consumption but SJARACNe is 2–2.5 times faster than its competitor as the number of samples increases. Further, ARACNe is unable to handle a dataset with a very large number of samples (it failed at N = 1981), while SJARACNe and ARACNe-AP successfully completed the job.

Fig. 1.

Performance comparison of SJARACNe (blue), ARACNe-AP (green) and ARACNe (red). (A) run time and (B) memory. No results of ARACNe in very large dataset (N = 1981) is due to its failure in handling big input data We have performed the network similarity analysis on gene regulatory networks generated by the different methods for 10 680 transcripts representing 6 458 signaling factors in all four benchmark datasets with 100 bootstraps. Then we performed Fisher’s exact test to measure the significance of overlaps of targets for each isoform generated by the three algorithms. SJARACNe and ARACNe construct exactly the same networks with the same initial seeds. SJARACNe and ARACNe-AP produce highly similar networks: For each of the 10 680 signaling factor isoforms, the targets predicted by SJARACNe and ARACNe-AP overlap significantly (P < 10−9) (Supplementary Fig. S1). In summary, SJARACNe addresses the pressing issue of reconstructing gene networks from big data and will have broad applications. Click here for additional data file.

7 in total

1. Elucidating Compound Mechanism of Action by Network Perturbation Analysis.

Authors: Jung Hoon Woo; Yishai Shimoni; Wan Seok Yang; Prem Subramaniam; Archana Iyer; Paola Nicoletti; María Rodríguez Martínez; Gonzalo López; Michela Mattioli; Ronald Realubit; Charles Karan; Brent R Stockwell; Mukesh Bansal; Andrea Califano
Journal: Cell Date: 2015-07-16 Impact factor: 41.582

2. Direct reversal of glucocorticoid resistance by AKT inhibition in acute lymphoblastic leukemia.

Authors: Erich Piovan; Jiyang Yu; Valeria Tosello; Daniel Herranz; Alberto Ambesi-Impiombato; Ana Carolina Da Silva; Marta Sanchez-Martin; Arianne Perez-Garcia; Isaura Rigo; Mireia Castillo; Stefano Indraccolo; Justin R Cross; Elisa de Stanchina; Elisabeth Paietta; Janis Racevskis; Jacob M Rowe; Martin S Tallman; Giuseppe Basso; Jules P Meijerink; Carlos Cordon-Cardo; Andrea Califano; Adolfo A Ferrando
Journal: Cancer Cell Date: 2013-11-27 Impact factor: 31.743

3. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups.

Authors: Christina Curtis; Sohrab P Shah; Suet-Feung Chin; Gulisa Turashvili; Oscar M Rueda; Mark J Dunning; Doug Speed; Andy G Lynch; Shamith Samarajiwa; Yinyin Yuan; Stefan Gräf; Gavin Ha; Gholamreza Haffari; Ali Bashashati; Roslin Russell; Steven McKinney; Anita Langerød; Andrew Green; Elena Provenzano; Gordon Wishart; Sarah Pinder; Peter Watson; Florian Markowetz; Leigh Murphy; Ian Ellis; Arnie Purushotham; Anne-Lise Børresen-Dale; James D Brenton; Simon Tavaré; Carlos Caldas; Samuel Aparicio
Journal: Nature Date: 2012-04-18 Impact factor: 49.962

4. Hippo/Mst signalling couples metabolic state and immune function of CD8α⁺ dendritic cells.

Authors: Xingrong Du; Jing Wen; Yanyan Wang; Peer W F Karmaus; Alireza Khatamian; Haiyan Tan; Yuxin Li; Cliff Guy; Thanh-Long M Nguyen; Yogesh Dhungana; Geoffrey Neale; Junmin Peng; Jiyang Yu; Hongbo Chi
Journal: Nature Date: 2018-05-30 Impact factor: 69.504

5. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context.

Authors: Adam A Margolin; Ilya Nemenman; Katia Basso; Chris Wiggins; Gustavo Stolovitzky; Riccardo Dalla Favera; Andrea Califano
Journal: BMC Bioinformatics Date: 2006-03-20 Impact factor: 3.169

6. ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information.

Authors: Alexander Lachmann; Federico M Giorgi; Gonzalo Lopez; Andrea Califano
Journal: Bioinformatics Date: 2016-04-23 Impact factor: 6.937

7. Inhibition of the autocrine IL-6-JAK2-STAT3-calprotectin axis as targeted therapy for HR-/HER2+ breast cancers.

Authors: Ruth Rodriguez-Barrueco; Jiyang Yu; Laura P Saucedo-Cuevas; Mireia Olivan; David Llobet-Navas; Preeti Putcha; Veronica Castro; Eva M Murga-Penas; Ana Collazo-Lorduy; Mireia Castillo-Martin; Mariano Alvarez; Carlos Cordon-Cardo; Kevin Kalinsky; Matthew Maurer; Andrea Califano; Jose M Silva
Journal: Genes Dev Date: 2015-07-30 Impact factor: 11.361

7 in total

12 in total

1. Combination of Ribociclib and Gemcitabine for the Treatment of Medulloblastoma.

Authors: Allison Pribnow; Barbara Jonchere; Jingjing Liu; Kyle S Smith; Olivia Campagne; Ke Xu; Sarah Robinson; Yogesh Patel; Arzu Onar-Thomas; Gang Wu; Clinton F Stewart; Paul A Northcott; Jiyang Yu; Giles W Robinson; Martine F Roussel
Journal: Mol Cancer Ther Date: 2022-08-02 Impact factor: 6.009

2. The molecular characteristics of low-grade and high-grade areas in desmoplastic infantile astrocytoma/ganglioglioma.

Authors: Jason Chiang; Xiaoyu Li; Hongjian Jin; Gang Wu; Tong Lin; David W Ellison
Journal: Neuropathol Appl Neurobiol Date: 2022-03-01 Impact factor: 6.250

3. Large-scale genomic study reveals robust activation of the immune system following advanced Inner Engineering meditation retreat.

Authors: Vijayendran Chandran; Mei-Ling Bermúdez; Mert Koka; Brindha Chandran; Dhanashri Pawale; Ramana Vishnubhotla; Suresh Alankar; Raj Maturi; Balachundhar Subramaniam; Senthilkumar Sadhasivam
Journal: Proc Natl Acad Sci U S A Date: 2021-12-21 Impact factor: 12.779

4. Single-cell analysis reveals the Comma-1D cell line as a unique model for mammary gland development and breast cancer.

Authors: Rachel L Werner; Erin A Nekritz; Koon-Kiu Yan; Bensheng Ju; Bridget Shaner; John Easton; Jiyang Yu; Jose Silva
Journal: J Cell Sci Date: 2022-05-20 Impact factor: 5.235

5. The myogenesis program drives clonal selection and drug resistance in rhabdomyosarcoma.

Authors: Anand G Patel; Xiang Chen; Xin Huang; Michael R Clay; Natalia Komorova; Matthew J Krasin; Alberto Pappo; Heather Tillman; Brent A Orr; Justina McEvoy; Brittney Gordon; Kaley Blankenship; Colleen Reilly; Xin Zhou; Jackie L Norrie; Asa Karlstrom; Jiyang Yu; Dominik Wodarz; Elizabeth Stewart; Michael A Dyer
Journal: Dev Cell Date: 2022-04-27 Impact factor: 13.417

Review 6. Computational Oncology in the Multi-Omics Era: State of the Art.

Authors: Guillermo de Anda-Jáuregui; Enrique Hernández-Lemus
Journal: Front Oncol Date: 2020-04-07 Impact factor: 6.244

7. Integrative network analysis reveals USP7 haploinsufficiency inhibits E-protein activity in pediatric T-lineage acute lymphoblastic leukemia (T-ALL).

Authors: Timothy I Shaw; Li Dong; Liqing Tian; Chenxi Qian; Yu Liu; Bensheng Ju; Anthony High; Kanisha Kavdia; Vishwajeeth R Pagala; Bridget Shaner; Deqing Pei; John Easton; Laura J Janke; Shaina N Porter; Xiaotu Ma; Cheng Cheng; Shondra M Pruett-Miller; John Choi; Jiyang Yu; Junmin Peng; Wei Gu; A Thomas Look; James R Downing; Jinghui Zhang
Journal: Sci Rep Date: 2021-03-04 Impact factor: 4.379

8. Transcriptional Networks Identify BRPF1 as a Potential Drug Target Based on Inflammatory Signature in Primary Lower-Grade Gliomas.

Authors: Mingyang Xia; Huiyao Chen; Tong Chen; Ping Xue; Xinran Dong; Yifeng Lin; Duan Ma; Wenhao Zhou; Wei Shi; Hao Li
Journal: Front Oncol Date: 2021-12-02 Impact factor: 6.244

9. Network-based systems pharmacology reveals heterogeneity in LCK and BCL2 signaling and therapeutic sensitivity of T-cell acute lymphoblastic leukemia.

Authors: Yoshihiro Gocho; Jingjing Liu; Jianzhong Hu; Wentao Yang; Neekesh V Dharia; Jingliao Zhang; Hao Shi; Guoqing Du; August John; Ting-Nien Lin; Jeremy Hunt; Xin Huang; Bensheng Ju; Lauren Rowland; Lei Shi; Dylan Maxwell; Brandon Smart; Kristine R Crews; Wenjian Yang; Kohei Hagiwara; Yingchi Zhang; Kathryn Roberts; Hong Wang; Elias Jabbour; Wendy Stock; Bartholomew Eisfelder; Elisabeth Paietta; Scott Newman; Giovanni Roti; Mark Litzow; John Easton; Jinghui Zhang; Junmin Peng; Hongbo Chi; Stanley Pounds; Mary V Relling; Hiroto Inaba; Xiaofan Zhu; Steven Kornblau; Ching-Hon Pui; Marina Konopleva; David Teachey; Charles G Mullighan; Kimberly Stegmaier; William E Evans; Jiyang Yu; Jun J Yang
Journal: Nat Cancer Date: 2021-01-21

10. Highly connected, non-redundant microRNA functional control in breast cancer molecular subtypes.

Authors: Guillermo de Anda-Jáuregui; Jesús Espinal-Enríquez; Enrique Hernández-Lemus
Journal: Interface Focus Date: 2021-06-11 Impact factor: 3.906