Literature DB >> 31846472

A transcriptome-based signature of pathological angiogenesis predicts breast cancer patient survival.

Rodrigo Guarischi-Sousa1,2, Jhonatas S Monteiro1, Lilian C Alecrim1, Jussara S Michaloski1, Laura B Cardeal1, Elisa N Ferreira3, Dirce M Carraro3, Diana N Nunes3, Emmanuel Dias-Neto3,4, Jüri Reimand2,5, Paul C Boutros6, João C Setubal1, Ricardo J Giordano1.   

Abstract

The specific genes and molecules that drive physiological angiogenesis differ from those involved in pathological angiogenesis, suggesting distinct mechanisms for these seemingly related processes. Unveiling genes and pathways preferentially associated with pathologic angiogenesis is key to understanding its mechanisms, thereby facilitating development of novel approaches to managing angiogenesis-dependent diseases. To better understand these different processes, we elucidated the transcriptome of the mouse retina in the well-accepted oxygen-induced retinopathy (OIR) model of pathological angiogenesis. We identified 153 genes changed between normal and OIR retinas, which represent a molecular signature relevant to other angiogenesis-dependent processes such as cancer. These genes robustly predict the survival of breast cancer patients, which was validated in an independent 1,000-patient test cohort (40% difference in 15-year survival; p = 2.56 x 10-21). These results suggest that the OIR model reveals key genes involved in pathological angiogenesis, and these may find important applications in stratifying tumors for treatment intensification or for angiogenesis-targeted therapies.

Entities:  

Year:  2019        PMID: 31846472      PMCID: PMC6917213          DOI: 10.1371/journal.pgen.1008482

Source DB:  PubMed          Journal:  PLoS Genet        ISSN: 1553-7390            Impact factor:   5.917


Introduction

It has been almost half a century that the field of angiogenesis ascended based on the thoughtful idea that tumor growth is dependent on neovascularization [1-3]. This exciting hypothesis evolved and matured over the decades, and today patients from at least two important groups of diseases benefit from anti-angiogenesis therapy: cancer and ocular diseases [2,4]. Most of the drugs in the clinic are directed at the vascular endothelial growth factors (VEGF) or their receptors and pathways, all essential players in blood vessel formation [4,5,6]. Although successful, anti-VEGF therapy still has important deficiencies. Patients from both groups of diseases, ocular and oncologic, may have incomplete response to anti-VEGF therapy or eventually become refractory to treatment [5-7]. Not all tumors respond to anti-VEGF therapy and response is far from homogenous, even for patients with the same type of tumor. While, in general, patients with renal and metastatic colon cancer respond well to anti-VEGF therapy, the approval of bevacizumab for breast cancer treatment was suspended in the US after two studies showed contradictory results (reviewed by ref. [8]). Therefore, there is a pressing need for reliable methods of assessing patient response to anti-angiogenesis therapy. We propose that mRNA abundance gene signatures may have an important role in separating patients that would benefit from anti-VEGF therapy from those that do not. The challenge for developing these signatures lies in identifying sets of genes that are representative of and tightly associated with specific diseases. This can be a challenge when using data obtained from human tumors for which angiogenesis is a hallmark [9] or, for that matter, any cohort of human samples given the genetic variability amongst patients. Cancer cells are notoriously heterogeneous, and tumors themselves are surrounded and affected by a dynamic microenvironment comprised of a diversity of parenchymal, vascular, immune, and even prokaryotic cells [10-12]. To minimize these confounding factors and to identify a core of genes associated with angiogenesis, we relied on a mouse model that has been extensively used to study pathological angiogenesis [13-18]. The oxygen-induced retinopathy (OIR) mouse model works by varying the levels of oxygen in the developing retina in order to modulate VEGF expression [19,20]. The result is a state of pathological angiogenesis that is driven by hypoxia and VEGF, as in human tumors. The OIR model also reproduces many aspects of another important human disease that affects premature babies: the retinopathy of prematurity (ROP) [20,21]. Thus, by comparing transcripts of mouse retinas developing under physiological and pathological conditions (OIR), we were able to identify a set of genes that may capture the gene expression basis of hypoxia and VEGF-driven pathological angiogenesis. These genes were next validated by building an angiogenesis signature with prognostic value in human breast cancer. Collectively, our data validate these differentially expressed genes as representative of pathological angiogenesis not only in the retina but also in an important human disease, breast cancer, suggesting that other signatures and therapeutic gateways may be developed based on this gene set.

Results

OIR sequencing

The ideal animal model for studying angiogenesis should have the following features. First, it should be isogenic, to reduce genetic variation and increase reproducibility. Second, it should not require transient genetic manipulation or administration of exogenous materials (e.g. drugs, scaffolds or viral vectors). Third, it should mimic physiologic processes like retinal neovascularization, recapitulating aspects of human diseases [19]. The model we have used (OIR) has all these benefits, thus avoiding confounding factors present in other strategies. The OIR model consists of placing seven-day-old mice (postnatal day 7, P7) with their nursing mothers in 75% oxygen for five days (). The hyperoxic environment halts the physiological vascular development in the retina and its effect is particularly visible in the central area of the retina, where vessel regression can be clearly observed [22]. Thus, when mice return to room air (20.8% oxygen) on P12, the now under-vascularized retina experiences a sudden hypoxic condition leading to VEGFA overexpression compared to physiological retinas, resulting in abnormal vascular growth and pathological angiogenesis, which peaks at P17 [20].

The OIR model and the transcriptome of the angiogenic retina.

(a) Cartoon representation of the OIR model. Neonatal mice with their nursing mothers are exposed to 75% oxygen from day P7 to P12, which reduces VEGF production in the retina and induces vaso-regression and central area vaso-obliteration (VO). Upon return to room air, the now hypoxic retina increases VEGF levels and a pathological angiogenesis states ensues. (b) Confirmation of the pathological angiogenesis condition in OIR-mice used for the RNA-seq study, with central area VO, vascular tufts (*) and the presence of intravitreal vessels (arrows). (c) Expression profile of selected genes quantified by RNA-Seq and RT-PCR methods. Bars represent standard error of the mean from independent biological samples (N = 8). Fold-changes were calculated relative to P12 samples. To confirm that in our experimental settings mice had indeed developed OIR, we analyzed sample retinas and observed the characteristic phenotype: vaso-obliteration (VO) in central retina, formation of vascular tufts, and vitreous humour vascular invasion (). Retinas under physiological development did not show any sign of retinopathy. Having confirmed the development of the OIR phenotype, we proceeded to determine the transcriptome of retinas at different time points of development in physiological and pathological conditions. Two OIR experiments were performed and retinas from 4 mice (N = 8 retinas) were analyzed. In physiological conditions, we collected samples at P12, P15 and P17 (physiological angiogenesis); from OIR mice we collected samples at P12 (immediately after leaving the chamber with 75% oxygen) along with retinas 12 hours after exposure to 75% oxygen (P12.5); then at P15 (mid-ROP) and P17 (ROP peak). To distinguish physiological and pathological samples, the OIR retina samples will be referred to as R12, R12.5, R15 and R17. High quality total RNA was obtained from all retinas, treated with DNase to remove genomic DNA contamination, and utilized for RNA-seq library construction, preserving RNA strand information. To enrich for exonic reads, we rebuilt all libraries using poly-A+ RNA and performed very high-depth sequencing (average: 147 million reads per sample). Approximately 90% of the reads confidently mapped to the mouse genome and 86% to exons (). Our reads mapped to 21,390 annotated genes (out of 47,069 genes in the primary assembly), representing 45% of all mouse genes and reflecting the intense transcriptional activity of the retina. To confirm the quality of our RNA-seq data, we performed real time PCR for 42 candidate genes selected based on their role in angiogenesis or large effect-sizes. The RT-PCR and RNA-seq data were in agreement in all samples and stages of retinal development or OIR ( and ), with a strong linear relation between the abundance values obtained with each of the two technologies (r2 = 0.93; pR = 2.31x10-104; ).

Transcriptome landscape of the developing retina

We observed that physiological and pathological retinas share over 3,800 differentially expressed genes (1.5-fold, log2), which were associated with multiple cellular functions and diseases, such as cellular movement, morphology and survival (apoptosis), cancer, cardiovascular development and organismal injury or abnormalities (). This is expected, and most of these pathways represent cellular processes ongoing in normal and pathological retinal development. It is also evidence of how challenging it is to analyze transcriptome data of complex tissues. In particular, the retina is a highly transcriptionally active tissue comprised of as many as 60 different types of neural cells, plus additional parenchymal, immune and vascular cells [23]. Nevertheless, it is reassuring to see that the top canonical pathway we identified (hepatic fibrosis/hepatic stellar cell activation) () is driven by molecular factors that are known to play an important role in angiogenesis: VEGF, fibroblast growth factor (FGF), platelet derived growth factor (PDGF-BB), transforming growth factor (TGF-α and–β), insulin growth factor (IGF-1), and endothelin signaling pathways (). The next ranked canonical pathway, axon guidance signaling, is also noteworthy. The developing retina, an extension of the central nervous system, is actively remodeling neurons and glial cells. However, VEGF is also a neurogenic factor and many of the molecules and cellular pathways associated with neuronal axon migration are also involved in angiogenesis [24,25]. In summary, the RNA-seq data represent an intricate mix of transcriptomes from all types of retinal cells, resulting in a pool of expressed genes from diverse pathways.

The retina transcriptome landscape.

(a) Diseases, cellular functions and (b) canonical pathways associated with in OIR retinas, based on >1,800 differentially expressed genes (fold-change ≥ 1.5). (c) Most of the canonical pathways found in hepatic fibrosis/hepatic stellate cell activation are associated with angiogenesis.

Genes associated with pathological angiogenesis

Given the complexity of our transcriptome, we chose a special approach to identify genes associated with pathological angiogenesis. First, we performed a principal component analysis (PCA) using gene expression (500 most variable genes), which revealed the expected segregation pattern for our model, with principal component (PC) 1 reflecting the developmental stage (in days) and PC2 representing the condition (OIR or normal retinal development) (). Next, to limit the number of genes, we decided to focus on differentially expressed genes between controls and the pathological condition (OIR) that had significant (p<0.05) and substantial change in expression by adopting the threshold of 2-fold (log2) change in at least one comparison. These genes were then subjected to the null hypothesis test and only those with FDR (false discovery rate) < 0.05 were selected. The result was a list of 153 genes, most of them up-regulated in pathological angiogenesis (R15 and R17) (; ). We next asked whether these genes would be representative of angiogenesis in other models and diseases.

Angiogenic associated genes.

(a) Principal component analysis using 500 genes with the highest variance in expression. PC1 is stage dependent, and PC2 is condition dependent (OIR or normal retina development); these components explain 61% and 18% of variance, respectively; filtered by genes with > 500 reads on all samples. (b) Heatmap and dendrogram illustrating the expression profile of the 153 differentially expressed genes across all retina samples (fold-change ≥ 2).

Angiogenic and hypoxic gene signatures

To address this question, we hypothesized that our set of differentially expressed genes might comprise a signature of pathological angiogenesis. Several research groups, including ours, have long attempted to identify gene signatures with prognostic power, with mixed results [26,27]. First, we determined how the 153 genes that we identified in our transcriptome would compare with genes in these previously described signatures. We performed a comprehensive evaluation of existing angiogenic signatures and found nine angiogenesis signatures in the literature () [28-36] (). Considering the underlying relationship between angiogenesis and hypoxia, we extended our analysis to include eight hypoxia signatures [27,37-43] (). Across the 1,457 genes comprising these 17 signatures, very few genes were present in multiple signatures () and while some individual hypoxia and angiogenesis signatures share a small number of genes [27,29,30,34-36,38,40-42], others do not [28,33] (). Overall, the signatures we investigated showed little correlation with each other, indicating a surprising disagreement ().

Meta-analysis of current available angiogenesis and hypoxia signatures.

(a) List of genes present in at least two angiogenesis signatures sorted by their frequency and presence in hypoxia signatures. (b) Pearson correlation between each pair of signatures. Diana clustering highlight the most related signatures. (c) Venn-diagram showing the overlap between hypoxia, angiogenesis signatures and the 149 differentially expressed genes with human homologs identified in the retina transcriptome (fold-change ≥ 2). AEC = Aggressive Endometrial Cancer; AOC = Advanced Ovarian Carcinoma; BC = Breast Cancer; CC = Colorectal Cancer; CCRCC = Clear Cell Renal Cell Carcinoma; GNT = Gastrointestinal Neuroendocrine Tumours; HNSCC = Head and Neck Squamous Cell Carcinoma; NSCLC = Non-small-cell lung cancer; SOV = Serous Ovarian Cancer. Amongst the 153 genes identified as differentially expressed in pathological angiogenesis, 143 have a homologous gene in humans, based on homology information retrieved from BioMart, resulting in 149 human homolog genes (3 mouse genes had multiple human homologs). Of those, 20 were present in hypoxia signatures and 36 in angiogenesis signatures (). Thus, the great majority (111 genes, 82%) of the 149 differentially expressed genes with human homologs that we identified in the retina transcriptome are not present in any of the previously described gene signatures.

Identification of a new angiogenic gene signature with prognostic power

To assess whether we had identified a bona fide signature of pathological angiogenesis, we used a machine-learning approach and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset [44] to evaluate the disease relevance of our angiogenic signature in these patients (). METABRIC is a large collection with almost 2,000 tumor samples, including mRNA abundance profile and long-term clinical outcomes. All patients received similar chemotherapy regimen and none were treated with anti-HER2 drugs, which makes this dataset an excellent resource to use in the development of an angiogenesis gene signature (). The METABRIC dataset was initially assembled in two stages: an initial set of tumors used as the discovery group (N = 996) and a further set of tumors (N = 992) that was added later to the database, and used as validation cohort [44]. There is no patient overlap between the two groups.

Diagram outlining the input and workflow of the machine-learning algorithm.

Two angiogenesis-dependent diseases were selected for this study: breast cancer (pink boxes) and OIR (green boxes). For the OIR model, data was generated in the study; by comparing physiological and pathological retinas (OIR), 153 genes differentially expressed were identified. These genes have 149 human homologs that were then compared with publicly available data (METABRIC) for breast cancer patients. For that, mRNA abundance data were pre-processed and quantile-normalized for comparison with the RNA-seq data. The METABRIC dataset was divided into two cohorts (learning and validation). The machine-learning algorithm (blue boxes) uses the learning cohort to assemble the risk prediction models, which are then tested on the validation cohort. The angiogenesis gene signature is the model with the minimum error. Each of the angiogenesis signature genes also present in the METABRIC dataset was tested for its association with overall survival using Cox proportional hazards modeling. Feature-selection focused on genes with a significant association (p<0.05). This resulted in 56 features (54 genes plus patient age and tumor stage), which were used to build a survival random forest model for patient classification (). The algorithm starts by analyzing individual genes and their associations with the patient’s prognostic data using the discovery cohort (N = 996 patients). For this, features are ranked according to their p-values in the METABRIC dataset and given a discretionary prediction value. The machine learning algorithm then incrementally feeds new attributes to generate all models that could best classify and group patients according to survival time, which are then tested against the validation cohort (N = 992 patients). The final gene signature is the model with the minimum error. Our final model contains 15 features: 11 differentially expressed genes (VEGFA and PIEZO2 are used twice, as continuous and binary values) plus age and stage (). This model was next used to classify patients in the test cohort into groups according to their predicted time of survival: low-risk (more than 15 years), intermediate-risk (between 15 and 7.5 years) and high-risk (less than 7.5 years). A surprising predictive value was observed (p = 2.56 x 10−21; log-rank test) ().

Machine learning and an angiogenesis signature with prognostic power.

(a) Features used in gene signature. Patient’s age, stage of cancer and 13 (11 genes) expression-based features were used with Random Forest to build a signature for prediction of breast cancer relapse with D representing discretized (binary) expression values. (b) RandomForestSRC algorithm was trained on a dataset of 56 features (54 angiogenesis-induced genes plus age and tumor stage) using the METABRIC training cohort. The resulting classifier was applied on the validation cohort and was able to separate patients into three groups with significantly different outcomes. Low-risk (more than 15 years), intermediate-risk (between 15 and 7.5 years) and high-risk (less than 7.5 years).

Model fitness and contribution of OIR genes

To assess the performance of our model, we did two additional analyses. First, we evaluated the variation of the calculated error through the decision process to build our model (number of trees run by the algorithm). We observed that after 500 trees the algorithm had already reached a plateau with error rate around 0.315, which corresponds to a performance of 68.5% (). We also evaluated the performance of our method relative to the null distribution of prognostic signatures [26,45]. A series of 290,000 random sets of n features (where n = 2, 3, 4, …, 29, 30; with 10,000 random sets per value of n) were generated and individually used to build a random forest classifier for each patient outcome using the exact same approach used in our model. The accuracy and p-value of our model were compared with this empirical estimate of the null distribution. We observed that while 156,282 random signatures were prognostic at p < 0.05 (53.89% of total), our model was superior to 99.92% of all signatures tested (). Together, these data provide strong evidence that our angiogenesis signature is already near its optimum prognostic power.

Machine learning performance and OIR gene contribution.

(a) Performance (error rate) per number of tree generated by the random forest algorithm. (b) Resulting classifier was evaluated against a null distribution of classifiers. The y axis of top plot indicates the accuracy while the bottom plot highlights the -log10(P). Larger values indicate a lower P-value and hence a more statistically significant separation of patient groups in the validation dataset. Dashed lines indicate performance of chosen model; Gray object highlight the number of features used on the chosen model; the same strategy of stratifying samples on low, intermediate and high-risk group was applied and evaluated using the log-rank test. (c and d) Contribution of each feature (age, stage or IOR genes) for the prognostic power of the algorithm. Because we used age and tumor stage, which are not strictly angiogenic features, we next analyzed the contribution of each individual feature to the final model. We observed that while age and tumor stage contribute to the model, as expected, the addition of the OIR genes increased substantially the prognostic power of our signature. Together, the OIR genes contributed to more than half (close to 60%) of the prognostic power in the final model (). Finally, to confirm that the 11 genes featured in the final model indeed participate in pathological angiogenesis, we performed real time PCR for the remaining 9 OIR genes selected by the random forest model (VEGF and SERPINA3 had already been validated, ). Again, the RT-PCR and RNA-seq data were in agreement and confirmed that all genes in the signature were up-regulated in pathological angiogenesis () with a strong linear correlation between the abundance values obtained with each of the two technologies (r2 = 0.94; pR = 4.47x10-13; ).

Prognostic value according to tumor subtype

Breast cancer is a heterogeneous disease, with at least four molecular subtypes [46]. Given that patients from each subtype respond differently to therapy, we asked how well our angiogenesis signature would work in each molecular subtype. Patients with breast cancer molecular subtypes luminal A (p = 9.15 x 10−15; log-rank test), luminal B (p = 8.47 x 10−5; log-rank test) and basal (p = 1.51 x 10−4; log-rank test) were among those that could be best classified according to our gene signature. On the other hand, our model could not classify breast cancer patients in the HER2-positive (p = 0.11; log-rank test) () or normal-like (p = 0.58; log-rank test) tumor subtypes (). It is noteworthy that none of the previously published angiogenesis gene signatures could predict patient survival with levels of accuracy as high as ours. We evaluated all nine earlier angiogenesis signatures and none were a robust biomarker in the METABRIC dataset (; ).

Breast cancer sub-types and comparison with other gene signatures.

(a) Overall survival for each of the PAM50 subtypes was observed. Shown the Kaplan-Meier survival curves for Basal, Luminal-A, Luminal-B and Her2 subtypes. p-values were calculated using log-rank test. Six samples did not have a PAM50 class assigned and were excluded. Low-risk (more than 15 years), intermediate-risk (between 15 and 7.5 years) and high-risk (less than 7.5 years) (b) All nine previously published angiogenesis signature were assessed on METABRIC cohort. The heatmap shows the score between -1 and 1 for each sample (heatmap on center). Top plot depicts median score obtained from all nine signatures, gray lines highlight standard deviation. Bottom depicts consensus call of all nine signatures combined. On right, forest plot shows hazard ratio, lines represent 95% confidence interval. (c) Survival curves for 6 representative angiogenesis signatures. Groups were dichotomized based on signature score: high-risk (score greater than the percentile of events) and a low-risk (score less than or equal to the percentile of events).

Discussion

Angiogenesis is a hallmark of cancer and a necessary process to provide the growing tumors with nutrients and oxygen [1,9]. But pathological angiogenesis is not limited to neoplastic disorders and also participates in well-known angiogenesis-dependent diseases, which include retinopathy [2]. Thus, our angiogenesis gene signature is a welcome finding, and it is not entirely surprising that a mouse model for retinopathy could provide such a strong prognostic gene signature for a human disease, breast cancer. The concept that angiogenesis is an “organizing principle” has been proposed more than 10 year ago by Judah Folkman [2]. Indeed, all vertebrate metazoans use angiogenesis through conserved mechanisms (hypoxia, HIF, VEGF and other shared genes). Based on these premises, drugs that have been initially developed for treating oncologic patients have now been successfully translated for the treatment of a novel and completely unrelated set of diseases (ocular diseases). Therefore, it is not unexpected to identify a gene signature shared by these “seemingly” unrelated diseases, as presented in this study. In fact, the identification of an effective angiogenesis signature has so far eluded researchers, perhaps because of the challenges in identifying angiogenesis-associated genes. We reasoned that by performing RNA-seq in a well-suited animal model such as the OIR, we would identify a significant number of genes relevant to human angiogenesis-dependent diseases. Furthermore, by using a reliable isogenic animal model, we also avoided the genetic variability of human samples and other confounding factors present in most angiogenesis models (scaffolds, matrigel, viral vector or growth factor administration). In the OIR model, neovascularization is elegantly induced in mouse retinas by varying oxygen levels administered to the animals, closely recapitulating what happens with premature babies, therefore, already providing a model of a human disease (retinopathy of prematurity)[20,47]. Our results thus agree with angiogenesis being an organizing principle preserved in evolution and in health and disease, as proposed by the late Dr. Judah Folkman [2]. The final model of our gene signature included 13 gene-features (11 unique genes), most of them directly implicated in angiogenesis. UHRF1, also known as ICBP90, is an epigenetic regulator, a tumor marker for breast cancer [48], which also modulates epigenetically VEGF gene expression [49]. VEGF, ANGPT2 and APLN are well known players in angiogenesis [5,6]. PIEZO2 is a mechanosensitive ion-channel important in tumor angiogenesis and vascular permeability [50]. SERPINE1 has also been implicated in angiogenesis and pharmacological blockage of serpine1 protein (PAI-1) inhibits tumor angiogenesis [51]. Finally, LAD1 gene methylation is a potential biomarker for angiogenesis therapy in renal cancer [52]. Taken together, these data indicate that 9 out of 13 gene-features (70%) in our final model are directly implicated in tumor angiogenesis activity. However, none of the previously described angiogenesis and hypoxia signatures used a similar combination or even a subset of these 11 genes. Only two previously described signatures combined at most two genes present in our model: VEGF and SERPINE1 [41] or SERPINE1 and SERPINA3 [29]. Thus, our angiogenesis gene signature is unique. Of course, not all angiogenesis is the same. So, an interesting question that future studies may address is whether this same set of 11 genes or other combinations of genes derived from the OIR transcriptome will be prognostic for other types of tumors or angiogenesis dependent diseases. It is significant that our gene signature showed such strong prognostic value in breast cancer considering the confounding benefits of anti-angiogenesis therapy for these patients [8,53,54]. In agreement with clinical data and the fact that the OIR model relies on hypoxia and VEGF to induce angiogenesis, our gene signature performs well for the subtypes of breast cancer tumors that respond to anti-VEGF therapy (basal, luminal A and B), while it has no prognostic value for patients with the HER2+ subtype for which anti-VEGF therapy has no effect [55]. HER2 is an orphan receptor with a very active intracellular domain, whose activation leads to the increased production of VEGF-A and HIF1A, regardless of hypoxia [56,57]. These different mechanisms of activation of angiogenic pathways may explain why our gene signature, which is derived by a model driven by hypoxia, did not perform well in a HER2-driven angiogenesis context [57]. Nevertheless, our results emphasize the contribution of hypoxia and VEGF-driven angiogenesis in specific subtypes of breast cancer. It may also provide a new blueprint for the development of gene signatures and discovery of biomarkers for human disease.

Materials and methods

Ethics statement

Animal study ethical approval was granted by the Animal Study Ethics Committee from the Institute of Chemistry (approval #10/2010).

Mouse model of ROP (OIR)

Mice (C57BL/6, Takoniks) were housed at the Animal Research Facility of the Chemistry Department, University of Sao Paulo. Newborn mice with their nursing mothers were kept on cages with food and water ad libitum. For the OIR model, the animals were exposed to 75% oxygen between P7 and 12, and then returned to room air (~20.8% O2) until sample collection at the specified day and time [18].

RNA extraction, sequencing and analysis

Retinas from mice at four developmental stages (P12, P15 and P17; and at P12.5 – 12h after leaving the 75% oxygen chamber) were dissected in RNAlater (Qiagen) and total RNA was extracted using Qiagen RNeasy Mini Kit (Qiagen). Two OIR experiments were performed and then RNA from 4 animals (retinas N = 8) from each individual experiment were pooled to form the biological duplicates. Samples were treated with DNAse (TURBO DNA-free, Thermo Fisher Scientific) following manufacturer’s protocol and the absence of genomic DNA contamination was confirmed by RT-PCR and was found to be <3 pg/ml. Library preparation and sequencing was performed at the Next Generation Sequencing Core from Scripps Research Institute (San Diego, CA). Strand-specific libraries were generated enriching for polyadenylated transcripts using the NEB Next Ultra Directional RNA Library Prep Kit for Illumina (New England BioLabs), following manufacturer’s protocol. Purified libraries were quantitated using the Qubit quantitation platform (Invitrogen) and sized using RNA Nano Chip (Agilent). Single-read sequencing was performed on the Illumina HiSeq 2000 platform and demultiplexed based on index sequences. RNA-seq reads were mapped with STAR (version 2.5.1b) using 2-pass mapping with annotation procedure. Mus musculus GRCm38.p4 was used as reference genome and GRCm38.83 as reference annotation. Alignments were summarized by gene on counting tables using HTSeq-count (version 0.6.1p1); alignments with quality lower than 10 were discarded.

Gene expression analysis

Differential gene expression analysis was carried out with DESeq2 (version 1.14.1) and control samples were used as baselines. Genes with fold-change greater than 2 (log2) and adjusted p-values < 0.05 were considered as differentially expressed and used on further analysis. Benjamini–Hochberg procedure was used for multiple testing correction. For active genes evaluation, all genes with 10 or more reads on two or more samples were deemed as expressed. Additional gene annotation and homology information was retrieved using biomarRt package (version 2.30.0).

Real-time PCR validation assays

A custom TaqMan low-density array (TLDA) card (Thermo-Fisher Scientific) containing 384-well microfluidic cards with eight ports, each made up of 48 connected wells (42 targets + 6 endogenous controls), was designed. Experiments were carried out using the Applied Biosystems 7900HT real-time PCR platform. Samples were individually tested (no pooling). Technical duplicated for each sample was used and failed experiments were omitted. GeNorm was used to test the most stable normalizers; Tbp (Mm00446971_m1) and Sdha (Mm01352363_m1) were selected and used on following analysis. Relative quantification analysis was performed on Thermo Fisher Cloud system. P12 samples were used as baselines for relative quantification. Experiments with CT > 35 were discarded due to high variance observed. The remaining genes featured in the final model for the angiogenesis gene signature (Angpt2, Apln, Ch25h, Lad1, Mki67, Piezo2, Serpine1, S100a10 and Uhrf1) were validated separately. In brief, 500 ng of total RNA from mouse retinas (OIR or normal retinas) were converted to cDNA using supercript III (Invitrogen Thermo Fisher Scientific) using random hexaprimers. Quantitative PCR was then performed using the SYBR green PCR master mix (Applied Biosystems Thermo Fisher Scientific) and the specific primers () according to the manufacture recommendations in a QuantStudio 3 Real-Time PCR System (at the Center for Advanced Technologies in Genomic [CATG], Chemistry Institute, University of São Paulo, Brazil).

Pathway analyses

Enriched pathways, functions and disease annotations were computed using the IPA platform (Qiagen).

METABRIC data pre-processing

Raw METABRIC breast cancer dataset files [44] were downloaded from the European genome-phenome archive (Study ID: EGAS00000000083). Data was pre-processed, summarized and quantile-normalized from raw abundance files generated via Illumina BeadStudio (R packages: beadarray (v2.4.2) and illuminaHuman (v3.db_1.12.2). Data files from one subject were not available and accordingly excluded, resulting in a total of 1988 subjects. This dataset was divided into a training cohort (n = 996) and a validation cohort (n = 992) for subsequent analysis. Survival data was truncated at 15 years follow-up time.

Evaluation of existing signatures on METABRIC dataset

Training and validation set from METABRIC were combined and each signature was used to score each sample. Features were weighted respecting the expression information provided: +1 for up-regulated genes and -1 for down-regulated ones. If a signature had no expression information, all weights were set to +1. Scores for each sample were normalized between -1 and +1 and samples dichotomized according the percentile of events in the dataset. This sample classification was used to fit a Cox proportional hazard model where hazard ratios were calculated [58]. Median and standard deviation of scores between angiogenesis signatures for each sample were calculated. Majority vote between all angiogenesis signatures was used to call the consensus signature.

Human homolog genes

Out of the 153 DE mouse genes, 10 genes did not have human homologs (Fgf2os, Muc2, Gm15983, Gm12802, Gm43620, H2-Q2, H2-Q6, Gm35040, H2-T23, H2-K1). For the remaining 143 genes, there were cases in which a single mouse gene mapped to multiple human genes (H2-Ab1, H2-Q7, Ifitm3), resulting in 149 human genes. This is the number that was used for comparison with previously published gene angiogenesis and hypoxia signatures (). For building the gene signature, 3 genes within the 149 human homologs were not arrayed on the platform used in the METABRIC study (ADGRL4, HLA-DQB2, ITGA1), thus yielding a total of 146 human genes used in the algorithm. For each of the 146 differentially expressed genes in angiogenesis, continuous and discrete expression information was used [58].

Survival analysis and signature creation

For discretization, expression values were median dichotomized across all samples in the dataset. We performed univariate analysis (Cox model) for each gene and only genes with a p-value < 0.05 were considered on further analysis. Age of diagnosis and tumor stage were supplied for each sample whenever available. We have used randomForestSRC package (v2.2.0) (Ref. [26,53]) to build the model. In summary, features were ranked by p-value and models were built by sequentially adding features (i.e. forward selection). The model with the minimum error was chosen as the final model. Model was applied on the validation samples (not used during training). Patient’s predicted probability of survival was used to assign a class using the following sequential criteria: low-risk if P(survival at 15 years) > 0.5; intermediate-risk if P(survival at 7.5 years) > 0.5; high-risk otherwise. 7.5 years represents the half of the dataset follow-up time after truncation.

Permutation analysis

To evaluate model’s performance against a null distribution, we have used the same strategy as described previously [26,58]. In summary, we restricted the analysis to the same 302 features used for the signature’s creation (149 genes with continuous and discrete information, plus age and stage). We then generated 10 million permutations of 14 features (same number of predictors used in our signature) using METABRIC training for the model’s training. We calculated statistical significance for each model using a log-rank test on METABRIC validation dataset.

Data visualization

Data visualization was performed with the lattice (v0.20–33), latticeExtra (v0.6–28) and BPG (v 5.6.8) packages [59].

Comparison of gene expression quantified by RNA-seq and RT-PCR.

(a). Expression profile of 42 selected genes quantified by RNA-Seq and RT-PCR methods. Bars represent standard error of the mean from independent biological samples (N = 8). Fold-changes were calculated relative to P12 samples. (b) Correlation between expression values calculated by RNA-seq and RT-PCR. (PDF) Click here for additional data file.

Expression of the remaining OIR genes featured in the final model by qRT-PCR and comparison with the RNA-seq data.

(a) Expression profile of 9 selected genes quantified by RNA-Seq and qRT-PCR methods. Bars represent standard error of the mean from independent biological samples (N = 4). Fold-changes were calculated relative to P12 samples. (b) Correlation between expression values calculated by RNA-seq and RT-PCR. (c) Table listing the sequences of the oligonucleotides utilized for the qRT-PCR reactions. (The results for the remaining genes featured in the final model, Vegf and Serpina3n, are shown in Fig 1C in the main text of the manuscript.).
Fig 1

The OIR model and the transcriptome of the angiogenic retina.

(a) Cartoon representation of the OIR model. Neonatal mice with their nursing mothers are exposed to 75% oxygen from day P7 to P12, which reduces VEGF production in the retina and induces vaso-regression and central area vaso-obliteration (VO). Upon return to room air, the now hypoxic retina increases VEGF levels and a pathological angiogenesis states ensues. (b) Confirmation of the pathological angiogenesis condition in OIR-mice used for the RNA-seq study, with central area VO, vascular tufts (*) and the presence of intravitreal vessels (arrows). (c) Expression profile of selected genes quantified by RNA-Seq and RT-PCR methods. Bars represent standard error of the mean from independent biological samples (N = 8). Fold-changes were calculated relative to P12 samples.

(PDF) Click here for additional data file.

Prognostic power of angiogenesis signature on Normal-like breast cancer tumor type.

Our model could not distinguish breast cancer patients with normal-like (P = 0.58; log-rank test) tumor subtypes. It is important to note that this type of tumor had the smallest sample of all breast cancer tumor types in the METABRIC dataset. (PDF) Click here for additional data file.

Prognostic power of existing angiogenesis signature.

All nine previously published angiogenesis signature were assessed on METABRIC cohort. Survival curves of groups (low and high risk) determined by all 9 angiogenesis gene signatures. (PDF) Click here for additional data file.

RNA-seq sequencing summary.

(PDF) Click here for additional data file.

List of 153 genes differentially regulated in pathological angiogenesis.

(PDF) Click here for additional data file.

Gene used in previously described gene signatures for angiogenesis and hypoxia.

(PDF) Click here for additional data file.

Genes shared by all gene signatures.

(PDF) Click here for additional data file.

Remaining features after p-value (P<0.05) filtering (METABRIC).

(PDF) Click here for additional data file.

Rdata object containing the final model for the angiogenesis gene signature.

(ZIP) Click here for additional data file.
Table 1

Angiogenesis gene signatures.

ContributorsReferenceCancer TypePlatformGenes
Hu et al.36NSCLCMicroarray62
Mendiola et al.35AOCqRT-PCR34
Bentink et al.34SOVIllumina DASL, BeadArray100
Masiero et al.33HNSCC, BC, CCRCCMeta signature43
Khong et al.32CCRT Profiler, PCR array9
Pinato et al.31GNTTissue microarray2
Sanmartin et al.30NSCLCqRT-PCR3
Langlois et al.29Glioma, CCMicroarray110
Stefansson et al.28AECMicroarray32

AEC = Aggressive Endometrial Cancer; AOC = Advanced Ovarian Carcinoma; BC = Breast Cancer; CC = Colorectal Cancer; CCRCC = Clear Cell Renal Cell Carcinoma; GNT = Gastrointestinal Neuroendocrine Tumours; HNSCC = Head and Neck Squamous Cell Carcinoma; NSCLC = Non-small-cell lung cancer; SOV = Serous Ovarian Cancer.

Table 2

Clinical characteristis of METABRIC cohort on Training and Validation sets.

 TrainingValidation
Number of patients996992
Age at diagnosis (years)*61.30 (51.13, 70.31)62.66 (51.91, 70.91)
Overall survival (years)*7.06 (3.94, 11.96)7.32 (4.35, 11.90)
Tumour size*23 (17, 30)23 (17, 30)
Lymph nodes positive  
0514528
1174163
29576
> = 3213219
NA06
Stage  
0112
1315187
2533299
37146
491
NA67447
ER status  
Pos798719
Neg198273
PAM50 subtype  
Basal118212
HER286152
Luminal A466255
Luminal B268223
Normal58144
NA06
  59 in total

Review 1.  The mouse retina as an angiogenesis model.

Authors:  Andreas Stahl; Kip M Connor; Przemyslaw Sapieha; Jing Chen; Roberta J Dennison; Nathan M Krah; Molly R Seaward; Keirnan L Willett; Christopher M Aderman; Karen I Guerin; Jing Hua; Chatarina Löfqvist; Ann Hellström; Lois E H Smith
Journal:  Invest Ophthalmol Vis Sci       Date:  2010-06       Impact factor: 4.799

2.  The tumor microenvironment and Immunoscore are critical determinants of dissemination to distant metastasis.

Authors:  Bernhard Mlecnik; Gabriela Bindea; Amos Kirilovsky; Helen K Angell; Anna C Obenauf; Marie Tosolini; Sarah E Church; Pauline Maby; Angela Vasaturo; Mihaela Angelova; Tessa Fredriksen; Stéphanie Mauger; Maximilian Waldner; Anne Berger; Michael R Speicher; Franck Pagès; Viia Valge-Archer; Jérôme Galon
Journal:  Sci Transl Med       Date:  2016-02-24       Impact factor: 17.956

3.  The peptidomimetic Vasotide targets two retinal VEGF receptors and reduces pathological angiogenesis in murine and nonhuman primate models of retinal disease.

Authors:  Richard L Sidman; Jianxue Li; Matthew Lawrence; Wenzheng Hu; Gary F Musso; Ricardo J Giordano; Marina Cardó-Vila; Renata Pasqualini; Wadih Arap
Journal:  Sci Transl Med       Date:  2015-10-14       Impact factor: 17.956

4.  Concordant regulation of gene expression by hypoxia and 2-oxoglutarate-dependent dioxygenase inhibition: the role of HIF-1alpha, HIF-2alpha, and other pathways.

Authors:  Gareth P Elvidge; Louisa Glenny; Rebecca J Appelhoff; Peter J Ratcliffe; Jiannis Ragoussis; Jonathan M Gleadle
Journal:  J Biol Chem       Date:  2006-03-24       Impact factor: 5.157

Review 5.  The neuronal organization of the retina.

Authors:  Richard H Masland
Journal:  Neuron       Date:  2012-10-17       Impact factor: 17.173

Review 6.  Hallmarks of cancer: the next generation.

Authors:  Douglas Hanahan; Robert A Weinberg
Journal:  Cell       Date:  2011-03-04       Impact factor: 41.582

7.  Increased angiogenesis is associated with a 32-gene expression signature and 6p21 amplification in aggressive endometrial cancer.

Authors:  Ingunn M Stefansson; Maria Raeder; Elisabeth Wik; Monica Mannelqvist; Kanthida Kusonmano; Gøril Knutsvik; Ingfrid Haldorsen; Jone Trovik; Anne M Øyan; Karl-H Kalland; Anne Cathrine Staff; Helga B Salvesen; Lars A Akslen
Journal:  Oncotarget       Date:  2015-04-30

8.  Piezo2 protein: A novel regulator of tumor angiogenesis and hyperpermeability.

Authors:  Hong Yang; Chang Liu; Rong-Mei Zhou; Jin Yao; Xiu-Miao Li; Yi Shen; Hong Cheng; Jun Yuan; Biao Yan; Qin Jiang
Journal:  Oncotarget       Date:  2016-07-12

9.  Large meta-analysis of multiple cancers reveals a common, compact and highly prognostic hypoxia metagene.

Authors:  F M Buffa; A L Harris; C M West; C J Miller
Journal:  Br J Cancer       Date:  2010-01-19       Impact factor: 7.640

10.  Identification of the angiogenic gene signature induced by EGF and hypoxia in colorectal cancer.

Authors:  Tak L Khong; Ngayu Thairu; Helene Larsen; Peter M Dawson; Serafim Kiriakidis; Ewa M Paleolog
Journal:  BMC Cancer       Date:  2013-11-02       Impact factor: 4.430

View more
  5 in total

1.  Molecular Characterization of Limited Ulcerative Colitis Reveals Novel Biology and Predictors of Disease Extension.

Authors:  Carmen Argmann; Minami Tokuyama; Ryan C Ungaro; Ruiqi Huang; Ruixue Hou; Sakteesh Gurunathan; Roman Kosoy; Antonio Di'Narzo; Wenhui Wang; Bojan Losic; Haritz Irizar; Lauren Peters; Aleksandar Stojmirovic; Gabrielle Wei; Phillip H Comella; Mark Curran; Carrie Brodmerkel; Joshua R Friedman; Ke Hao; Eric E Schadt; Jun Zhu; Judy Cho; Noam Harpaz; Marla C Dubinsky; Bruce E Sands; Andrew Kasarskis; Saurabh Mehandru; Jean-Frederic Colombel; Mayte Suárez-Fariñas
Journal:  Gastroenterology       Date:  2021-09-02       Impact factor: 22.682

2.  DMDRMR promotes angiogenesis via antagonizing DAB2IP in clear cell renal cell carcinoma.

Authors:  Yumeng Zhu; Xiaojun Liu; Yang Wang; Yongbo Pan; Xiaoqi Han; Bo Peng; Xu Zhang; Shaoxi Niu; He Wang; Qinong Ye; Yinmin Gu; Shan Gao
Journal:  Cell Death Dis       Date:  2022-05-13       Impact factor: 9.685

Review 3.  Angiogenesis goes computational - The future way forward to discover new angiogenic targets?

Authors:  Abhishek Subramanian; Pooya Zakeri; Mira Mousa; Halima Alnaqbi; Fatima Yousif Alshamsi; Leo Bettoni; Ernesto Damiani; Habiba Alsafar; Yvan Saeys; Peter Carmeliet
Journal:  Comput Struct Biotechnol J       Date:  2022-09-13       Impact factor: 6.155

4.  Gene expression profile of the murine ischemic retina and its response to Aflibercept (VEGF-Trap).

Authors:  Jesús Eduardo Rojo Arias; József Jászai
Journal:  Sci Rep       Date:  2021-07-28       Impact factor: 4.379

5.  PCDHB17P/miR-145-3p/MELK/NF-κB Feedback Loop Promotes Metastasis and Angiogenesis of Breast Cancer.

Authors:  Li Zhu; Yan-Jun Zhang; Bin Wang; Li Yang; Yi-Qiong Zheng; Lin-De Sun; Lin Tian; Tao Chen; Jian-Dong Wang
Journal:  Front Oncol       Date:  2021-07-19       Impact factor: 6.244

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.