| Literature DB >> 34943776 |
Oriol Iborra-Egea1, Carolina Gálvez-Montón1,2, Cristina Prat-Vidal1,2, Santiago Roura1,2,3, Carolina Soler-Botija1,2, Elena Revuelta-López1,2, Gemma Ferrer-Curriu1, Cristina Segú-Vergés4, Araceli Mellado-Bergillos1, Pol Gomez-Puchades1,2, Paloma Gastelurrutia1,2,5, Antoni Bayes-Genis1,2,6,7.
Abstract
Specific proteins and processes have been identified in post-myocardial infarction (MI) pathological remodeling, but a comprehensive understanding of the complete molecular evolution is lacking. We generated microarray data from swine heart biopsies at baseline and 6, 30, and 45 days after infarction to feed machine-learning algorithms. We cross-validated the results using available clinical and experimental information. MI progression was accompanied by the regulation of adipogenesis, fatty acid metabolism, and epithelial-mesenchymal transition. The infarct core region was enriched in processes related to muscle contraction and membrane depolarization. Angiogenesis was among the first morphogenic responses detected as being sustained over time, but other processes suggesting post-ischemic recapitulation of embryogenic processes were also observed. Finally, protein-triggering analysis established the key genes mediating each process at each time point, as well as the complete adverse remodeling response. We modeled the behaviors of these genes, generating a description of the integrative mechanism of action for MI progression. This mechanistic analysis overlapped at different time points; the common pathways between the source proteins and cardiac remodeling involved IGF1R, RAF1, KPCA, JUN, and PTN11 as modulators. Thus, our data delineate a structured and comprehensive picture of the molecular remodeling process, identify new potential biomarkers or therapeutic targets, and establish therapeutic windows during disease progression.Entities:
Keywords: deep learning; gene regulation; myocardial infarction; transcriptomics
Mesh:
Substances:
Year: 2021 PMID: 34943776 PMCID: PMC8699769 DOI: 10.3390/cells10123268
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Figure 1Schematic representation of the implemented systems biology workflow. (1) First, we characterized myocardial infarction (MI) at the molecular level via manual curation of the literature and using a compendium of massive public databases describing the molecular interactions of interest. (2) In parallel, we used experimental transcriptomic data to define the molecular behavior of MI at each time point. (3) We used the information to generate a map of proteins regulated by the obtained experimental data. (4) We fed the mathematical models to identify patterns in the data, (5) identify key regulatory proteins, and (6) predict new mechanisms of action. (7) The final result included a scientific justification of every prediction. ANN: artificial neural network.
Number of proteins mechanistically related to cardiac remodeling.
| Protein Localization | C6 | C30 | C45 | R6 | R30 | R45 |
|---|---|---|---|---|---|---|
| # of proteins mechanistically | 3302 | 3267 | 2930 | 83 | 77 | 13 |
C: infarct core area; R: remote area; 6: 6 days after infarction; 30: 30 days after infarction; 45: 45 days after infarction.
Figure 2Principal component analysis. The visual output showing perfect separation of the six groups and the two different regions (upper left plot).
Figure 3Functional validation of the integration of gene expression data over 6 weeks after myocardial infarction. Using Metascape algorithms, we identified the most affected biological processes according to their GO terminology (A). These processes were then clearly clustered using a network analysis based on their biological significance (B) and p-values (C).
Figure 4Enrichment and overlap analyses and heatmap clustering. (A) Results of the enrichment and overlap analyses. Upper panels: infarct core region. Lower panels: remote myocardium region. The enrichment score (ES) reflects the degree to which a gene set is over-represented at the top or bottom of a ranked list of genes, with a positive or negative ES indicating gene set enrichment at the top or bottom of the list, respectively. ES is calculated by walking down the ranked list of genes, increasing a running-sum statistic when a gene is in the gene set and decreasing it when it is absent. The magnitude of the increment depends on the correlation of the gene with the phenotype. The ES is the maximum deviation from zero encountered in walking the list. (B,C) Heatmap clustering. (B) The clustered genes in the leading-edge subsets in the upper panels in (A) (core region). (C) The clustered genes in the leading-edge subsets in the lower panels in (a) (remote region). Expression values are represented by colors, with red, pink, light blue, and dark blue indicating high, moderate, low, and lowest expression, respectively.
Figure 5Protein–protein interaction network analysis. (A) Venn diagram depicting the common enriched pathways between the three different time points. (B) Network analysis using STRING software, including the 105 protein candidates identified by artificial intelligence analysis techniques. Left: the cloud of interactions at 0.9 evidence. Right: the same interactions clustered by k-means for vector quantization at K = 6. Number of nodes: 105; number of edges: 128; average node degree: 2.44; average local clustering coefficient: 0.469; expected number of edges: 62; PPI enrichment p-value: 1.32 × 10−7.
Proteins that overlap between time points regardless of the myocardial region.
| Protein Information | Identification Method | Identified as Classifier | Secreted | Related to MI | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| UniProt | Protein Name | C6 | C30 | C45 | R6 | R30 | R45 | |||
| P27487 | DPP4 | Models, models/HT | 1 | 1 | 1 | − | 1 | 1 | 1 | 1 |
| Q6R327 | RICTR | Models | 1 | 1 | 1 | − | − | 1 | 0 | 1 |
| Q15759 | MK11 | Models | 1 | 1 | 1 | − | − | − | 1 | 1 |
| P53778 | MK12 | Models, models/HT, HT/models | 1 | − | 1 | 1 | 1 | − | 1 | 1 |
| P07585 | PGS2 | Models | − | 1 | 1 | 1 | − | 1 | 1 | 1 |
| O75676 | KS6A4 | Models | − | 1 | 1 | 1 | − | − | 1 | 0 |
| Q9UKL0 | RCOR1 | Models | − | − | 1 | 1 | 1 | − | 1 | 0 |
Models: the combinations of proteins that better classify the solution of the model to the corresponding cohort; models/HT: the combinations of proteins that better classify the solution of the model to the corresponding cohort filtered by the proteins acting according to the high-throughput (HT) data; HT/models: the combinations of proteins that better classify the microarray experiments to the corresponding cohort filtered by the proteins relevant in the models; 1: detected/positive; 0: negative; −: not detected.
Best combinations differentiating infarcted and non-infarcted hearts using data from C6, C30, and C45.
| C6 | Uniprot | Protein Name | Generalization Capability | Accuracy | Type of Results | Secreted a | Related to MI b |
|---|---|---|---|---|---|---|---|
| Combination 1 | P19838 | NFKB1 | 100% | 100% | Models/HT | ✓ | ✓ |
| O14641 | DVL2 | ✓ | × | ||||
| Q06187 | BTK | ✓ | × | ||||
| P01100 | FOS | ✓ | ✓ | ||||
| P61981 | 1433G | ✓ | × | ||||
| Combination 2 | P53778 | MAPK12 | 100% | 100% | Models/HT | ✓ | ✓ |
| Q9Y243 | AKT3 | ✓ | ✓ | ||||
| P28482 | MAPK01 | ✓ | ✓ | ||||
| P19838 | NFKB1 | ✓ | ✓ | ||||
| P45984 | MAPK09 | ✓ | ✓ | ||||
| C30 | |||||||
| Combination 1 | P62942 | FKB1A | 100% | 100% | Models/HT | ✓ | ✓ |
| Q14790 | CASP8 | ✓ | ✓ | ||||
| Combination 2 | P62942 | FKB1A | 100% | 100% | Models/HT | ✓ | ✓ |
| P30559 | OXYR | ✓ | ✓ | ||||
| C45 | |||||||
| Combination 1 | P27487 | DPP4 | 100% | 100% | Models/HT | ✓ | ✓ |
| O75330 | HMMR | ✓ | ✓ | ||||
| P03952 | KLKB1 | ||||||
| P07203 | GPX1 | ||||||
| P31645 | SLC6A4 | ||||||
| Combination 2 | P62942 | FKB1A | 100% | 100% | Models/HT | ✓ | ✓ |
| P03952 | KLKB1 | ✓ | ✓ |
a Indicates whether the protein is secreted and/or has been detected in plasma. b Indicates if there is a previously known relationship between the protein and MI. Models/HT: the combinations of proteins that better classify the solution of the model to the corresponding cohort filtered by the proteins according to the high-throughput (HT) data; MI: myocardial infarction; C6: infarct core area 6 days after infarction; C30: infarct core area 30 days after infarction; C45: infarct core area 45 days after infarction.
Figure 6Graphic representation of the evolution of affected processes throughout the progression of myocardial infarction. Values on the X-axis show time evolution. Values on the Y-axis indicate upregulation (v > 0), downregulation (v < 0), or a lack of differential expression (v = 0 and matching the X-axis).
Identification of time-dependent source proteins that explain the molecular mechanism of action in myocardial infarction.
| Time-Points of Protein Expression in the Core MI Area | C6 | C30 | C45 |
|---|---|---|---|
| Proteins available for analysis a | 1182 | 1150 | 1033 |
| Maximum % of proteins explained by other differential proteins b | 49% | 48% | 48% |
| Number of source proteins c | 16 | 18 | 11 |
| % of explainable proteins explained by triggering proteins d | 92% | 93% | 89% |
Proteins were identified by triggering-protein analysis. a The number of differential proteins related to cardiac remodeling available to be analyzed for each time point. b The percentage of evaluable proteins that were linked to another of the evaluated proteins (i.e., explainable by other proteins). c The number of source proteins selected. d The percentage of explainable proteins (i.e., proteins in the second row) that are explained by or linked to the selected source proteins. C: infarcted core region.
Figure 7Mechanisms of action determined by artificial neural network (ANN) analysis. (A) Common mechanistic relationship between cardiac remodeling and the common source proteins (in grey: IGF1R, RAF1, KPCA, JUN, and PTN11) identified at all three analyzed time points in the infarct core region. (B–D) Mechanistic representation of (B) C6–C30 common source proteins (SRC, STAT1, MK03, RAC1, and TF65), (C) C30 source proteins (GRB2 and CDC42), and (D) C45 source proteins (MK01). Continuous colored lines depict links present only in one cohort. Discontinued lines depict links present in two cohorts. Continuous grey lines depict links present at all time points. C6: infarct core area 6 days after infarction; C30: infarct core area 30 days after infarction; C45: infarct core area 45 days after infarction.
Proteins involved in the common mechanism of action between time points for the infarct core region and their log ratios at each time point.
| Entry Name | UniProt Code | Log Ratio C6 | Log Ratio C30 | Log Ratio C45 |
|---|---|---|---|---|
| TNFA | P01375 | - | - | - |
| RAF1 | P04049 | −0.69 | −0.90 | −0.71 |
| P53 | P04637 | 0.82 | 0.84 | - |
| JUN | P05412 | −1.49 | −1.46 | −1.36 |
| IGF1R | P08069 | −1.61 | −0.993 | −1.54 |
| THB | P10828 | −1.59 | −2.30 | −1.44 |
| KPCA9 | P17252 | 2.61 | 2.30 | 1.79 |
| SLC9A1 | P19634 | - | −0.56 | - |
| MAPK03 | P27361 | 0.60 | 0.84 | - |
| MAPK01 | P28482 | - | - | −0.78 |
| MTOR | P42345 | −0.57 | −0.7 | −0.59 |
| TSC2 | P49815 | - | - | 0.75 |
| MMP14 | P50281 | 1.32 | 1.74 | 1.40 |
| RPS6KA3 | P51812 | - | - | −0.49 |
| PTNP11 | Q06124 | 0.86 | 0.58 | 0.64 |
| RPS6KA2 | Q15349 | - | - | - |
| RPS6KA1 | Q15418 | - | - | - |
| BAD | Q92934 | 0.90 | 1.07 | 0.81 |
C6: infarct core area 6 days after infarction; C30: infarct core area 30 days after infarction; C45: infarct core area 45 days after infarction.