| Literature DB >> 32362896 |
Jolanda H M van Bilsen1, Remon Dulos1, Mariël F van Stee1, Marie Y Meima1, Tanja Rouhani Rankouhi1, Lotte Neergaard Jacobsen2, Anne Staudt Kvistgaard2, Jossie A Garthoff3, Léon M J Knippels4,5, Karen Knipping4,5, Geert F Houben1, Lars Verschuren1, Marjolein Meijerink1, Shaji Krishnan1.
Abstract
A healthy immune status is strongly conditioned during early life stages. Insights into the molecular drivers of early life immune development and function are prerequisite to identify strategies to enhance immune health. Even though several starting points for targeted immune modulation have been identified and are being developed into prophylactic or therapeutic approaches, there is no regulatory guidance on how to assess the risk and benefit balance of such interventions. Six early life immune causal networks, each compromising a different time period in early life (the 1st, 2nd, 3rd trimester of gestations, birth, newborn, and infant period), were generated. Thereto information was extracted and structured from early life literature using the automated text mining and machine learning tool: Integrated Network and Dynamical Reasoning Assembler (INDRA). The tool identified relevant entities (e.g., genes/proteins/metabolites/processes/diseases), extracted causal relationships among these entities, and assembled them into early life-immune causal networks. These causal early life immune networks were denoised using GeneMania, enriched with data from the gene-disease association database DisGeNET and Gene Ontology resource tools (GO/GO-SLIM), inferred missing relationships and added expert knowledge to generate information-dense early life immune networks. Analysis of the six early life immune networks by PageRank, not only confirmed the central role of the "commonly used immune markers" (e.g., chemokines, interleukins, IFN, TNF, TGFB, and other immune activation regulators (e.g., CD55, FOXP3, GATA3, CD79A, C4BPA), but also identified less obvious candidates (e.g., CYP1A2, FOXK2, NELFCD, RENBP). Comparison of the different early life periods resulted in the prediction of 11 key early life genes overlapping all early life periods (TNF, IL6, IL10, CD4, FOXP3, IL4, NELFCD, CD79A, IL5, RENBP, and IFNG), and also genes that were only described in certain early life period(s). Concluding, here we describe a network-based approach that provides a science-based and systematical method to explore the functional development of the early life immune system through time. This systems approach aids the generation of a testing strategy for the safety and efficacy of early life immune modulation by predicting the key candidate markers during different phases of early life immune development.Entities:
Keywords: biomarkers; early life; immune networks; machine learning; text mining
Year: 2020 PMID: 32362896 PMCID: PMC7182036 DOI: 10.3389/fimmu.2020.00644
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Bioinformatics workflow to generate human early life networks. (A) Expert based selection of early life immune manuscripts were divided in 6 early life time periods and subjected to INDRA text mining tool. This resulted in 6 causal INDRA network. (B) The gene-gene connections of the INDRA networks were denoised and validated for the human situation by GeneMania. (C) DisGeNET and Gene Ontology tools (GO and GOslim) enriched the denoised early life networks by adding gene-disease connections and gene-process/pathway connections. (D) Inference calculations enriched the early life networks further by adding process-disease and disease-immune health endpoint connections. All steps together resulted in 6 human early life immune networks. The results of the different programming steps are depicted in Tables 2–4 as indicated.
Figure 2Workflow to generate the basis of early life immune networks by literature. Six causal early life immune networks covering a different early life were generated by selecting appropriate manuscripts from literature after which relationships between biological entities were extracted by the text mining tool INDRA. Next INDRA assembled, de-duplicated and standardized all relationships into causal early life-immune networks each covering a different early life period. These INDRA networks formed the basis of the early life immune networks. *Several unique articles cover multiple early life periods.
Search strings used to assess the available literature regarding the immune functional developmental stages in human and experimental animals was performed by searching the databases Scopus and Medline.
| Thymus OR spleen OR lymph nodes OR Peyer's patches OR bone marrow OR liver | Human OR mini pig | •Functional AND developmental AND stages |
| Cord blood | Human OR mini pig | •Functional AND developmental AND stages |
| Human OR mini pig | •Functional AND developmental AND stages AND (amniotic fluid) OR placenta OR ( |
No additional organ/tissue-specific term used in this search string which is specifically aimed at the gestational phase.
Developmental early life stages in human, minipig, rat, and mouse [adapted from (8)].
| Human | GD0–GW12 | GW13–28 | GW29–40 | – | 0–28 days | 1–23 months |
| Minipig | GD0–GD37 | GD38–75 | GD76–113 | – | 0–15 days | 2–4 weeks |
| Rat | GD0–6 | GD7–13 | GD14–21 | – | 0–7/10 days | 1/1.5–3 weeks |
| Mouse | GD0–6 | GD7–13 | GD14–21 | – | 0–7/10 days | 1/1.5–3 weeks |
Starts at fertilization/conception.
EG/MG/LG, early/mid/late gestational period.
GD, gestational day; GW, gestational week.
Figure 3Overview of the steps used to enrich the INDRA networks. The genes described in early life literature (level 1). were entered in (i) DisGeNET to add gene-disease relationships to the network (level 1–4) and (ii) Gene Ontology tools GO/GO-SLIM to add gene-sub bioprocess (level 1–2), sub bioprocess—parent bioprocess (level 2–3) and gene-parent bioprocesses (level 1–3) relationships. Next the GO-terms linking to immune health features described previously in Meijerink et al. (16) were added to the network (level 2–5; blue arrow). The associations between bioprocesses and diseases (level 3–4) and disease–immune health features (level 4–5) were inferred (black arrows) based on the previous enrichment steps (orange arrows).
Figure 4Early life immune networks based on information from early life immune literature and enriched with info from databases and inference steps, each covering a different phase during early life. (A–C) EG, MG, and LG; (D) birth; (E) newborn (0–28 days); (F) infant (1–24 months). (G) magnification of infant.
Number of edges between genes described in early life (literature info) and their presence in the human GeneMania database.
| EG | 440 | 228 | 72 (32%) |
| MG | 477 | 278 | 84 (30%) |
| LG | 508 | 319 | 90 (28%) |
| Birth | 225 | 162 | 49 (30%) |
| Newborn | 291 | 249 | 68 (27%) |
| Infant | 232 | 174 | 51 (29%) |
EG/MG/LG, early/mid/late gestation.
Sometimes it was not possible to distinguish protein names from corresponding gene names in literature. Therefore, all those names were annotated as being both a protein and a gene and regarded as 1 node in the network.
Results of enrichment/inference steps of the early life denoised INDRA immune networks.
| EG | 149 | 9,546 | 443 | 3,894 | 1,121 | 1,701 | 1,023 |
| MG | 160 | 10,195 | 517 | 4,089 | 1,132 | 1,908 | 1,029 |
| LG | 180 | 10,968 | 546 | 4,568 | 1,246 | 2,207 | 1,136 |
| Birth | 67 | 3,929 | 168 | 1,719 | 695 | 1,073 | 627 |
| Newborn | 102 | 6,159 | 231 | 2,759 | 832 | 1,215 | 752 |
| Infant | 86 | 4,980 | 296 | 2,233 | 770 | 823 | 706 |
Depicted are the number of connections (edges) between biological entities (genes, bioprocesses, diseases, immune endpoints) added to the INDRA immune networks. EG/MG/LG, early/mid/late gestation.
Enriched early life immune network nodes.
| Proteins/genes | 440 | 477 | 508 | 225 | 291 | 232 |
| Protein families | 101 | 110 | 114 | 62 | 72 | 55 |
| Chemicals | 175 | 189 | 211 | 93 | 128 | 106 |
| Bioprocesses | 51 | 56 | 58 | 36 | 39 | 34 |
| GO processes | 3,709 | 3,868 | 3,988 | 1,947 | 2,751 | 2,289 |
| GOslim processes | 55 | 55 | 55 | 59 | 60 | 59 |
| Diseases | 351 | 352 | 400 | 245 | 282 | 257 |
| Immune health endpoint | 4 | 4 | 4 | 4 | 4 | 4 |
Depicted are the number of nodes in the networks after all enrichment/inference steps. These networks formed the basis of the gene prioritization (see .
Using text mining, it was not always possible to distinguish genes from proteins (often same name used).
Bioprocesses identified by ontology of INDRA text mining tool.
List of prioritized genes per early life time period.
The PageRank score of all nodes was calculated for each gene in order to identify the most “central” genes in the networks. The top 50 genes (i.e., highest PageRank score) per network are depicted, including their PageRank score. EG/MG/LG, early/mid/late gestation. Descriptions of the genes are described in .
Figure 5Venn diagram depicting unique and shared sets of genes from the top 50 gene lists of the different early life phases (Table 6); (A) number of genes and (B) gene names. For the gestational phases, the top 50 gene lists of early, mid and late period were combined, resulting in 67 unique genes. EG/MG/LG, early/mid/late gestation.