Literature DB >> 32823315

Contributions from the 2019 Literature on Bioinformatics and Translational Informatics.

Abstract

OBJECTIVES: Summarize recent research and select the best papers published in 2019 in the field of Bioinformatics and Translational Informatics (BTI) for the corresponding section of the International Medical Informatics Association Yearbook.
METHODS: A literature review was performed for retrieving from PubMed papers indexed with keywords and free terms related to BTI. Independent review allowed the section editors to select a list of 15 candidate best papers which were subsequently peer-reviewed. A final consensus meeting gathering the whole Yearbook editorial committee was organized to finally decide on the selection of the best papers.
RESULTS: Among the 931 retrieved papers covering the various subareas of BTI, the review process selected four best papers. The first paper presents a logical modeling of cancer pathways. Using their tools, the authors are able to identify two known behaviours of tumors. The second paper describes a deep-learning approach to predicting resistance to antibiotics in Mycobacterium tuberculosis. The authors of the third paper introduce a Genomic Global Positioning System (GPS) enabling comparison of genomic data with other individuals or genomics databases while preserving privacy. The fourth paper presents a multi-omics and temporal sequence-based approach to provide a better understanding of the sequence of events leading to Alzheimer's Disease.
CONCLUSIONS: Thanks to the normalization of open data and open science practices, research in BTI continues to develop and mature. Noteworthy achievements are sophisticated applications of leading edge machine-learning methods dedicated to personalized medicine. Georg Thieme Verlag KG Stuttgart.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 32823315 PMCID： PMC7442509 DOI： 10.1055/s-0040-1702002

Source DB: PubMed Journal: Yearb Med Inform ISSN： 0943-4747

1 Introduction

Within the 2020 International Medical Informatics Association (IMIA) Yearbook of Medical Informatics, the goal of the Bioinformatics and Translational Informatics (BTI) section is to provide an overview of research trends from 2019 publications that demonstrated excellent research about various aspects of bioinformatics methods and techniques to advance clinical care. In 2008, the American Medical Informatics Association (AMIA) has defined translational bioinformatics as “… the development of storage, analytic, and interpretive methods to optimize the transformation of increasingly voluminous biomedical data into proactive, predictive, preventative, and participatory health ” 1 . The first priorities addressed storage, retrieval, and focused analytics of high-throughput data that motivated numerous research and development studies in the last decade. Today, the topic is clearly coming of age with more ambitious objectives (such as pan-cancer approaches, multi-omics analyses, drug repurposing) which make use among others of the most advanced computational methods such as Artificial Intelligence and Deep Learning.

2 Paper Selection Method

Following the method described in 2 , a comprehensive review of articles published in 2019 and addressing various subtopics for BTI was conducted. The selection was performed by querying MEDLINE via PubMed (from NCBI, National Center for Biotechnology Information, NLM, NIH) with a set of predefined MeSH descriptors along with free terms: “Translational informatics”, “Translational bioinformatics”, “Bioinformatics”, “Computational molecular biology”, “Computing Methodologies”, “Information storage and retrieval”, “Pattern recognition”, “automated”, “Medical informatics, Algorithms”, “Translational medical Research”, “Genetics, Medical”, “Precision Medicine”, “Personalized medicine”, “Molecular Medicine”, “Genomic medicine”, “Medical genetics”, “Medical genomics”, “Clinical genomics”, “Genetics”, “Genomics”, “Next-generation sequencing”, “High throughput sequencing”, “Transcriptome”, “Transcriptomics”, “Proteome”, “Proteomics”, “Proteogenomics”, “Epigenomics”, “Metabolomics”, “Metagenomics”, “Large-scale datasets”, “Big data”, “Omics”, and “Multi-omics”. Bibliographic databases were searched on January 20th, 2020 for papers published in 2019, considering the electronic publication date. The original set of 931 references was reviewed jointly by the two section editors to select a consensual list. Hence, 41 and 34 references were selected by MS, respectively BR, based on the title and the abstract of the papers. Six articles were in common to both selections, and 11 were accepted in one selection and pending in the other. Section editors agreed on 15 papers out of the 17 pre-selected references which were subsequently peer-reviewed by the IMIA Yearbook editors and external reviewers (at least four reviews per paper). Four papers were finally selected as best papers ( Table 1 ). A content summary of these best papers can be found in the appendix of this synopsis.

Table 1

Best paper selection of articles for the IMIA Yearbook of Medical Informatics 2020 in the section ‘Bioinformatics and Translational Informatics’. The articles are listed in alphabetical order of the first author’s surname.

SectionBioinformatics and Translational Informatics
▪ Béal J, Montagud A, Traynard P, Barillot E, Calzone L. Personalization of logical models with multi-omics data allows clinical stratification of patients. Front Physiol 2019 Jan 24;9:1965.
▪ Chen ML, Doddi A, Royer J, Freschi L, Schito M, Ezewudo M, Kohane IS, Beam A, Farhat M. Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction. EBioMedicine 2019 May;43:356–69.
▪ Kim K, Baik H, Jang CS, Roh JK, Eskin E, Han B. Genomic GPS: using genetic distance from individuals to public data for genomic analysis without disclosing personal genomes. Genome Biol 2019 Dec;20(1):175.
▪ Marttinen M, Paananen J, Neme A, Vikram M, Takalo M, Natune T, Paldanius KMA, Mäkinen P, Bremang M, Kurki MI, Rauramaa T, Leinonen V, Soininen H, Haapasalo A, Pike I, Hiltunen M. A multiomic approach to characterize the temporal sequence in Alzheimer‘s disease-related pathology. Neurobiol Dis 2019;124:45468.

3 Description of Candidate Best Papers and Best Papers

A rapid content analysis of the 931 retrieved references revealed a large proportion of studies dealing with identification and routine use in clinical settings of genetic variants in connection with various diseases. Through their choices, section editors wanted to shed light on three research trends and one emerging topic in the BTI field which are presented in the following 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 . The research trends and emerging topics cover three different dimensions: methods, application domain, and purpose. Many of the selected articles are present in more than one trend or emerging topic.

3.1 Trend 1: Approaches Based on Machine-learning Methods

The 2019 selection confirms the massive impact of machine learning and neural networks in the biomedical informatics field. Machine learning methods in general, and more specifically deep learning, have been largely adopted by the biomedical community. Last year, successes were mostly obtained in the field of image analysis; new fields of applications, including drug sensitivity or resistance prediction or genomics, have been explored successfully in the current edition. While machine learning methods were mostly referring to tools coming from the computer science community, it is worth noting that many articles used the overall term artificial intelligence for any predictive system, including classical statistical approaches such as regression. The use of machine learning libraries such as keras, tensorflow, or scikit-learn is also clear and could further generalize the adoption of machine learning methods. Chen et al., 6 use a neural architecture combining a wide and deep neural network (DNN) to tackle the problem of multidrug resistance in tuberculosis. To that end, the authors combined the wide and deep neural network with features derived from genomic variant data (both rare and known to cause resistance). Liu et al., 14 also approache the problem of sensitivity to drugs using a neural network approach. They designed a twin-convolutional network taking as input SMILE sequences and data derived from gene expression to predict the response of a drug. Leveraging already existing components, the authors came up with a novel DNN architecture for a complex problem. The paper also illustrates well the difficulty of tuning the hyperparameters of DNNs. In 5 , Béal et al,. demonstrate the potential of using logical modeling for systems biology to support precision medicine thanks to a sophisticated tailoring protocol. Their so-called PROFILE approach helps for patient stratification which is shown to be correlated with patient grouping on NPI (Nottingham Prognostic Index) and survival time. Various strategies are proposed to use patient omics data (mutations, CNA, expression, etc.) to personalize a generic logical model of cancer signaling pathways through stochastic simulations. The paper uses the METABRIC breast cancer as a proof of concept and allows to see promising directions to build patient models with mechanistic insights. Esteban-Medina et al., 8 used public gene expression data and a list of genes that are the target of approved drugs to identify potential causal relationships between proteins and cell activities. They rely on a Multi-Output Random Forest regressor available in scikit-learn and an optimization strategy built on top to predict a circuit activity across the disease pathway. Wan et al., 17 used logistic regression and Support Vector Model (SVM), associated with dimension reduction methods (principal component analysis and truncated singular-value decomposition) to build a signature of early stage colorectal cancer using whole genome sequencing data. They used an original confounder-controlled cross validation procedure for robust generalization estimation. Ibrahim et al., 11 used a statistical approach leveraging LASSO attribute selection and Monte-Carlo cross validation simulation to identify variables predictive of acute kidney injuries using proteomics data. The work by Adam et al., 3 indirectly tackles the question of machine learning. The authors emphasized the importance of early standardization and data quality control during data collection to enable later data exploitation using machine learning methods.

3.2 Trend 2: Genomics, Proteomics, and Multi-omics for the Exploration of a Wide Range of Diseases

This year’s selection and overall review process clearly highlighted a trend already observed last year: the increased use of multi-omics approaches. Nearly all the articles of the top-15 selection used some form of omics data, and most of the works relied on multi-omics approaches. The use of publicly available data already mentioned in last year’s BTI section is confirmed. While the main domain of application of the field remains cancer research, numerous other domains of applications have been explored. A large variety of methods are used, along with many sources of data ranging from sequencing technologies that have been largely adopted over the last decade, to new methods including circulating DNA sequencing and multiplex proteomics. Among the many applications of BTI in biomedical research, the selection highlighted common tasks: Early stage detection of cancer (e.g., using circulating biomarkers) and recurrence detection 16 17 ; Patient stratification and drug sensitivity prediction, and simulation models 7 9 10 14 . The paper authored by Marttinen et al., 15 , which belongs to our top-4 best papers, proposes a multi-omics and temporal sequence-based approach to provide a better understanding of the sequence of events leading to Alzheimer’s Disease (AD). The authors coupled transcriptomic and phosphoproteomic data to determine the temporal sequence of changes in microRNA, protein, and phosphopeptide expression levels from human temporal cortical samples, with varying stages of AD. This approach highlighted a significant sequence of key functions occurring at the considered stages of the disease. A quite odd paper is the one authored by Chung et al., 7 where a tool, called OmicsSIMLA, is proposed to simulate genomics (SNPs and CNV), epigenomics (i.e., bisulfite sequencing), transcriptomics, and proteomics data at the whole-genome level. Both the relationships between different types of omics data and between multi-omics data and disease status have been modeled. If the tool is adopted, OmicsSIMLA would be useful to generate benchmarks to evaluate the performance of methods analysing multi-omics data methods. Sample sizes can also be estimated when planning a new multi-omics disease study.

3.3 Trend 3: Drug Repositioning and Large-scale Prediction of Drug Sensitivity

The availability of high throughput technologies (in particular in genomics and proteomics) increased the volume of open data, along with the use of artificial intelligence methods enabling new tools to identify drugs and potential targets of drugs in different diseases. Fernández-Navarro et al., 9 used PanDrug, a previously developed pharmacological resource, to identify targeted treatments against cancers. PanDrug proposes a large set of drug-target associations and offers a score on gene cancer relevance and drug target to guide the selection of the best suited treatment. In their work, the authors combined PanDrug with information coming from RNA and DNA sequencing to suggest new therapies in a case of acute T-Cell lymphoblastic leukemia. Graim et al., 10 proposed PLATYPUS a machine learning approach to identify drug sensitivity signatures from cancer cell-lines databases. To overcome the problem of missing data and data sparsity both at the learning and prediction times, the authors developed an approach relying on a concept called multiple view learning, a semi-supervised method, and leveraged multi-omics data (expression, CNV, mutation, and also Sample- and Patient- specific information). Liu et al., 14 , as mentioned before, also relied on cell line data but in combination with features extracted from SMILES representation of drugs. Esteban-Medina et al,. 8 used a machine learning approach, combined with data from KEGG, Orphanet, GEO, GTEx and DrugBank to identify drugs that could have an effect on signaling the circuits that cause the treatment of the Fanconi anemia. Selected in the top-4 papers, Chen et al ., 6 used a neural network to predict drug resistance to antibiotics in Mycobacterium tuberculosis . They leveraged whole-genome sequencing of the pathogene, as well as rare variants and known drug resistant variants to predict the resistance to 10 anti-tuberculosis drugs. They achieved AUCs over 93% for first-line drugs, and 89% for second-line drugs.

3.4 Emerging Topic: Ethical Issues Raised in BTI Practices

This year’s special topic, ethics, was already an emerging topic in the previous edition. The coverage of ethical issues remains relatively low, with two articles among the 15 pre-selected in the section. While the large use of artificial intelligence methods requires and will continue to require ethical reflection and research, it may seem surprising to observe such a low representation in the final selection. We had indeed hypothesized that the culture of data use (including management of private personal information), the secondary use of public data, and the associated regulations (GDPR in Europe, HIPAA in the USA) would have enabled an early emergence of ethical concern in the field. In this year’s top-4 selection, Kim et al., 13 introduced the Genomics GPS, a method analogical to transport GPS, to enable comparison of genomics information between individuals or an individual and a group without disclosing sensitive genomics information. The second paper illustrating this emerging topic comes from Kim et al. 12 and tackled the issue of racial representation disparity in population genomic sequencing programs. This topic is particularly relevant in the age of artificial intelligence and data-driven models for which biased training datasets can lead to erroneous models. The authors proposed a method to quantify the difference in the composition of ethnicity in four genomic databases relative to epidemiological data on the US population.

4 Conclusion and Outlook

A few interesting papers published in 2019 in the BTI scope matched the special topic “Ethics in Health Informatics” of the 2020 IMIA Yearbook of Medical Informatics. These papers illustrate well the complexity and the constraints induced by the deployment of deep learning techniques, especially in the context of multidisciplinary and personalized care, including molecular characterization of tumors. We noticed a good diversity of methodologies, ranging from traditional regression-based approaches to logical modeling of biological systems, to support several tasks related to precision medicine such as early stage detection of cancer, disease recurrence detection, and drug sensitivity prediction. Further intelligent approaches are expected in coming years, combining semantic web languages with clinical, omics data, and biomolecular knowledge for extracting self-explanatory and actionable knowledge nuggets in clinical settings. It is worth noting that many contributions keep on relying on public datasets (such as GEO, KEGG, Orphanet…), as well as open machine learning libraries, tools, and systems.

16 in total

1. Toward a formalization of the process to select IMIA Yearbook best papers.

Authors: J-B Lamy; B Séroussi; N Griffon; G Kerdelhué; M-C Jaulent; J Bouaud
Journal: Methods Inf Med Date: 2014-11-14 Impact factor: 2.176

2. Utilization of tumor genomics in clinical practice: an international survey among ASCO members.

Authors: Romualdo Barroso-Sousa; Hao Guo; Piyush Srivastava; Ted James; Walter Birch; Lillian L Siu; William P Tew; Sara M Tolaney
Journal: Future Oncol Date: 2019-06-26 Impact factor: 3.404

3. Racial Representation Disparity of Population-Level Genomic Sequencing Efforts.

Authors: Isaac E Kim; Indra Neil Sarkar
Journal: Stud Health Technol Inform Date: 2019-08-21

4. A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification.

Authors: Ren-Hua Chung; Chen-Yu Kang
Journal: Gigascience Date: 2019-05-01 Impact factor: 6.524

5. A novel algorithm for network-based prediction of cancer recurrence.

Authors: Jianhua Ruan; Md Jamiul Jahid; Fei Gu; Chengwei Lei; Yi-Wen Huang; Ya-Ting Hsu; David G Mutch; Chun-Liang Chen; Nameer B Kirma; Tim H-M Huang
Journal: Genomics Date: 2016-07-21 Impact factor: 5.736

6. Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA.

Authors: Nathan Wan; David Weinberg; Tzu-Yu Liu; Katherine Niehaus; Eric A Ariazi; Daniel Delubac; Ajay Kannan; Brandon White; Mitch Bailey; Marvin Bertin; Nathan Boley; Derek Bowen; James Cregg; Adam M Drake; Riley Ennis; Signe Fransen; Erik Gafni; Loren Hansen; Yaping Liu; Gabriel L Otte; Jennifer Pecson; Brandon Rice; Gabriel E Sanderson; Aarushi Sharma; John St John; Catherina Tang; Abraham Tzou; Leilani Young; Girish Putcha; Imran S Haque
Journal: BMC Cancer Date: 2019-08-23 Impact factor: 4.430

7. PLATYPUS: A Multiple-View Learning Predictive Framework for Cancer Drug Sensitivity Prediction.

Authors: Kiley Graim; Verena Friedl; Kathleen E Houlahan; Joshua M Stuart
Journal: Pac Symp Biocomput Date: 2019

8. Exploring the druggable space around the Fanconi anemia pathway using machine learning and mechanistic models.

Authors: Marina Esteban-Medina; María Peña-Chilet; Carlos Loucera; Joaquín Dopazo
Journal: BMC Bioinformatics Date: 2019-07-02 Impact factor: 3.169

9. A clinical, proteomics, and artificial intelligence-driven model to predict acute kidney injury in patients undergoing coronary angiography.

Authors: Nasrien E Ibrahim; Cian P McCarthy; Shreya Shrestha; Hanna K Gaggin; Renata Mukai; Craig A Magaret; Rhonda F Rhyne; James L Januzzi
Journal: Clin Cardiol Date: 2019-01-08 Impact factor: 2.882

10. The use of PanDrugs to prioritize anticancer drug treatments in a case of T-ALL based on individual genomic data.

Authors: Pablo Fernández-Navarro; Pilar López-Nieva; Elena Piñeiro-Yañez; Gonzalo Carreño-Tarragona; Joaquín Martinez-López; Raúl Sánchez Pérez; Ángel Aroca; Fátima Al-Shahrour; María Ángeles Cobos-Fernández; José Fernández-Piqueras
Journal: BMC Cancer Date: 2019-10-26 Impact factor: 4.430

2 in total

1. Bioinformatics analysis of the potential regulatory mechanisms of renal fibrosis and the screening and identification of factors related to human renal fibrosis.

Authors: Cixiao Wang; Shaobo Wu; Jiang Li; Yuexian Ma; Youqun Huang; Na Fang
Journal: Transl Androl Urol Date: 2022-06

Review 2. High-throughput proteomics: a methodological mini-review.

Authors: Miao Cui; Chao Cheng; Lanjing Zhang
Journal: Lab Invest Date: 2022-08-03 Impact factor: 5.502

2 in total