| Literature DB >> 32397130 |
Antonio Federico1,2, Angela Serra1,2, My Kieu Ha3,4,5, Pekka Kohonen6,7, Jang-Sik Choi3,4,5, Irene Liampa8, Penny Nymark6,7, Natasha Sanabria9, Luca Cattelani1,2, Michele Fratello1,2, Pia Anneli Sofia Kinaret1,2,10, Karolina Jagiello11,12, Tomasz Puzyn11,12, Georgia Melagraki13, Mary Gulumian9,14, Antreas Afantitis13, Haralambos Sarimveis8, Tae-Hyun Yoon3,4,5, Roland Grafström6,7, Dario Greco1,2,10.
Abstract
Preprocessing of transcriptomics data plays a pivotal role in the development of toxicogenomics-driven tools for chemical toxicity assessment. The generation and exploitation of large volumes of molecular profiles, following an appropriate experimental design, allows the employment of toxicogenomics (TGx) approaches for a thorough characterisation of the mechanism of action (MOA) of different compounds. To date, a plethora of data preprocessing methodologies have been suggested. However, in most cases, building the optimal analytical workflow is not straightforward. A careful selection of the right tools must be carried out, since it will affect the downstream analyses and modelling approaches. Transcriptomics data preprocessing spans across multiple steps such as quality check, filtering, normalization, batch effect detection and correction. Currently, there is a lack of standard guidelines for data preprocessing in the TGx field. Defining the optimal tools and procedures to be employed in the transcriptomics data preprocessing will lead to the generation of homogeneous and unbiased data, allowing the development of more reliable, robust and accurate predictive models. In this review, we outline methods for the preprocessing of three main transcriptomic technologies including microarray, bulk RNA-Sequencing (RNA-Seq), and single cell RNA-Sequencing (scRNA-Seq). Moreover, we discuss the most common methods for the identification of differentially expressed genes and to perform a functional enrichment analysis. This review is the second part of a three-article series on Transcriptomics in Toxicogenomics.Entities:
Keywords: RNA-Seq; batch effect; data preprocessing; differential expression; microarray; normalization; quality check; scRNA-Seq; toxicogenomics; transcriptomics
Year: 2020 PMID: 32397130 PMCID: PMC7279140 DOI: 10.3390/nano10050903
Source DB: PubMed Journal: Nanomaterials (Basel) ISSN: 2079-4991 Impact factor: 5.076
Figure 1Data preprocessing schema for microarray. The brown box indicates the input of the pipeline. The green box indicates the output of the pipeline. The blue boxes show the intermediate steps of the pipeline and above or below the boxes are listed the software/packages employed in the step.
Figure 2Panel A—Prince plot showing the association between the technical variables and the principal components. The text and the background color in each cell represent the association p-value. The row label underlined with solid blue line represents the variable of interest. The row labels underlined with dotted purple line represent other sources of high variation. Panel B—Confounding plot, representing the correlation among the technical variables. The row label underlined with solid blue line represents the variable of interest. The green squares represent the variables confounded with the variable of interest or other batch variables. The row labels circled by red outline are batch variables suitable for correction.
Figure 3Data preprocessing schema for RNA-Seq. The brown box indicates the input of the pipeline. The green box indicates the output of the pipeline. The blue boxes show the intermediate steps of the pipeline and above or below the boxes are shown the software/packages employed in the step.
Figure 4Data preprocessing schema for single-cell RNA-Seq. The brown box indicates the input of the pipeline. The green box indicates the output of the pipeline. The blue boxes show the intermediate steps of the pipeline and above or below the boxes are shown the software/packages employed in the step.