| Literature DB >> 34207216 |
Francesca Marzorati1, Chu Wang2, Giulio Pavesi2, Luca Mizzi2, Piero Morandini1.
Abstract
Transcriptomics studies have been facilitated by the development of microarray and RNA-Seq technologies, with thousands of expression datasets available for many species. However, the quality of data can be highly variable, making the combined analysis of different datasets difficult and unreliable. Most of the microarray data for Medicago truncatula, the barrel medic, have been stored and made publicly accessible on the web database Medicago truncatula Gene Expression atlas (MtGEA). The aim of this work is to ameliorate the quality of the MtGEA database through a general method based on logical and statistical relationships among parameters and conditions. The initial 716 columns available in the dataset were reduced to 607 by evaluating the quality of data through the sum of the expression levels over the entire transcriptome probes and Pearson correlation among hybridizations. The reduced dataset shows great improvements in the consistency of the data, with a reduction in both false positives and false negatives resulting from Pearson correlation and GO enrichment analysis among genes. The approach we used is of general validity and our intent is to extend the analysis to other plant microarray databases.Entities:
Keywords: Medicago; MtGEA; R programming; correlation analysis; functional genomics; microarray; transcriptomics
Year: 2021 PMID: 34207216 PMCID: PMC8234645 DOI: 10.3390/plants10061240
Source DB: PubMed Journal: Plants (Basel) ISSN: 2223-7747
Figure 1Scatterplot of the mean expression levels of probes Mtr.41545.1.S1_at and Mtr.16327.1.S1_at. Three outliers are indicated by red arrows.
Figure 2Graphical representation of the sum of the expression values for the RT_LCM samples [31], compared to the ones adjacent to them in the dataset list. All the other samples have sum of expression values close to 2.0 × 107, whereas the RT_LCM samples range between 1.0 × 107 and 1.7 × 107.
Figure 3Heatmaps for Pearson correlation coefficients in the logarithmic form for genes of the saponins biosynthetic pathway, with expression values from original (A) and cleaned (B) Log datasets (Table S12A,B).
Figure 4Heatmaps generated for Pearson correlation coefficients in the logarithmic version for the TFs family bHLH, with expression values from original (A) and cleaned (B) Log datasets (Table S15A,B).
Figure 5Comparison of AgriGO enrichment analysis working with linear co-expression data for the Mtr.12230.1.S1_at gene (Table S17A). Results obtained from correlation values working on the original (A) and cleaned (B) dataset.
Figure 6Comparison of AgriGO enrichment analysis working with logarithmic co-expression data for the Mtr.12230.1.S1_at gene (Table S17B). Results obtained from correlation values from the original (A) and cleaned (B) dataset.