| Literature DB >> 36157078 |
Lucile Mégret1, Cloé Mendoza1, Maialen Arrieta Lobo1, Emmanuel Brouillet1, Thi-Thanh-Yen Nguyen2, Olivier Bouaziz2, Antoine Chambaz2, Christian Néri1.
Abstract
Micro-RNAs (miRNAs) are short (∼21 nt) non-coding RNAs that regulate gene expression through the degradation or translational repression of mRNAs. Accumulating evidence points to a role of miRNA regulation in the pathogenesis of a wide range of neurodegenerative (ND) diseases such as, for example, Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis and Huntington disease (HD). Several systems level studies aimed to explore the role of miRNA regulation in NDs, but these studies remain challenging. Part of the problem may be related to the lack of sufficiently rich or homogeneous data, such as time series or cell-type-specific data obtained in model systems or human biosamples, to account for context dependency. Part of the problem may also be related to the methodological challenges associated with the accurate system-level modeling of miRNA and mRNA data. Here, we critically review the main families of machine learning methods used to analyze expression data, highlighting the added value of using shape-analysis concepts as a solution for precisely modeling highly dimensional miRNA and mRNA data such as the ones obtained in the study of the HD process, and elaborating on the potential of these concepts and methods for modeling complex omics data.Entities:
Keywords: complex RNA-seq data; machine learning; miRNA regulation; neurodegenerative disease; precision analysis; shape analysis
Year: 2022 PMID: 36157078 PMCID: PMC9500540 DOI: 10.3389/fnmol.2022.914830
Source DB: PubMed Journal: Front Mol Neurosci ISSN: 1662-5099 Impact factor: 6.261
FIGURE 1Simplified view of miRNA regulation of gene expression.
Machine learning methods used for research on the association between miRNAs and neurodegenerative diseases.
| Methods | Positive aspects | Negative aspects | Examples of use in the context of miRNA analysis |
|
| Reduces the impact of variables that are not important for the prediction | Doesn’t eliminate irrelevant variables |
|
|
| Reduces overfitting by adding a penalty to coefficients the model overemphasizes and eliminates them | Doesn’t take into account multicollinearity in the model and could eliminate relevant independent variables |
|
|
| |||
|
| |||
|
| Combines both Lasso and Ridge aspects: It eliminates some variables while reducing the impact of some other variables | Computationally more expensive than LASSO or Ridge |
|
|
| |||
|
| Very simple to understand and visualize | Subject to overfitting |
|
| Doesn’t work well with imbalanced data | |||
| Very different trees can be generated if a small chance in the data is made | |||
|
| Can deal with imbalanced datasets and missing data | The number of nodes in decision trees will grow exponentially with depth |
|
| Being an ensemble of decision trees, overfitting is not a problem | The prediction needs to be uncorrelated |
| |
|
| More accurate than RF | Sensibility to outliers |
|
| Doesn’t need bootstrap sampling like RF | Overfitting can be a problem when too many trees are added |
| |
|
| Works well with 2D, 3D, or higher dimensions | Computationally more expensive for larger datasets |
|
|
| |||
| Outliers have less impact on the prediction since the hyperplane is influenced by the support vectors (data points closer to the hyperplane) | Works poorly if the dataset has overlapped classes |
| |
|
| Work very well with huge amount of data | Can be quickly computationally and time consuming |
|
| Can handle unstructured data | Big dependence on the training data, so overfitting can be a problem |
| |
|
| |||
|
| Very simple algorithm to implement | Lack of robustness with big data analysis |
|
| Choosing K can be difficult |
| ||
| Doesn’t work well with imbalanced data or outliers |
| ||
|
| Retains connectivity of nodes | Can lack biological precision |
|
|
| |||
|
| Can handle missing data and avoid overfitting | Need for sensitivity analysis, to be applied to the outcome |
|
|
|
Comparison of miRNAs retained in the striatum of HD model knock-in mice using a WGCNA-centric approach (Langfelder et al., 2018) or the MIRAMINT pipeline (Megret et al., 2020).
| Mir1247 | Miramint |
| Mir132 | Miramint, WGCNA |
| Mir133b | Miramint |
| Mir139 | Miramint, WGCNA |
| Mir187 | Miramint |
| Mir1b | Miramint |
| Mir20b | Miramint |
| Mir222 | Miramint, WGCNA |
| Mir299b | Miramint |
| Mir3102 | Miramint |
| Mir363 | Miramint |
| Mir378b | Miramint |
| Mir484 | Miramint |
| Mir673 | Miramint |
| Mir128-1 | WGCNA |
| Mir212 | WGCNA |
| Mir218 | WGCNA |
| Mir181d | WGCNA |
| Mir128-2 | WGCNA |
| Mir221 | WGCNA |
| Mir29a | WGCNA |
|
| WGCNA |
|
| WGCNA |
|
| WGCNA |
|
| WGCNA |
|
| WGCNA |
|
| WGCNA |
|
| WGCNA |
FIGURE 2Examples of a mRNA expression surface negatively correlated with a miRNA expression surface in the striatum of HD mice.