| Literature DB >> 32039167 |
Xueyan Liu1, Nan Li2, Sheng Liu1, Jun Wang1, Ning Zhang1, Xubin Zheng3, Kwong-Sak Leung3, Lixin Cheng1.
Abstract
Dozens of normalization methods for correcting experimental variation and bias in high-throughput expression data have been developed during the last two decades. Up to 23 methods among them consider the skewness of expression data between sample states, which are even more than the conventional methods, such as loess and quantile. From the perspective of reference selection, we classified the normalization methods for skewed expression data into three categories, data-driven reference, foreign reference, and entire gene set. We separately introduced and summarized these normalization methods designed for gene expression data with global shift between compared conditions, including both microarray and RNA-seq, based on the reference selection strategies. To our best knowledge, this is the most comprehensive review of available preprocessing algorithms for the unbalanced transcriptome data. The anatomy and summarization of these methods shed light on the understanding and appropriate application of preprocessing methods.Entities:
Keywords: RNA-seq; microarray; normalization; regression; subset reference; transcriptome
Year: 2019 PMID: 32039167 PMCID: PMC6988798 DOI: 10.3389/fbioe.2019.00358
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
A summary of 23 normalization methods developed for unbalanced transcriptome data.
| 1 | GRSN (Pelz et al., | Global rank-invariant set normalization | Oligo array | Subset | Lowess | R |
| 2 | Xcorr (Chua et al., | Cross-correlation normalization | cDNA array, Oligo array | Subset | Non-linear regression | Matlab |
| 3 | NVSA (Ni et al., | Non-parametric variable selection and approximation | Oligo array | Subset | Non-linear regression | Matlab |
| 4 | KDWL (Hsieh et al., | Kernel density weighted loess normalization | Oligo array | Subset | Loess | SAS |
| 5 | KDQ (Hsieh et al., | Kernel density quantile normalization | Oligo array | Subset | Quantile | SAS |
| 6 | IRON (Welsh et al., | Iterative rank-order normalization | Oligo array | Subset | Loess | C |
| 7 | LVS (Calza et al., | Least-variant set normalization | Oligo array | Subset | Non-linear regression | R |
| 8 | LVSmiR (Suo et al., | Modified least-variant set normalization | Oligo miRNA array | Subset | Non-linear regression | R |
| 9 | Invariants normalization (Pradervand et al., | Invariants normalization | Oligo miRNA array | Subset | Non-linear regression | R |
| 10 | HMM-normalization (Landfors et al., | HMM assisted normalization | Microarray, RNA-seq | Subset | Further subset normalization | R |
| 11 | BSN (Aanes et al., | Biological scaling normalization | RNA-seq | Subset | R | |
| 12 | SVR (Fujita et al., | Support vector regression | cDNA, Oligo array | Subset | Non-linear regression | |
| 13 | ISN (Li and Hung Wong, | Invariant set normalization (in dChip) | Oligo array | Subset | – | R |
| 14 | Spike-in controls (Choe et al., | Spike-in standards | Microarray, RNA-seq | Negative controls | – | – |
| 15 | wlowess (Oshlack et al., | Weighted lowess normalization | cDNA array, Oligo array | Negative controls | Loess | R |
| 16 | wcloess (Wu et al., | Weighted cyclic loess normalization | Oligo miRNA array | Negative controls | Loess | R |
| 17 | SQN (Wu and Aryee, | Subset quantile normalization | Oligo array | Negative controls | Quantile | R |
| 18 | loessM (Risso et al., | loessM | two-color miRNA array | Entire set (median) | Loess | R |
| 19 | GPA normalization (Xiong et al., | Generalized procrustes analysis | cDNA array | Entire set (median) | GPA | |
| 20 | Non-normalization (Klinglmueller et al., | Using data that are background adjusted but not normalized | Oligo miRNA array | Entire set | None | R |
| 21 | WPRMA (Kim et al., | RMA using within-pedigree pool | Oligo array | Entire set | Quantile | R |
| 22 | CrossNorm (Cheng et al., | Cross normalization | Oligo array | Entire set | Quantile | R |
| 23 | ICN (Cheng et al., | Informative cross normalization | Oligo array | Entire set | Quantile | R |
Figure 1General steps of the preprocessing methods for skewed transcriptome data. The core step of normalization is the selection of reference, which was summarized into three categories, data-driven invariant subset, foreign subset, and the entire set. Sometimes summarization is after normalization.