| Literature DB >> 34158060 |
Yingdong Zhao1, Ming-Chung Li1, Mariam M Konaté1, Li Chen2, Biswajit Das2, Chris Karlovich2, P Mickey Williams2, Yvonne A Evrard2, James H Doroshow3, Lisa M McShane4.
Abstract
BACKGROUND: In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions. Several methods have been proposed and continue to be used. However, a consensus has not been reached regarding the best gene expression quantification method for RNA-seq data analysis.Entities:
Keywords: Count; DESeq2; FPKM; Normalization; Patient derived xenograft models; Quantification measures; RNA sequencing; RSEM; TMM; TPM
Year: 2021 PMID: 34158060 PMCID: PMC8220791 DOI: 10.1186/s12967-021-02936-w
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Fig. 1A Hierarchical clustering of 61 patient-derived xenograft (PDX) samples using TPM data. B Hierarchical clustering of 61 PDX samples using DESeq2 normalized count data. Distance metric 1-Pearson correlation was used to generate the dendrogram in each right panel and Euclidean distance was used for the dendrogram in each left panel. Discordant models are highlighted with different color labels
Number of discordant models in hierarchical cluster analysis under all scenarios
| Distance matrix | TPM (Fig. | CountDEseq2 (Fig. | CountTMM (Additional file | FPKM (Additional file | TPM-Zscore (Additional file | TPM-TMM (Additional file |
|---|---|---|---|---|---|---|
| 1-Pearson | 1/20 | 0 | 0 | 0 | 1/20 | 0 |
| Euclidean | 4/20 | 0 | 0 | 0 | 6/20 | 0 |
Maximum height in hierarchical cluster analysis under all scenarios
| Distance matrix | TPM (Fig. | CountDEseq2 (Fig. | CountTMM (Additional file | FPKM (Additional file | TPM-Zscore (Additional file | TPM-TMM (Additional file |
|---|---|---|---|---|---|---|
| 1-Pearson | 0.613 | 0.091 | 0.089 | 0.106 | 3.152a | 0.102 |
aSince Ward method is used as the linkage method, the height is not limited to the original scale and can be larger than 2
Fig. 2Bar plot of median coefficients of variation (CV) for gene expression levels from replicate samples of each PDX model using different quantification measures
Fig. 3A Bar plot of gene intraclass correlation coefficients (ICCg) across replicate samples of each PDX model using different quantification measures. B Boxplots of model intraclass correlation coefficients (ICCm) for gene expression levels from replicate samples across 20 PDX models using different quantification measures
Fig. 4A Pairwise scatter plots comparing TPM values for all genes between replicate samples of PDX model 475296-252-R. B Pairwise scatter plots comparing DESeq2 normalized count values for all genes between replicate samples of PDX model 475296-252-R. The x- and y- axes are normalized log2 counts on all pairwise scatter plots. Plots along the diagonal represent the density of the respective variable
Fig. 5A Bar plot of the sum of TPM values for the top 5 most highly expressed genes in four PDX models with the lowest ICCg. B Bar plot of the sum of TPM values for the top 5 most highly expressed genes in five PDX models with the highest ICCg