| Literature DB >> 33073088 |
Stefan Graw1,2, Jillian Tang1, Maroof K Zafar1, Alicia K Byrd1, Chris Bolden3, Eric C Peterson3, Stephanie D Byrum1,2.
Abstract
The technological advances in mass spectrometry allow us to collect more comprehensive data with higher quality and increasing speed. With the rapidly increasing amount of data generated, the need for streamlining analyses becomes more apparent. Proteomics data is known to be often affected by systemic bias from unknown sources, and failing to adequately normalize the data can lead to erroneous conclusions. To allow researchers to easily evaluate and compare different normalization methods via a user-friendly interface, we have developed "proteiNorm". The current implementation of proteiNorm accommodates preliminary filters on peptide and sample levels followed by an evaluation of several popular normalization methods and visualization of the missing value. The user then selects an adequate normalization method and one of the several imputation methods used for the subsequent comparison of different differential expression methods and estimation of statistical power. The application of proteiNorm and interpretation of its results are demonstrated on two tandem mass tag multiplex (TMT6plex and TMT10plex) and one label-free spike-in mass spectrometry example data set. The three data sets reveal how the normalization methods perform differently on different experimental designs and the need for evaluation of normalization methods for each mass spectrometry experiment. With proteiNorm, we provide a user-friendly tool to identify an adequate normalization method and to select an appropriate method for differential expression analysis.Entities:
Year: 2020 PMID: 33073088 PMCID: PMC7557219 DOI: 10.1021/acsomega.0c02564
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 1Overview of proteiNorm’s workflow, inputs, and outputs.
Meta-Data for the Mouse Data Seta
| Protein.Sample.Names | Custom.Sample.Names | group | batch |
|---|---|---|---|
| Reporter.intensity.corrected.1.TMT1 | NA_T_1 | NA_T | 1 |
| Reporter.intensity.corrected.2.TMT1 | NA_T_2 | NA_T | 1 |
| Reporter.intensity.corrected.3.TMT1 | NA_T_3 | NA_T | 1 |
| Reporter.intensity.corrected.4.TMT1 | NA_1 | NA | 1 |
| Reporter.intensity.corrected.5.TMT1 | NA_2 | NA | 1 |
| Reporter.intensity.corrected.6.TMT1 | NA_3 | NA | 1 |
| Reporter.intensity.corrected.1.TMT2 | S_T_1 | S_T | 2 |
| Reporter.intensity.corrected.2.TMT2 | S_T_2 | S_T | 2 |
| Reporter.intensity.corrected.3.TMT2 | S_T_3 | S_T | 2 |
| Reporter.intensity.corrected.4.TMT2 | S_1 | S | 2 |
| Reporter.intensity.corrected.5.TMT2 | S_2 | S | 2 |
| Reporter.intensity.corrected.6.TMT2 | S_3 | S | 2 |
“Protein.Sample.Names” care automatically generated from the proteinGroups file. “Custom.Sample.Names” is optional and replaces protein sample names when provided (needs to be a unique name). “Group” specifies individual treatment groups. “Batch” is optional and indicates the batch for each sample.
Figure 2Evaluation of normalization and missing values. (A) Sum of normalized intensities using VSN by sample. (B) Principal component analysis plot based on data normalized by VSN. (C) Sample-clustered heatmap of missing values. (D) Pooled intragroup coefficient of variance comparing different normalization methods. (E) Pooled intragroup estimate of variance comparing different normalization methods. (F) Pooled intragroup median absolute deviation comparing different normalization methods. (G) Pairwise sample correlations within a group for different normalization methods. (H) Correlation heatmap (all pairwise samples). (I) Distribution of log2-ratios (all two-group combinations) for different normalization methods.
Meta-Data for the Breast Cancer Data Seta
| Protein.Sample.Names | Custom.Sample.Names | group | batch |
|---|---|---|---|
| Reporter.intensity.corrected.1.TMT3 | ER/PR+ MCF7_UT_19 | ERPR | 3 |
| Reporter.intensity.corrected.2.TMT3 | ER/PR+ MCF7_UT_20 | ERPR | 3 |
| Reporter.intensity.corrected.3.TMT3 | ER/PR+ MCF7_UT_21 | ERPR | 3 |
| Reporter.intensity.corrected.4.TMT3 | MCF 10A_UT_22 | control | 3 |
| Reporter.intensity.corrected.5.TMT3 | MCF 10A_UT_23 | control | 3 |
| Reporter.intensity.corrected.6.TMT3 | MCF 10A_UT_24 | control | 3 |
| Reporter.intensity.corrected.7.TMT3 | HER2+ HCC 1954_UT_25 | HER2 | 3 |
| Reporter.intensity.corrected.8.TMT3 | HER2+ HCC 1954_UT_26 | HER2 | 3 |
| Reporter.intensity.corrected.9.TMT3 | HER2+ HCC 1954_UT_27 | HER2 | 3 |
| Reporter.intensity.corrected.10.TMT3 | pool_3 | pool | 3 |
| Reporter.intensity.corrected.1.TMT4 | MCF 10A_TR_28 | ERPR_TR | 4 |
| Reporter.intensity.corrected.2.TMT4 | MCF 10A_TR_29 | ERPR_TR | 4 |
| Reporter.intensity.corrected.3.TMT4 | MCF 10A_TR_30 | ERPR_TR | 4 |
| Reporter.intensity.corrected.4.TMT4 | ER/PR+ MCF7_TR_31 | control_TR | 4 |
| Reporter.intensity.corrected.5.TMT4 | ER/PR+ MCF7_TR_32 | control_TR | 4 |
| Reporter.intensity.corrected.6.TMT4 | ER/PR+ MCF7_TR_33 | control_TR | 4 |
| Reporter.intensity.corrected.7.TMT4 | HER2+ HCC 1954_TR_34 | HER2_TR | 4 |
| Reporter.intensity.corrected.8.TMT4 | HER2+ HCC 1954_TR_35 | HER2_TR | 4 |
| Reporter.intensity.corrected.9.TMT4 | HER2+ HCC 1954_TR_36 | HER2_TR | 4 |
| Reporter.intensity.corrected.10.TMT4 | pool_4 | pool | 4 |
“Protein.Sample.Names” are automatically generated from the proteinGroups file. “Custom.Sample.Names” is optional and replaces protein sample names when provided (needs to be a unique name). “Group” specifies individual treatment groups. “Batch” is optional and indicates the batch for each sample.
Meta-Data for the Spiked-in Data Seta
| Protein.Sample.Names | Custom.Sample.Names | group | batch |
|---|---|---|---|
| Intensity.12500amol_R1 | 12500amol_R1 | 12500 | 1 |
| Intensity.12500amol_R2 | 12500amol_R2 | 12500 | 1 |
| Intensity.12500amol_R3 | 12500amol_R3 | 12500 | 1 |
| Intensity.125amol_R1 | 125amol_R1 | 125 | 1 |
| Intensity.125amol_R2 | 125amol_R2 | 125 | 1 |
| Intensity.125amol_R3 | 125amol_R3 | 125 | 1 |
| Intensity.25000amol_R1 | 25000amol_R1 | 25000 | 1 |
| Intensity.25000amol_R2 | 25000amol_R2 | 25000 | 1 |
| Intensity.25000amol_R3 | 25000amol_R3 | 25000 | 1 |
| Intensity.2500amol_R1 | 2500amol_R1 | 2500 | 1 |
| Intensity.2500amol_R2 | 2500amol_R2 | 2500 | 1 |
| Intensity.2500amol_R3 | 2500amol_R3 | 2500 | 1 |
| Intensity.250amol_R1 | 250amol_R1 | 250 | 1 |
| Intensity.250amol_R2 | 250amol_R2 | 250 | 1 |
| Intensity.250amol_R3 | 250amol_R3 | 250 | 1 |
| Intensity.50000amol_R1 | 50000amol_R1 | 50000 | 1 |
| Intensity.50000amol_R2 | 50000amol_R2 | 50000 | 1 |
| Intensity.50000amol_R3 | 50000amol_R3 | 50000 | 1 |
| Intensity.5000amol_R1 | 5000amol_R1 | 5000 | 1 |
| Intensity.5000amol_R2 | 5000amol_R2 | 5000 | 1 |
| Intensity.5000amol_R3 | 5000amol_R3 | 5000 | 1 |
| Intensity.500amol_R1 | 500amol_R1 | 500 | 1 |
| Intensity.500amol_R2 | 500amol_R2 | 500 | 1 |
| Intensity.500amol_R3 | 500amol_R3 | 500 | 1 |
| Intensity.50amol_R1 | 50amol_R1 | 50 | 1 |
| Intensity.50amol_R2 | 50amol_R2 | 50 | 1 |
| Intensity.50amol_R3 | 50amol_R3 | 50 | 1 |
“Protein.Sample.Names” are automatically generated from the proteinGroups file. “Custom.Sample.Names” is optional and replaces protein sample names when provided (needs to be a unique name). “Group” specifies individual treatment groups. “Batch” is optional and indicates the batch for each sample.
Figure 3Distribution ratios of spiked-in proteins (averages across replicates) of different spike-in concentrations for raw and normalized protein intensities. Intensities of spiked-in proteins were averaged across concentration replicates to calculate the ratio of averaged proteins between different spike-in concentrations. The distributions of ratios of the measured proteins are compared between raw and normalized protein data. Spiked-in concentrations of 50–500amol were excluded due to the absence of most measurements.