Literature DB >> 35313706

Protocol for computationally evaluating the loss of stoichiometry and coordinated expression of proteins.

Stefan Hinz¹, Michael E Todhunter¹, Mark A LaBarge^1,2.

Abstract

Dysregulation of the transcriptional or translational machinery can alter the stoichiometry of multiprotein complexes and occurs in natural processes such as aging. Loss of stoichiometry has been shown to alter protein complex functions. We provide a protocol and associated code that use omics data to quantify these stoichiometric changes via statistical dispersion utilizing the interquartile range of expression values per grouping variable. This descriptive statistical approach enables the quantification of stoichiometry changes without additional data acquisition. For complete details on the use and execution of this protocol, please refer to Hinz et al. (2021).

Entities: Chemical

Keywords: Bioinformatics; Proteomics; RNAseq; Systems biology

Mesh：

Substances：
Proteins

Year: 2022 PMID： 35313706 PMCID： PMC8933523 DOI： 10.1016/j.xpro.2022.101182

Source DB: PubMed Journal: STAR Protoc ISSN： 2666-1667

Before you begin

General considerations

This protocol uses numerical expression data to evaluate the changing stoichiometry of gene or protein complexes over a condition variable (e.g., age, time, or treatment). The script assumes that all genes or proteins are present at all tested conditions. The protocol assumes that the expression of gene/protein complexes must change concordantly to maintain stoichiometry (coordinated change of expression). A progressive reduction in the correlation between protein and mRNA causes a progressive loss of stoichiometry in several protein complexes, including ribosomes (Kelmer Sacramento et al., 2020), which is observed as an uncoordinated change of expression. The figure of merit is the interquartile range (IQR) of expression. IQR describes the difference between the 75th and 25th percentile (x75 -x25) and, if the proteins complexed are unchanged between condition variables, the IQR stays unchanged, whereas the IQR changes given coordination changes (Figure 1). These analyses have been used to identify changes in proteostasis. (Hinz et al., 2021; Kelmer Sacramento et al., 2020)

Figure 1

Examples of coordinated and uncoordinated change of expression

The provided function requires numerical expression data with n samples for m conditions (exemplar conditions A and B) to calculate the interquartile range.

Examples of coordinated and uncoordinated change of expression The provided function requires numerical expression data with n samples for m conditions (exemplar conditions A and B) to calculate the interquartile range.

Key resources table

Step-by-step method details

Source IQR functions

Timing: 5 min To run this protocol, a sourcing of provided convenience R functions from GitHub is required to calculate and visualize the IQR analyses. The functions, example data, and a tutorial are available at https://github.com/LaBargeLab/IQR_test. Download IQR functions in R. if(!require(devtools)){ install.packages("devtools") # If not already installed } source_url("https://raw.githubusercontent.com/LaBargeLab/IQR_test/main/clean_functions.R")

Format input data

Timing: 5–30 min The functions require input data with gene name, sample name, expression value, and grouping variable columns. Any parametric units are valid for expression values, such as counts-per-million, reads-per-kilobase transcript for RNA-seq, or protein abundance data. An example can be downloaded through the provided GitHub page. Import expression matrix. CRITICAL: Data must be in long format - i.e., every combination of gene name and sample name must have its own row. urlfile <- “https://raw.githubusercontent.com/LaBargeLab/IQR_test/main/example_gene_data.csv” data <- read.csv(urlfile) data # Symbol name value variable ## # 1 gene1 sample1 99 A # 2 gene1 sample2 131 A # 3 gene1 sample3 74 A # 4 gene1 sample4 145 A # # #... #10000 gene1000 sample10 0 B .

Run analyses

Timing: 1 min Execute the stoichiometry function with the long format input data from step 2 to calculate the IQR for every sample by conditions and return a data frame with the results. CRITICAL: The stoichiometry function input requires the following arguments: symbol - character vector of gene symbols expression - numeric vector of expression values variable - character or factor vector with condition information sample - character vector of sample IDs geneset - character vector of interested genes (same nomenclature as symbol); if no geneset is supplied IQR analyses are performed based on all provided symbols stoi <- stoichiometry(expression = data$value, symbol = data$Symbol, variable = data$variable, geneset = c("gene1", "gene2", "gene3", "gene4", "gene5"), sample = data$name) Use the output from the stoichiometry function as input for the plotting function for visualization. stoi_plot(stoi)

Expected outcomes

The provided functions output a data frame with the IQR values on a sample level, and these can be conveniently plotted using boxplots (Figure 2).

Figure 2

Results of example IQR analyses

Boxes of box whisker plot represent 25–75 percentile ranges, vertical lines represent 1.5 × inter quartile range, and horizontal bars represent medians.

Results of example IQR analyses Boxes of box whisker plot represent 25–75 percentile ranges, vertical lines represent 1.5 × inter quartile range, and horizontal bars represent medians.

Quantification and statistical analysis

Statistical significance can be calculated using an appropriate statistical test, such as the Welch two-sample t-test for comparing two conditions or ANOVA for more than two conditions. The tests utilize the per-sample IQR data as input. Therefore, a power estimation is recommended to assess minimum sample size for meaningful analyses. t.test(stoi$IQR ∼ stoi$variable) # Welch Two Sample t-test # data : stoi$IQR by stoi$variable # alternative hypothesis: true difference in means is not equal to 0 # -216.46803 -65.53197 # sample estimates: # mean in group A mean in group B # 36.6 177.6 95 percent confidence interval: # -15.48451 22.68451 # sample estimates: # mean in group A mean in group B # 36.6 33.0

Limitations

The method described here is a statistical approach to assess the deregulation of protein complexes and does not replace confirmatory experiments.

Troubleshooting

Problem 1

The provided function does not load or work (step 3).

Potential solution

Confirm that all dependencies are installed (see key resources table). In case issues persist, an issue can be opened through the GitHub page.

Problem 2

The expression data includes NA values (step 1). Remove data with NA values or impute expression data if appropriate.

Problem 3

Expression data is in wide format not the required long format (step 2). There are multiple tools to reshape data in R. The authors suggest the use of pivot_longer() function from the tidyR package.

Problem 4

Where to find curated genesets (step 3)? There are multiple databases of curated genesets. The authors of this protocol recommend Molecular Signatures Database (MSigDB), Kyoto Encyclopedia of Genes and Genomes (KEGG), Drug Signatures Database (DSigDB), Gene Ontology Resource (GO), or HUGO Gene Nomenclature Committee as starting points.

Problem 5

The geneset does not match any symbols provided in the dataset (step 3). Confirm that the symbol nomenclature matches between genset and dataset symbol. In case of differing format, consider utilizing symbol conversion tools (e.g.,: biomaRt).

Resource availability

Lead contact

Mark A. LaBarge, mlabarge@coh.org

Materials availability

This study did not generate new unique reagents.

REAGENT OR RESOURCE	SOURCE	IDENTIFIER
Software and algorithms

R (V3.6.1)	CRAN	r-project.org
RStudio (V1.2.1335)	CRAN	rstudio.com
dplyr (V1.0.5)	CRAN	https://cran.r-project.org/web/packages/dplyr/index.html
ggrepel (V0.8.2)	CRAN	https://cran.r-project.org/web/packages/ggrepel/index.html
ggplot2 (V3.3.3)	CRAN	https://cran.r-project.org/web/packages/ggplot2/index.html
ggsci (V2.9)	CRAN	https://cran.r-project.org/web/packages/ggsci/index.html
readr (V1.4.0)	CRAN	https://cran.r-project.org/web/packages/readr/index.html

Deposited data

Numerical expression data	GitHub	https://raw.githubusercontent.com/LaBargeLab/IQR_test/main/example_gene_data.csv

Other

Computer	NA	NA
R functions	GitHub/zenodo	https://github.com/LaBargeLab/IQR_testhttps://doi.org/10.5281/zenodo.5879559

2 in total

1. Reduced proteasome activity in the aging brain results in ribosome stoichiometry loss and aggregation.

Authors: Erika Kelmer Sacramento; Joanna M Kirkpatrick; Mariateresa Mazzetto; Mario Baumgart; Aleksandar Bartolome; Simone Di Sanzo; Cinzia Caterino; Michele Sanguanini; Nikoletta Papaevgeniou; Maria Lefaki; Dorothee Childs; Sara Bagnoli; Eva Terzibasi Tozzini; Domenico Di Fraia; Natalie Romanov; Peter H Sudmant; Wolfgang Huber; Niki Chondrogianni; Michele Vendruscolo; Alessandro Cellerino; Alessandro Ori
Journal: Mol Syst Biol Date: 2020-06 Impact factor: 11.429

2. Deep proteome profiling of human mammary epithelia at lineage and age resolution.

Authors: Stefan Hinz; Antigoni Manousopoulou; Masaru Miyano; Rosalyn W Sayaman; Kristina Y Aguilera; Michael E Todhunter; Jennifer C Lopez; Lydia L Sohn; Leo D Wang; Mark A LaBarge
Journal: iScience Date: 2021-08-23

2 in total