| Literature DB >> 21695224 |
Anatole Ghazalpour1, Brian Bennett, Vladislav A Petyuk, Luz Orozco, Raffi Hagopian, Imran N Mungrue, Charles R Farber, Janet Sinsheimer, Hyun M Kang, Nicholas Furlotte, Christopher C Park, Ping-Zi Wen, Heather Brewer, Karl Weitz, David G Camp, Calvin Pan, Roumyana Yordanova, Isaac Neuhaus, Charles Tilford, Nathan Siemers, Peter Gargalovic, Eleazar Eskin, Todd Kirchgessner, Desmond J Smith, Richard D Smith, Aldons J Lusis.
Abstract
The relationships between the levels of transcripts and the levels of the proteins they encode have not been examined comprehensively in mammals, although previous work in plants and yeast suggest a surprisingly modest correlation. We have examined this issue using a genetic approach in which natural variations were used to perturb both transcript levels and protein levels among inbred strains of mice. We quantified over 5,000 peptides and over 22,000 transcripts in livers of 97 inbred and recombinant inbred strains and focused on the 7,185 most heritable transcripts and 486 most reliable proteins. The transcript levels were quantified by microarray analysis in three replicates and the proteins were quantified by Liquid Chromatography-Mass Spectrometry using O(18)-reference-based isotope labeling approach. We show that the levels of transcripts and proteins correlate significantly for only about half of the genes tested, with an average correlation of 0.27, and the correlations of transcripts and proteins varied depending on the cellular location and biological function of the gene. We examined technical and biological factors that could contribute to the modest correlation. For example, differential splicing clearly affects the analyses for certain genes; but, based on deep sequencing, this does not substantially contribute to the overall estimate of the correlation. We also employed genome-wide association analyses to map loci controlling both transcript and protein levels. Surprisingly, little overlap was observed between the protein- and transcript-mapped loci. We have typed numerous clinically relevant traits among the strains, including adiposity, lipoprotein levels, and tissue parameters. Using correlation analysis, we found that a low number of clinical trait relationships are preserved between the protein and mRNA gene products and that the majority of such relationships are specific to either the protein levels or transcript levels. Surprisingly, transcript levels were more strongly correlated with clinical traits than protein levels. In light of the widespread use of high-throughput technologies in both clinical and basic research, the results presented have practical as well as basic implications.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21695224 PMCID: PMC3111477 DOI: 10.1371/journal.pgen.1001393
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1A schematic representation of the experimental design.
97 inbred and recombinant inbred strains in the HMDP panel were utilized to study the relationships between transcripts, proteins, and clinical traits. The relationships between proteins and transcripts were assessed at the biological level by the overall correlation across datasets, and at the genetic level by comparing the genome-wide association profiles of the two datasets. The biological relationship between the transcripts and proteins was also assessed in the context of the physiological phenotypes by relating these two datasets to the 42 clinical traits measured in the HMDP panel.
Figure 2Proteome and transcriptome data quality.
A) Reliability of peptide measurement in LC-MS. The distribution of variance among the technical replicates in the LC-MS data (grey plot) and in the HMDP population (blue plot). B) The frequency of peptides with varying amount as defined by the “signal to noise” ratio. C) Distribution of heritability (fraction of total variance attributed to genetics) in the transcript dataset. The dashed line depicts the significant heritability estimates (p-value<0.05) D) Comparison of Affymetrix data with the Next Generation Sequencing data. E) Number of peptides per gene in the filtered peptide dataset.
Figure 3Relationships between protein levels and transcript levels.
A) Histogram of correlation coefficients computed peptides and probesets representing the same gene. The median correlation coefficient is 0.27. B) Classification of correlations between probeset-peptides based on signal to noise ratio in the peptide data (larger signal to noise depicts less technical variation in the peptide measurement).
Figure 4Isoform-specific analysis of peptide data.
A) An example of differential regulation of isoforms detected in the LC-MS data. Top panel, comparison of similarity in expression variation of 20 peptides measured for Acox1. Grey plots illustrate the expression variation among inbred mice for 19 peptides which represent all four Acox1 isoforms. Red plot illustrated the expression profile of the peptide representing the isoforms skipping exon 4. Bottom panel, Ensembl genome browser's schematic representation of four Acox1 isoforms. Arrow points to Acox1-002 isoform which skips exon 4. B) Concordance between Acox1 peptides. The left boxplot depicts correlations among peptides that include Acox1-002 isoform. The right boxplot depicts correlations between the peptide mapping to exon 4 and all other peptides. The scatter points overlaid on each boxplot represent the pair-wise correlation values. C) Exon level analysis of peptide measurements by LC-MS and transcript measurements as measured by NGS in the livers of the B6 and DBA inbred strains. The black dots depict the relationships examined by comparing peptide data to microarray data and the red dots represent the highly significant relations found by peptide comparison with the microarray data. The lines depict the best fit as predicted by linear regression (black line = regression of all peptides, red line = regression of highly significant peptides).
Relationship of PT-pairs with clinical traits.
| # Trait Correlations Unique to Proteins | # Trait Correlations Unique to Transcripts | # Trait Correlations Shared Between Transcripts and Proteins | # Proteins with Significant Correlation (%) | # Transcripts with Significant Correlation (%) | |
| 0.1% FDR | 35 | 272 | 17 | 24 (6) | 122 (31) |
| 1% FDR | 93 | 704 | 71 | 64 (16) | 217 (55) |
| 5% FDR | 322 | 1547 | 234 | 162 (41) | 325 (82) |
Figure 5Relationships between the peptide data and transcript data with clinical traits and biological pathways.
A) Correlations of transcriptome and proteome with clinical traits. A scatter plot of correlation coefficients between 607 probesets and 1343 peptides with 42 clinical traits (peptide-trait correlations are plotted on the x-axis and probeset-trait correlations are plotted on the y-axis). Red points are those correlations which were significant for transcripts only, green points are those correlations which were significant for protein data only and black points are those which were not significant in either of the two datasets. B) Concordance of transcripts and proteins in 115 KEGG biological pathways.
Genome-wide association profiles for the proteome and the transcriptome data.
| Global Analysis | Number of Probesets/Peptides | Total Number of Significant Associations | Number of Probesets/Peptides With at least One Significant Association (% Total Phenotypes) | Number of Probesets/Peptides with Local Associations (% Total Phenotypes) | Number of Distant Associations | Number of Probesets/Peptides With More Than One Significant Association (% Total Phenotypes) |
| Transcriptome | 9896 | 14463 | 6299 (63%) | 2066 (21%) | 12397 | 3651 (37%) |
| Proteome | 1543 | 1368 | 672 (43%) | 144 (9%) | 1224 | 339 (21%) |
Figure 6Global analyses of proteome and transcriptome genetic regulation.
A) Global eQTL profile for the 14463 eQTLs and 1368 pQTLs superimposed on each other. In this plot, larger dots represent protein association and smaller dots represent transcript association. The diagonal line with strong association depicts the local eQTLs and pQTLs and each off-diagonal dot depicts the location of distant eQTLs and pQTLs. B) eQTL landscape for protein and transcript data. For each dataset, the genome was divided into 2 Mb bins and the number of eQTLs (grey) and pQTLs (red) were counted separately in each bin as the windows were slid every 50 kb. The frequency of eQTLs and pQTLs in each window are plotted as the fraction of total significant associations (14463 for transcripts and 1368 for proteins).