| Literature DB >> 23676674 |
Linfeng Wu1, Sophie I Candille, Yoonha Choi, Dan Xie, Lihua Jiang, Jennifer Li-Pook-Than, Hua Tang, Michael Snyder.
Abstract
Gene expression differs among individuals and populations and is thought to be a major determinant of phenotypic variation. Although variation and genetic loci responsible for RNA expression levels have been analysed extensively in human populations, our knowledge is limited regarding the differences in human protein abundance and the genetic basis for this difference. Variation in messenger RNA expression is not a perfect surrogate for protein expression because the latter is influenced by an array of post-transcriptional regulatory mechanisms, and, empirically, the correlation between protein and mRNA levels is generally modest. Here we used isobaric tag-based quantitative mass spectrometry to determine relative protein levels of 5,953 genes in lymphoblastoid cell lines from 95 diverse individuals genotyped in the HapMap Project. We found that protein levels are heritable molecular phenotypes that exhibit considerable variation between individuals, populations and sexes. Levels of specific sets of proteins involved in the same biological process covary among individuals, indicating that these processes are tightly regulated at the protein level. We identified cis-pQTLs (protein quantitative trait loci), including variants not detected by previous transcriptome studies. This study demonstrates the feasibility of high-throughput human proteome quantification that, when integrated with DNA variation and transcriptome information, adds a new dimension to the characterization of gene expression regulation.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23676674 PMCID: PMC3789121 DOI: 10.1038/nature12223
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Fig. 1Overview of workflow and protein association with ethnicity
a) Flow chart of experimental scheme. In each experiment, peptide digests from a reference cell line (GM12878) and five other cell lines were each labeled with one of the TMT-sixplex tags. Labeled peptides were equally mixed and subjected to identification and quantification by mass spectrometry, and then used for protein quantification. A total of 51 experiments were performed.
b) The P value distribution for the difference in protein levels between CEU and YRI shows enrichment at small P values.
c) P value of protein level differences between CEU and YRI plotted as a function of the genomic coordinate for each protein. The dashed line is at significance threshold Bonferroni P = 0.05. All the proteins that passed the threshold are highlighted with larger dots and labeled with gene names. Proteins that differed between CEU and YRI are distributed throughout the genome.
Fig. 2Protein covariation network generated by sparse partial correlation estimation
Nodes represent proteins. Edges represent connection by covariation. This sparse network displays the 223 strongest connections among 278 proteins. Protein function was annotated by node color. Edge color was categorized according to correlation value. Known protein-protein interacting pairs were highlighted in larger nodes and labeled with gene names.
Fig. 3Loci associated with protein expression levels
a) Identification of cis-pQTLs in all three populations combined (n=72). The P value and genomic coordinates for each protein/cis-SNP association test were plotted in the Manhattan plot. pQTLs with max(T) corrected P value < 0.001 were highlighted with a bigger dot size and a black outline. Multiple loci throughout the genome displayed an excess of small P values. Arrow indicates the location of the IMPA1 gene which contains a significant cis-pQTL.
b) Overview of IMPA1 protein level and SNP genotype association in CEU, YRI, and all populations combined. The bottom plot is the fine mapping of cis-pQTL for IMPA1 based on HapMap I, II and III genotypes release 28. Each dot represents a tested SNP. Dot colors represent testing groups. The arrow is indicative of the chromosome location and transcription direction of the IMPA1 gene. There are several highly significant associations near the IMPA1 region in CEU and all populations combined. The exact locations of these associations in the IMPA1 gene region are illustrated in the top plot. The most significant SNP is rs1058401, located in IMPA1 3′UTR.
c) Validation of IMPA1 protein expression level. IMPA1 protein expression level was validated by immunoblotting in 11 CEU individuals, with their genotype at rs1058401 labeled at the bottom.
d) The bar plots show the mean of IMPA1 protein level of these 11 individuals in each rs1058401 genotype, based on data measured by quantitative mass spectrometry and by densitometry of immunoblot blots. Error bar, standard error of the mean. M.S., mass spectrometry. Im., immunoblotting.
Number of cis-pQTLs at different FDR
| group | No. of LCLs | No. of proteins | No. of tests | No. of genes with a pQTL | ||
|---|---|---|---|---|---|---|
| 10% FDR | 20% FDR | 30% FDR | ||||
| CEU | 41 | 3,984 | 116,556 | 33 | 54 | 122 |
| YRI | 22 | 4,017 | 121,405 | 13 | 34 | 50 |
| 3 pop. | 72 | 4,021 | 130,505 | 77 | 134 | 239 |