| Literature DB >> 21553863 |
Suzanne S Fei1, Phillip A Wilmarth, Robert J Hitzemann, Shannon K McWeeney, John K Belknap, Larry L David.
Abstract
Decades of genetics research comparing mouse strains has identified many regions of the genome associated with quantitative traits. Microarrays have been used to identify which genes in those regions are differentially expressed and are therefore potentially causal; however, genetic variants that affect probe hybridization lead to many false conclusions. Here we used spectral counting to compare brain striata between two mouse strains. Using strain-specific protein databases, we concluded that proteomics was more robust to sequence differences than microarrays; however, some proteins were still significantly affected. To generate strain-specific databases, we used a complete database that contained all of the putative genetic isoforms for each protein. While the increased proteome coverage in the databases led to a 6.8% gain in peptide assignments compared to a nonredundant database, it also necessitated the development of a strategy for grouping similar proteins due to a large number of shared peptides. Of the 4563 identified proteins (2.1% FDR), there were 1807 quantifiable proteins/groups that exceeded minimum count cutoffs. With four pooled biological replicates per strain, we used quantile normalization, ComBat (a package that adjusts for batch effects), and edgeR (a package for differential expression analysis of count data) to identify 101 differentially expressed proteins/groups, 84 of which had a coding region within one of the genomic regions of interest identified by the Portland Alcohol Research Center.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21553863 PMCID: PMC3128464 DOI: 10.1021/pr200133p
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1Experimental Design and Spearman-Rank clustering of samples before normalization, after normalization, and after batch adjustment. Striata from six mice were pooled for each sample to reduce within-strain variation and to obtain enough protein. Batch 1 contained samples B61, B62, D21, and D22. Batch 2 contained samples B63, B64, D23, and D24. The strains formed B6 and D2 clusters only after applying batch corrections.
Comparison of Strategies for Grouping Similar Proteinsa
| grouping strategy | percent of peptides shared | total number of groups | number of groups with >10 counts | percent of groups containing any shared peptides | percent of groups containing only one protein | number of groups differentially expressed ( |
|---|---|---|---|---|---|---|
| No grouping | 31.16% | 4593 | 2583 | 52.03% | 100.00% | 116/16 |
| Baseline grouping (1/1) | 11.94% | 3264 | 2405 | 33.76% | 77.51% | 120/17 |
| Light grouping (1/5) | 6.84% | 2998 | 2329 | 26.66% | 70.92% | 119/17 |
| Swiss-Prot search with no grouping | 4.78% | 2976 | 2201 | 27.21% | 100.00% | 110/16 |
| Moderate grouping (2/10) | 4.62% | 2885 | 2259 | 22.13% | 69.06% | 123/16 |
| Ensembl family grouping | 0.59% | 2343 | 1808 | 4.54% | 55.65% | 101/19 |
| Aggressive grouping | 0.00% | 2579 | 1958 | 0.00% | 63.31% | 111/14 |
Grouping label (2/10) indicates that two proteins with any shared peptides are merged unless they each have 2 exclusive peptides with a total of 10 exclusive peptide counts to distinguish between them.
The “no grouping” protein set includes redundant proteins.
Figure 2Histogram of the number of additional unique peptides identified when using Ensembl vs Swiss-Prot. Only the cases where one Swiss-Prot protein mapped to one Ensembl family that represented two or more isoforms (and where additional peptides were found using Ensembl or Swiss-Prot) are shown.
Figure 3Protein families found to be significantly different between strains. Gray circles represent all of the data. Black open circles represent a p-value of less than 0.05. Black closed circles represent an FDR-adjusted q-value of less than 0.05. Quantile normalized and batch adjusted data is shown, but a plot of the raw data was similar.
Number of Significantly Differentially Expressed Protein Families that have at least One Protein that was Identified in the Data Set and that Lie within a Region of the Genome Found to be Associated with the Given Phenotype
| quantitative phenotype | ||
|---|---|---|
| Acute Alcohol Withdrawal | 13 | 3 |
| Alcohol Acceptance | 5 | 2 |
| Alcohol Metabolism | 14 | 3 |
| Alcohol Preference Drinking | 54 | 10 |
| Alcohol Response Conditioning | 21 | 1 |
| Alcohol Stimulated Activity | 65 | 13 |
| Chronic Alcohol Withdrawal | 24 | 3 |
| Hypothermia | 6 | 2 |
| Loss of Righting Reflex | 12 | 3 |
Effect of a Single Amino Acid Substitution on Protein Family ENSFM00250000001899a
| protein ID: ENSMUSP00000068260 | reference DB | D2 DB | ||||||
|---|---|---|---|---|---|---|---|---|
| peptide sequence | B6–1 | B6–2 | B6–3 | B6–4 | D2–1 | D2–2 | D2–3 | D2–4 |
| ELSGLPSGPSVGSGPPPPPPGPPPPPI | 0 | 0 | 0 | 0 | 10 | 6 | 10 | 6 |
| ELSGLPSGPSVGSGPPPPPPGPPPPPI | 5 | 8 | 8 | 10 | 0 | 0 | 0 | 0 |
Using the Ensembl reference database, this family was considered differentially expressed with a total of 185 counts in the B6 strain and 148 counts in the D2 strain (edgeR, p = 0.0077). Using the D2 database on the D2 samples increased the D2 counts to 180, making the family no longer significant (p = 0.20). This change is due to the single amino acid substitution S242P in protein ENSMUSP00000068260.