| Literature DB >> 23142869 |
Vanessa C Evans1, Gary Barker, Kate J Heesom, Jun Fan, Conrad Bessant, David A Matthews.
Abstract
Identification of proteins by tandem mass spectrometry requires a reference protein database, but these are only available for model species. Here we demonstrate that, for a non-model species, the sequencing of expressed mRNA can generate a protein database for mass spectrometry-based identification. This combination of high-throughput sequencing and protein identification technologies allows detection of genes and proteins. We use human cells infected with human adenovirus as a complex and dynamic model to demonstrate the robustness of this approach. Our proteomics informed by transcriptomics (PIT) technique identifies >99% of over 3,700 distinct proteins identified using traditional analysis that relies on comprehensive human and adenovirus protein lists. We show that this approach can also be used to highlight genes and proteins undergoing dynamic changes in post-transcriptional protein stability.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23142869 PMCID: PMC3581816 DOI: 10.1038/nmeth.2227
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Reads generated and mapped to the human, adenovirus and papilloma virus genomes.
Total number of paired end reads at each time point is listed along with how many of those reads mapped to a unique site in either a female human genome (hg19 less chromosome Y), the adenovirus type 5 genome or papilloma virus type 18 genome – part of which is integrated into the HeLa cell genome. In all cases we only consider reads where both ends in a pair map to the target genome in the correct orientation and to opposite strands as expected for a correctly mapped pair of sequence reads.
| Uninfected HeLa cells | 8 hours post infection | 24 hours post infection | |
|---|---|---|---|
| Total reads generated | 29,552,473 | 26,220,901 | 26,251,561 |
| Reads uniquely mapped | 18,097,929 | 16,325,343 | 3,183,200 |
| Reads uniquely mapped | 187 | 521,731 | 15,134,568 |
| Reads uniquely mapped | 45,088 | 18,755 | 634 |
Identification of peptides and proteins using different protein datasets.
Five different lists of proteins were used as the reference list to search the MS/MS spectra using MaxQuant. In all cases the search list included a standard list of known contaminants and a list of reversed proteins to act as a decoy that allowed the false discovery rate to be set at 1%. For the canonical protein lists (Ensembl or Swissprot) we added a list of human adenovirus proteins as well so that we can compare the Trinity list (which will contain adenovirus sequences) on a like for like basis. The adenovirus proteins were derived from the GenBank entry for adenovirus type 5 (AC_000008.1). In each case, the percentage quoted refers to the number of peptides present in both lists as a proportion of the total number of peptides detected in the canonical ENSGs list.
| Canonical | ENSGs detected | ENSTs | Trinity derived ORFS | SwissProt-Uniprot | |
|---|---|---|---|---|---|
| Total number of | 29,371 | 28,862 | 28,862 | 28,827 | 29,512 |
| As a | 100% | 98.2% | 98.2% | 95.6% | 99.6% |
| Distinct protein | 3,415 | 3,373 | 3,373 | 3,595 | 3,443 |
| Peptides | 0 | 454 | 454 | 754 | 257 |
| Number of | 21,173 | 14,537 | 29,287 | 80,648 | 72,049 |
| Total number of | 11,633,994 | 8,828,371 | 15,690,432 | 11,305,091 | 32,897,704 |
| Total number of | 420,069 | 418,430 | 418,430 | 414,616 | 421,031 |
Figure 1Illustration of data integration between the transcriptome and the proteome
Image taken from the IGV viewer showing a SAM alignment file generated by GMAP using Trinity derived sequences. In addition we show the data from the custom GFF3 file that allows us to see what peptides were identified by MS/MS, their location on the transcript and genome. For each peptide identified the yellow box (arrowed) appears once the mouse pointer is over the peptide and in each case lists the peptide sequence, the confidence score, and the ratios at different time points. In the middle of the screenshot there are the refseq annotated isoforms of NPM1. Note that the same peptide is flagged multiple times as it belongs to one of several Trinity assembled transcripts.
Figure 2Adenovirus induces degradation of POLDIP3 in a manner sensitive to MG132 and redistribution of POLDIP3 in infected cells
a) Samples of adenovirus infected or uninfected HeLa cells were tested by western blot for expression of POLDIP3. These samples are biological repeats in the presence of either DMSO or MG132 in DMSO. Equivalence of loading is shown by the GAPDH control.
The top row of panels (part b) shows the normal distribution of HA tagged POLDIP3 (green) in uninfected HeLa cells. The middle row (c) shows the distribution of HA-POLDIP3 (green) in wild type adenovirus infected cells. The adenovirus DNA binding protein, DBP (in red), is clearly visible in the nuclei of cells. The final row of cells (d) shows the distribution of HA-POLDIP3 (red) in cells infected with adenovirus mutant dl306 which lacks the E4 region of the virus but still expresses DBP (in green). In all cases the infected cells were fixed at 24 hours post infection, the white bar represents 10um and the cell nuclei are stained with DAPI in blue.