Literature DB >> 15253419

Using proteomics to mine genome sequences.

Jonathan W Arthur1, Marc R Wilkins.   

Abstract

We present a method for mining unannotated or annotated genome sequences with proteomic data to identify open reading frames. The region of a genome coding for a protein sequence is identified by using information from the analysis of proteins and peptides with MALDI-TOF mass spectrometry. The raw genome sequence or any unassembled contigs of an organism are theoretically cleaved into a number of equal sized but overlapping fragments, and these are then translated in all six frames into a series of virtual proteins. Each virtual protein is then subjected to a theoretical enzymatic digestion. Standard proteomic sample preparation methods are used to separate, array, and digest the proteins of interest to peptides. The masses of the resulting peptides are measured using mass spectrometry and compared to the theoretical peptide masses of the virtual proteins. The region of the genome responsible for coding for a particular protein can then be identified when there are a large number of hits between peptides from the protein and peptides from the virtual protein. The method makes no assumptions about the location of a protein in a particular gene sequence or the positions or types of start and stop codons. To illustrate this approach, all 773 proteins of Pseudomonas aeruginosa contained in SWISS-PROT were used to theoretically test the method and optimize parameters. Increasing the size of the virtual proteins results in an overall improvement in the ability to detect the coding region, at the cost of decreasing the sensitivity of the method for smaller proteins. Increasing the minimum number of matching peptides, lowering the mass error tolerance, or increasing the signal-to-noise ratio of the simulated mass spectrum, improves the ability to detect coding regions. The method is further demonstrated on experimental data from Mycobacterium tuberculosis and is also shown to work with eukaryotic organisms (e.g., Homo sapiens).

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15253419     DOI: 10.1021/pr034056e

Source DB:  PubMed          Journal:  J Proteome Res        ISSN: 1535-3893            Impact factor:   4.466


  8 in total

1.  Mass spectrometry of the M. smegmatis proteome: protein expression levels correlate with function, operons, and codon bias.

Authors:  Rong Wang; John T Prince; Edward M Marcotte
Journal:  Genome Res       Date:  2005-08       Impact factor: 9.043

2.  Identification of a copper-binding metallothionein in pathogenic mycobacteria.

Authors:  Ben Gold; Haiteng Deng; Ruslana Bryk; Diana Vargas; David Eliezer; Julia Roberts; Xiuju Jiang; Carl Nathan
Journal:  Nat Chem Biol       Date:  2008-08-24       Impact factor: 15.040

3.  Sequencing and validation of the genome of a Campylobacter concisus reveals intra-species diversity.

Authors:  Nandan P Deshpande; Nadeem O Kaakoush; Hazel Mitchell; Karolina Janitz; Mark J Raftery; Simone S Li; Marc R Wilkins
Journal:  PLoS One       Date:  2011-07-29       Impact factor: 3.240

4.  Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs.

Authors:  Bradford C Powell; Clyde A Hutchison
Journal:  BMC Bioinformatics       Date:  2006-01-19       Impact factor: 3.169

5.  HybGFS: a hybrid method for genome-fingerprint scanning.

Authors:  Kosaku Shinoda; Nozomu Yachie; Takeshi Masuda; Naoyuki Sugiyama; Masahiro Sugimoto; Tomoyoshi Soga; Masaru Tomita
Journal:  BMC Bioinformatics       Date:  2006-10-29       Impact factor: 3.169

6.  Identification of a Novel Serum Biomarker for Tuberculosis Infection in Chinese HIV Patients by iTRAQ-Based Quantitative Proteomics.

Authors:  Cong Chen; Tao Yan; Liguo Liu; Jianmin Wang; Qi Jin
Journal:  Front Microbiol       Date:  2018-02-26       Impact factor: 5.640

7.  Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions.

Authors:  Jainab Khatun; Yanbao Yu; John A Wrobel; Brian A Risk; Harsha P Gunawardena; Ashley Secrest; Wendy J Spitzer; Ling Xie; Li Wang; Xian Chen; Morgan C Giddings
Journal:  BMC Genomics       Date:  2013-02-28       Impact factor: 3.969

Review 8.  Computational methods for protein identification from mass spectrometry data.

Authors:  Leo McHugh; Jonathan W Arthur
Journal:  PLoS Comput Biol       Date:  2008-02       Impact factor: 4.475

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.