Literature DB >> 29746514

False discovery rate estimation and heterobifunctional cross-linkers.

Lutz Fischer1, Juri Rappsilber1,2.   

Abstract

False discovery rate (FDR) estimation is a cornerstone of proteomics that has recently been adapted to cross-linking/mass spectrometry. Here we demonstrate that heterobifunctional cross-linkers, while theoretically different from homobifunctional cross-linkers, need not be considered separately in practice. We develop and then evaluate the impact of applying a correct FDR formula for use of heterobifunctional cross-linkers and conclude that there are minimal practical advantages. Hence a single formula can be applied to data generated from the many different non-cleavable cross-linkers.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29746514      PMCID: PMC5944926          DOI: 10.1371/journal.pone.0196672

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Cross-linking mass-spectrometry (CLMS) has become an increasingly popular tool for analyzing protein structures, protein networks and protein dynamics[1-4]. Recently the question of what is the correct error estimation to use with CLMS has been addressed with the help of a target-decoy database approach[5], based on previous work for cross-linked[6,7] and linear peptides[8-11]. This approach to estimating a false discovery rate (FDR) of cross-links is based on the assumption that the cross-linker used is homobifunctional, i.e. have the same reactive group on either end. However, heterobifunctional cross-linkers are also used in the field, for example 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide hydrochloride (EDC)[12-15] or succinimidyl 4,4'-azipentanoate (SDA)[15-21]. It is unclear how far these cross-linker choices affect FDR estimation as they do link different amino acids and consequently one has to consider different search spaces for each site of the cross-linker. Here, we provide some theoretical insights on extending the target-decoy approach to FDR estimation when using heterobifunctional cross-linkers, and assess whether it is necessary to use a different formula for FDR estimation. Note that these considerations are for non-cleavable cross-linkers. While MS-cleavable cross-linkers with independent identification of both peptides could be treated the same way, by taking the two identifications as one combined identification, they are currently handled differently for FDR estimation[22,23].

Results and discussion

Currently, the most commonly used cross-linkers are non-directional, e.g. when looking at a mass-spectrum of a cross-linked peptide, there is no means to distinguish a cross-link that was formed as peptide A linked to peptide B, than from a cross-link formed as peptide B linked to peptide A. But the most commonly used formula[24-28] is actually for directional cross-links[5]. Here TT is the number of observed target-target matches (both cross-linked peptides come from the target database), TD is the number of observed target-decoy matches (one linked site comes from the target database and one from the decoy database) and DD stands for the number of decoy-decoy matches (both peptide matches are from the decoy database). A correct formula for the more commonly used non-directional cross-linker (e.g. BS3 or DSS) would be[5]: This formula requires knowledge of the number of possible target-decoy pairs in the initial search database (TD). However, the error made by using formula 1 approaches zero relatively fast with increasing database size. Therefore in practical terms the directional formula is also applicable to data of non-directional cross-linkers such as BS3 or DSS. Directionality (or the lack of it) is not the only property of a cross-linker. Cross-linkers can also be homobifunctional or heterobifunctional. For homobifunctional cross-linkers, any peptide in the database that can react with one side of the cross-linker, can also react with the other side. For heterobifunctional cross-linker that is not the case, which has consequences for constructing the target and decoy search space. It leads to distinct databases (set of peptides or residue pairs) for each side of the cross-linker. The formulas used previously, assume a homobifunctional cross-linker. A set of considerations (see supporting information S1 File) leads us to an FDR estimation formula for non-directional, heterobifunctional cross-linkers: Besides the observed target-target (TT), target-decoy and decoy-target (TD), and decoy-decoy matches, it needs a set of parameters describing the search database (Table 1). As formula 2 can be simplified to formula 1 in all practical terms we wondered how big an error would occur when also using the much simpler formula for directional, homobifunctional cross-linkers (formula 1), in place of formula 3.
Table 1

Formula symbols.

SymbolMeaning
TaTarget entries in the database linkable by side A of the cross-linker
TbTarget entries in the database linkable by side B of the cross-linker
TabTarget entries in the database linkable by both sides the cross-linker
DaDecoy entries in the database linkable by side A of the cross-linker
DbDecoy entries in the database linkable by side B of the cross-linker
DabDecoy entries in the database linkable by both sides the cross-linker
TTObserved target target matches with
TDObserved target decoy and decoy target matches
DDObserved decoy-decoy matches
The error appears once matches with two decoy peptides are encountered. Before then, one arrives at the same FDR value with formula 3 and 1. Up to this point we have a linear problem (Fig 1a), as we can use the decoys only to model the hits with one wrongly identified partner, and overlook any match to two wrongly identified partners. Statistically, these will be rare, however they are not modeled until a significant number of decoy-decoy matches are encountered.
Fig 1

Random search spaces for false positive matches.

To model matches where one correct and one incorrect partner are combined requires considering a linear random match space (A). In contrast, when modelling matches with two incorrect partners it requires construction of a quadratic random match space depending on whether the cross-linker is homodimeric, non-directional (B), homodimeric, directional (C), heterodimeric, non-directional (D), or heterodimeric, directional (E).

Random search spaces for false positive matches.

To model matches where one correct and one incorrect partner are combined requires considering a linear random match space (A). In contrast, when modelling matches with two incorrect partners it requires construction of a quadratic random match space depending on whether the cross-linker is homodimeric, non-directional (B), homodimeric, directional (C), heterodimeric, non-directional (D), or heterodimeric, directional (E). The situation changes once matches with two decoys are encountered. Here we start modeling how likely we have hits with two wrongly matched partners. The random space for a non-directional heterobifunctional cross-linker is somewhere between the directional and nondirectional spaces for the homobifunctional cross-linker (Fig 1b). In fact the larger the non-overlap is between the two sites of the cross-linker—and therefore the smaller Tab and Dab are—the closer it behaves like a directional, homobifunctional cross-linker and the simplification of formula 1 applies. The error made when using formula 1 for heterobifunctional cross-linkers is smaller than the error made when using formula 1 for non-directional homo-bifunctional cross-linkers (Fig 2). Already, at 200 entries (i.e. peptide, linkable residues or proteins, depending of what level the FDR should be estimated on[5]) in the database, even for a 100% overlap between both sides of the cross-linker (effectively resulting in a directional homobifunctional cross-linker) the error of FDR estimation incurred by using formula 1 instead of formula 3 should not exceed 1%. For example when cross-linking human serum albumin (HSA Uniprot:P02768), which has 585 residues in the active form, of which 129 are Lysine, Serine, Threonine or Tyrosine and the protein amino terminus, with SDA, the maximal error resulting from using formula 1 should be less than 0.2% from the estimated FDR—i.e. 5% would be <5.01% (Table 2). This error is usually smaller than the actual resolution of the FDR estimation[5]. Considering EDC in a second example: there is a 100% non-overlap between both sides of the cross-linker (Lysine, Serine, Threonine, Tyrosine, and the protein amino terminus on one side and Glutamic acid, Aspartic acid, and the protein carboxy terminus on the other side). An FDR calculation using formula 1 would result in the same estimate as using formula 3. At the level of peptides, the situation would look slightly different. Taking HSA cross-linked with EDC and a tryptic digest with four missed cleavages would result in 23 peptides exclusively for one side (Ta), 31 peptides for the other side (Tb) and 329 peptides (Tab) that could be linked to either side of the cross-linker. This would lead to a maximal error of around 0.45% (i.e. 5% would become 5.023%).
Fig 2

Maximal error from using formula 1.

Maximal expected error when using formula 1, exemplified for the extreme case of every possible combination of links being observed. X-axis is the size of the database and Y-axis is the maximal error. The green and blue line give the border cases of 0% overlap for both sides of the cross-linker and 100% overlap respectively. The gray area represents possible errors for all cross-linker with partial overlap. Residue-level for HSA cross-linked SDA (dark red dot) and HSA cross-linked with EDC (light red dot) are given as reference.

Table 2

Examples of maximal expected error when using the simple formula for HSA, cross-linked with either EDC or SDA.

Cross-LinkerLevelTaTbTabMaximal ErrorFormula 1Formula 3
SDAresidue pairs04551300.19%5.00%5.01%
peptide pairs0273600.48%5.00%5.02%
EDCresidue pairs9913000.00%5.00%5.00%
peptide pairs23313290.45%5.00%5.02%

Maximal error from using formula 1.

Maximal expected error when using formula 1, exemplified for the extreme case of every possible combination of links being observed. X-axis is the size of the database and Y-axis is the maximal error. The green and blue line give the border cases of 0% overlap for both sides of the cross-linker and 100% overlap respectively. The gray area represents possible errors for all cross-linker with partial overlap. Residue-level for HSA cross-linked SDA (dark red dot) and HSA cross-linked with EDC (light red dot) are given as reference. In conclusion, from a theoretical point of view formula 3 is to be used for FDR estimations when working with heterobifunctional cross-linkers. However, for all practical purposes, the simpler formula 1 gives an approximation with an error smaller than the resolution of FDR estimation.

Derivation of formula.

(DOCX) Click here for additional data file.
  27 in total

1.  False discovery rate estimation for cross-linked peptides identified by mass spectrometry.

Authors:  Thomas Walzthoeni; Manfred Claassen; Alexander Leitner; Franz Herzog; Stefan Bohn; Friedrich Förster; Martin Beck; Ruedi Aebersold
Journal:  Nat Methods       Date:  2012-07-08       Impact factor: 28.547

Review 2.  Crosslinking and Mass Spectrometry: An Integrated Technology to Understand the Structure and Function of Molecular Machines.

Authors:  Alexander Leitner; Marco Faini; Florian Stengel; Ruedi Aebersold
Journal:  Trends Biochem Sci       Date:  2015-12-01       Impact factor: 13.807

3.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome.

Authors:  Junmin Peng; Joshua E Elias; Carson C Thoreen; Larry J Licklider; Steven P Gygi
Journal:  J Proteome Res       Date:  2003 Jan-Feb       Impact factor: 4.466

4.  Architecture and conformational switch mechanism of the ryanodine receptor.

Authors:  Rouslan G Efremov; Alexander Leitner; Ruedi Aebersold; Stefan Raunser
Journal:  Nature       Date:  2014-12-01       Impact factor: 49.962

Review 5.  XL-MS: Protein cross-linking coupled with mass spectrometry.

Authors:  Andrew N Holding
Journal:  Methods       Date:  2015-06-12       Impact factor: 3.608

6.  Interactions of the transmembrane polymeric rings of the Salmonella enterica serovar Typhimurium type III secretion system.

Authors:  Sarah Sanowar; Pragya Singh; Richard A Pfuetzner; Ingemar André; Hongjin Zheng; Thomas Spreter; Natalie C J Strynadka; Tamir Gonen; David Baker; David R Goodlett; Samuel I Miller
Journal:  MBio       Date:  2010-08-03       Impact factor: 7.867

7.  A pseudo-atomic model for the capsid shell of bacteriophage lambda using chemical cross-linking/mass spectrometry and molecular modeling.

Authors:  Pragya Singh; Eri Nakatani; David R Goodlett; Carlos Enrique Catalano
Journal:  J Mol Biol       Date:  2013-06-25       Impact factor: 5.469

8.  Isotopically-coded short-range hetero-bifunctional photo-reactive crosslinkers for studying protein structure.

Authors:  Nicholas I Brodie; Karl A T Makepeace; Evgeniy V Petrotchenko; Christoph H Borchers
Journal:  J Proteomics       Date:  2014-09-02       Impact factor: 4.044

9.  The beginning of a beautiful friendship: cross-linking/mass spectrometry and modelling of proteins and multi-protein complexes.

Authors:  Juri Rappsilber
Journal:  J Struct Biol       Date:  2010-10-26       Impact factor: 2.867

10.  Optimized Fragmentation Regime for Diazirine Photo-Cross-Linked Peptides.

Authors:  Sven H Giese; Adam Belsom; Juri Rappsilber
Journal:  Anal Chem       Date:  2016-08-04       Impact factor: 6.986

View more
  4 in total

1.  Quantitative Photo-crosslinking Mass Spectrometry Revealing Protein Structure Response to Environmental Changes.

Authors:  Fränze Müller; Andrea Graziadei; Juri Rappsilber
Journal:  Anal Chem       Date:  2019-07-05       Impact factor: 6.986

2.  Transcriptome sequencing analysis revealed the molecular mechanism of podoplanin neutralization inhibiting ischemia/reperfusion-induced microglial activation.

Authors:  Shuang Qian; Lei Qian; Ye Yang; Jie Cui; Yiming Zhao
Journal:  Ann Transl Med       Date:  2022-06

3.  Molecular architecture of the augmin complex.

Authors:  Clinton A Gabel; Zhuang Li; Andrew G DeMarco; Ziguo Zhang; Jing Yang; Mark C Hall; David Barford; Leifu Chang
Journal:  Nat Commun       Date:  2022-09-16       Impact factor: 17.694

4.  Reliable identification of protein-protein interactions by crosslinking mass spectrometry.

Authors:  Swantje Lenz; Ludwig R Sinn; Francis J O'Reilly; Lutz Fischer; Fritz Wegner; Juri Rappsilber
Journal:  Nat Commun       Date:  2021-06-11       Impact factor: 14.919

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.