Literature DB >> 24400813

Autopilot: an online data acquisition control system for the enhanced high-throughput characterization of intact proteins.

Kenneth R Durbin¹, Ryan T Fellers, Ioanna Ntai, Neil L Kelleher, Philip D Compton.

Abstract

The ability to study organisms by direct analysis of their proteomes without digestion via mass spectrometry has benefited greatly from recent advances in separation techniques, instrumentation, and bioinformatics. However, improvements to data acquisition logic have lagged in comparison. Past workflows for Top Down Proteomics (TDPs) have focused on high throughput at the expense of maximal protein coverage and characterization. This mode of data acquisition has led to enormous overlap in the identification of highly abundant proteins in subsequent LC-MS injections. Furthermore, a wealth of data is left underutilized by analyzing each newly targeted species as unique, rather than as part of a collection of fragmentation events on a distinct proteoform. Here, we present a major advance in software for acquisition of TDP data that incorporates a fully automated workflow able to detect intact masses, guide fragmentation to achieve maximal identification and characterization of intact protein species, and perform database search online to yield real-time protein identifications. On Pseudomonas aeruginosa, the software combines fragmentation events of the same precursor with previously obtained fragments to achieve improved characterization of the target form by an average of 42 orders of magnitude in confidence. When HCD fragmentation optimization was applied to intact proteins ions, there was an 18.5 order of magnitude gain in confidence. These improved metrics set the stage for increased proteome coverage and characterization of higher order organisms in the future for sharply improved control over MS instruments in a project- and lab-wide context.

Entities: Chemical Disease Species

Mesh：

Substances：
Bacterial Proteins

Year: 2014 PMID： 24400813 PMCID： PMC3958137 DOI： 10.1021/ac402904h

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

The high throughput analysis of intact proteins by mass spectrometry has become increasingly relevant due to the recent and rapid acceleration in the development of enabling technologies. Front-end separations[1] have undergone a transformation as methods for solution-based intact protein separation have provided a viable alternative to the bottom up approach of in-gel digestion.[2] These new separation techniques are amenable to lower initial sample amounts and can be coupled to nano-LC prior to mass spectrometric analysis. Additionally, order of magnitude increases in sensitivity and speed enable modern instruments to more adequately handle the rigors of complex mixtures replete with high mass intact proteins.[3] These improvements in separations and instrumentation have transformed top down mass spectrometric analyses from single protein direct infusion experiments to high-throughput proteome-wide analyses up to 100 kDa.[4] Recent analyses have shown confident identification of thousands of proteoforms and nearly 2000 unique accession numbers.[4,5] The focused efforts in separations and instrumentation, along with new and existing bioinformatics tools (e.g., ProSight PC 3.0[6]), have largely bypassed the realm of high-throughput mass spectrometric data acquisition. However, intelligent acquisition of mass spectrometric data holds great promise for increased experimental efficiency. Real-time adjustments to fragmentation parameters driven by decision logic have been successfully incorporated into bottom up proteomics workflows.[7,8] The incorporation of “intelligence” yielded sizable increases in the number of peptides identified in subsequent experiments. For intact protein analysis, intelligent acquisition strategies have been used to increase the number and quality of protein identifications resulting from an LC-MS analysis but relied on additional offline analysis.[9] Extending these efforts to an online data acquisition and analysis platform for TDP offers an exciting way to increase the value of these proteomics data sets. Further development of data driven acquisition will not only increase the efficiency of data collection but also result in more effective use of the data that is generated during an analysis. High-throughput proteomics can generate immense quantities of data; however, each new fragmentation event is typically treated as a single, isolated event instead of part of a data set for a unique mass species. This leads to redundancy in data collection, which hinders deep exploration of the proteome. Additionally, project wide dynamic exclusion instead of an LC-MS run based dynamic exclusion would prevent much unneeded analysis of proteins that are already well characterized. Because of the mismatch of dynamic range between protein expression and mass spectrometers,[10] the very most abundant proteins (e.g., histone and heat shock proteins) can be fragmented thousands of times over the course of a proteomic investigation.[2] Unfortunately, these proteins are frequently fully characterized upon the first fragmentation so additional fragmentation of the protein produces no additional information. Here, we demonstrate an upgraded TDP workflow driven by intelligent data collection through a software termed Autopilot. Online mass detection, supplemented with prior knowledge of past fragmentation events, guides precursor selection and fragmentation. Proteins not sufficiently characterized are subjected to repeat fragmentation with optimized parameters based on a fragmentation logic scheme. Spectral data are searched immediately and the information is added to a proteomic repository for reference in current and future runs. Each distinct mass builds a fragmentation history with new fragmentation data joined with previous data to produce the best possible sequence coverage of proteins. Through accretion of data from different fragmentation types, energies, and charge states, more complete protein characterization can be achieved. Furthermore, proteins which have exceeded a threshold of characterization are excluded from all forthcoming analysis to permit characterization of the maximal number of proteins in the given proteome analysis.

Methods

Cell Culture

Pseudomonas aeruginosa PAO1 (ATCC 15692) was plated on Mueller Hinton II agar plates and incubated at 37 °C. A single colony was inoculated in 50 mL of Mueller Hinton broth (MHB) and incubated overnight at 37 °C. This starter culture was used to inoculate a larger culture at an inoculum to culture ratio of 1:500. Cells were harvested at midlogarithmic growth phase (OD600 ≈ 0.4) by centrifugation at 5000 × g.

Sample Preparation

PAO1 cells were lysed in 4% SDS and 25 mM Tris, pH 7.4 with Halt protease inhibitors (Thermo Pierce, Rockford, IL) and 1 mM DTT. The cell lysate was acetone precipitated with three volumes of cold acetone and centrifuged at max speed for 10 min. The precipitated pellets were dried and 400–500 μg of protein was fractionated on a GELFREE 8100 Fractionation system (Expedeon, San Diego, CA) with either 10% or 12% cartridges. The collected fractions were precipitated with MeOH/CH3Cl/H2O as previously described to remove SDS.[11] Prior to MS analysis, the samples were resuspended in 30 μL of buffer A (95% H2O, 5% acetonitrile, 0.2% formic acid).

LC-MS/MS

Briefly, each GELFrEE fraction was injected onto a 2 cm, 150 μm i.d. PLRP-S trap column and a longer 10 cm, 75 μm i.d. column was used for the online separation as described previously.[12] Both the trap and analytical columns were packed in-house with 5 μm diameter, 1000 Å pore size PLRP-S. An Ultimate 3000 RPLCnano (Thermo Scientific Dionex) with 300 nL/min flow rate setup was used for online chromatography. An LTQ Velos Orbitrap Elite (Thermo Scientific) was the mass spectrometer used for all data acquisition. The resolution settings were 60 000 and 30 000 for precursor and fragmentation scans, respectively, at 400 m/z. An isolation window of 15 m/z was used for mass selection. For the Xcalibur data acquisition, the method implemented precursor and fragmentation scans with four μscans and dynamic exclusion with a repeat count of 1, a repeat duration of 240 s, and an exclusion duration of 5000 s. The maximum inject time for both precursor and fragmentation scans was 1000 and 400 ms for SIM scans. Autopilot utilized 4 μscans for precursor scans and SIM scans, 6 μscans for HCD fragmentation, and 12 μscans for ETD fragmentation. For Autopilot operation, the Thermo LTQ component object model (COM) is used in conjunction with an advanced user’s license for the Orbitrap Elite. Several ion trap control language (ITCL) changes were applied to enable ETD and HCD fragmentation by Autopilot. The source code for the Autopilot logic interface is included in the Supporting Information. Instructions on the implementation of the fragmentation logic, an analysis to infer mass, online search are included. Raw mass spectrometry data has been deposited in the IU Scholar Works Repository. The data is available at http://hdl.handle.net/2022/17234.

Online Data Analysis

Autopilot applies Xtract with a 3 signal/noise cutoff to full precursor scans, SIM scans, and fragmentation scans. For fragmentation data, the system performs online absolute mass searches against the PAO1 database with the precursor mass from Xtract. The search window is a 2000 Da tolerance around the precursor mass while the fragment tolerance is 10 ppm. The search is accomplished with the ProSight search engine and requires a minimum number of 4 matched fragments, with only the top hit (the most confident P-Score) returned from the search.

Offline Data Processing

The LC-MS/MS data files acquired by Xcalibur files were processed with an in-house software, cRAWler. The cRAWler produces ProSight upload format (.puf) files, which contain Xtract detected precursor masses linked with fragment ions from the corresponding data-dependent scan. The completed .puf files are then searched by ProSight PC 3.0 (Thermo Scientific, San Jose, CA) against the PAO1 database. A multitier search was implemented in ProSight PC with an initial small window absolute mass precursor search of 2.2 Da, followed by a large window absolute mass search of 100 000 Da if no hits were found in the first search. For data-dependent Xcalibur fragmentation scans with no associated precursor masses, a 100 000 Da window centered on 55 555.55 Da is searched.

Results and Discussion

Characterization

Increased protein characterization in an automated fashion was the primary goal of this work. However, TDPs currently lacks a “characterization” score. Therefore, the P-score, which is a measure of the confidence of protein identification, was used as an indirect measure of characterization. As described previously, this score generally improves with an increasing number of matched fragment ions, which indicates better characterization. The score also decreases with an increasing number of unmatched fragment ions, breaking the direct link between characterization and this score. While it is recognized that this does not directly equate to characterization, the development of a characterization score is beyond the scope of this work.

Workflow

For mass spectrometric workflows to operate at optimal efficiency, the acquisition platform must utilize real-time and stored information during an analysis to enable the instrument to react to current acquisition needs in a productive manner. To properly guide acquisition, an accurate survey of the current protein elution landscape is of primary importance (Figure 1, top left). From a full MS spectrum, the data points can be directly piped from the instrument into an isotope determination routine, Xtract, which outputs a list of the average and monoisotopic masses present in the spectrum (Figure 1, top right and Figure S1A of the Supporting Information).[13] Along with mass information, Xtract includes the charge state distribution of the species, which enables charge states of a particular species to be grouped as a single “target”. The strategy of isotope analysis for mass detection is suitable for the mass range of proteins the Orbitrap Elite is able to isotopically resolve on an LC-MS time scale. For proteins no longer isotopically resolved (i.e., >30–35 kDa), a short transient FT scan or low resolution ion-trap scan can be combined with charge state deconvolution for intact mass determination.[14] However, this study focuses on lower molecular weight proteins that can be isotopically resolved by the Orbitrap mass spectrometer.

Figure 1

Data acquisition workflow on an example protein. A full precursor scan is taken, followed by HCD fragmentation of the 9+ charge state on the detected mass 7246.26 Da. After an online search, the software determines more analysis should be performed as the P-Score (1.8 × 10–47) is not below the cutoff. An ETD scan of the highest charge state is taken and searched. The fragment ions are combined and the final P-Score of 5.0 × 10–102 is below the cutoff. All charge states of the 7246.26 Da species are permanently excluded from further fragmentation and the system goes in search of the next target mass. The protein species detected by Xtract represent the potential targets for focused fragmentation events. The mass list is parsed to group similar masses and account for simple hardware and software artifacts such as oxidation and off-by-one errors that result from incorrect isotope matches. The mass list is further culled to remove previously fragmented species identified at a high confidence level. Here, a very high degree of confidence cutoff (P-score <1 × 10–50) was selected to showcase the characterization abilities of the software. The remaining targets, sorted by decreasing abundance, are placed in a queue. This first-in, first-out queue supplies a new fragmentation scan definition to the instrument as soon as the data from the previous scan is returned. Upon the initial fragmentation of a species, the lowest charge state is selected for HCD fragmentation with a normalized collision energy (NCE) of 25 (Figure 1, bottom right). The lowest charge is fragmented first as it typically will yield the largest number of unique fragments.[15] The underlying instrument control language translates the NCE into a HCD energy in eV to be applied based on a combination of the NCE value and the m/z of the fragmentation target. We apply Xtract to the fragmentation spectrum for fragment ion detection. After Xtract analysis is complete, a wide-window (2000 Da) absolute mass search of the fragment set is immediately started. The search is against the 5763 gene, 23 176 form pseudomonas PAO1 database with a 10 ppm fragment tolerance. The top hit is added to the fragmentation history of the mass species. When a precursor mass of a previously fragmented species is redetected by Xtract, whether in the same run or a subsequent injection, the history of the species is queried from the proteomic data set. The best previous fragmentation event for the species is compared to an identification confidence cutoff. If the confidence of the identification exceeds the cutoff, the target is placed on an exclusion list. If it is below the set score cutoff, a fragmentation decision tree (Figure S1B of the Supporting Information) charts the course of action to optimize the energy, charge state, and fragmentation technique chosen for the reanalysis of the protein. The resultant spectrum from refragmentation is then processed in a similar fashion to the first fragmentation and the database record for the target is updated accordingly. If no new targets are found in the precursor scan, the platform queues a scan event for the acquisition of a 15 m/z SIM scan. The region selected for the SIM scan is from a part of the spectrum not occupied by an Xtract detected mass. The SIM scan is for discovery purposes, as SIM scans can provide the increased ion statistics required for Xtract to discover new targets. With the noise reduction of the LTQ software, there are frequently areas of the spectrum with no signal and a dedicated SIM scan of these areas can often reveal protein species in this space. In Figure S2 of the Supporting Information, a SIM scan of the highlighted region was taken. While that region had very little signal in the full precursor scan, Xtract detection on the SIM scan produced nine precursor masses, including two methylations. This discovery mode effectively increases the dynamic range of the experiment by preventing the monopolization of the ion current in a given scan by highly abundant protein species.

Acquisition Mode Comparison Study

A comparison of the data acquisition control system described above, named Autopilot, to the standard acquisition software of Thermo mass spectrometers, Xcalibur, was conducted on the low mass GELFrEE fractions from a 10% T GELFrEE fractionation of PAO1. Each fraction was analyzed in technical triplicate with both systems. At a 1% FDR cutoff, Autopilot enabled the confident identification of 234 unique accession numbers while the standard data dependent software identified 230 unique accession numbers, with 180 accession numbers shared between the two data sets (Figure 2A and Table S1 of the Supporting Information).

Figure 2

Comparison between Autopilot and the standard data acquisition software for Thermo mass spectrometers, Xcalibur. (A) All Autopilot identifications were generated online during data acquisition with a 2000 Da absolute mass search mode. Meanwhile, the Xcalibur identifications were generated with offline searches with a search tree capable of up to 100 000 Da absolute mass searches. At a 1% FDR cutoff, Autopilot was able to identify 234 unique accession numbers compared to 230 by Xcalibur. (B) The total analysis time for the online data acquisition and offline data analysis with the Xcalibur acquisition mode was 48 h, with 18 h of instrument time. The time for each run is the LC-MS acquisition time of 90 min plus 30 min for spectral summing and ion detection and another 1–2 h for database searches and reports. Because Autopilot processes the data concurrently with acquisition, the time from first injection to end of analysis is 18 h, which is a 270% increase in overall experimental throughput. (C) The decoy and forward gamma distributions from the calculated instantaneous FDRs for every identification event from both data acquisition methods are shown. Importantly, the identifications from Autopilot were found with searches performed during data acquisition by an absolute mass search with a 2000 Da precursor tolerance. This search strategy was chosen to sufficiently limit the search space to enable searches to be performed on the same desktop computer that controls acquisition and finish within the cycle time of the instrument. The identification mode for Xcalibur utilized an in-house software for precursor and fragment detection followed by a ProSight PC 3.0 search. The ProSight search incorporated a search tree with a first-pass 2.2 Da precursor tolerance absolute mass search and a larger window absolute mass search with a 100 000 Da precursor tolerance if needed. Additionally, for fragmentation scans without an associated precursor, the large window 100 000 Da precursor search was performed. Figure 2A and B demonstrate the advantages of an online search. Even with the much larger search space used in the offline searches, Autopilot produced more protein identifications with the simple online search. Further, Figure 2B compares the total amount of processing time required to obtain a meaningful list of identifications. With Autopilot searches performed simultaneously with acquisition instead of the typical online acquisition, offline analysis approach, the number of identifications/time increases from 4.8 identifications/hour with the conventional workflow to 13 identifications/hour with Autopilot, a 270% increase in throughput. To validate our search results, the instantaneous FDR, a Bayesian posterior p value to measure the FDR for each identification event, was calculated for each Poisson-based P-Score.[4] The instantaneous FDR plots in Figure 2C indicate our directed fragmentation strategy significantly improves the characterization of the identified proteins as the calculated FDRs for each data set are substantially different. The 1% FDR cutoff for Autopilot was 1.1 × 10–5 while the cutoff for Xcalibur was 2.6 × 10–10. This difference can be attributed to the shift of the decoy distribution toward higher P-Scores with a subsequent shift of the forward distribution toward lower, more confident P-Scores. In other words, there are more matched fragments with a corresponding lower number of unmatched fragments, which reinforces the use of P-score as a reasonable substitute in the absence of a true characterization score.

Fragmentation and Reanalysis of Protein Forms

The increase in characterization seen in the comparison experiment is a result of directed reanalysis of protein targets. Because not every protein behaves in a predictable manner, each fragmented species must be examined to determine if additional fragmentation is needed. After the online protein identification discussed above, the top P-Score proteoform from the search is considered the correct hit for the mass. If the top hit is above the confidence cutoff, the mass is excluded from all future analysis; meaning, when detected again, no fragmentation event will be triggered on any charge state of the species. Alternatively, if the fragmentation led to mediocre results that missed the cutoff for a characterized identification, another fragmentation event can be queued. The subsequent fragmentation of a mass target follows a decision tree composed of ETD fragmentation with a variety of different reaction times and HCD fragmentation with energy optimization (Figure S1 of the Supporting Information). The fast speed of HCD coupled with generally robust fragmentation coverage makes HCD an ideal starting point for characterization. The first fragmentation event of any detected mass is an HCD scan on the lowest charge state with an NCE of 25. If more fragmentation is required, a different fragmentation technique is used for the first reanalysis. As ETD and HCD have been shown to be complementary fragmentation techniques,[16] the combination of the two can increase overall sequence coverage. An ETD scan of the highest charge state is therefore applied for the reanalysis. The scan has twice as many μscans as the HCD scans to maximize ion statistics of the many, low abundant fragment ions produced from ETD. The highest charge state is selected because the ETD reaction becomes faster and more efficient as the charge increases on the protein. McLuckey and co-workers have demonstrated that the rate of an ion–ion reaction is dominated by Coulombic interaction.[17] Therefore, the reaction time is calculated by a simple expression: tprecursor = zprecursor2 × (topt/zopt2), where opt indicates a measured optimal reaction time for a species with a known charge state. In this case, 60 ms for a +3 ion was utilized. If the first ETD attempt does not yield the desired coverage, the software will try various reaction times in an effort to better maximize the ETD reaction and fragmentation. When the ETD optimization is complete, the program will shift efforts to HCD fragment energy optimization if the confidence cutoff has still not been reached.

HCD Optimization

A past attempt was made to generalize the HCD fragmentation energy selection for proteins with a method to estimate optimal energy as a function of the protein’s mass and m/z.[18] The equation to govern the choice of energy outlined in that article was developed on a small data set from standard proteins. We applied this equation, along with several other energies (NCE of 20, 25, and 30), across the set of detected proteins from the PAO1 proteome during high-throughput Top Down proteome analysis. We then visualized the P-Scores obtained from the varied HCD energies across the data set with a random set of proteins from the run (Figure S3 of the Supporting Information). The general shape of the energy distributions in relation to the corresponding P-Scores is a defined maximum with steep falloffs on each side. We adapted this new knowledge into our HCD optimization. To accomplish our optimization, different fragmentation energies are surveyed until a local maximum is found. A quadratic is then fit to the maximum and the closest points on each side. The software will continue to fit the points until the best fit is found. In Figure 3A, the software takes HCD scans of 25, 30, and 34 NCE. The scan of 30 NCE produces a P-Score better than the 25 NCE scan. A higher energy scan (34 NCE) is taken in an effort to produce even better coverage. When the 34 NCE scan returns a P-Score worse than the 30 NCE scan, the quadratic is fit to the three points. This fit is repeated two more times until the optimal value is discovered (orange dot, Figure 3A).

Figure 3

(A) Example of the HCD optimization process is shown for the 10+ charge state at 1032 m/z of the 50S ribosomal protein L7 (RL7_PSEAE). Each colored line is a different quadratic fit to find the energy that will produce the best P-Score. The maximum for each fit is the energy applied in the new fragmentation scan. After three quadratic optimizations, the best energy for this protein was determined. Table S2 of Supporting Information lists the energy, matched fragments, and P-Score for each fragmentation event. (B) The gamma distributions for the decoy and forward search spaces. The distribution for those species that were optimized more than three times undergoes a sizable shift to the right, which results in a greater percentage of the masses moving past the 1% cutoff. (C) The number of matched fragments from the most optimized scan are compared to those from the first pass, an HCD fragmentation scan with 25 NCE. (D) A similar graph to (C) but with −Log10(P-Score) shown instead. Figure 3B–D showcases the result of this optimization throughout the entire data set. From 1342 unique mass species, 408 were targeted for refragmentation at least once and 204 were retargeted more than three times. Figure 3B illustrates the shift in the forward P-Score gamma distributions upon optimization. After full HCD optimization, the forward distribution has shifted very significantly to the right of the previous distributions. If applied to a completely optimized, full proteomic investigation, many species would be shifted from the unacceptable FDR range to a confident value. Additionally, the −log10 of the P-Scores were continually increased on average as the proteins were reanalyzed with different HCD energies (Figure 3D). The mean change to the −log10 value of the P-Score was 4.0, 6.0, and 18.5 after one, two, and three or more reanalyses, respectively. As evidenced by panel C, there is an almost universal improvement in the characterization of proteins with repeated, directed fragmentation. Furthermore, panel D indicates that, generally, the increase in number of matched fragment ions does not come at the expense of more unmatched fragment ions. While the end result of Figure 3A is a modestly improved P-Score, the difference between a protein falling outside of the FDR cutoff can often be only a couple of fragment ions. A fragmentation optimization of proteins on the boundary of the confidence cutoff can easily lead to gains that can bring the protein identification into acceptable confidence levels.

Fragmentation History

By analyzing data and intelligently directing acquisition online, metadata generated at runtime may be used to improve the quality of information extracted from a data set. If fragmentation scans were previously performed (during any analysis in a project) on the mass of interest, the former fragment spectra corresponding to that mass can be combined with the new fragmentation spectrum through spectral averaging of the centroid data. The summation of scans increases signal-to-noise of real ions and can lead to detection of new fragment ions. From there, the averaged scan is Xtracted and searched. An improved version of the ProSight search engine, capable of database retrieval on the different sets of ions generated by the different fragmentation techniques in the same search, completes the search. The combination of CID or HCD with ETD ions from the same protein form bolsters identification confidence and improves overall protein characterization. The confidence score for the combined ion types is unchanged if we make a simplification to the score assignment. The score used is similar to the Poisson distribution calculation outlined in Meng et al.:[19]P = ((xf) × e–/n!) where x = (1/111.1) × 2 × (Ma × 2), f is the number of detected fragments, and n is the number of random fragment ion matches. The first factor of 2 in x is the number of different ion types in the search type. With separate ETD and HCD searches, the final results can be pooled together and scored with this P-Score calculation if Y ions generated during ETD are ignored. That is, the ETD search will have two ion types, C and Z, and the HCD search will have two ion types, B and Y. Typically, only a small number of Y ions are produced by ETD so any detrimental effects of extra unmatched fragment ions will be minor to the overall score. On one PAO1 GELFrEE run, significant improvements were produced in the coverage of the masses that were targeted again (Figure 4). The median improvement for the overall combined history in comparison to the individual target was 13 orders of magnitude. The mean improvement was even larger, with a gain of >42 orders of magnitude in confidence. In terms of matched fragments, the median coverage had 29.0 more matched fragments while the mean had 41.1 more matched fragments. A few species had lower P-Scores or less matched fragments because of a poorly fragmented protein spectrum being averaged with the best previous individual fragmentation spectrum. However, this combined result would be ignored in favor of the best individual spectrum upon selection for further fragmentation, but is included for illustration purposes here. Also, an increase in number of matched fragments is not always accompanied by a commensurate increase in P-Score. This can be attributed to a higher number of unmatched fragments detected in the averaged scan, thereby lowering the P-Score.

Figure 4

Comparison of targeted masses with greater than one fragmentation event. (A) The utilization of a fragmentation history results in greater sequence coverage (i.e., more matching fragments) and a more confident P-Score than the top hit from the best fragmentation event. (B) The fragmentation maps of the 50 S ribosomal protein L29 depicts the fragmentation history of the 7195.89 Da species. The top map shows the results from the initial HCD fragmentation. The map in the middle is the HCD fragmentation together with the fragments from the first ETD pass. The bottom map is the culmination of all the fragmentation data from the species, with the total combination yielding a confidence score of 2.2 × 10–118.

Conclusion

This work demonstrates that the implementation of several important data acquisition features can produce large enhancements in the characterization of proteins observed during TDPs investigation. When past data from the same species is coupled with current fragmentation data, a vast improvement to the overall fragmentation coverage of many proteins in our samples was achieved. Further, performing data analysis in parallel with data acquisition enabled increased characterization of proteins in significantly less overall time. While this work focused on increased characterization of proteins, the benefits of this software in terms of increased unique protein accession numbers will be more visible and directly tractable on newer mass spectrometers that feature tremendous improvements in speed paired with quadrupolar selection of mass species. The speed is desirable for different MS1 modes, such as SIM scans. As shown in Figure S2 of the Supporting Information, the SIM scan discovery mode can uncover many species previously undetectable because of dynamic range issues. With the quadrupole on these instruments, such as the Thermo Q Exactive, precise selection of these new SIM mass targets should enable large increases in the number of identified proteins. The tight isolation windows will prevent the current problem on hybrid instruments of nearby, abundant proteins being isolated and fragmented with the target species of interest. The abundant fragment ions drown the signal of the target species, making identification of the lower abundance species difficult. With quadrupole isolation, this limitation is all but eliminated. In addition to cross-platform capabilities, the data acquisition structure of Autopilot is extensible by nature and can eventually be operated in more complex ways. For instance, after initial determination of the protein form with a first pass fragmentation mode, the system could precisely fragment the protein based on primary sequence composition. Also, our flexible database allows for offline searches to be imported into the database which will enable time and computationally intensive searches, such as biomarker searches, to be done offline on a cluster or supercomputer. The results of those searches can then inform future data acquisition decisions in real-time. Lastly, the database is compatible with our previously collected data, which will allow past data sets to be seamlessly incorporated into future proteomic studies. As TDPs continues to advance, a necessary part of its evolution is more efficient and intelligent data acquisition. The characterization improvements presented here, coupled with advanced instrumentation, will serve as the foundation for a more complete elucidation of the complex proteoforms that comprise higher organisms.

18 in total

1. Informatics and multiplexing of intact protein identification in bacteria and the archaea.

Authors: F Meng; B J Cargile; L M Miller; A J Forbes; J R Johnson; N L Kelleher
Journal: Nat Biotechnol Date: 2001-10 Impact factor: 54.908

Review 2. The challenge of the proteome dynamic range and its implications for in-depth proteomics.

Authors: Roman A Zubarev
Journal: Proteomics Date: 2013-03 Impact factor: 3.984

3. Evaluation of the compact high-field orbitrap for top-down proteomics of human cells.

Authors: Dorothy R Ahlf; Philip D Compton; John C Tran; Bryan P Early; Paul M Thomas; Neil L Kelleher
Journal: J Proteome Res Date: 2012-07-13 Impact factor: 4.466

4. Intact mass detection, interpretation, and visualization to automate Top-Down proteomics on a large scale.

Authors: Kenneth R Durbin; John C Tran; Leonid Zamdborg; Steve M M Sweet; Adam D Catherman; Ji Eun Lee; Mingxi Li; John F Kellie; Neil L Kelleher
Journal: Proteomics Date: 2010-10 Impact factor: 3.984

5. Ion/ion proton-transfer kinetics: implications for analysis of ions derived from electrospray of protein mixtures.

Authors: S A McLuckey; J L Stephenson; K G Asano
Journal: Anal Chem Date: 1998-03-15 Impact factor: 6.986

6. Multiplexed size separation of intact proteins in solution phase for mass spectrometry.

Authors: John C Tran; Alan A Doucette
Journal: Anal Chem Date: 2009-08-01 Impact factor: 6.986

7. Versatile online-offline engine for automated acquisition of high-resolution tandem mass spectra.

Authors: Craig D Wenger; Michael T Boyne; Jonathan T Ferguson; Dana E Robinson; Neil L Kelleher
Journal: Anal Chem Date: 2008-10-08 Impact factor: 6.986

8. A robust two-dimensional separation for top-down tandem mass spectrometry of the low-mass proteome.

Authors: Ji Eun Lee; John F Kellie; John C Tran; Jeremiah D Tipton; Adam D Catherman; Haylee M Thomas; Dorothy R Ahlf; Kenneth R Durbin; Adaikkalam Vellaichamy; Ioanna Ntai; Alan G Marshall; Neil L Kelleher
Journal: J Am Soc Mass Spectrom Date: 2009-08-12 Impact factor: 3.109

9. Top down proteomics of human membrane proteins from enriched mitochondrial fractions.

Authors: Adam D Catherman; Mingxi Li; John C Tran; Kenneth R Durbin; Philip D Compton; Bryan P Early; Paul M Thomas; Neil L Kelleher
Journal: Anal Chem Date: 2013-01-23 Impact factor: 6.986

10. A framework for intelligent data acquisition and real-time database searching for shotgun proteomics.

Authors: Johannes Graumann; Richard A Scheltema; Yong Zhang; Jürgen Cox; Matthias Mann
Journal: Mol Cell Proteomics Date: 2011-12-13 Impact factor: 5.911

17 in total

1. A comprehensive pipeline for translational top-down proteomics from a single blood draw.

Authors: Timothy K Toby; Luca Fornelli; Kristina Srzentić; Caroline J DeHart; Josh Levitsky; John Friedewald; Neil L Kelleher
Journal: Nat Protoc Date: 2019-01 Impact factor: 13.491

2. Advancing Top-down Analysis of the Human Proteome Using a Benchtop Quadrupole-Orbitrap Mass Spectrometer.

Authors: Luca Fornelli; Kenneth R Durbin; Ryan T Fellers; Bryan P Early; Joseph B Greer; Richard D LeDuc; Philip D Compton; Neil L Kelleher
Journal: J Proteome Res Date: 2016-12-02 Impact factor: 4.466

3. The Value of Activated Ion Electron Transfer Dissociation for High-Throughput Top-Down Characterization of Intact Proteins.

Authors: Nicholas M Riley; Jacek W Sikora; Henrique S Seckler; Joseph B Greer; Ryan T Fellers; Richard D LeDuc; Michael S Westphall; Paul M Thomas; Neil L Kelleher; Joshua J Coon
Journal: Anal Chem Date: 2018-07-05 Impact factor: 6.986

4. Top Down Proteomics Reveals Mature Proteoforms Expressed in Subcellular Fractions of the Echinococcus granulosus Preadult Stage.

Authors: Karina R Lorenzatto; Kyunggon Kim; Ioanna Ntai; Gabriela P Paludo; Jeferson Camargo de Lima; Paul M Thomas; Neil L Kelleher; Henrique B Ferreira
Journal: J Proteome Res Date: 2015-10-28 Impact factor: 4.466

5. Analyzing internal fragmentation of electrosprayed ubiquitin ions during beam-type collisional dissociation.

Authors: Kenneth R Durbin; Owen S Skinner; Ryan T Fellers; Neil L Kelleher
Journal: J Am Soc Mass Spectrom Date: 2015-02-26 Impact factor: 3.109

Review 6. Identification and Quantification of Proteoforms by Mass Spectrometry.

Authors: Leah V Schaffer; Robert J Millikin; Rachel M Miller; Lissa C Anderson; Ryan T Fellers; Ying Ge; Neil L Kelleher; Richard D LeDuc; Xiaowen Liu; Samuel H Payne; Liangliang Sun; Paul M Thomas; Trisha Tucholski; Zhe Wang; Si Wu; Zhijie Wu; Dahang Yu; Michael R Shortreed; Lloyd M Smith
Journal: Proteomics Date: 2019-05 Impact factor: 3.984