| Literature DB >> 28590201 |
Wenzhou Li1, Jette Wypych1, Robert J Duff1.
Abstract
Sequence variant analysis (SVA) is critical in therapeutic protein development because it ensures the absence of genetic mutations of a production clone or high-level misincorporations during cell culture. While software for searching sequence variants from mass spectrometry data are available, effectively distinguishing true positives from a large number of false positives in the reported hits or identifications found in the error tolerant search mode is a challenge. This verification process must be done manually and can take several days or even weeks to accomplish. We report here the use of a Perl-based script to evaluate every identified hit to remove the false positives from the search results of PepFinder™ (also known as MassAnalyzer) based on orthogonal criteria. Our data show that the false positives from PepFinder™ output were reduced ∼4-fold without loss of accuracy in the detection of true identifications, representing a more than 70% reduction in time compared with the manual data verification process.Entities:
Keywords: Algorithm; automation; false positive removal; sequence variant analysis
Year: 2017 PMID: 28590201 PMCID: PMC5540081 DOI: 10.1080/19420862.2017.1336591
Source DB: PubMed Journal: MAbs ISSN: 1942-0862 Impact factor: 5.857
Figure 1.Data processing workflow with (bottom) and without (top) data post-processing. Typically each experiment can identify more than one hundred mutations / misincorporations. The analyst need to examine those identifications one by one which will take a couple days or even weeks.
Identification of expected variants in the spiking experiment (mAb B into mAb A, all spike levels combined). S40A and A341T are missed because their parent peptides are too short (5 and 2 residues respectively) and elute too early to be detected. Reliable identification for such small peptides can also be challenging for any search engine.
| Expected Variant | Native peak Area | Identified by PepFinder™ | In filtered results |
|---|---|---|---|
| L311V | 3.20E+07 | ✓ | ✓ |
| V399M | 1.70E+07 | ✓ | ✓ |
| K276Q | 1.50E+07 | ✓ | ✓ |
| A329G | 3.40E+06 | ✓ | ✓ |
| L104V | 1.30E+06 | ✓ | ✓ |
| S40A | 4.00E+05 | ||
| A341T | 4.00E+05 |
Figure 2.Comparison of the identification list before (left) and after (right) the post-processing filtering. Each row is an identification from the search engine which would require ∼30 minutes to manually verify. The following criteria were used to filter the results: only retain single base substitutions, strict enzymatic site, RT score 0 to 5, % of Largest Native peak set to 50%.
S to N misincorporations and their retention time properties.
| Variant | No spike | 0.1% spike | 0.3% spike | 0.5% spike | Parent Peak Area(*1e5) | Parent Peak rank | RT shift(From – To) | RT shift(Minutes) | RTscore |
|---|---|---|---|---|---|---|---|---|---|
| S410N | 0.10% | 0.10% | 0.10% | 0.10% | 276 | 100% | 125.1–124.3 | 0.8 | 1.39 |
| S174N | 0.10% | 0.10% | 0.07% | 0.06% | 166 | 100% | 101.6–101.2 | 0.4 | 0.66 |
| S182N | 0.05% | 0.05% | 0.05% | 0.05% | 166 | 100% | 101.6–100.7 | 0.9 | 0.61 |
| S306N | 0.04% | 0.04% | 0.04% | 0.04% | 485 | 100% | 137.7–140.6 | −2.9 | 1.29 |
| S202N | 0.02% | 0.02% | 0.02% | 0.03% | 127 | 100% | 84.3–83.6 | 0.7 | 0.78 |
| S269N | 0.02% | 0.02% | 0.02% | 0.02% | 260 | 100% | 107.2–100.7 | 6.5 | 4.31 |
Figure 3.Frequency charts showing the distribution of sequence variants by type in the PepFinder™ output (left) and script filtered results (right) of the spiking experiment (mAb B into mAb A, all spike levels combined). Each cell represents a mutation/misincorporation from the residue in the left column to the residue in the top row. Yellow colored are the expected spikes and the green ones are commonly known misincorporations.
Results for a real sequence variant analysis application.
| PepFinder™ alone | With filtering script | |
|---|---|---|
| Identifications above 0.3% | 34 | 4 |
| Identifications below 0.3% | 120 | 86 |