| Literature DB >> 22059886 |
Alexey V Nefedov1, Rovshan G Sadygov.
Abstract
BACKGROUND: Enumeration of all theoretically possible amino acid compositions is an important problem in several proteomics workflows, including peptide mass fingerprinting, mass defect labeling, mass defect filtering, and de novo peptide sequencing. Because of the high computational complexity of this task, reported methods for peptide enumeration were restricted to cover limited mass ranges (below 2 kDa). In addition, implementation details of these methods as well as their computational performance have not been provided. The increasing availability of parallel (multi-core) computers in all fields of research makes the development of parallel methods for peptide enumeration a timely topic.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22059886 PMCID: PMC3270061 DOI: 10.1186/1471-2105-12-432
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Number of compositions and sequences comprised of 20 letters, of length not greater than L, for L ranging from 3 to 10, and their ratios (rounded)
| Length of Peptides ( | Number of Compositions ( | Number of Sequences ( | Ratio |
|---|---|---|---|
| 3 | 1,770 | 8,420 | 5 |
| 4 | 10,625 | 168,420 | 16 |
| 5 | 53,129 | 3,368,420 | 63 |
| 6 | 230,229 | 67,368,420 | 293 |
| 7 | 888,029 | 1,347,368,420 | 1,517 |
| 8 | 3,108,104 | 26,947,368,420 | 8,670 |
| 9 | 10,015,004 | 538,947,368,420 | 53,814 |
| 10 | 30,045,014 | 10,778,947,368,420 | 358,760 |
Figure 1Pseudocode for recursive procedure GenBasic which enumerates all compositions of length not greater than .
Figure 2Pseudocode for recursive procedure Gen, a faster version of GenBasic. The procedure generates the mass histogram of all compositions of length not greater than L.
Figure 3Creating multiple jobs from a single job of enumerating all compositions. A single call to procedure Gen with parameters (L, 1, 0) is equivalent to L+1 calls with parameters (L, 2, 0), (L - 1, 2, aam [1]),..., (0, 2, aam [1]*L), while n1 is set to 0, 1,..., L, correspondingly. Any job with start = 2 can be further expanded into L+1 jobs with start = 3, as shown for Gen(L, 2, 0).
Figure 4Pseudocode for procedure CreateMassHistMaster.
Figure 5Pseudocode for procedure CreateMassHistWorker.
Computation times for enumerating all tryptic compositions up to the length of 30, for different sets of jobs and number of work processes, with and without the maximum mass limit
| Task | Number | Job Table | Computation Time | |||||
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | - | - | - | 6 h 03 min | 35 h 11 min | |
| 5 | 30 | 2 | - | - | - | 2 h 12 min | 14 h 52 min | |
| 30 | 30 | 2 | - | - | - | 1 h 39 min | 13 h 32 min | |
| 30 | 255 | ≤ 3 | 20 | - | - | 28 min | 5 h 02 min | |
| 71 | 255 | ≤ 3 | 20 | - | - | 27 min | 4 h 57 min | |
| 71 | 679 | ≤ 5 | 20 | 24 | 28 | 11 min | 1 h 20 min | |
Computations were done using 71 work processes executed on a cluster with 12 Intel Xeon X5650 CPUs running Windows HPC Server 2008.
Computation times for enumerating all tryptic compositions with different maximum lengths, with and without maximum mass limit
|
| Computation Time | |
|---|---|---|
| 25 | 19 min | 29 min |
| 30 | 11 min | 1 h 20 min |
| 35 | 8 min | 5 h 38 min |
| 40 | 8 min | 38 h 28 min |
| 45 | 14 min | > 96 h |
| 50 | 29 min | - |
Parameters of the job table were: start ≤ 7, Lmax,2 = 20, Lmax,3 = 24, Lmax,4 = 28, Lmax,5 = 34, Lmax,6 = 40. Computations were done using 71 work processes executed on a cluster with 12 Intel Xeon X5650 CPUs running Windows HPC Server 2008.