| Literature DB >> 17012284 |
Anthony P Malanoski1, Baochuan Lin, Zheng Wang, Joel M Schnur, David A Stenger.
Abstract
There is an increasing recognition that detailed nucleic acid sequence information will be useful and even required in the diagnosis, treatment and surveillance of many significant pathogens. Because generating detailed information about pathogens leads to significantly larger amounts of data, it is necessary to develop automated analysis methods to reduce analysis time and to standardize identification criteria. This is especially important for multiple pathogen assays designed to reduce assay time and costs. In this paper, we present a successful algorithm for detecting pathogens and reporting the maximum level of detail possible using multi-pathogen resequencing microarrays. The algorithm filters the sequence of base calls from the microarray and finds entries in genetic databases that most closely match. Taxonomic databases are then used to relate these entries to each other so that the microorganism can be identified. Although developed using a resequencing microarray, the approach is applicable to any assay method that produces base call sequence information. The success and continued development of this approach means that a non-expert can now perform unassisted analysis of the results obtained from partial sequence data.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17012284 PMCID: PMC1636417 DOI: 10.1093/nar/gkl565
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Schematic representation of the algorithm representing relationship of three main tasks and logic of subtasks associated with tasks. ProSeq identification Task(I) carries out filtering and subsequence selection, and then determines what database records Subseqs are most similar to. ProSeq grouping Task(II) figures whether prototype sequence identifications support a common organism identification. Pathogen determination Task(III) does final examination and decisions of the detected organism from the microarray data. ProSeq: prototype sequence; SubSeq: subsequences.
Figure 2Detail schematic representation of filtering subtask of ProSeq identification Task(I). For each ProSeq, primer regions were masked as N (ambiguous) calls, then UniRate, was calculated from the HybSeq. For ProSeqs, which passed the UniRate requirement, a revised sliding window algorithm attempted to grow a SubSeq that could be used as a query to BLAST. The identity (start location in the ProSeq and length) of a successfully grown SubSeq was placed in a file for batch querying via BLAST. VARI = [(‘SubSeq length’ − 30) * 0.2857 + 70]. Detailed SubSeqs requirement is described in Supplementary Data.
Figure 3Detailed schematic representation of the second subtask of ProSeq identification Task(I), organism identification for an individual SubSeq. Each SubSeq sent to BLAST returned a list of possible matches contained in a Return array that was sorted through to find best bit score/expect value pair (MaxScore). If the MaxScore was greater than MIN (10−6), all returns that had this best Score were sorted into a new array Rank1. Detailed SubSeqs requirement is described in Supplementary Data.
Figure 4Schematic representation of the third subtask of ProSeq identification Task(I), which determines the organism, determined for a ProSeq based on the results found for its SubSeq. All of the SubSeq of a particular ProSeq are compared to determine the two best scoring SubSeq. When there was a single SubSeq or one scored much better than the other, the ProSeq inherited the properties of that SubSeq. Detailed SubSeqs requirement is described in Supplementary Data.
Algorithm decisions for C.pneumoniae at several concentrations for SubSeq, ProSeq identification Task(I), ProSeq grouping Task(II) and pathogen determination Task(III)
| Genome copies | ProSeq | Unique calls (%) | No. of SubSeq | SubSeq organism identification and Uniqueness, Bit score | Task(I) | Task(II) | Task(III) |
|---|---|---|---|---|---|---|---|
| 1000 | VD2 | 89 | 1 | POSITIVE | |||
| VD4 | 91 | 1 | |||||
| 80 | 2 | ||||||
| 100 | VD2 | 100 | 1 | POSITIVE | |||
| VD4 | 97 | 1 | |||||
| 80 | 2 | ||||||
| 100 | VD2 | 83 | 1 | POSITIVE | |||
| VD4 | 91 | 1 | |||||
| 84 | 2 | ||||||
| 10 | VD2 | 100 | 1 | POSITIVE | |||
| VD4 | 97 | 1 | |||||
| 90 | 2 | ||||||
| 10 | VD2 | 100 | 1 | POSITIVE | |||
| VD4 | 93 | 1 | |||||
| 11 | 0 | Null | Null |
(G1) J138 (BA000008), AR39 (AE002167),Tw-183 (AE017159), C. pne (M69230, AF131889, AY555078, M64064, AF131229, AF131230); (G2) C. pne (S83995); (G3) J138 (BA000008), AR39 (AE002167), Tw-183 (AE017159).
SU abbreviation for SeqUniqu; TA abbreviation for TaxAmbig; TU abbreviation for TaxUniqu.
Algorithm decisions for Influenza A clinical sample identified previously using a manual method for SubSeq, ProSeq identification Task(I), ProSeq grouping Task(II) and pathogen determination Task(III)
| Sample name | ProSeq | No. of Sub | SubSeq organism identification and Uniqueness, Bit score | Task(I) | Task(II) | Task(III) |
|---|---|---|---|---|---|---|
| A/Colorado /360/05 | HA3 | 1 | H3N2 | H3N2 | (NY) SU H3N2 | POSITIVE H3N2 |
| NA2 | 1 | A/NewYork/98/04(NY) | (NY) SU | |||
| M | 4 | 2 Flu A | H3N2 | |||
| 2 H3N2 | ||||||
| A/Qatar /2039/05 | HA3 | 1 | A/Qatar/2039/05(QA) | (QA) SU | (QA) SU H3N2 | POSITIVE H3N2 |
| NA2 | 2 | 2 H3N2 | H3N2 | |||
| M | 4 | 2 H3N2 | H3N2 | |||
| 2 Flu A | ||||||
| A/Guam /362/05 | HA3 | 1 | A/Guam/362/05(GA) | (GA) SU | (GA) SU H3N2 | POSITIVE H3N2 |
| NA2 | 1 | H3N2 | H3N2 | |||
| M | 4 | 2 H3N2 | H3N2 | |||
| 2 Flu A | ||||||
| A/Italy /384/05 | HA3 | 1 | A/Italy/384/05(IT) | (IT) SU | (IT) SU (NY) SU H3N2 | POSITIVE H3N2 |
| NA2 | 1 | A/NewYork/371/04(NY) | (NY) SU | |||
| M | 3 | 2 H3N2 | H3N2 | |||
| Flu A | ||||||
| A/Turkey/2108/05 | HA3 | 1 | A/Turkey/2108/05(TU) | (TU) SU | (TU) SU H3N2 | POSITIVE H3N2 |
| NA2 | 1 | H3N2 | H3N2 | |||
| M | 3 | 2 H3N2 | H3N2 | |||
| Flu A | ||||||
| A/Korea/298/05 | HA3 | 1 | A/Korea/298/05(KO) | (KO) SU | (KO) SU (NY) SU H3N2 | POSITIVE H3N2 |
| NA2 | 3 | A/NewYork/98/04(NY) | (N1) SU | |||
| 2 Flu A | ||||||
| M | 4 | 2 Flu A | H3N2 | |||
| 2 H3N2 | ||||||
| A/Japan /1383/05 | HA3 | 1 | A/Japan/1383/05(JA) | (JA) SU | (JA) SU H3N2 | POSITIVE H3N2 |
| NA2 | 1 | H3N2 | H3N2 | |||
| M | 5 | 3 Flu A | H3N2 | |||
| 2 H3N2 | ||||||
| A/Ecuador /1968/04 | HA3 | 1 | H3N2 | H3N2 | H3N2 TA | POSITIVE H3N2 |
| NA2 | 2 | H3N2 | H3N2 | |||
| M | 4 | 3 Flu A | H3N2 | |||
| H3N2 | ||||||
| A/Iraq /34/05 | HA3 | 1 | A/Iraq/34/05(IR) | (IR) SU | (IR) SU H3N2 | POSITIVE H3N2 |
| NA2 | 3 | 2 H3N2 | H3N2 | |||
| Flu A | ||||||
| M | 5 | 3 H3N2 | H3N2 | |||
| 2 Flu A | ||||||
| A/Peru /166/05 | HA3 | 1 | A/Peru/166/05(PU) | (PU) SU | (PU) SU H3N2 | POSITIVE H3N2 |
| NA2 | 1 | H3N2 | H3N2 | |||
| M | 3 | H3N2 | H3N2 TA | |||
| Flu A | ||||||
| A/NewYork/461/2005 SU,247 |
Note: Within a row the first listing of a specific strain was followed by a two-letter abbreviation used in the remaining columns of that row.
Organism identification and algorithm decisions from Variola Major virus Nucleic Acid templates for SubSeq, ProSeq identification Task(I), ProSeq grouping Task(II) and pathogen determination Task(III)
| Genome copies | ProSeq | Unique calls (%) | No. of SubSeq | SubSeq organism identification and Uniqueness, Bit score | Task(I) | Task(II) | Task(III) |
|---|---|---|---|---|---|---|---|
| 1000 | CRMB | 83.90 | 1 | Variola | Vari., | Vari., | Positive |
| HA | 77.00 | 1 | Variola | Vari., | Vari., | Variola | |
| 1000 | CRMB | 80.90 | 1 | Variola | Vari., | Vari., | Positive |
| HA | 75.50 | 1 | Variola | Vari., | Vari., | Variola | |
| 1000 | CRMB | 76.40 | 1 | Variola | Vari., | Vari., | Positive |
| HA | 73.30 | 1 | Variola | Vari., | Vari., | Variola | |
| 1000 | CRMB | 80.10 | 1 | Variola | Vari., | Vari., | Positive |
| HA | 74.90 | 1 | Variola | Vari., | Vari., | Variola | |
| 1000 | CRMB | 81.60 | 1 | Variola | Vari., | Vari., | Positive |
| HA | 76.10 | 1 | Variola | Vari., | Vari., | Variola | |
| 1000 | CRMB | 77.90 | 1 | Variola | Vari., | Vari., | Positive |
| HA | 75.50 | 1 | Variola | Vari., | Vari., | Variola | |
| 1000 | CRMB | 81.60 | 1 | Variola | Vari., | Vari., | Positive |
| HA | 74.90 | 4 | Variola | Vari., | Vari., | Variola | |
| Variola | |||||||
| 100 | CRMB | 84.20 | 1 | Variola | Vari., | Vari., | Positive |
| HA | 5.60 | 0 | Null | Null | Variola |
TA, TaxAmbig in this case Variola, Variola major and minor taxonomic classes.
Organism identification and algorithm decisions from Vaccinia sample on Variola Major virus ProSeqs for SubSeq, ProSeq identification Task(I), ProSeq grouping Task(II) and pathogen determination Task(III)
| CFU | ProSeq | Unique calls (%) | # SubSeq | SubSeq organism identification and Uniqueness, Bit score | Task I | Task II | Task III |
|---|---|---|---|---|---|---|---|
| 5 × 107 | CRMB | 77.90 | 2 | Orth. | Vacc., | Vacc., | Detected |
| HA | 29.40 | 1 | Orth. | Orth., | Orth., | Vaccinia | |
| 5 × 107 | CRMB | 79.80 | 2 | Orth. | Vacc., | Vacc., | Detected |
| HA | 25.70 | 1 | Orth. | Orth., | Orth., | Vaccinia | |
| 1.6 × 107 | CRMB | 79.40 | 2 | Orth. | Vacc., | Vacc., | Detected |
| HA | 14.80 | 0 | Null | Null | Vaccinia | ||
| 1.6 × 107 | CRMB | 77.50 | 2 | Orth. | Orth., | Orth., | Detected |
| HA | 24.50 | 0 | Null | Null | Orthopox | ||
| 1.6 × 107 | CRMB | 76.80 | 2 | Orth. | Vacc., | Vacc., | Detected |
| HA | 21.60 | 0 | Null | Null | Vaccinia | ||
| 1.6 × 107 | CRMB | 74.50 | 2 | Orth. | Orth., | Orth., | Detected |
| HA | 17.30 | 0 | Null | Null | Orthopox | ||
| 5 × 106 | CRMB | 77.90 | 2 | Orth. | Vacc., | Vacc., | Detected |
| HA | 25.70 | 0 | Null | Null | Vaccinia | ||
| 5 × 106 | CRMB | 78.30 | 2 | Orth. | Orth., | Orth., | Detected |
| HA | 22.00 | 0 | Null | Null | Orthopox | ||
| 5 × 106 | CRMB | 73.00 | 2 | Orth. | Vacc., | Vacc., | Detected |
| HA | 13.00 | 0 | Null | Null | Vaccinia | ||
| 5 × 106 | CRMB | 73.40 | 2 | Orth. | Orth., | Orth., | Detected |
| HA | 7.80 | 0 | Null | Null | Orthopox | ||
| 1.6 × 106 | CRMB | 75.30 | 2 | Orth. | Orth., | Orth., | Detected |
| HA | 8.60 | 0 | Null | Null | Orthopox | ||
| 1.6 × 106 | CRMB | 49.80 | 2 | Orth. | Vacc., | Vacc., | Detected |
| HA | 6.60 | 0 | Null | Null | Vaccinia | ||
| 1.6 × 106 | CRMB | 65.50 | 2 | Orth. | Orth., | Orth., | Detected |
| HA | 10.00 | 0 | Null | Null | Orthopox | ||
| 1.6 × 106 | CRMB | 62.90 | 2 | Orth. | Orth., | Orth., | Detected |
| HA | 8.20 | 0 | Null | Null | Orthopox | ||
| 5 × 105 | CRMB | 58.40 | 2 | Orth. | Orth., | Orth., | Detected |
| HA | 9.00 | 0 | Null | Null | Orthopox | ||
| 5 × 105 | CRMB | 56.20 | 2 | Orth. | Orth., | Orth., | Detected |
| HA | 8.00 | 0 | Null | Null | Orthopox | ||
| 5 × 105 | CRMB | 49.00 | 1 | Orth. | Orth., | Orth., | Detected |
| HA | 9.30 | 0 | Null | Null | Orthopox | ||
| 5 × 105 | CRMB | 44.60 | 1 | Orth. | Orth., | Orth., | Detected |
| HA | 7.80 | 0 | Null | Null | Orthopox |
Vacc., Vaccinia; Orth., Orthopox. (H1) Rabbitpox, Buffalopox, Cowpox, Vaccinia, Callithrix jacchus, Taterapox. (H2) Vaccinia. (H3) Vaccinia, Variola (Major and Minor), Cantagalo, Ectromelia, Elephantpox, Aracatuba, Cowpox, Taterapox. (H4) H2 and Cowpox. (H5) H4 and Camelpox. (H6) H1 and Variola, Variola Major, Variola Minor.
Figure 5Alignment of the influenza A NA1 ProSeq and A/Weiss/43, A/PuertoRico/8/34 strains. Raw and filtered hybridization chip results of A/puertoRico/8/34 are also shown. Asterisks indicate perfectly matched sequences.