| Literature DB >> 19735578 |
Guy Tsafnat1, Enrico Coiera, Sally R Partridge, Jaron Schaeffer, Jon R Iredell.
Abstract
BACKGROUND: Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19735578 PMCID: PMC3087341 DOI: 10.1186/1471-2105-10-281
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The cassette array grammar. The start of the array is marked by an integron-class specific 5' flanking sequence or attI and its end with the corresponding 3' flanking sequence, tni or ybeA. The middle consists of a number of cassettes and non-cassette insertions (NCI) such as insertion sequences.
Figure 2Visual representation of a tree resulting from a parse of one array sequence containing three gene cassettes and a non-cassette insertion.
Sample SD based on a pilot study with n = 50 experiments in 5 omission proportions for a desired confidence α ≥ 95% and a desired Error E ≤ 1%.
| Omission Proportion | SD | Required Experiments | KS test | |
|---|---|---|---|---|
| 5% | 0.140 | 1.09 | 749 | 0.068 |
| 10% | 0.096 | 0.565 | 352 | |
| 15% | 0.081 | 0.733 | 252 | |
| 20% | 0.070 | 0.505 | 187 | |
| 25% | 0.054 | 0.300 | 113 | |
| 30% | 0.052 | 0.419 | 103 | |
| 35% | 0.044 | 0.716 | 75 | |
| 40% | 0.044 | 0.360 | 74 | |
| 45% | 0.041 | 0.438 | 66 | |
| 50% | 0.039 | 0.150 | 59 |
The Anderson-Darling values, adjusted for sample size (A*2) greater than 0.752 rejects the null-hypothesis that sample is normally distributed. For the 5% omission proportion, a truncated normal distribution was tested using the KS test which gave a p-value > 5%.
Figure 3Sensitivity, specificity and F. Error bars indicate ± 1SD in F1-measure.