| Literature DB >> 17672919 |
Erin P Price1, John Inman-Bamber, Venugopal Thiruvenkataswamy, Flavia Huygens, Philip M Giffard.
Abstract
BACKGROUND: Single nucleotide polymorphisms (SNPs) and genes that exhibit presence/absence variation have provided informative marker sets for bacterial and viral genotyping. Identification of marker sets optimised for these purposes has been based on maximal generalized discriminatory power as measured by Simpson's Index of Diversity, or on the ability to identify specific variants. Here we describe the Not-N algorithm, which is designed to identify small sets of genetic markers diagnostic for user-specified subsets of known genetic variants. The algorithm does not treat the user-specified subset and the remaining genetic variants equally. Rather Not-N analysis is designed to underpin assays that provide 0% false negatives, which is very important for e.g. diagnostic procedures for clinically significant subgroups within microbial species.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17672919 PMCID: PMC1973086 DOI: 10.1186/1471-2105-8-278
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Not-N analysis-derived binary gene targets from CGH data of Campylobacter jejuni, Yersinia enterocolitica and Clostridium difficile.
| Livestock | --- | 8 | |||
| Non-livestock | --- | --- | --- | n/a | |
| Non-pathogenic | Ye8081-4002, absent (100) | --- | --- | 0 | |
| Low pathogenicity | Ye8081-0306, absent (100) | --- | --- | 15 | |
| Highly pathogenic | Ye8081-0113, present (100) | --- | --- | 50 | |
| HY | CD2669, absent (98.1) | CD2983, present (100) | --- | 13 | |
| A-B+ | CD2983, absent (100) | --- | --- | 0 | |
| HA1 | CD2570, present (46.4) | CD2669, present (83.9) | CD2983, present (100) | 0 | |
| HA2 | CD0265, absent (100) | --- | --- | 8 |
a Clades defined for Y. enterocolitica,C. jejuni and C. difficile by references [4-6] respectively.
b Corresponds to the number of alternate outputs provided by Not-N analysis that are not shown in the. n/a, not applicable.
Single-nucleotide polymorphisms identified by Not-N analysis for the major subtypes of hepatitis C virus.
| Positionb | Discrimination (%) | Position | Discrimination (%) | ||
| 1a | 117 | 126* (C or T) | 100 | - | - |
| 1b | 382 | 103 (C) | 72.1 | 194* (G)/238* (C) 100% | 100 |
| 2a | 38 | 258 (A or G) 99.7% | 99.7 | 182* (T) | 100 |
| 2b | 53 | 314 (A) | 99.7 | 127 (A) | 100 |
| 2c | 5 | 50 (T) | 99.7 | 39* (G) | 100 |
| 3a | 49 | 295 (G) | 100 | - | - |
| 3b | 6 | 307 (T) | 99.8 | 126* (G) | 100 |
| 4a | 26 | 182* (A) | 97.8 | 325* (A) | 100 |
| 4d | 17 | 154 (A) | 100 | - | - |
| 4f | 21 | 325* (C) | 99.5 | 194* (T)/238* (A) 100% | 100 |
| 4t | 4 | 338/339 (C) | 100 | - | - |
| 5a | 18 | 100 (C) | 100 | - | - |
| 6a | 34 | 39* (C or T) | 100 | - | - |
aSubtypes containing less than four confirmed sequences were not included in the analysis. Sequences were downloaded from the hepatitis C virus (HCV) sequence database [10].
bThe single-nucleotide polymorphism (SNP) position refers to a 340 bp fragment of the RNA-dependent RNA polymerase NS5B spanning nucleotides 8276 to 8615 (GenBank accession AF009606 [48]). NS5B is used to construct phylogenetic trees for HCV, which form the basis of the genotype and subtype nomenclature [40].
*SNP discriminates multiple subtypes.
The Not-N algorithm and its implementation by the Minimum SNPs computer program. A. Data for seven hypothetical sequence types (STs) at six single-nucleotide polymorphisms (SNPs). B. Not-N analysis output of the alignment at A. Four sets of two SNPs are identified, all of which reach 100% discrimination. C. Result obtained if positions 3 and 4 are excluded.
| A. | ||||||
| Sequence ID | SNP 1 | SNP 2 | SNP 3 | SNP 4 | SNP 5 | SNP 6 |
| ST 1* | G | G | G | A | T | T |
| ST 2* | A | G | T | T | T | G |
| ST 3* | G | G | G | C | T | G |
| ST 4 | A | G | A | G | A | T |
| ST 5 | A | A | A | G | T | T |
| ST 6 | A | G | T | C | A | A |
| ST 7 | A | A | A | G | T | G |
| Consensus (STs 1, 2 & 3) | Not informative | Not-ACT | Not-AC | Not-G | Not-ACG | Not-AC |
| >ST4 | n/a | + | - | - | - | + |
| >ST5 | n/a | - | - | - | + | + |
| >ST6 | n/a | + | + | + | - | - |
| >ST7 | n/a | - | - | - | + | + |
| Confidence (%) | Position not used | 50 | 75 | 75 | 50 | 25 |
| *"Group of interest" sequences | ||||||
| B. | ||||||
| SNP set | SNP 1 position and consensus | Cumulative discriminatory power (%) | SNP 2 position and consensus | Cumulative discriminatory power | ||
| 1 | 3, NOT AC, | 75 | 5, NOT ACG | 100 | ||
| 2 | 3, NOT AC | 75 | 6, NOT AC | 100 | ||
| 3 | 4, NOT G | 75 | 5, NOT ACG | 100 | ||
| 4 | 4, NOT G | 75 | 6, NOT AC | 100 | ||
| C. | ||||||
| SNP set | SNP 1 position and consensus | Cumulative discriminatory power (%) | SNP 2 position and consensus | Cumulative discriminatory power (%) | ||
| 1 | 2, NOT ACT, | 50 | 5, NOT ACG | 100 | ||