| Literature DB >> 19912620 |
Neeltje Carpaij1, Ad C Fluit, Jodi A Lindsay, Marc Jm Bonten, Rob Jl Willems.
Abstract
BACKGROUND: Microarray-based Comparative Genomic Hybridisation (CGH) has been used to assess genetic variability between bacterial strains. Crucial for interpretation of microarray data is the availability of a reference to compare signal intensities to reliably determine presence or divergence each DNA fragment. However, the production of a good reference becomes unfeasible when microarrays are based on pan-genomes.When only a single strain is used as a reference for a multistrain array, the accessory gene pool will be partially represented by reference DNA, although these genes represent the genomic repertoire that can explain differences in virulence, pathogenicity or transmissibility between strains. The lack of a reference makes interpretation of the data for these genes difficult and, if the test signal is low, they are often deleted from the analysis. We aimed to develop novel methods to determine the presence or divergence of genes in a Staphylococcus aureus multistrain PCR product microarray-based CGH approach for which reference DNA was not available for some probes.Entities:
Mesh:
Year: 2009 PMID: 19912620 PMCID: PMC2779823 DOI: 10.1186/1471-2164-10-522
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Distribution of the normalised ratio. Example of a histogram of one slide constructed using Matlab2006b showing the distribution of the normalised signal intensities. Only the MRSA252 spots were taken into account in the number of spots and the flagged spots were filtered.
Figure 2Box plots for determining cut-offs for the presence or divergence of genes. Example of two different box plots of two different microarray slides (Sa07_010 and Sa07_015) constructed using GraphPad Prism5. 1MRSA252 are the spots originating from MRSA252 and should give a signal for in the Cy5 dye; 2 Non252 are the spots originating from the other six strains (N315, MW2, Mu50, NCTC8325, COL and MSSA476), which are absent in the MRSA252 and so should not yield a Cy5 signal. The box plots illustrate the log2 of the raw intensity of the reference (Cy5) channel for the MRSA252 and the non-MRSA252 spots separately. The horizontal line in the box denotes the median of the intensity. The log2 of the signal for 50% of the spots falls within the boxes and the dots account for the upper and lower 5% of the spots. These pictures clearly show that height of the raw intensity does not correlate with the presence or divergence of a gene. These box plots indicate highly variable intensities between different spots on the same microarray slide.
Sensitivity, specificity, PPV and NPV of the newly developed analysis methods based on calculated cut-offs per spot.
| Analysis method | Test characteristics | MRSA252a | All spotsb |
|---|---|---|---|
| 1 Cut-off based on two times the background | Sensitivityc | 96.90% | 86.40% |
| Specificity | 15.21% | 76.68% | |
| PPV | 89.43% | 73.64% | |
| NPV | 39.91% | 82.34% | |
| 2 Cut-off based on reference signal intensities | Sensitivity | 50.31% | 90.06% |
| Specificity | 100% | 19.79% | |
| PPV | 100% | 51.96% | |
| NPV | 21.38% | 71.69% | |
| 3 Cut-off based on the minimal ratio positivity | Sensitivity | 99.98% | 98.24% |
| Specificity | 98.81% | 7.85% | |
| PPV | 99.84% | 50.86% | |
| NPV | 99.89% | 80.90% | |
| 4 Cut-off based on two times the background and reference signal intensities | Sensitivity | 50.31% | 79.15% |
| Specificity | 100% | 81.61% | |
| PPV | 100% | 76.13% | |
| NPV | 21.38% | 76.41% | |
| 5 Cut-off based on two times the background and the minimal ratio of positivity | Sensitivity | 96.90% | 86.24% |
| Specificity | 99.24% | 77.88% | |
| PPV | 99.89% | 74.09% | |
| NPV | 81.24% | 82.34% | |
a Spots based on MRSA252 ORFs (75% of all spots on the array). Flagged spots of the MRSA252 data set were filtered from the calculations.
To validate the results of the different new analysis methods the MRSA252 spots were also analysed with the new methods and compared with the results obtained with the GACK method and sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of the new methods were calculated.
b All spots representing genes present in MRSA252, N315, MW2, Mu50, NCTC8325, COL and MSSA476.
c To calculate sensitivity, specificity, PPV and NPV of the new methods, the hybridisation results of six sequenced strains used in the array design were included with the exception of the results of MSSA476. The calculated sensitivity of an analysis method was only based on the specific spots for each strain that were added to the array. So only the MRSA252 spots were taken into account for the sensitivity for the MRSA252 strain. Specificity for a strain could only be calculated based on the strains that were added on the array after the specific strain. These genes have to be divergent, since they were not present in the specific strain. This means that for strain NCTC8325 all 170 genes spotted extra for strains MW2 and MSSA476 should be divergent in the NCTC8325 hybridisations, since MW2 and MSSA476 were added after NCTC8325.