| Literature DB >> 20630115 |
Ed Yzerman1, Jeroen W den Boer, Martien Caspers, Arpit Almal, Bill Worzel, Walter van der Meer, Roy Montijn, Frank Schuren.
Abstract
BACKGROUND: Discrimination between clinical and environmental strains within many bacterial species is currently underexplored. Genomic analyses have clearly shown the enormous variability in genome composition between different strains of a bacterial species. In this study we have used Legionella pneumophila, the causative agent of Legionnaire's disease, to search for genomic markers related to pathogenicity. During a large surveillance study in The Netherlands well-characterized patient-derived strains and environmental strains were collected. We have used a mixed-genome microarray to perform comparative-genome analysis of 257 strains from this collection.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20630115 PMCID: PMC3091632 DOI: 10.1186/1471-2164-11-433
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1AFLP analysis of . AFLP analysis shows genomic diversity between a representative subset of 102 L. pneumophila strains used for further analysis in this study. For each strain serogroup type (Sg.) and assigned EWGLI-type are shown. The variety of patterns and presence of several different EWGLI-types indicates the random selection of strains representing the natural genomic diversity within this species.
Figure 2Genomic diversity based on Principal Component Analysis (PCA). All 346 microarray data sets (linearly distributed values used, based on the selection of 480 markers) were subjected to Principal Component Analysis to determine whether this would lead to subgrouping of patient-derived or environmental strains of L. pneumophila. Each data set is represented as a single dot in an n-dimensional space of which the three main components (covering most of the overall variation covered within these data) are shown. It is clear from this analysis that no clear differences between environmental and patient-derived strains are detected based on overall genome composition.
Figure 3Genomic diversity based on hierarchical clustering analysis. Hierarchical clustering analysis was performed on all 346 genomic datasets generated in this study. Data were clustered both in columns (representing datasets) and rows (representing 480 spots on the array) with Pearson correlation distances and average linkage. Binarized data were used for this analysis. Yellow represents absence of signal, black represents presence of signal. The color-coded bars on the bottom show the distribution of the strains used for AFLP analysis as shown in Figure 1 (red) and the additional strains (yellow-green), the distribution of patient (orange) and environmental (blue) strains and the distribution of training (purple) and test (green) set strains. In all cases the random distribution of the groups is clearly visible.
GP control parameters
| Elitism | True |
|---|---|
| Inputs | LePn.011A2-b, LePn.019H4, LePn.010B12, LePn.008D6, LePn.024B1 |
| Operators | =, >, <, > =, < =, and, not, or, ?, nand, xor, nor |
| Fitness Function | AUC |
| Population Size | 10000 |
| Cross Over Rate | 0.7 |
| Mutation Rate | 0.3 |
| Generation Limit | 100 |
The 7 rules of the predictive model describing all relevant relationships between the 5 selected markers
| Rule No. | Rule |
|---|---|
| 1 | If (LePn.011A2-b < LePn.019H4) then [not(LePn.011A2-b = LePn.010B12)] else [(LePn.011A2-b = LePn.024B1) and (LePn.008D6 = LePn.010B12)] |
| 2 | If (LePn.011A2-b < LePn.010B12) then [not(LePn.019H4 < LePn.024B1)] else [(LePn.008D6 > LePn.011A2-b) nor (LePn.008D6 < LePn.024B1)] |
| 3 | If (LePn.011A2-b < LePn.010B12) then [not(LePn.024B1 > LePn.019H4)] else [(LePn.010B12 < LePn.008D6) nor (LePn.024B1 > LePn.008D6)] |
| 4 | If (LePn.019H4 = LePn.008D6) then [(LePn.024B1 < = LePn.019H4)] else [(LePn.019H4 < LePn.024B1) nor (LePn.010B12 = LePn.011A2-b)] |
| 5 | If (LePn.019H4 = LePn.008D6) then [(LePn.008D6 > = LePn.024B1)] else [(LePn.010B12 < = LePn.011A2-b) nor (not (LePn.024B1 < = LePn.019H4))] |
| 6 | If (LePn.019H4 = LePn.008D6) then [not(LePn.024B1 > LePn.019H4)] else [(LePn.024B1 > LePn.019H4) nor (LePn.011A2-b > = LePn.010B12 |
| 7 | If (LePn.019H4 = LePn.008D6) then [not(LePn.019H4 < LePn.024B1)] else [(LePn.010B12 = LePn.011A2-b) nor (LePn.019H4 < LePn.024B1)] |
The 7 rules of the predictive model describing all relevant relationships between the 5 selected markers. Each rule "votes" for a sample as being a clinical sample if the result of the rule evaluates to 'True' when the Boolean value for each probe is entered. If 4 or more rules in the table agree that a given sample is clinical, then the "meta-rule" of these 7 rules predicts that the sample is clinical.
Prediction results when testing the data obtained on a set of 148 unique L. pneumophila strains with the statistical five marker model built with 109 unique strains
| Patient derived | Environmental | Total | |
|---|---|---|---|
| Positive Test result | 34 | 35 | 69 |
| Negative Test result | 0 | 79 | 79 |
| Total | 34 | 114 | 148 |
| sensitivity | 100% | ||
| specificity | 69% | ||
Sequence homologies of the 5 predictive markers when compared to available sequences for L. pneumophila strains Paris (lpp), Philadelphia (lpg), Lens (lpl), Corby (LPC) and L. pneumophila str. Paris plasmid (plpp)
| Marker name | Homologues in available genome sequences |
|---|---|
| 11A2 | |
| 19H4 | |
| 10B12 | |
| 24B1 | |
| 8D6 | |
Absence of homologues indicates that no high similarity homologue is present. The sequences are accessible through GenBank accession numbers HM584933-HM584937).