| Literature DB >> 19129208 |
Jumamurat R Bayjanov1, Michiel Wels, Marjo Starrenburg, Johan E T van Hylckama Vlieg, Roland J Siezen, Douwe Molenaar.
Abstract
MOTIVATION: Pangenome arrays contain DNA oligomers targeting several sequenced reference genomes from the same species. In microbiology, these can be employed to investigate the often high genetic variability within a species by comparative genome hybridization (CGH). The biological interpretation of pangenome CGH data depends on the ability to compare strains at a functional level, particularly by comparing the presence or absence of orthologous genes. Due to the high genetic variability, available genotype-calling algorithms can not be applied to pangenome CGH data.Entities:
Mesh:
Year: 2009 PMID: 19129208 PMCID: PMC2639077 DOI: 10.1093/bioinformatics/btn632
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schematic representation of the PanCGH algorithm for a CGH experiment. The left panel shows the fluorescence of a query strain to a set of probes (p1 to pn) targeting different reference orthologs (homologous genes from reference strains A, B and C) of an ortholog group g. Some probes target several reference orthologs, as shown by the overlap between the probe sets targeting the reference orthologs from strains A and B. In the right panel, a schematic representation of the calculation of the presence score is shown. For each reference ortholog, the mode (indicated with a star) is calculated from the distribution of (log) signals of the corresponding probes. The presence score is the highest of these mode values. In this case, the presence score is above the threshold and equals the mode of the signals targeting the reference ortholog from strain B.
True-positive rate (sensitivity) and true-negative rate (specificity) of the PanCGH genotype-calling algorithm for three L. lactis strains
| Strain | True-positive rate (%) | True-negative rate (%) |
|---|---|---|
| SK11 | 97.6 | 90.5 |
| IL1403 | 97.9 | 86.2 |
| MG1363 | 95.4 | 96.4 |
Fig. 2.Hierarchical clustering of L. lactis strains based on presence/absence predictions of representatives of 4571 ortholog groups of L. lactis. The pairwise binary distance was used as a distance metric and clustering was performed using the average linkage agglomeration method (Hastie et al. 2001). The cluster of strains at the top represents the subspecies cremoris genotype, while the large cluster at the bottom, excluding strains P7266 and P7304, contains strains of subspecies lactis genotype and one strain (LMG8520) of subspecies hordniae phenotype. In these two clusters 1341 groups from the total of 4571 ortholog groups are present in all strains. Though strains P7266 and P7304 have subspecies lactis phenotype, they are far apart from other subspecies lactis strains (see explanation in text). Branches with a solid rectangle are dairy isolates and other strains are isolated from plants.
Functional categories in ortholog groups with frequent false calls in test strain L. lactis MG1363
| Functional category | False-positive | False-negative |
|---|---|---|
| Hypothetical genes | 49.9 | 60 |
| Transposases | 29.2 | 0 |
| Related to transporters | 5.3 | 7.2 |
aAs a percentage of the total number of false cells.