| Literature DB >> 19531738 |
Marco Severgnini1, Paola Cremonesi, Clarissa Consolandi, Giada Caredda, Gianluca De Bellis, Bianca Castiglioni.
Abstract
16S rRNA gene is one of the preferred targets for resolving species phylogenesis issues in microbiological-related contexts. However, the identification of single-nucleotide variations capable of distinguishing a sequence among a set of homologous ones can be problematic. Here we present ORMA (Oligonucleotide Retrieving for Molecular Applications), a set of scripts for discriminating positions search and for performing the selection of high-quality oligonucleotide probes to be used in molecular applications. Two assays based on Ligase Detection Reaction (LDR) are presented. First, a new set of probe pairs on cyanobacteria 16S rRNA sequences of 18 different species was compared to that of a previous study. Then, a set of LDR probe pairs for the discrimination of 13 pathogens contaminating bovine milk was evaluated. The software determined more than 100 candidate probe pairs per dataset, from more than 300 16S rRNA sequences, in less than 5 min. Results demonstrated how ORMA improved the performance of the LDR assay on cyanobacteria, correctly identifying 12 out of 14 samples, and allowed the perfect discrimination among the 13 milk pathogenic-related species. ORMA represents a significant improvement from other contexts where enzyme-based techniques have been employed on already known mutations of a single base or on entire subsequences.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19531738 PMCID: PMC2760787 DOI: 10.1093/nar/gkp499
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Block diagram representing the steps through which ORMA works. The four steps described in the main text are highlighted in gray: (I) Sequence importing and consensus creation; (II) Search of the discriminating positions by SBS algorithm; (III) Retrieval of the candidate sequences from the found positions. The actual design depends on the molecular application chosen; (IV) Quality filtering and ranking of the candidate probes. On the right, in boxes, example screenshots (probe pair design on cyanobacteria dataset) are given for each step. Steps (II) and (III) are indistinguishable in ORMA output and have been represented together. Please note that for visualization purposes only a part of the total 18 sequences are represented.
Cyanobacterial samples and related LDR results for Castiglioni et al. probes and ORMA-designed ones
| Group | Sample ID | Strain/Clone name | Geographic origin | Sequencing classification (score) | LDR results | |
|---|---|---|---|---|---|---|
| Castiglioni | ORMA probes | |||||
| Calothrix | 1 | Calothrix sp.strain PCC 7714 | Small pool, Aldabra Atoll, India | Specific (2/2) | Specific (2/2) | |
| Cylindrospermopsis | 2 | Cylindrospermopsis 1LT32S01 | Trasimeno Lake, Italy | Specific (2/2) | Specific (2/2) | |
| Cylindrospermum | 3 | Cylindrospermum stagnale PCC 7417 | Soil, greenhouse, Stockholm, Sweden | Aspecific (2/2) | Specific (2/2) | |
| Halotolerans | 4 | Cyanothece sp.strain PCC 7418 | Solar Lake, Israel | Aspecific (1/2), Specific (1/2) | Specific (2/2) | |
| Leptolyngbya | 5 | Leptolyngbya sp.strain 0BB 30S02 | Bubano Basin, Imola, Italy | Specific (2/2) | Specific (2/2) | |
| Microcystis | 6 | Microcystis aeruginosa PCC 9354 | Little Rideau Lake, Ontario, Canada | Specific (1/2) No signal (1/2) | Specific (2/2) | |
| Nodularia | 7 | Nodularia 3SD7S01 | Svalbard Islands, Norway | Aspecific (2/2) | Specific (2/2) | |
| Nostoc | 8 | Nostoc sp.strain PCC 7107 | Shallow pond, Point Reyes, CA, USA | Nostoc (100%) | Aspecific (2/2) | Non-specific (0/2) |
| 9 | Nostoc sp.strain PCC 8114 | Water bloom, Lake Hepet.on, Morris Co, NJ, USA | Cylindrospermum (58%) | Non-specific (0/2) | Non-specific (0/2) | |
| Planktothrix | 10 | Planktothrix sp.strain 2 | Lake Markusbölefjärden, Åland Islands, Finland | Specific (2/2) | Specific (2/2) | |
| Prochlorococcus+ Synechococcus | 11 | Prochlorococcus marinus PCC 9511 | Mediterranean Sea | Specific (2/2) | Specific (2/2) | |
| 12 | Synechococcus sp.strain Hegewald 1974-30 | Lake Kuusjärvi, Saukkolahti, Finland | Aspecific (1/2), Specific (1/2) | Specific (2/2) | ||
| Spirulina | 13 | Spirulina major PCC 6313 | Brackish water, Berkeley, CA, USA | Aspecific (2/2) | Specific (2/2) | |
| Synechocystis | 14 | Synechocystis sp.strain PCC 7008 | Shallow pond, Point Reyes, CA, USA | Non-specific (0/2) | Specific (2/2) | |
Where sequencing has been performed, the result of the classification is also reported. Sample ID refers to the numbers used in Figure 2.
aClonal DNA from environmental sample.
bSpecific indicates that only the probe corresponding to the species was present; non-specific means that no probe was present (except for the universal cyanobacteria probe); aspecific means that the species-specific probe was present, but also other probes showed an IF significantly above background signal. The number of replicates is reported within brackets.
cAccording to RDP II database, release 9.60.
Figure 2.Heat maps of P-values deriving from the duplicate LDR experiments on cyanobacteria dataset. (A) Castiglioni et al. probes; (B) ORMA-designed probes. The scale varies between non-significance (>0.05) to high-significance (<0.005). On the x-axis, the IDs of the tested samples (see Table 1 for full description) are reported. On the y-axis, the probe pair name is reported. The line ‘Other’ represents the mean of all the remaining Zip-codes in the universal arrays that were not associated to any actual probe. Experiments on Nostoc samples were repeated twice on different DNAs because of the failure of the first test. Halotolerans probe pair in one replicate of sample 8 (classified as Nostoc) has a P-value of 0.02, above the threshold of 0.01 chosen for significance.
List of probe pairs for the cyanobacteria experiment, associated Zip-codes and major thermodynamic parameters
| Oligo name | Species | Discr Base pos Full | Real Pos | Zip code | Discrim oligo | Common probe | Length of DS | Length of CP | Number of Deg bases DS | Number of Deg bases CP | Score | Intra- group Score | Inter- group score | Seq DS Score | Seq CP Score | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Calothrix _z_36 | Calothrix | 1116 | 93 | 36 | GGTGAGTAACGCGTGAGAATCTGT | CTTYAGGTCGGGGACAACAGTT | 24 | 22 | 65.2 | 63.1 | 0 | 1 | 10 | 3 (3) | 100% | 0 (349) | 0% | 100 | 100 |
| Cylindrospermopsis_z_28 | Cylindrospermopsis | 1560 | 543 | 28 | CGTAAAGGGTCTGCAGGTGGA | ACTGAAAGTCTGCTGTTAAAGAGTTTG | 21 | 27 | 63.3 | 63.7 | 0 | 0 | 10 | 3 (3) | 100% | 3 (349) | 1% | 100 | 100 |
| Cylindrospermum_z_29 | Cylindrospermum | 2133 | 1062 | 29 | GTTTTTAGTTGCCAGCACTTCGGG | TGGGCACTCTAGAGAGACTGC | 24 | 21 | 65.2 | 63.3 | 0 | 0 | 10 | 2 (2) | 100% | 0 (350) | 0% | 100 | 100 |
| Halotolerans_z_13B | Halotolerans | 1634 | 584 | 13B | CTGGTGYGCTAGAGGGCGAC | AGGGGTAGAGGGAATTCCCAG | 20 | 21 | 65.6 | 63.3 | 1 | 0 | 10 | 8 (8) | 100% | 0 (344) | 0% | 95.00 | 100 |
| Leptolyngbya_z_37 | Leptolyngbya | 1202 | 185 | 37 | GTGAAATGTTWTWTYGCCTGAGGATGAA | CTCGCGTCTGATTAGCTAGTTGG | 28 | 23 | 65.0 | 64.6 | 3 | 0 | 10 | 5 (5) | 100% | 0 (347) | 0% | 93.57 | 100 |
| Microcystis _z_1B | Microcystis | 1581 | 524 | 1B | GTCAGCCAAGTCTGCYGTCAAAT | CAGGTTGCTTAACGACCTAAAGGC | 23 | 24 | 63.8 | 65.2 | 1 | 0 | 10 | 91 (91) | 100% | 0 (261) | 0% | 99.09 | 99.27 |
| Nodularia_z_23B | Nodularia | 1239 | 211 | 23B | TAGCTAGTAGGTGTGGTAAAAGCG | CACCTAGGCGACGATCAGTAG | 24 | 21 | 63.5 | 63.3 | 0 | 0 | 10 | 28 (30) | 93% | 14 (322) | 4% | 98.19 | 99.52 |
| Nostoc_z_32 | Nostoc | 1886 | 825 | 32 | GGGGAGTACGCCGGCAACG | GTGAAACTCAAAGGAATTGACGGG | 19 | 24 | 66.0 | 63.5 | 0 | 0 | 10 | 5 (6) | 83% | 4 (346) | 1% | 97.37 | 100 |
| Prochl+Synech _z_3B | Prochlorococcus+ Synechococcus | 1475 | 426 | 3B | CTTGAGGAATAAGCCACGGCTAAT | TCCGTGCCAGCAGCCGCG | 24 | 18 | 63.5 | 65.2 | 0 | 0 | 10 | 86 (86) | 100% | 0 (266) | 0% | 98.60 | 99.94 |
| Planktotrix _z_21B | Planktotrix | 1558 | 510 | 21B | GGGCGTAAAGAGTCCGTAGGTA | GTCATCCAAGTCTGCTGTTAAAGAG | 22 | 25 | 64.0 | 64.1 | 0 | 0 | 10 | 11 (11) | 100% | 0 (341) | 0% | 100 | 100 |
| Spirulina_z_11B | Spirulina | 2473 | 1350 | 11B | CACACCATGGAAGCTGGCAACA | TCCGAAGTCGTTACTCCAACYKTT | 22 | 24 | 64.0 | 63.5 | 0 | 2 | 10 | 10 (11) | 91% | 1 (341) | 0% | 63.64 | 64.39 |
| Synechocystis _z_31 | Synechocystis | 1602 | 576 | 31 | GTTAAAGAATGGAGCTTAACTCCATAG | GAGCGGTGGAAACTGCAAGAC | 27 | 21 | 63.7 | 63.3 | 0 | 0 | 10 | 8 (9) | 89% | 2 (343) | 1% | 97.94 | 97.88 |
| UniCyano_z_8 | UniCyano | 1330 | 304 | 8 | CCTACGGGAGGCAGCAGTG | GGGAATTTTCCGCAATGGGCG | 19 | 21 | 63.8 | 63.3 | 0 | 0 | 10 | 18 (18) | 100% | – | – | 100 | 100 |
| Gloeothece_z_35 | Gloeothece | 1857 | 795 | 35 | GCCGAAGCTAACGCGTTAAGTC | TCCCGCCTGGGGAGTACGC | 22 | 19 | 64.0 | 66.0 | 0 | 0 | 10 | 3 (5) | 60% | 0 (347) | 0% | 97.27 | 100 |
| Lyngbya _z_34 | Lyngbya | 1120 | 112 | 34 | AGTAACGCGTGAGAATCTGCCTTA | GGGTCGGGGACAACCACCG | 24 | 19 | 63.5 | 66.0 | 0 | 0 | 10 | 3 (3) | 100% | 1 (349) | 0% | 100 | 100 |
| Phormidium_z_33 | Phormidium | 1440 | 309 | 33 | TGGGAAGAAAGTTGTGAAAGCAGC | CTGACGGTACCAGAGGAATCAG | 24 | 22 | 63.5 | 64.0 | 0 | 0 | 10 | 2 (2) | 100% | 0 (350) | 0% | 100 | 100 |
| Thrichodesmium_z_27 | Thricodesmium | 1139 | 112 | 27 | CCTTCAGGTCTGGGACAACAGAA | GGAAACTTCTGCTAATCCCGGATG | 23 | 24 | 64.6 | 65.2 | 0 | 0 | 10 | 7 (7) | 100% | 0 (345) | 0% | 99.38 | 99.40 |
| Woronichinia_z_5B | Woronichinia | 1299 | 285 | 5B | GCAGCCACACTGGAACTGAGAA | ACRGTCCAGACTCCTACGGG | 22 | 20 | 64.0 | 63.5 | 0 | 1 | 10 | 2 (2) | 100% | 0 (350) | 0% | 100 | 98.75 |
| Ana+Apha_z_38 | Anabaena+ Aphanizomenon | 1989 | 923 | 38 | ACCTTACCAAGGCTTGACATGTCA | CGAATYCYGTWGAAAKATRGRAGTG | 24 | 25 | 63.5 | 63.3 | 0 | 6 | 8.3 | 62 (68) | 91% | 0 (284) | 0% | 99.20 | 95.59 |
‘Len DS’ (or ‘Len CP’) is the probe length; ‘Tm’ is the melting temperature; ‘Deg bases’ is the number of degenerated bases within each probe; ‘Score’ is proportional to the number of quality checks each probe passed (10 means all, 8.3 is five out of six); ‘Inter-group score ‘and ‘Intra-group score’ evaluate the probe pair specificity (full description in the text); ‘Seq score’ is the score of the consensus sequence (as reported in the text). The exact probes sequence from ORMA is reported. For synthesis purpose, any degenerated base was substituted with inosine (I). The first 11 specific +1 universal probes corresponded to probes which were actually tested on cyanobacteria samples. The last six species were tested only on the synthetic templates. Anabaena + Aphanizomenon probe pair did not show any signal, probably due to high number of degenerated bases in the sequence of the common probe.
aThe reported position refers to the absolute position in the multiple alignment.
bThe ‘Real Position’ refers to the position in the single consensus per species.
Figure 3.Graphical comparison between the Castiglioni et al. and the ORMA-designed probe pair (DS+CP) on Cylindrospermum species, aligned in ClustalW with Cylindrospermum strain sequences (Cy*) and the Leptolyngbya strain sequences (Leptolyngbya* and Lpg*). The part of each probe flanking the discriminating position is highlighted in red (Castiglioni et al. probe pair) or green (ORMA). The bases aligned with the discriminating base are marked by a black box. In Castiglioni et al. probe pair, the discriminating position was found also on some Leptolyngbya strains, whereas in ORMA probe pair, the discriminating position is unique to all Cylindrospermum sequences. Absolute positions of the bases in the alignment are reported on the top ruler.
List of probe pairs for the milk-pathogens experiment and major thermodynamic parameters
| Oligo name | Species | Discr Base pos | Real Pos | Zip code | Discrim oligo | Common probe | Length of DS | Length of CP | #Deg bases DS | #Deg bases CP | Score | Intra- group score | Seq DS score | Seq CP score | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bacillus_z_10 | 880 | 862 | 10 | GCTAAGTGTTAGAGGGTTTCCGCCCTTT AGTGCTGAAGT | TAACGCATTAAGCACTCCGCCTGGGGAG TACGG | 39 | 33 | 67.6 | 68.1 | 0 | 0 | 10 | 313 (313) | 100% | 99.87 | 99.90 | |
| S_equi_z_12 | 224 | 178 | 12 | CTAATACCGCATAAAAGTGGTTGACCC ATGTTAACNATTTAAAAGGAGCAACA | GCTCCACTATGAGATGGACCTGCGTTGT ATTAGCTAGTTG | 53 | 40 | 67.5 | 67.6 | 1 | 0 | 10 | 4 (5) | 80% | 88 | 99.50 | |
| S_agal_z_15 | 87 | 78 | 15 | CGTGCCTAATACATGCAAGTAGAACGCT GAGGTTTGGTGTTTA | CACTAGACTGATGAGTTGCGAACGGGT GAGTAACGC | 43 | 36 | 67.4 | 67.9 | 0 | 0 | 10 | 17 (18) | 94% | 92.25 | 99.69 | |
| S_bovis_z_16 | 91 | 81 | 16 | GTGCCTAATACATGCAAGTAGAACGCTG AAGACTTTAGCTTGCTAA | AGTTGGAAGAGTTGCGAACGGGTGAGT AACGCGTAG | 46 | 36 | 67.2 | 67.9 | 0 | 0 | 10 | 19 (22) | 86% | 92.98 | 98.11 | |
| S_uberis_z_19 | 223 | 192 | 19 | CGCATGACAATAGGGTACACATGTACCC TATTTAAAAGGGGCAAA | TGCTTCACTATGAGATGGACCTGCGTTGT ATTAGCTAGTTGG | 45 | 42 | 67.3 | 67.4 | 0 | 0 | 10 | 5 (5) | 100% | 98.22 | 99.52 | |
| Staph_aureus_z_2 | 222 | 219 | 2 | CCGGATAATATTTTGAACCGCATGGTTCA AAAGTGAAAGACGGTC | TTGCTGTCACTTATAGATGGATCCGCGCT GCATTAGCTAG | 45 | 40 | 67.3 | 67.6 | 0 | 0 | 10 | 61 (62) | 98% | 99.39 | 99.80 | |
| Mycoplasma_z_20 | 906 | 848 | 20 | CATCGACGCAGCTAACGCATTAAATGAT CCGCCTGAGT | AGTACGTTCGCAAGAATAAAACTTAAAG GAATTGACGGGGATCCG | 38 | 45 | 67.7 | 67.3 | 0 | 0 | 10 | 51 (51) | 100% | 98.71 | 98.47 | |
| Staphylococcus_z_21 | 208 | 186 | 21 | GAAACCGGAGCTAATACCGGATAATATA TTGAACCGCATGGTTCAAT | AGTGAAAGACGGTTTTGCTGTCACTTATA GATGGATCCGCG | 47 | 41 | 67.2 | 67.5 | 0 | 0 | 10 | 41 (49) | 84% | 96.79 | 97.51 | |
| E_coli _z_28 | 484 | 469 | 28 | GTTGTAAAGTACTTTCAGCGGGGAGGAA GGGAGTAAAGTTAATAC | CTTTGCTCATTGACGTTACCCGCAGAAGA AGCACCG | 45 | 36 | 67.3 | 67.9 | 0 | 0 | 10 | 10 (11) | 91% | 90.91 | 90.91 | |
| S_canis_z_3 | 474 | 469 | 3 | GATCGTAAAGCTCTGTTGTTAGAGAAGA ACGGTAATGGGAGTGGAAAAC | CCATTATGTGACGGTAACTAACCAGAAA GGGACGGCTAACTAC | 49 | 43 | 68.7 | 68.3 | 0 | 0 | 10 | 6 (6) | 100% | 99.66 | 100 | |
| S_dysgal_z_4 | 1061 | 1039 | 4 | GTCTAGAGATAGGCTTTCCCTTCGGGG CAGG | AGTGACAGGTGGTGCATGGTTGTCGTC AGCTCG | 31 | 33 | 67.0 | 68.1 | 0 | 0 | 10 | 55 (55) | 100% | 99.65 | 100 | |
| Salmonella_z_5 | 258 | 251 | 5 | CCATCAGATGTGCCCAGATGGGATTAGC TTGTTGGTGA | GGTAACGGCTCACCAAGGCGACGATCCC | 38 | 28 | 67.7 | 67.2 | 0 | 0 | 10 | 41 (41) | 100% | 99.87 | 100 | |
| Campylob_1_z_8 | 179 | 148 | 8 | CCCTACACAAGAGGACAACAGTTGGAAA CGACTGCTAATACT | CTATACTCCTGCTTAACACAAGTTGAGTA GGGAAAGTTTTTCGGTG | 42 | 46 | 67.4 | 67.2 | 0 | 0 | 10 | 71 (78) | 91% | 90.60 | 90.75 | |
| Campylob_2_z_9 | 233 | 192 | 9 | CTCTATACTCCTGCTTAACACAAGTTGA GTAGGGAAAGTTTTTCGG | TGTAGGATGAGACTATATAGTATCAGCT AGTTGGTAAGGTAATGGCTTAC | 46 | 50 | 67.2 | 67.0 | 0 | 0 | 10 | 71 (78) | 91% | 90.75 | 89.59 | |
The exact probes sequence from ORMA is reported. For synthesis purpose, any degenerated base was substituted with inosine (I). The description of the reported columns is the same as those in Table 2.
Figure 4.Heat map of P-values deriving from the duplicate LDR experiments on milk pathogen dataset. The scale varies between non-significance (>0.05) to high-significance (<0.005). The line ‘Other’ represents the mean of all the remaining Zip-codes in the universal arrays that were not associated to any actual probe. Complete association between samples numbers and names is given in Supplementary Table 2.