| Literature DB >> 26880910 |
Mohamed Awad1, Osama Ouda2, Ali El-Refy1, Fawzy A El-Feky1, Kareem A Mosa3, Mohamed Helmy4.
Abstract
Sequencing and restriction analysis of genes like 16S rRNA and HSP60 are intensively used for molecular identification in the microbial communities. With aid of the rapid progress in bioinformatics, genome sequencing became the method of choice for bacterial identification. However, the genome sequencing technology is still out of reach in the developing countries. In this paper, we propose FN-Identify, a sequencing-free method for bacterial identification. FN-Identify exploits the gene sequences data available in GenBank and other databases and the two algorithms that we developed, CreateScheme and GeneIdentify, to create a restriction enzyme-based identification scheme. FN-Identify was tested using three different and diverse bacterial populations (members of Lactobacillus, Pseudomonas, and Mycobacterium groups) in an in silico analysis using restriction enzymes and sequences of 16S rRNA gene. The analysis of the restriction maps of the members of three groups using the fragment numbers information only or along with fragments sizes successfully identified all of the members of the three groups using a minimum of four and maximum of eight restriction enzymes. Our results demonstrate the utility and accuracy of FN-Identify method and its two algorithms as an alternative method that uses the standard microbiology laboratories techniques when the genome sequencing is not available.Entities:
Year: 2015 PMID: 26880910 PMCID: PMC4735980 DOI: 10.1155/2015/303605
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Comparison between sequencing-based identification approach and FN-Identify proposed approach.
Names and GenBank accession number of Lactobacillus species used in this study.
| Strain ID | Organism | GenBank accession number |
|---|---|---|
| 1 |
| CP002559 |
| 2 |
| CP000033 |
| 3 |
| CP002338 |
| 4 |
| CP002609 |
| 5 |
| CP000416 |
| 6 |
| CP002652 |
| 7 |
| CP000423 |
| 8 |
| FN692037 |
| 9 |
| CP000156 |
| 10 |
| CR954253 |
| 11 |
| CP000412 |
| 12 |
| CP002033 |
| 13 |
| AP008937 |
| 14 |
| CP000413 |
| 15 |
| CP000517 |
| 16 |
| CP002429 |
| 17 |
| CP002464 |
| 18 |
| FN298497 |
| 19 |
| AE017198 |
| 20 |
| CP001617 |
| 21 |
| CP002222 |
| 22 |
| CP000705 |
| 23 |
| AP007281 |
| 24 |
| AP011548 |
| 25 |
| FM179322 |
| 26 |
| FM179323 |
| 27 |
| CR936503 |
| 28 |
| CP002764 |
| 29 |
| CP002391 |
| 30 |
| CP003032 |
| 31 |
| CP002034 |
| 32 |
| CP000233 |
| 33 |
| CP002461 |
This ID will be used to refer to the species/strains in the text.
This table lists the studied Lactobacillus species/strains and their GenBank accession numbers.
16S rRNA and HSP60 copy numbers and genomics positions.
| Strain ID | 16S rRNA copies number | 16S rRNA position | HSP60 position |
|---|---|---|---|
| 1 | 4 |
57091⋯58665 | 407805⋯409506 |
|
| |||
| 2 | 4 | 59255⋯60826 | 379688⋯381333 |
|
| |||
| 3 | 4 | 66295⋯67869 | 403452⋯405083 |
|
| |||
| 4 | 4 | 55901⋯57475 | 376234⋯377865 |
|
| |||
| 5 | 5 | 86149⋯87711 | 645454⋯647079 |
|
| |||
| 6 | 5 | 706262⋯707824 | 1429276⋯1430898 |
|
| |||
| 7 | 5 | 259510⋯261077 | 2233684⋯2235318 |
|
| |||
| 8 | 4 | 62524⋯64075 | 391450⋯393075 |
|
| |||
| 9 | 9 | 35825⋯37395 | 1448011⋯1449624 |
|
| |||
| 10 | 9 | 45160⋯46720 | 1392354⋯1393967 |
|
| |||
| 11 | 9 | 43705⋯45265 | 1405173⋯1406786 |
|
| |||
| 12 | 5 | 169808⋯171375 | 394255⋯395886 |
|
| |||
| 13 | 5 | 169391⋯170958 | 393747⋯395378 |
|
| |||
| 14 | 6 | 477570⋯479148 | 425524⋯427155 |
|
| |||
| 15 | 4 | 76215⋯77787 | 408372⋯409994 |
|
| |||
| 16 | 4 | 85110⋯86682 | 393232⋯394854 |
|
| |||
| 17 | 4 | 546957⋯548607 | 490210⋯491841 |
|
| |||
| 18 | 4 | 455618⋯457268 | 412091⋯413722 |
|
| |||
| 19 | 6 | 558550⋯560200 | 502509⋯504140 |
|
| |||
| 20 | 5 | 484838⋯486408 | 631044⋯632669 |
|
| |||
| 21 | 5 | 487643⋯489213 | 591466⋯593091 |
|
| |||
| 22 | 6 | 177728⋯179296 | 401807⋯403435 |
|
| |||
| 23 | 6 | 177347⋯178880 | 401630⋯403258 |
|
| |||
| 24 | 5 | 306772⋯308345 | 2303140⋯2304732 |
|
| |||
| 25 | 5 | 307756⋯309313 | 2308734⋯2310368 |
|
| |||
| 26 | 5 | 289782⋯291339 | 2265733⋯2267367 |
|
| |||
| 27 | 7 | 306178⋯307748 | 358686⋯360625 |
|
| |||
| 28 | 41 | 125303⋯126858 | 82036⋯83667 |
|
| |||
| 29 | 5 | 274946⋯276503 | 2240006⋯2241640 |
|
| |||
| 30 | 6 | 274311⋯275837 | 650101⋯651714 |
|
| |||
| 31 | 7 | 74995⋯76521 | 1247027⋯1248649 |
|
| |||
| 32 | 7 | 74540⋯76056 | 1246385⋯1248007 |
|
| |||
| 33 | 7 | 40703⋯42272 | 485966⋯487585 |
1Our Annotation for 16S rRNA sequences in L. kefiranofaciens ZW3.
Primer sequences used for 16S rRNA.
| ID | Gene name | Name | Sequence | Reference |
|---|---|---|---|---|
| 1 | 16S rRNA | 8F | 5′AGAGTTTGATCCTGGCTC AG3′ | [ |
| 2 | 16S rRNA | U1492R | 5′GGTTACCTTGTTACGACTT3′ | [ |
| 3 | 16S rRNA | 928F | 5′TAAAACTYAAAKGAATTGACGGG3′ | [ |
| 4 | 16S rRNA | 336R | 5′ACTGCTGCSYCCCGTAGGAGTCT3′ | [ |
| 5 | 16S rRNA | 1100F | 5′YAACGAGCGCAACCC3′ | [ |
| 6 | 16S rRNA | 1100R | 5′AGGGTTGCGCTCGTTG3′ | [ |
| 7 | 16S rRNA | 907R | 5′CCGTCAATTCCTTTRAGTTT3′ | [ |
| 8 | 16S rRNA | 785F | 5′GGATTAGATACCCTGGTA3′ | [ |
| 9 | 16S rRNA | 805R | 5′GACTACCAGGGTATCTAATC3′ | [ |
| 10 | 16S rRNA | 515F | 5′GTGCCAGCMGCCGCGGTAA3′ | [ |
| 11 | 16S rRNA | 518R | 5′GTATTACCGCGGCTGCTGG3′ | [ |
| 12 | 16S rRNA | 27F | 5′AGAGTTTGATCMTGGCTCAG3′ | [ |
| 13 | 16S rRNA | 1541R | 5′AAGGAGGTGATCCAGCCGCA3′ | [ |
| 14 | HSP60 | HSP60-F | 5′ATGGCWAARGANNTHAARTT3′ | Designed |
| 15 | HSP60 | HSP60-R | 5′TCDGCVACNACNGCTTCNGA3′ | Designed |
16S rRNA selected primers.
Algorithm 1CreateScheme(ℱ, ℰ).
Figure 2An example of a tree T representing an identification scheme. Dotted lines points to a strain that is identified.
Algorithm 2GeneIdentify(𝒢, T).
Figure 3Identification scheme of Lactobacillus using the fragments numbers only of the 16S rRNA gene, proposed by FN-Identify.
Figure 4Identification scheme of Lactobacillus using the fragments numbers and fragments lengths of the 16S rRNA gene, proposed by FN-Identify.
Summary of the employed training and testing datasets and FN-Identify performance.
| Bacterial group | Gram1 | Members | 16S rRNA | HSP60 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Unique sequences2 | Required enzymes | Max.-Min. Enzymes/species3 | Unique sequences2 | Required enzymes | Max.-Min. Enzymes/species3 | |||||||
| 1 factor | 2 factors | 1 factor | 2 factors | 1 factor | 2 factors | 1 factor | 2 factors | |||||
| Training set | ||||||||||||
|
| P. | 33 | 24 | 6 | 5 | 6-6 | 5-3 | 23 | 6 | 5 | 4-1 | 3-1 |
| Testing sets | ||||||||||||
|
| N. | 33 | 32 | 8 | 6 | 8-7 | 7-4 | — | — | — | — | — |
|
| P. | 22 | 18 | 7 | 4 | 7-5 | 4-3 | — | — | — | — | — |
1P: positive and N: negative.
2Members with differences in 16S rRNA sequences. In some cases two or more members have 100% similarity in 16S rRNA sequences. Those members are considered as one entry to FN-Identify.
3The maximum and minimum number of enzymes required identifying a given member of the group.