| Literature DB >> 22474405 |
Harry M Bohle1, Toni Gabaldón.
Abstract
Molecular markers serve to assign individual samples to specific groups. Such markers should be easily identified and have a high discrimination power, being highly conserved within groups while showing sufficient variability between the groups that are to be distinguished. The availability of a large number of complete genomic sequences now enables the informed selection of genes as molecular markers based on the observed patterns of variability. We derived a new scoring system based on observed DNA polymorphic differences, and which uses the Bayes theorem as adapted by Wilcox. For validation, we applied this system to the problem of identifying individual species within a prokaryotic (Vibrio) and a eukaryotic (Diphyllobothrium) genus for validation. Top-scoring candidates genes Chromosome segregation ATPase and ATPase-subunit 6 showed better discrimination power in Vibrio and Diphyllobothrium, respectively, as compared to standard molecular markers (recA, dnaJ and atpA for Vibrio, and 18s rRNA, ITS and COX1 for Diphyllobothrium).Entities:
Keywords: Bayes’s theorem; DNA polymorphism; genome analysis; molecular marker
Year: 2012 PMID: 22474405 PMCID: PMC3315472 DOI: 10.4137/EBO.S8989
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
10 top-scoring marker genes for Vibrio species discrimination using S =300 pb.
| Scorei | Locus tag | Size (pb) | Tarima’s | ||
|---|---|---|---|---|---|
| 0.00308 | 0.98387 | 0.03469 | 0.00000 | −0.09022 | |
| 0.00252 | VC1954 | 0.33667 | 0.05809 | 0.00000 | −0.12885 |
| 0.00238 | VC2163 | 0.78667 | 0.03703 | 0.00000 | −0.08185 |
| 0.00237 | VC2354 | 0.47667 | 0.04847 | 0.00000 | −0.10258 |
| 0.00233 | VC2665 | 0.96667 | 0.03374 | 0.00000 | −0.07132 |
| 0.00222 | VC2189 | 0.59667 | 0.04396 | 0.00000 | −0.08477 |
| 0.00212 | VC1986 | 0.60653 | 0.04145 | 0.00000 | −0.08437 |
| 0.00208 | VC2658 | 0.82189 | 0.03318 | 0.00000 | −0.07621 |
| 0.00207 | VC2652 | 0.56667 | 0.03689 | 0.00000 | −0.09897 |
| 0.00207 | VC1534 | 0.59817 | 0.04150 | 0.00000 | −0.08352 |
10 top-scoring marker genes o Diphyllobothrium species discrimination using S =500 pb.
| Score | Gen | Size (pb) | ||
|---|---|---|---|---|
| 0.01175 | 509 | 0.01196 | 0.00013 | |
| 0.01066 | ND6 | 458 | 0.01156 | 0.00015 |
| 0.00733 | ND3 | 356 | 0.00944 | 0.00019 |
| 0.00563 | ND4L | 260 | 0.00833 | 0.00028 |
| 0.00524 | COX2 | 569 | 0.00596 | 0.00023 |
| 0.00479 | ND2 | 878 | 0.00841 | 0.00015 |
| 0.00433 | ND4 | 1250 | 0.01083 | 0.00017 |
| 0.00404 | ND1 | 890 | 0.00719 | 0.00022 |
| 0.00355 | ND5 | 1568 | 0.01115 | 0.00047 |
| 0.00230 | COX1 | 1565 | 0.00720 | 0.00004 |
Prokaryotic molecular markers genes comparison using Discrimination power scoring.
| Species | Accession number | SC | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CSC | Id | (1-Id) | CSC | Id | (1-Id) | CSC | Id | (1-Id) | CSC | Id | (1-Id) | |||
| JN040521 | 0.999 | 0.001 | 0.801 | 0.199 | 0.899 | 0.101 | 0.292 | |||||||
| NZ_AAPS01000071 | 0.999 | 0.001 | 0.883 | 0.117 | 0.967 | 0.033 | 0.191 | |||||||
| NC_002505 | 0.921 | 0.079 | 0.932 | 0.068 | 0.958 | 0.042 | 0.124 | |||||||
| NZ_ACZN01000015 | 0.971 | 0.029 | 0.904 | 0.096 | 0.957 | 0.043 | 0.221 | |||||||
| JN040526 | 0.924 | 0.076 | 0.904 | 0.096 | 0.973 | 0.027 | 0.156 | |||||||
| NC_006840 | 0.876 | 0.124 | 0.852 | 0.148 | 0.918 | 0.082 | 0.239 | |||||||
| JN040529 | 0.861 | 0.139 | 0.848 | 0.152 | 0.889 | 0.111 | 0.327 | |||||||
| JN040527 | 0.882 | 0.118 | 0.853 | 0.147 | 0.942 | 0.058 | 0.198 | |||||||
| JN040517 | 0.979 | 0.021 | 0.925 | 0.075 | 0.979 | 0.021 | 0.132 | |||||||
| JN040531 | 0.845 | 0.155 | 0.825 | 0.175 | 0.835 | 0.165 | 0.354 | |||||||
| JN040530 | 0.921 | 0.079 | 0.932 | 0.068 | 0.958 | 0.042 | 0.124 | |||||||
| JN040535 | 0.971 | 0.029 | 0.904 | 0.096 | 0.957 | 0.043 | 0.423 | |||||||
| JN040523 | 0.893 | 0.107 | 0.851 | 0.149 | 0.974 | 0.026 | 0.213 | |||||||
| JN040516 | 0.917 | 0.083 | 0.133 | 0.970 | 0.030 | 0.132 | ||||||||
| JN040518 | 0.979 | 0.021 | 0.925 | 0.075 | 0.979 | 0.021 | 0.226 | |||||||
| NC_011312 | 0.876 | 0.124 | 0.852 | 0.148 | 0.918 | 0.082 | 0.239 | |||||||
| NC_ABCH01000040 | 0.893 | 0.107 | 0.826 | 0.174 | 0.912 | 0.088 | 0.461 | |||||||
| JN040524 | 0.924 | 0.076 | 0.904 | 0.096 | 0.973 | 0.027 | 0.156 | |||||||
| JN040522 | 0.886 | 0.114 | 0.853 | 0.147 | 0.935 | 0.065 | 0.213 | |||||||
| JN040533 | 0.859 | 0.141 | 0.842 | 0.158 | 0.904 | 0.096 | 0.3 | |||||||
Notes: Underline Score is highest. JN040516-JN040535: In this work.
Abbreviations: SC, Specie code; CSC, Closest specie code; Id, Identity (Match nucleotides/total nucleotides).
Eukaryotic molecular markers genes comparison using Discrimination power scoring.
| Species | Accession number | SC | 18s rRNA | COX1 | 18s + ITS + 5.8s rRNA | ATPase6 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CSC | Id | (1-Id) | CSC | Id | (1-Id) | CSC | Id | (1-Id) | CSC | Id | (1-Id) | |||
| JN040538 | 1 | 0.999 | 0.001 | 0.936 | 0.064 | 0.999 | 0.001 | 0.065 | ||||||
| JN040539 | 2 | 0.999 | 0.001 | 0.902 | 0.098 | 0.999 | 0.001 | 0.108 | ||||||
| JN040536 | 3 | 0.999 | 0.001 | 0.095 | 0.989 | 0.011 | 0.065 | |||||||
| JN040540 | 4 | 0.996 | 0.004 | 0.935 | 0.065 | 0.973 | 0.027 | 0.098 | ||||||
| JN040541 | 5 | 0.964 | 0.036 | 0.849 | 0.151 | 0.853 | 0.147 | 0.177 | ||||||
Notes: Underline Score is higher. JN040536-JN040541: In this work.
Abbreviations: SC, Specie code; CSC, Closest specie code; Id, Identity (Match nucleotides/total nucleotides).