| Literature DB >> 28265511 |
Marcel Martinez-Porchas1, Enrique Villalpando-Canchola1, Luis Enrique Ortiz Suarez2, Francisco Vargas-Albores1.
Abstract
The 16S rRNA gene has been used as master key for studying prokaryotic diversity in almost every environment. Despite the claim of several researchers to have the best universal primers, the reality is that no primer has been demonstrated to be truly universal. This suggests that conserved regions of the gene may not be as conserved as expected. The aim of this study was to evaluate the conservation degree of the so-called conserved regions flanking the hypervariable regions of the 16S rRNA gene. Data contained in SILVA database (release 123) were used for the study. Primers reported as matches of each conserved region were assembled to form contigs; sequences sizing 12 nucleotides (12-mers) were extracted from these contigs and searched into the entire set of SILVA sequences. Frequency analysis shown that extreme regions, 1 and 10, registered the lowest frequencies. 12-mer frequencies revealed segments of contigs that were not as conserved as expected (≤90%). Fragments corresponding to the primer contigs 3, 4, 5b and 6a were recovered from all sequences in SILVA database. Nucleotide frequency analysis in each consensus demonstrated that only a small fraction of these so-called conserved regions is truly conserved in non-redundant sequences. It could be concluded that conserved regions of the 16S rRNA gene exhibit considerable variation that has to be considered when using this gene as biomarker.Entities:
Keywords: Biodiversity; Conserved regions 16S; Kmers; Primer design
Year: 2017 PMID: 28265511 PMCID: PMC5333541 DOI: 10.7717/peerj.3036
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Specificity analysis of k-mers with different size (9 to 15 nucleotides).
Figure describes the number of sequences of 16S rRNA gene (Silva 123) that showed duplicate reaction. k-mers tested corresponded to primer contigs 3, 5a and 8a. As expected, the longer the k-mer the greater astringency, and the probability of finding a duplicate is reduced. The optimal size was determined by the inflection point.
Primer contigs generated by assembling all of the primers reported for each conserved region of the 16S rRNA gene.
Location is based on E. coli sequence.
| Name | Sequence | Location | References |
|---|---|---|---|
| 1 | AGAGTTTGATYMTGGCTCAG | 8–27 | ( |
| 2 | ASYGGCGNACGGGTGAGTAA | 100–119 | ( |
| 3 | ACTGAGAYACGGYCCARACTCCTACGGRNGGCNGCAGTRRGGAA | 320–363 | ( |
| 4 | GGCTAACTHCGTGNCVGCNGCYGCGGTAANAC | 504–535 | ( |
| 5a | GTGTAGMGGTGAAATKCGTAGAT | 682–704 | ( |
| 5b | CAAACRGGATTAGAWACCCNNGTAGTCCACGC | 778–809 | ( |
| 6a | AAANTYAAANRAATWGRCGGGGRCCCGCACAAG | 906–938 | ( |
| 6b | ATGTGGTTTAATTCGA | 948–963 | ( |
| 6c | CAACGCGARGAACCTTACC | 966–984 | ( |
| 7a | AGGTGNTGCATGGYYGYCGTCAGCTCGTGYCGTGAG | 1045–1080 | ( |
| 7b | TGTTGGGTTAAGTCCCRYAACGAGCGCAACCCT | 1082–1114 | ( |
| 8a | GGAAGGYGGGGAYGACG | 1176–1192 | ( |
| 8b | GGGCKACACACGYGCTAC | 1219–1236 | ( |
| 9 | GCCTTGYACWCWCCGCCCGTC | 1386–1406 | ( |
| 10 | GGGTGAAGTCRTAACAAGGTANCC | 1486–1509 | ( |
Characteristics of primer contigs (Table 1) obtained after assembling all of the primers reported for each conserved region of the 16S rRNA gene.
The number of possible 12-mers is size-dependent, while the number of isomers is related to the number of degenerated bases.
| Name | Length | Degenerated bases | Number of 12-mers | Number of Iso 12-mers |
|---|---|---|---|---|
| 1 | 20 | 2 | 9 | 36 |
| 2 | 20 | 3 | 9 | 61 |
| 3 | 44 | 8 | 33 | 488 |
| 4 | 32 | 6 | 21 | 970 |
| 5a | 23 | 2 | 12 | 30 |
| 5b | 32 | 4 | 21 | 306 |
| 6a | 33 | 7 | 22 | 602 |
| 6b | 16 | 0 | 5 | 5 |
| 6c | 19 | 1 | 8 | 16 |
| 7a | 36 | 5 | 25 | 167 |
| 7b | 33 | 2 | 22 | 57 |
| 8a | 17 | 2 | 6 | 22 |
| 8b | 18 | 2 | 7 | 22 |
| 9 | 21 | 3 | 10 | 68 |
| 10 | 24 | 2 | 13 | 36 |
| Total | 388 | 49 | 223 | 2,886 |
12-mers registering the highest frequency within each primer contig.
| 12-mer | Frequency | |||
|---|---|---|---|---|
| Primer contig | Number | Sequence | Number | Percent |
| 01 | 8 | ATYMTGGCTCAG | 195,901 | 38.16% |
| 02 | 1 | SYGGCGNACGGG | 405,570 | 79.01% |
| 03 | 25 | GGRNGGCNGCAG | 500,253 | 97.46% |
| 04 | 14 | CVGCNGCYGCGG | 496,412 | 96.71% |
| 5a | 5 | GMGGTGAAATKC | 382,156 | 74.45% |
| 5b | 10 | TAGAWACCCNNG | 493,348 | 96.11% |
| 6a | 10 | RAATWGRCGGGG | 501,792 | 97.76% |
| 6b | 2 | GTGGTTTAATTC | 389,530 | 75.89% |
| 6c | 6 | GARGAACCTTAC | 393,614 | 76.68% |
| 7a | 12 | GYYGYCGTCAGC | 499,976 | 97.40% |
| 7b | 20 | CGAGCGCAACCC | 489,290 | 95.32% |
| 8a | 3 | AGGYGGGGAYGA | 454,807 | 88.60% |
| 8b | 2 | GCKACACACGYG | 382,857 | 74.59% |
| 09 | 5 | GYACWCWCCGCC | 388,911 | 75.77% |
| 10 | 6 | AGTCRTAACAAG | 172,918 | 33.69% |
Figure 2Frequency of 12-mers detected in contigs recovered from the different conserved regions of the 16S rRNA gene.
More than one contig were recovered from a single conserved region.
Figure 3Single nucleotide frequencies for the consensus corresponding to primer contigs 3 and 4.
Consensuses recovered from SILVA database are located on the upper side, whereas non-redundant sequences can be observed on the lower side. The red dotted line indicates the limit of 95%.
Figure 4Single nucleotide frequencies for the consensus corresponding to primer contig 5b and 6a, recovered from all database SILVA (upper) and NR sequences (lower).
Red dotted line indicates a limit of 95%.
Non-redundant (NR) sequences detected in consensuses corresponding to primer contigs.
The last two columns indicate how many of NR sequences required to reach a coverage of 95% of corresponding fragment recovered from SILVA database.
| Consensus | NR sequences | NR sequences needed to cover 95% of all fragments recovered | ||
|---|---|---|---|---|
| Number | Percent | Number | Percent | |
| Consensus 2 | 1,844 | 0.45% | 24 | 1.30% |
| Consensus 3 | 11,586 | 2.32% | 498 | 4.30% |
| Consensus 4 | 4,323 | 0.87% | 30 | 0.69% |
| Consensus 5b | 6,694 | 1.36% | 89 | 1.33% |
| Consensus 6a | 5,301 | 1.06% | 42 | 0.79% |
| Consensus 7a | 4,312 | 0.86% | 26 | 0.60% |
| Consensus 8a | 746 | 0.16% | 4 | 0.54% |
| Consensus 9 | 1,078 | 0.28% | 9 | 0.83% |
The primer contig and the corresponding consensuses from both all and non-redundant sequences are aligned.
Conserved nucleotides in the consensus of all sequences are in blue, while those preserved in non-redundant sequences are in red.
| 2 |
| |
|
| ||
|
| ||
| 3 |
| |
|
| ||
|
| ||
| 4 |
| |
|
| ||
|
| ||
| 5b |
| |
|
| ||
|
| ||
| 6a |
| |
|
| ||
|
| ||
| 7a |
| |
|
| ||
|
| ||
| 8a |
| |
|
| ||
|
| ||
| 9 |
| |
|
| ||
|
|