Literature DB >> 28795166

An efficient strategy using k-mers to analyse 16S rRNA sequences.

Marcel Martínez-Porchas¹, Francisco Vargas-Albores¹.

Abstract

The use of k-mers has been a successful strategy for improving metagenomics studies, including taxonomic classifications, or de novo assemblies, and can be used to obtain sequences of interest from the available databases. The aim of this manuscript was to propose a simple but efficient strategy to generate k-mers and to use them to obtain and analyse in silico 16S rRNA sequence fragments. A total of 513,309 bacterial sequences contained in the SILVA database were considered for the study, and homemade PHP scripts were used to search for specific nucleotide chains, recover fragments of bacterial sequences, make calculations and organize information. Consensus sequences matching conserved regions were constructed by aligning most of the primers used in the literature. Sequences of k nucleotides (9- to 15-mers) were extracted from the generated primer contigs. Frequency analysis revealed that k-mer size was inversely proportional to the occurrence of k-mers in the different conserved regions, suggesting a stringency relationship; high numbers of duplicate reactions were observed with short k-mers, and a lower proportion of sequences were obtained with large ones, with the best results obtained using 12-mers. Using 12-mers with the proposed method to obtain and study sequences was found to be a reliable approach for the analysis of 16S rRNA sequences and this strategy may probably be extended to other biomarkers. Furthermore, additional applications such as evaluating the degree of conservation and designing primers and other calculations are proposed as examples.

Entities: CellLine Chemical Disease Species

Keywords: Bioinformatics; Biological sciences; Microbiology

Year: 2017 PMID： 28795166 PMCID： PMC5537200 DOI： 10.1016/j.heliyon.2017.e00370

Source DB: PubMed Journal: Heliyon ISSN： 2405-8440

Introduction

The study of 16S ribosomal RNA (rRNA) sequences has become popular among microbiologists due to the need to study the diversity and structure of microbiomes thriving in specific ecosystems, including those as small as phycosphere [1, 2]. The number of genomic descriptions has greatly increased due to new sequencing technologies and tools for the analysis of metagenomics data [3]. This improved sequencing throughput has allowed for statistically robust quantitative comparisons between communities. The 16S rRNA is a core component of the 30S small subunit of prokaryotes that is currently used for phylogeny building because of its slow rate of evolution [4]. The molecule contains ten conserved (C) regions that are separated by variable (V) regions. The C regions have been used to design and anchor primers for polymerase chain reaction (PCR) amplification, whereas V regions have been useful for taxonomic identification [5]. Several sets of primers have been reported and used for amplifying specific V regions of 16S rRNA; specifically, the V3, V4 and V5 regions have been widely used for studies where taxonomic classification or understanding phylogenic relationships is required [6, 7, 8, 9]. However, no single region can differentiate among all bacteria and therefore the remaining regions have also been used for the same purpose but are preferentially used for studying particular communities. For example, V1 has been demonstrated to be particularly useful for differentiating among species in the genus Staphylococcus [10]. Moreover, Shakya, et al. [11] performed a comparative metagenomics microbial diversity characterization using synthetic communities and reported that all tested primers sets lead to significant taxon-specific biases; not to mention the new and rare sequences deposited daily that do not match with conventional primers. Therefore, it is clear that sometimes in silico analyses may not fully correspond to biological trails, particularly when a specimen whose genome or 16S rRNA sequences has not been uploaded in the databases, and it is considerable abundant in a given niche. Primer sets are usually evaluated by performing PCR on well-characterized samples, often with knowledge of the size and number of the expected products. However, in many circumstances, random environmental samples are used and a positive reaction for most variants cannot be guaranteed for the primers used. In these cases, it is possible to perform virtual PCR on a set of known or reference sequences, with the advantage of avoiding inherent problems of PCR for environmental samples, such as PCR inhibitors, cation requirements or physicochemical properties of primers. This virtual approach would serve to improve coverage when using real biological samples. The use of k-mers has been a successful strategy for improving metagenomics studies [12, 13, 14, 15, 16], including taxonomic classifications [17] and de novo assemblies [18], and can be used to get other sequences of interest from available databases. The k-mer length should be adjusted according to the requirements, seeking an appropriate balance between short k-mers that may exhibit low specificity and long k-mers that may increase stringency and exclude a considerable proportion of sequences. The aim of this manuscript was to propose a simple but efficient strategy to generate k-mers and use them to obtain and analyse in silico 16S rRNA sequences fragments.

Materials and methods

The 513,309 non-redundant bacterial sequences contained in the high quality ribosomal RNA database SILVA (release 123) were used for the study. Homemade PHP scripts were used for searching specific nucleotide chains, recovering fragments of bacterial sequences, making calculations and organizing information.

Primer contigs and generation of k-mers

Prior to contig generation, 214 primers used for the amplification of the different fragments of the bacterial 16S rRNA gene were found in literature; however, only 101 primers contributing with an increase of the amplicon size or containing a different nucleotide were selected for the study. These primers were assembled to generate a continuous “primer contig” sequence to perform a position-by-position sequence-scan analysis of regions (Table 1). Specifically, we tested if the continuity of contigs was interrupted by a 1 nucleotide (or more) gap and if each segment was considered as a sub-contig (a, b or c). Thereafter, sequences of k nucleotides (9 to 15) were extracted from the generated primer contigs.

Table 1

Primer contigs generated by the assembly of all of the primers reported for each conserved region of the 16S rRNA gene. Locations are based on E. coli sequence.

Name	Sequence	Location	References
1	AGAGTTTGATYMTGGCTCAG	8-27	[29, 30, 31, 32, 33, 34, 35, 36, 37]
2	ASYGGCGNACGGGTGAGTAA	100-119	[38, 39]
3	ACTGAGAYACGGYCCARACTCCTACGGRNGGCNGCAGTRRGGAA	320-363	[7, 10, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53]
4	GGCTAACTHCGTGNCVGCNGCYGCGGTAANAC	504-535	[27, 45, 46, 47, 49, 50, 52, 54, 55, 56, 57, 58, 59, 60, 61, 62]
5a	GTGTAGMGGTGAAATKCGTAGAT	682-704	[50, 63, 64]
5b	CAAACRGGATTAGAWACCCNNGTAGTCCACGC	778-809	[7, 36, 43, 45, 50, 55, 56, 58, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76]
6a	AAANTYAAANRAATWGRCGGGGRCCCGCACAAG	906-938	[47, 48, 50, 51, 56, 58, 60, 74, 77, 78, 79, 80, 81, 82]
6b	ATGTGGTTTAATTCGA	948-963	[50, 83]
6c	CAACGCGARGAACCTTACC	966-984	[50, 84, 85, 86]
7a	AGGTGNTGCATGGYYGYCGTCAGCTCGTGYCGTGAG	1045-1080	[50, 55, 84, 85, 87, 88, 89]
7b	TGTTGGGTTAAGTCCCRYAACGAGCGCAACCCT	1082-1114	[43, 45, 47, 50, 52, 59]
8a	GGAAGGYGGGGAYGACG	1176-1192	[50, 90]
8b	GGGCKACACACGYGCTAC	1219-1236	[55, 87]
9	GCCTTGYACWCWCCGCCCGTC	1386-1406	[45, 47, 52, 69, 74, 81, 82, 86, 87, 91, 92]
10	GGGTGAAGTCRTAACAAGGTANCC	1486-1509	[34, 36, 37, 72, 75, 93, 94, 95]

Primer contigs generated by the assembly of all of the primers reported for each conserved region of the 16S rRNA gene. Locations are based on E. coli sequence. The number of k-mers of each size was calculated as consensus size−k + 1; where k was the k‐mer size (9 to 15 in this case). If degeneracies were detected in any k-mer, each isoform was considered for the analysis; for instance, the nucleotide ambiguity code establishes that keto or K represents T or G, therefore the sequence containing this degeneracy was multiplied by two possibilities. This was applied to all types of degeneracies detected in all sequences; for instance, Y, M, S, R, W = 2 possibilities, V, H, B, D = 3, and N = 4. Thus, the primer contig sequences for each region were broken down into 9-, 10-, 11- …15-mers with their respective isoforms replacing any degeneracy with the corresponding nucleotides. Finally, the exact sequence of each k-mer isoform generated from all conserved regions was queried against the entire set of sequences contained in the SILVA database (Fig. 1).

Fig. 1

Workflow established for obtaining primer contigs and the subsequent generation of k-mers.

Frequency and duplicate analysis

For the frequency analysis, each k-mer was considered as a set of possible isomers after degeneracies were replaced by the corresponding nucleotides, as stated above. Thus, every identical isoform of k-mer (iso-k-mer) sequence was searched in the position of the corresponding k-mer. If the iso-k-mer sequence was found, the second occurrence of the same k‐mer was searched for to determine any duplicate reaction.

Search for region C1

In many sequences of the SILVA database, the C1 region located at the 5′‐terminus of the molecule was not detected by region-specific primers. Therefore, the most frequent k-mer for the C3 region was used to obtain C3 sequences attached to fragments towards the 3' end containing C1 due to the high proportion of C3 sequences obtained with k-mers, and then the positions of each of the database sequences were determined. Briefly, C3-positive sequences were grouped by position, and from these, some were extended towards the 3' end containing the C1 region, whereas others were incomplete. For those containing the C1 region, the location of any k-mer within this region was examined. The percentage of positive C1 hits for each sized-group was recorded.

Estimating primers for DGGE

The utility of k-mers for retrieving sequence fragments through simulated PCR was tested by a comparison with primers commonly used to study microbiomes through denaturing gradient gel electrophoresis (DGGE). As recommended by [19], primers E341F (5'‐CCTACGGGNGGCNGCA‐3') and the universal reverse primer U926 (5'‐CCGNCNATTNNTTTNAGTTT‐3') were used. These corresponded to positions 340–355 and 906–925 for forward and reverse primers, respectively, in E. coli 16S rRNA. Thus, a fragment of approximately 685 nucleotides was expected. These primers were designed for annealing in the conserved regions 3 and 6. The 12-mers used for comparison were those registering the highest frequency in the corresponding positions.

Results and discussion

Several primers designed to match conserved regions have been proposed for the amplification and study of complete or partial sequences of the 16S rRNA gene [20, 21]. Differences among primers used for the same conserved region can be as simple as a single base substitution, but in other cases the difference may be a pair of extra nucleotides or the use or degenerate bases for particular positions, etcetera. The main challenge in the study of environmental samples is to obtain sequences from most of the microbes thriving in any niche and consequently, assess comprehensive genomic information about the diversity, structure and functions of the microorganisms forming complex communities. The information contained in databases has exponentially increased in recent years due to the implementation of current high throughput sequencing technologies [22, 23], and therefore is a useful resource for determining the most conserved fraction of each region, which should be considered not only for designing primers but also for evaluating evolutionary or mutational patterns of this molecule. Contigs were successfully obtained by assembling the specific reported primers for each conserved region of the bacterial 16S rRNA; however, two or more gap-free primer contigs were obtained for regions 5, 6, 7, and 8, for a total of 15 primer contigs (Table 1). Contig sizes ranged from 16 to 44 nucleotides, and most of them (except one) exhibited ambiguities (Table 2). The only primer contig that did not require degeneracies was 6b, while primer contig 6a required the highest number of degeneracies with seven out of 33 nucleotides (21%). All primer contigs together covered 388 nucleotide positions, which corresponds to 25% of the average size of the 16S rRNA gene.

Table 2

Primer Contig			Iso9-mers	Iso10-mers	Iso11-mers	Iso12-mers	Iso13-mers	Iso14-mers	Iso15-mers
Name	Length	Ambiguities	Iso9-mers	Iso10-mers	Iso11-mers	Iso12-mers	Iso13-mers	Iso14-mers	Iso15-mers
1	20	2	38	39	38	36	32	28	24
2	20	3	64	63	62	61	60	56	52
3	44	8	288	333	390	488	612	734	856
4	32	6	455	574	765	970	1,175	1,476	1,792
5a	23	2	30	30	30	30	30	30	30
5b	32	4	213	244	275	306	334	350	372
6a	33	7	346	437	504	602	704	926	1,148
6b	16	0	8	7	6	5	4	3	2
6c	19	1	20	19	18	16	14	12	10
7a	36	5	110	125	140	167	194	222	246
7b	33	2	51	53	55	57	59	61	63
8a	17	2	24	24	24	22	20	16	12
8b	18	2	22	22	22	22	22	20	16
9	21	3	59	64	66	68	64	60	56
10	24	2	34	34	34	36	38	40	38
Total	388	49	1,762	2,068	2,429	2,886	3,362	4,034	4,717

Descriptive information of contigs generated after assembly of the reported primers for each conserved region of the 16S rRNA gene. The size of each contig, number of ambiguities detected and the number of iso-k-mers are shown. The number of generated k-mers is dependent on the primer contig size and is easily calculated (k-mers = primer contig size − k + 1), while the number of isomers is related to the number of degeneracies in each k-mer. The number of k-mers is directly dependent on the size of the primer contig and is a positional marker, whereas the number of iso-k-mers is a product of the number of degeneracies in each k‐mer and is a sequential marker. In this case, as the k-mers size increased from 9 to 15, the overall number of k-mers decreased from 268 to 178, but the number of iso k-mers increased from 1,762 to 4,717, respectively (Table 2).

k-mer search and frequency analysis

Frequency analysis revealed that k-mer size was inversely proportional to the occurrence of k-mers in the different conserved regions, suggesting an increase in stringency. Frequency curves for k-mers of different sizes of three primer contigs are shown in Fig. 2 as an example. In general, all curves exhibited similar trends in most cases; however, significant deviations were observed, particularly for shorter k-mers, suggesting that the occurrence of non-specific reactions that could be the result of a decrease in stringency (Fig. 2B). In other cases (Fig. 2C), the curves had the same shape indicating a low influence of size on specificity.

Fig. 2

Frequency of k-mers of 9 to 15 nucleotides detected in different conserved regions of 16S rRNA sequences contained in the SILVA database.

Frequency of k-mers of 9 to 15 nucleotides detected in different conserved regions of 16S rRNA sequences contained in the SILVA database. Another important feature for selecting the appropriate k-mer size is specificity, defined as the ability to react only at one site of the 16S rRNA sequence. Therefore, the presence of duplicate reactions was investigated for each set of k-mers of different sizes. As shown in Fig. 3, the number of duplicates decreased as k-mer size increased in all cases, although with different slopes.

Fig. 3

Duplicate reactions detected within sequences obtained from the SILVA database when using 9- to 15-mers constructed from the primer contigs matching all conserved regions of the 16S rRNA.

Duplicate reactions detected within sequences obtained from the SILVA database when using 9- to 15-mers constructed from the primer contigs matching all conserved regions of the 16S rRNA. Duplicate reactions were substantially reduced when 12-mers were used, and the same occurred when 13-, 14- or 15-mers were considered (Fig. 3). The inflection point at a size of 12 nucleotides suggests that 12-mers represent the optimal k-mer size for maintaining high coverage while avoiding non-specific reactions and with a minimal occurrence of duplicate reactions. Therefore, the use of 12-mers represents a reliable alternative to study 16R rRNA sequences and to determine some characteristics of the conserved regions as well as the usefulness of the reported primers. Table 3 shows the most frequent 12-mers detected in each primer consensus. Conserved regions 3, 4, 5b, 6a, 7a, and 7b registered k-mers with a frequency of ≥95%, whereas conserved regions located at the extremes recorded the lowest frequencies (Table 3). This Table containing the highest frequencies of 12-k-mers in different regions allows for visualizing the most conserved areas and identifying the best candidates for the design and use of primers. The uses for this information are diverse and will depend on research requirements, as discussed below. In addition, the use of k-mers represents an ideal strategy for obtaining major proportions of sequences of this biomarker, which, according to recent evidence, may not be as conserved as expected across the breadth of microbial diversity [24].

Table 3

Primer Contig		12-mer
Number	Sequence	Position	Frequency
1	AGAGTTTGATYMTGGCTCAG	15	195,901 (38.2%)
2	ASYGGCGNACGGGTGAGTAA	100	405,570 (79.0%)
3	ACTGAGAYACGGYCCARACTCCTACGGRNGGCNGCAGTRRGGAA	344	500,253 (97.5%)
4	GGCTAACTHCGTGNCVGCNGCYGCGGTAANAC	517	496,412 (96.7%)
5a	GTGTAGMGGTGAAATKCGTAGAT	686	382,156 (74.4%)
5b	CAAACRGGATTAGAWACCCNNGTAGTCCACGC	787	493,348 (96.1%)
6a	AAANTYAAANRAATWGRCGGGGRCCCGCACAAG	915	501,792 (97.8%)
6b	ATGTGGTTTAATTCGA	949	389,530 (75.9%)
6c	CAACGCGARGAACCTTACC	971	393,614 (76.7%)
7a	AGGTGNTGCATGGYYGYCGTCAGCTCGTGYCGTGAG	1056	499,976 (97.4%)
7b	TGTTGGGTTAAGTCCCRYAACGAGCGCAACCCT	1101	489,290 (95.3%)
8a	GGAAGGYGGGGAYGACG	1176	457,537 (89.1%)
8b	GGGCKACACACGYGCTAC	1220	382,857 (74.6%)
9	GCCTTGYACWCWCCGCCCGTC	1390	388,911 (75.8%)
10	GGGTGAAGTCRTAACAAGGTANCC	1491	172,918 (33.7%)

Primer contigs constructed for the different conserved regions. 12-mers registering the highest frequency in each primer contig are underlined and in bold. The number and sequence of each primer contig, as well as position (following the numbering of E. coli rRNA) and frequency for each 12-mer are indicated. Minor primer contigs are italicized. Consensus sequences of conserved regions obtained by this technique, may serve to design primers for biological samples, but only as one of many considerations that a primer design requires. This is a virtual approach and may not fully correspond to the biological reality of metagenomes; therefore these results should be taken with caution and be considered only as additional information for biological experiments.

Querying the C1 region

The most frequent 12-mer from primer contig 1 was detected in less than 40% of the +513,000 bacterial sequences from the SILVA database. This may be associated with a lack of primer specificity, but mainly stems from the absence of complete sequences corresponding to the C1 region in the database. Despite the exponential increase of information uploaded to databases over the past 5–10 years, most of these data are composed of short sequences of central 16S rRNA positions [25]. Therefore, the lower frequency of terminal regions is expected. According to the data shown in Table 3, only 198,419 out of the 513,000+ sequences contain the most common 12-mer of the C1 region, whereas the most common 12-mer of the C3 region (E. coli, position 344) was detected in 500,057 sequences. The C3 region was selected because of the higher detection frequency along with its proximity to the C1 region (∼344 nucleotides). The results revealed that C1 was detected in a low proportion (if any) of those sequences where C3 was located closer to the 5' end (260–300). This low proportion of C1 was maintained until C3 was located at position 335 or beyond (Fig. 4), whereupon the proportion of C1 was greater than 60% and even approached 100% in some cases. For example, more than 90% of the sequences whose C3 was located at position 355 registered the presence of C1. Apparently, the low frequency of C1 is associated with 5' end‐truncation; something similar may occur in the C10 region, which registers 12-mers with low frequency. The use of the most frequent 12-mers determined for each region was useful to explain and verify that several of the sequences deposited in the database SILVA are truncated. This may represent a source of bias for experiments considering extreme regions of the 16S rRNA molecule for phylogenetic studies. For example, variable regions located at the extremes (V1, V9) have been used to detect particular taxonomic groups [26, 27]; however, a brief study of these sequences through the use of k-mers revealed that many studies considering such variable regions have not considered a substantial missing proportion.

Fig. 4

Proportion of C1 sequences obtained when using 12-mers matching C3 located at different nucleotide positions. For example, C1 was not detected in sequences when C3 is located at position 295 or lower; meanwhile, when C3 is located at position 340 or higher, 80% or more of the sequences contained C1. The cumulative percentage of C3-positive sequences is indicated by the step line.

Comparison with DGGE primers

Performing virtual PCR on a set of sequences is a very useful practice and allows for estimating the response of specific primers in real samples. Although inherent factors to the reaction and the physicochemical properties of the primers are not fully considered, the advantages are substantial, especially in terms of cost and time. From the 513,309 sequences contained in the SILVA 123 database, both DGGE primers considered for the study were found in 468,079 sequences (91.19%), while the pair of 12-mers was detected in 489,734 (95.41%). This difference (greater than 4%) was explained by only one of the primers being detected in > 20,000 sequences, whereas the same pattern occurred in <12,100 sequences when 12-mers were used (Table 4).

Table 4

Reaction	DGGE Primers	12-mers	Primers & 12-mers
Both ends	468,079 (91.19%)	489,734 (95.41%)	466,499 (90.88%)
Forward	20,472 (3.99%)	10,517 (2.05%)	8,493 (1.65%)
Reverse	22,376 (4.36%)	12,058 (2.35%)	11,386 (2.22%)
None	2,382 (0.46%)	1,000 (0.19%)	888 (0.17%)
Total	513,309	513,309	487,266 (94.93%)

Reactions of primers used for DGGE and for the most frequent 12-mers of regions C3 and C6. The vast majority of the SILVA database sequences reacted at both ends, indicating a possible amplification. The reaction occurred only at one end in ∼8% and 4% of sequences when primers and 12-mers were used, respectively. Less than 0.5% did not react with any primer or 12-mer. Sequences not reacting with the forward primer (22,376) were recovered using the most frequent 12-mer with the upstream addition of four nucleotides, as primer sequences should start at this position (Fig. 5). Using this strategy, the recovery of 10,802 additional sequences was possible, and the analysis of these four nucleotides revealed a great diversity of these positions, which explained the lack of in silico detection using the primer. For example, the most frequent tetranucleotides were TCTA (28.7%) and CCTG (27.9%), followed by 121 different tetranucleotide combinations representing an aggregate proportion of 43% (data not shown). Moreover, this small fraction of tetranucleotides is so variable that degeneracies (YHHV) were required to represent 97.5% of sequences.

Fig. 5

Alignment of primers for DGGE and the primer contig. The most frequent 12-mer is underlined, while the difference G8 of the primer, which corresponds to R4 of the 12-mer, is shaded.

Alignment of primers for DGGE and the primer contig. The most frequent 12-mer is underlined, while the difference G8 of the primer, which corresponds to R4 of the 12-mer, is shaded. Another factor affecting primer detection (CCTACGGGNGGCNGCA) was nucleotide G8, which corresponded to the ambiguity R4 of the 12-mer (CGGRNGGCNGCA). From the 10,802 retrieved sequences, only 70.72% (7,640) registered G for this position, whereas A was detected in the remaining 29.27% (3,162). Therefore, the ambiguity R seems to be a better choice for this position, but will depend on an overall evaluation of the primer and the microbial population to be studied or the expected taxonomic groups. Also noteworthy was the 30% increase in detection of the undetected sequences that was achieved when using the ambiguity R instead of the original nucleotide; this represented approximately 0.6% of the total sequences. The analysis of these types of details is not accessible with biological samples because the PCR reaction may still be successful considering that mismatches occur upstream of the zone containing the critical nucleotides for annealing and amplification [28]. Similar results were obtained for the analysis of the 3′ end. A portion of sequences from the SILVA database (20,472 or 3.99%) reacted only with the forward primer. The undetected sequences were recovered using the most frequent 12-mer method, and the fragment corresponding to the primer was then analysed. Several degeneracies were required to cover most of the sequences (RVVHHHVRRKRAATTGACGG). Although the origin causing the lack of reaction was found, whether to increase the number of degeneracies in the primer or to accept the loss of a small percentage (3%) of sequences will depend on the research intent or objective. Finally, using the k-mer strategy, and particularly 12-mers, to obtain and analyse 16S rRNA sequences was found to be reliable. This may not only serve to evaluate the presence of a molecular motif but also to evaluate and design primers, study mutational or evolutionary patterns, and detect rare sequences, along with many other possible applications. Moreover, this approach can consider all of the novel and rare 16S rRNA sequences obtained through shotgun sequencing and deposited in database; adapting in real time to databases actualizations.

Declarations

Author contribution statement

Francisco Vargas-Albores, Marcel Martínez-Porchas: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Funding statement

This work was supported by the National Council for Science and Technology (CONACyT), Mexico, grant 84398 (to FVA).

Competing interest statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

86 in total

1. Changes in archaeal, bacterial and eukaryal assemblages along a salinity gradient by comparison of genetic fingerprinting methods in a multipond solar saltern.

Authors: Emilio O Casamayor; Ramon Massana; Susana Benlloch; Lise Øvreås; Beatriz Díez; Victoria J Goddard; Josep M Gasol; Ian Joint; Francisco Rodríguez-Valera; Carlos Pedrós-Alió
Journal: Environ Microbiol Date: 2002-06 Impact factor: 5.491

2. Identification of candidate periodontal pathogens and beneficial species by quantitative 16S clonal analysis.

Authors: Purnima S Kumar; Ann L Griffen; Melvin L Moeschberger; Eugene J Leys
Journal: J Clin Microbiol Date: 2005-08 Impact factor: 5.948

3. Identification of mesophilic lactic acid bacteria by using polymerase chain reaction-amplified variable regions of 16S rRNA and specific DNA probes.

Authors: N Klijn; A H Weerkamp; W M de Vos
Journal: Appl Environ Microbiol Date: 1991-11 Impact factor: 4.792

4. Microbial population structures in the deep marine biosphere.

Authors: Julie A Huber; David B Mark Welch; Hilary G Morrison; Susan M Huse; Phillip R Neal; David A Butterfield; Mitchell L Sogin
Journal: Science Date: 2007-10-05 Impact factor: 47.728

5. A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria.

Authors: Soumitesh Chakravorty; Danica Helb; Michele Burday; Nancy Connell; David Alland
Journal: J Microbiol Methods Date: 2007-02-22 Impact factor: 2.363

6. Evaluation of nested PCR-DGGE (denaturing gradient gel electrophoresis) with group-specific 16S rRNA primers for the analysis of bacterial communities from different wastewater treatment plants.

Authors: Nico Boon; Wim Windt; Willy Verstraete; Eva M Top
Journal: FEMS Microbiol Ecol Date: 2002-02-01 Impact factor: 4.194

7. Quantitative measure of small-subunit rRNA gene sequences of the kingdom korarchaeota

Authors:
Journal: Appl Environ Microbiol Date: 1998-12 Impact factor: 4.792

8. stringMLST: a fast k-mer based tool for multilocus sequence typing.

Authors: Anuj Gupta; I King Jordan; Lavanya Rishishwar
Journal: Bioinformatics Date: 2016-09-07 Impact factor: 6.937

9. Sequence heterogeneities of genes encoding 16S rRNAs in Paenibacillus polymyxa detected by temperature gradient gel electrophoresis.

Authors: U Nübel; B Engelen; A Felske; J Snaidr; A Wieshuber; R I Amann; W Ludwig; H Backhaus
Journal: J Bacteriol Date: 1996-10 Impact factor: 3.490

10. 16S rRNA gene pyrosequencing of reference and clinical samples and investigation of the temperature stability of microbiome profiles.

Authors: Jun Hang; Valmik Desai; Nela Zavaljevski; Yu Yang; Xiaoxu Lin; Ravi Vijaya Satya; Luis J Martinez; Jason M Blaylock; Richard G Jarman; Stephen J Thomas; Robert A Kuschner
Journal: Microbiome Date: 2014-09-16 Impact factor: 14.650