Literature DB >> 24805255

Novel PCR primers for the archaeal phylum Thaumarchaeota designed based on the comparative analysis of 16S rRNA gene sequences.

Jin-Kyung Hong1, Hye-Jin Kim1, Jae-Chang Cho1.   

Abstract

Based on comparative phylogenetic analysis of 16S rRNA gene sequences deposited in an RDP database, we constructed a local database of thaumarchaeotal 16S rRNA gene sequences and developed a novel PCR primer specific for the archaeal phylum Thaumarchaeota. Among 9,727 quality-filtered (chimeral-checked, size >1.2 kb) archaeal sequences downloaded from the RDP database, 1,549 thaumarchaeotal sequences were identified and included in our local database. In our study, Thaumarchaeota included archaeal groups MG-I, SAGMCG-I, SCG, FSCG, RC, and HWCG-III, forming a monophyletic group in the phylogenetic tree. Cluster analysis revealed 114 phylotypes for Thaumarchaeota. The majority of the phylotypes (66.7%) belonged to the MG-I and SCG, which together contained most (93.9%) of the thaumarchaeotal sequences in our local database. A phylum-directed primer was designed from a consensus sequence of the phylotype sequences, and the primer's specificity was evaluated for coverage and tolerance both in silico and empirically. The phylum-directed primer, designated THAUM-494, showed >90% coverage for Thaumarchaeota and <1% tolerance to non-target taxa, indicating high specificity. To validate this result experimentally, PCRs were performed with THAUM-494 in combination with a universal archaeal primer (ARC917R or 1017FAR) and DNAs from five environmental samples to construct clone libraries. THAUM-494 showed a satisfactory specificity in empirical studies, as expected from the in silico results. Phylogenetic analysis of 859 cloned sequences obtained from 10 clone libraries revealed that >95% of the amplified sequences belonged to Thaumarchaeota. The most frequently sampled thaumarchaeotal subgroups in our samples were SCG, MG-I, and SAGMCG-I. To our knowledge, THAUM-494 is the first phylum-level primer for Thaumarchaeota. Furthermore, the high coverage and low tolerance of THAUM-494 will make it a potentially valuable tool in understanding the phylogenetic diversity and ecological niche of Thaumarchaeota.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24805255      PMCID: PMC4013054          DOI: 10.1371/journal.pone.0096197

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The archaeal phylum Thaumarchaeota was proposed in 2008, distinguishing mesophilic ammonia-oxidizing archaeal (AOA) lineages from hyperthermophilic Crenarchaeota lineages [1]. This proposal was based on archaeal phylogeny inferred from rRNA and ribosomal protein sequences, which suggested that mesophilic Crenarchaeota constitute a distinct phylum that branches off near the root of Archaea. A few years later, this distinction was confirmed by genomic information (e.g., the identification of Thaumarchaeota-specific genes) in Cenarchaeum symbiosum, Nitrosopumilus maritimus, and Nitrososphaera gargensis, representatives of marine and terrestrial AOA lineages [2]. The discovery of Thaumarchaeota sparked interest not only in the field of microbial ecology, but also in the fields of evolution, physiology, and molecular biology of the domain Archaea, since the majority of its members described so far are mesophilic ammonia oxidizers [3]. In the past, the Archaea were thought to be confined to extreme environments [4]; consequently, their ecological role in global geochemical cycling was underestimated. However, a number of molecular ecological studies have revealed that the Archaea inhabit a wide variety of moderate environments [5], suggesting that they play a substantial role in global geochemical cycling. Based on 16S rRNA gene surveys, the Thaumarchaeota have been estimated to represent up to 20% and 5% of all prokaryotes in marine and terrestrial environments, respectively [6]–[8]. Another notable feature of the Thaumarchaeota is that all cultured or enriched members of this phylum are ammonia oxidizers [9]–[16]. Before AOA were discovered, ammonia oxidation was thought to be performed exclusively by ammonia-oxidizing bacteria (AOB) in the bacterial phylum Proteobacteria [17]. Initial evidence supporting archaeal ammonia oxidation included the discovery of archaeal homologs of bacterial ammonia monooxigenase genes (amoA and amoB) in metagenomes [18], [19]. Later, additional studies concluded that amoA-carrying archaea are AOA, and suggested that AOA could contribute significantly to the global nitrogen cycle [9], [10], [12], [20]–[23]. Recent studies have also shown that the copy numbers of archaeal amoA are much higher than the copy numbers of bacterial amoA in many soil samples [24]–[29], indicating the predominance of AOA over AOB. Since AOA are highly fastidious organisms (only a few laboratories have successfully isolated or enriched AOA from the environment), most ecological studies of AOA depend on PCR-based molecular methods. Hence, PCR primer specificity inevitably affects analysis and interpretation. However, PCR primers specific for AOA or Thaumarchaeota have not been well established. Moreover, all 16S rRNA primers previously used to quantify AOA targeted a single thaumarchaeotal subgroup [28], [30]–[37]. Almost all molecular ecological studies of AOA or the Thaumarchaeota assumed that marine group 1.1a AOA (hereafter referred to as MG-I) and soil group 1.1b AOA (hereafter referred to as SCG) predominated in marine and terrestrial samples, respectively. Thus, most of these studies employed primers specific to one of these subgroups for estimating the abundance of AOA or Thaumarchaeota. Although different thaumarchaeotal subgroups have been hypothesized to have different niches [28], [38], [39], samples could potentially harbor an unexpected AOA subgroup (e.g., subgroup MG-I in soil samples, or subgroup SCG in marine samples), as observed by Tourna et al. [37] and Beman and Francis [40]; moreover, samples could also harbor multiple subgroups or even as-yet-undiscovered subgroups. In such samples, the abundance and diversity of AOA or Thaumarchaeota could be drastically underestimated. We attributed the lack of phylum-level primers (Thaumarchaeota-directed primers) to the ambiguously defined phylogenetic range of Thaumarchaeota and the limited number of thaumarchaeotal 16S rRNA sequences available at the time of primer design. However, the current availability of a large sequence database has facilitated the timely design of PCR primers covering the entire phylogenetic range of Thaumarchaeota. Such primers will contribute to our understanding of the ecological roles, distribution patterns, and environmental factors shaping the niche of this phylum. In this study, we constructed a local database of thaumarchaeotal 16S rRNA gene sequences by comparatively analyzing all 16S rRNA gene sequences deposited in an RDP database, and developed a phylum-directed primer for Thaumarchaeota. The specificity of the designed primer was assessed by comparing its performance to those of existing subgroup-directed primers. To the best of our knowledge, this is the first study to comprehensively analyze the thaumarchaeotal 16S rRNA gene sequences with the most current database, and to design a Thaumarchaeota-directed primer. Herein, we describe the phylogenetic diversity and breadth of the phylum Thaumarchaeota and the specificity of the newly designed PCR primer.

Materials and Methods

Construction of Local Database

Thaumarchaeotal 16S rRNA gene sequences were collected from the RDP database (release 10.22, 2010, http://rdp.cme.msu.edu) [41]. Because no sequences were found under the database category, “phylum Thaumarchaeota,” in the RDP database, we first identified Thaumarchaeota-related sequences throughout the RDP database using an iterative phylogenetic approach (Fig. 1).
Figure 1

Schematic diagram showing the overall process to construct local database and to design primers.

Bold letters indicate sequence sets (or subsets). Open cross symbols and dashed lines indicate sequence merge points and repeating sub-routines, respectively.

Schematic diagram showing the overall process to construct local database and to design primers.

Bold letters indicate sequence sets (or subsets). Open cross symbols and dashed lines indicate sequence merge points and repeating sub-routines, respectively. Using the selection criteria employed by the RDP website, chimera check (pass) and length (near full-length [>1.2 kb]) of sequences, a total of 9,727 sequences out of 62,077 archaeal sequences were downloaded from the RDP database to construct our local database (Table 1). Prior to the iteration routine for searching and collecting the Thaumarchaeota-related sequences from the downloaded sequences, a backbone phylogenetic tree (Fig. S1) was constructed with representative sequences of archaeal taxa (see below for the phylogenetic reconstruction). The sequence set for the backbone tree, B  =  {backbone sequences}, included 106 sequences from the group MG-I as key members of the phylum Thaumarchaeota, which were selected based on published literature [12], [42]–[49], and 72 genus-level representative sequences (type strains and genome-sequenced strains) from the phyla Euryarchaeota, Crenarchaeota, Nanoarchaeota, and Korarchaeota (Table. S1).
Table 1

Summary of thaumarchaeotal 16S rRNA gene sequences used for phylogenetic analysis and primer design.

Taxaa RDP databaseb Local database
No. of total sequencesNo. of quality-filteredc sequencesNo. of sequencesMean size (base) ± SDd
Crenarchaeota 13,3162,6168721,373.7±89.0
Thermoprotei 13,316 (11,406)e 2,616 (1,837)e 872 (93)e 1,376.3±88.5
Euryarchaeota 35,9856,0726,0721,366.6±80.7
Archaeoglobi 41570701,408.8±65.7
Halobacteria 4,5761,3771,3771,411.6±60.9
Methanobacteria 5,2227287281,302.9±71.5
Methanococci 4741261261,398.4±65.6
Methanomicrobia 14,2702,0582,0581,377.9±59.8
Methanopyri 18551,400.6±100.2
Thermococci 6702292291,426.4±94.5
Thermoplasmata 1,6154374371,326.2±74.8
Unclassified Euryarchaeota 8,7251,0421,0421,326.1±90.7
Korarchaeota 21388881,306.0±75.2
Nanoarchaeota 54331,473.3±23.1
Thaumarchaeota -f -1,5491,376.1±54.3
FSCGg --171,384.1±56.1
HWCG-IIIg --251,342.4±49.3
MG-Ig --1,1001,373.1±56.4
RCg --71,372.1±57.9
SAGMCG-Ig --401,359.8±43.4
SCGg --3551,389.5±45.1
UT-Ih --21,405.5±6.4
UT-IIh --11,232k
UT-IIIh --21,333.0±5.7
DSAGi --3261,293.9±77.2
THSCGi --501,340.6±44.3
MCGi --4831,342.8±91.0
UGi --121,345.4±84.9
Unclassified Archaea j 12,5099482721,330.3±75.0
Total62,0779,7279,7271,363.4±79.8

Phylum- and class-level archaeal groups.

RDP database release 10.22.

Filtered using quality check option (http://rdp.cme.msu.edu).

Standard deviation.

Sequences belong to unclassified Thermoprotei.

Not shown in RDP database.

Subgroups in Thaumarchaeota (sequences closely related to MG-I). FSCG (forest soil crenarchaeotic group); HWCG-III (hot water crenarchaeotic group III); MG-I (marine group I); RC (rice cluster); SAGMCG-I (South Africa gold mine crenarchaeotic group I); SCG (soil crenarchaeotic group).

Unclassified Thaumarchaeota. Unclassified thaumarchaeotal subgroups found in this study.

Archaeal groups distantly related to the phylum Thaumarchaeota. DSAG (deep sea archaeotic group); THSCG (terrestrial hot spring crenarchaeotic group); MCG (miscellaneous crenarchaeotic group); UG (unclassified group).

As shown in RDP database.

Standard deviation not available.

Phylum- and class-level archaeal groups. RDP database release 10.22. Filtered using quality check option (http://rdp.cme.msu.edu). Standard deviation. Sequences belong to unclassified Thermoprotei. Not shown in RDP database. Subgroups in Thaumarchaeota (sequences closely related to MG-I). FSCG (forest soil crenarchaeotic group); HWCG-III (hot water crenarchaeotic group III); MG-I (marine group I); RC (rice cluster); SAGMCG-I (South Africa gold mine crenarchaeotic group I); SCG (soil crenarchaeotic group). Unclassified Thaumarchaeota. Unclassified thaumarchaeotal subgroups found in this study. Archaeal groups distantly related to the phylum Thaumarchaeota. DSAG (deep sea archaeotic group); THSCG (terrestrial hot spring crenarchaeotic group); MCG (miscellaneous crenarchaeotic group); UG (unclassified group). As shown in RDP database. Standard deviation not available. In the first round of the iteration routine (i = 1 and j = 1), a subset of downloaded RDP sequences, d, was added to the sequence set B, resulting in a combined sequence set, C = B+d. Then a subset of MG-I-related sequences, m, in the sequence set d (m⊂d), was identified from a phylogenetic tree constructed with the sequence set C (see below for the phylogenetic reconstruction). The input sequences that formed a tight (bootstrap score >80%) monophyletic cluster with the MG-I sequences that were included in the sequence set B were regarded as ‘MG-I-related sequences’. In the following rounds (j = j+1), MG-I-related sequences were searched repeatedly in a phylogenetic tree constructed with a subtracted sequence set, S ( = S−m (S = C−m, if j = 1) until no more MG-I-related sequences were found in the sequence set S (. After collecting MG-I-related sequences (M = ∑ m) from the sequence subset d, the monophyly of the sequence set M was evaluated again with the bootstrap score, then the iteration routine was performed for the remaining subsets of the RDP sequences, d +1. Finally, the downloaded RDP sequences that formed a monophyletic cluster (∑M) were regarded as thaumarchaeotal sequences (T) in our local database. Our local database sequences are available at http://sdrv.ms/1k8frAc with RDP’s structure-based alignment format. The final version of phylogenetic tree was constructed with the thaumarchaeotal sequences (T) identified during the iteration routine. Backbone sequences (B) were included in the phylogenetic tree to show the phylogenetic position of the phylum Thaumarchaeota. Some sequences distantly related to the group MG-I were also included in the final version of the phylogenetic tree for later reference.

Estimating Genetic Distances and Rarefaction Analysis

Pair-wise genetic distances between sequences in the local database were measured with MEGA [50] using the Jukes-Cantor (JC) model [51] and were subjected to UPGMA (unweighted pair group method with arithmetic mean) cluster analysis implemented in MEGA. Thaumarchaeotal phylotypes were defined by a cophenetic distance of 0.2 (sequence similarity ≥98%) in the cluster analysis, which roughly corresponded to species-level 16S rRNA gene similarity [52]. Intra- and inter-subgroup genetic distances were estimated from the phylotype sequences of the archaeal groups listed in Table 1. For the phyla Crenarchaeota, Euryarchaeota, Korarchaeota, and Nanoarchaeota, only the genus-level representative sequences included in the backbone sequence set (B) were used for the estimation of the genetic distances due to the calculation load. Rarefaction analysis was performed to estimate the phylotype richness of the phylum Thaumarchaeota. Phylotypes were defined as operational taxonomic units (OTUs), and the OTU richness estimators (S) for the phylum Thaumarchaeota and each of its subgroups were calculated using EstimateS [53].

Phylogenetic Reconstruction

A multiple alignment of the phylotype sequences determined by the cluster analysis was subjected to phylogenetic reconstruction. A bacterial sequence (GenBank accession no. J01695) was used as an outgroup, and sequences belonging to archaeal phyla other than the phylum Thaumarchaeota were also included in the phylogenetic analysis. Phylogenetic trees were inferred using the neighbor joining (NJ) algorithm implemented in the MEGA [50]. The tree topology was statistically evaluated by 100 bootstrap re-samplings and was further confirmed using the maximum likelihood (ML) algorithm (GTR+CAT approximation) implemented in the RAxML [54].

Local Bayesian Classifier

A local naïve Bayesian classifier was built with our local database to serve as a training set of sequences. The algorithm previously established by Wang et al [55], which is currently implemented in RDP’s classifier, was used to estimate the probability that a query sequence, S, is a member of phylum D, P(D|S) = P(S|D)×P(D)/P(S), where P(D) is the prior probability of a sequence being a member of phylum D and P(S) is the overall probability of finding sequence S in any phyla. The joint probability of observing the sequence S, which contains a set of words (subsequences, vi), was estimated as P(S|D) = ∏ P(vi|D). Word-specific priors were calculated with the 8-base subsequences, and the priors P(D) and P(S) were assumed to be constant terms according to the original paper [55]. The query sequences that gave the highest probability were classified as being members of the phylum D.

Primer Design

A phylum-directed primer for the Thaumarchaeota was designed with a thaumarchaeotal consensus sequence using Primrose [56]. The consensus sequence (majority rule) for the phylum Thaumarchaeota was obtained from the thaumarchaeotal phylotypes in the local database. We permitted no degenerate site in the primer sequences. The specificity of the designed primer (or primer pairs used) was evaluated using Oligocheck (www.cf.ac.uk/biosi/research/biosoft/) with local database sequences and determined both by the extent to which the primer binds to target group sequences (coverage) and non-target sequences (tolerance). The tolerance of the primer to domain Bacteria was evaluated using the ProbeMatch program implemented in the RDP website. The thermodynamic properties (e.g., free energy, ΔG, predictions) for self-complementary structures (hair-pin and primer-dimer) were determined using NetPrimer (http://www.premierbiosoft.com/netprimer/).

PCR Amplification

Soil samples were collected from Kellerberrin in Western Australia, Hillgate in southern California, La Campana in central Chile and Incheon in Korea, and a marine sediment sample was collected from the continental shelf of the Yellow Sea, west of Jeju Island, Korea. Details of the sampling locations and the physicochemical properties of the samples were described elsewhere [57], [58]. Community DNAs were directly extracted from the samples using the MoBio PowerSoil DNA extraction kit (MoBio Laboratories, Solana Beach, Calif., USA) according to the manufacturer’s protocol and were individually used as PCR templates. The phylum-directed primer designed in this study (primer THAUM-494) was used with the universal primer ARC917R or 1017FAR [59], [60]. The reaction mixture included 25 µl of TaKaRa Ex Taq premix (Takara, Shiga, Japan), 1 µl of each forward and reverse primer (stock concentration, 20 µM), 200 ng of template DNA extracted from the soil sample, and sterilized distilled water to give a 50 µl final volume. The PCR thermal profile was as follows: initial denaturation at 95°C for 5 min, followed by 30 cycles consisting of denaturation at 95°C for 30 s, primer annealing at 55°C for 30 s, and extension at 72°C for 30 s. The final elongation step was extended to 20 min. PCR amplification was performed with a GeneAmp PCR system 9700 (Applied Biosystems, Foster City, Calif., USA). Positive PCR amplicons were confirmed using agarose gel electrophoresis.

Cloning

The PCR amplicons were purified using a QIAquick PCR purification kit (Qiagen, Valencia, Calif., USA) and cloned into TOPO cloning vectors with a TOPO TA cloning kit (Invitrogen, Carlsbad, Calif., USA) to construct the clone libraries according to the manufacturer’s protocol. Insert sequences of 859 clones, which were randomly selected from 10 clone libraries (ca. 90 clones per primer pairs used and 180 clones per sample) were sequenced using an ABI3700 DNA analyzer (Applied Biosystems, Foster City, Calif., USA). The phylogenetic affiliations of the cloned sequences were determined using phylogenetic reconstruction and were further confirmed by our local Bayesian classifier. When the local Bayesian classifier applied, the cloned sequences were classified as being members of the phylum D or its subgroup O that gave the highest probability, P(D|S) or P(O|S), where S was a cloned sequence as a query.

Nucleotide Sequence Accession Numbers

All sequences produced in this study have been deposited in GenBank under the accession numbers KF275675 to KF276604.

Results and Discussion

Overview of Thaumarchaeotal Sequences in Public Databases

To design taxon-directed primers (or probes), the entire taxonomic range of the target taxon should be clearly defined so that designers can easily distinguish sequences belonging to non-target taxa from those belonging to the target taxon. While prokaryote taxonomy should not be solely deduced from 16S rRNA gene sequences, a priori knowledge of the phylogenetic breadth of the target taxon, which is partially reflected in the 16S rRNA-based phylogenetic tree, is a prerequisite for developing primers that specifically bind to complementary regions of target 16S rRNA gene sequences. However, our knowledge regarding the phylogenetic range of Thaumarchaeota is rather limited, with the exceptions of certain subgroups (MG-I, SCG, and HWCG-III [hot water crenarchaeotic group III]) used to define this phylum [1]. A large number of archaeal 16S rRNA gene sequences have been deposited into public databases (e.g., 128,378 and 38,641 in RDP [release 10.32, 2013] and SILVA [release 114, 2013] databases, respectively). However, only two studies, both of which comprehensively analyzed archaeal 16S rRNA sequences that are either closely or distantly related to Thaumarchaeota, have attempted to define the phylogenetic range of Thaumarchaeota [61], [62]. These two studies both classified the archaeal groups MG-I, SCG, SAGMCG-I (South Africa gold mine crenarchaeotic group I), and HWCG-III within the phylum Thaumarchaeota. However, these two studies reached contradictory conclusions regarding classification of the other subgroups. The SILVA database (release 114, 2013, http://arb-silva.de) [63] assigns 15,773 16S rRNA gene sequences to the phylum Thaumarchaeota, and divides Thaumarchaeota into 27 subgroups. However, this classification is mainly based on the phylogenetic assignment of the 16S rRNA gene sequences according to literature, with the phylogenetic positions of those sequences determined by SILVA’s workflow (personal communication). To date, no supporting materials have been published regarding SILVA’s classification scheme for thaumarchaeotal 16S rRNA gene sequences. For example, the SILVA database classifies 16S rRNA gene sequences of MCG (miscellaneous crenarchaeotic group, previously named TMCG, where “T” stands for “terrestrial”), DSAG (deep sea archaeal group, alternatively named MBG-B, marine benthic group B), and HWCG-I (hot water crenarchaeotic group I) into Thaumarchaeota. However, recent studies have concluded that rRNA-based phylogenies are insufficient for determining the relationship of these three archaeal groups to Thaumarchaeota, proposing that more data are required to accurately define their taxonomic positions [62], [64]. The situation is more complicated with other 16S rRNA gene-oriented databases. For example, the RDP database does not retrieve any 16S rRNA gene sequences when the Thaumarchaeota taxonomic categories used. Moreover, the Greengenes database (2013 version, http://greengenes.lbl.gov) [65] does not even index the phylum Thaumarchaeota. Due to the lack of robust phylogenetic affiliations and the difficulties in accessing Thaumarchaeota-related sequences in current public databases, we constructed our own local database of thaumarchaeotal 16S rRNA gene sequences to facilitate the development of Thaumarchaeota-directed primers.

Collection of Thaumarchaeota-related Sequences

We initially constructed a backbone phylogenetic tree (Fig. S1) for the domain Archaea. This tree was built with 106 MG-I sequences, selected from the published literature, and 72 representative sequences from well-defined taxa in the archaeal phyla Crenarchaeota, Euryarchaeota, Korarchaeota, and Nanoarchaeota (Table. S1). Since MG-I is a basal constituent of the phylum Thaumarchaeota [1], the only thaumarchaeotal sequences initially included in the backbone phylogenetic tree were these 106 MG-I sequences. Thaumarchaeota sequences were limited in this way to avoid phylogenetic inferences for other thaumarchaeotal subgroups at the initial stages of sequence collection, as well as to define the minimum phylogenetic range of Thaumarchaeota. In this backbone tree, the MG-I sequences formed a tight monophyletic cluster (bootstrap score, 100%), comprising a lineage distinct from Crenarchaeota, Euryarchaeota, Korarchaeota, and Nanoarchaeota. While Crenarchaeota was a monophyletic group (bootstrap score, 98%), Nanoarchaeota and Korarchaeota were not-well-resolved in this tree. In addition, Euryarchaeota was split into four independent clusters and appeared to be paraphyletic, as previously reported [66], [67]. However, further attempts to clarify the phylogenetic positions of these unresolved phyla were left for future studies, since phylogenetic inference for archaeal phyla other than Thaumarchaeota was considered beyond the scope of this study. In subsequent iteration steps for collecting thaumarchaeotal sequences (Fig. 1), 16S rRNA gene sequences belonging to archaeal phyla other than Thaumarchaeota were considered to be non-target (non-thaumarchaeotal) sequences. All sequences forming a tight monophyletic cluster (bootstrap score >80%) with MG-I sequences were collected and classified as thaumarchaeotal. During iteration routine performed with the 178 sequences included in the backbone tree and the 9,727 RDP quality-filtered archaeal sequences downloaded from the RDP database, 272 RDP sequences could not be assigned to any archaeal group in the backbone tree due to their unstable phylogenetic positions near the root of the domain Archaea. These sequences were assigned to ‘unclassified Archaea’ in our local database. With the exceptions of the RDP sequences that had been properly classified as Euryarchaeota, Nanoarchaeota, or Korarchaeota at the phylum-level in the RDP database (n = 6,163), we found that 2,419 RDP sequences (hereafter referred to as U-RDP sequences) formed clusters distinct from Crenarchaeota, Euryarchaeota, Nanoarchaeota, and Korarchaeota. All of these U-RDP sequences were derived from RDP taxa designated as either ‘unclassified Thermoprotei’ in the phylum Crenarchaeota (n = 1,596) or ‘unclassified Archaea’ (n = 823) (Table 1). A neighbor joining (NJ) phylogenetic tree was constructed with these U-RDP sequences and backbone sequences (Fig. 2); in this tree, 1,549 U-RDP sequences (hereafter referred to as T-RDP sequences) formed a tight monophyletic cluster with MG-I sequences, which was supported by a high bootstrap score (92%). The monophyly of the T-RDP cluster was also confirmed in a maximum likelihood (ML) tree, with an associated bootstrap score of 95%. We assigned this monophyletic group to the phylum Thaumarchaeota, containing nine subgroups in this study (Table 1; Table S2). Among these subgroups, six corresponded to previously recognized archaeal groups: MG-I, SAGMCG-I, SCG, FSCG (forest soil crenarchaeotic group), RC (rice cluster), and HWCG-III. The remaining subgroups, comprised of five sequences (AB050231, AB113628, EF444677, HM187528, and HM187541), were assigned to groups designated as UT-1, -2, and -3 (UT, unclassified Thaumarchaeota) in our local database, since their previous designations were unclear or unavailable. Two UT-1 sequences, AB050231 and AB113628, had previously been reported as SAGMCG-II sequences in previous studies [48], [68], whereas the other U-RDP sequences belonging to SAGMCG-II were merged into the MG-I cluster in our phylogenetic tree. Among the six subgroups corresponding to known archaeal groups, five subgroups (MG-I, SCG, FSCG, RC, and HWCG-III) appeared to be monophyletic (bootstrap scores >50%).
Figure 2

Phylogenetic positions of archaeal sequences in our local database.

The phylogenetic distances of each sequence were calculated using the Jukes-Cantor model, and the tree was constructed using the neighbor-joining (NJ) algorithm. The numbers at the nodes indicates the bootstrap score (as a percentage) and are shown for the frequencies at or above the threshold of 50%. Open circles indicate the bootstrap score >50% estimated using the randomized accelerated maximum likelihood (RAxML) algorithm (GTR+CAT approximation). Arrow indicates an internal node corresponding to the phylum Thaumarchaeota. The scale bar represents the expected number of substitutions per nucleotide position.

Phylogenetic positions of archaeal sequences in our local database.

The phylogenetic distances of each sequence were calculated using the Jukes-Cantor model, and the tree was constructed using the neighbor-joining (NJ) algorithm. The numbers at the nodes indicates the bootstrap score (as a percentage) and are shown for the frequencies at or above the threshold of 50%. Open circles indicate the bootstrap score >50% estimated using the randomized accelerated maximum likelihood (RAxML) algorithm (GTR+CAT approximation). Arrow indicates an internal node corresponding to the phylum Thaumarchaeota. The scale bar represents the expected number of substitutions per nucleotide position. The remaining U-RDP sequences (hereafter referred to as N-RDP sequences, n = 870) were clustered into four independent groups, whose cohesive phylogenetic relationships to known archaeal phyla were not supported by high bootstrap scores (bootstrap scores <50%) in either NJ or ML trees. Three groups of these N-RDP sequences corresponded to previously reported archaeal groups: DSAG, MCG, and THSCG (terrestrial hot spring crenarchaeotic group), whereas the remaining N-RDP group was arbitrarily designated as 'unclassified group’ (UG) in this study. These UG sequences had also been designated as unclassified archaea in previous studies [42], [45], [48], [69], [70]. In addition to phylogenetic tree topologies, the inter-group genetic distances between MG-I and the four N-RDP sequence groups (0.283±0.028) were as great as those between the recognized archaeal phyla Crenarchaeota, Euryarchaeota, Nanoarchaeota, and Korarchaeota (inter-phylum genetic distance, 0.297±0.023) (Table 2). On the other hand, inter-group genetic distances between MG-I and its close sister groups (FSCG, HWCG-III, RC, SAGMCG-I, SCG, and UTs) (0.165±0.035) were significantly smaller (α = 0.05, t-test) than the average inter-phylum genetic distance, thus supporting our initial hypothesis that these groups belong to the phylum Thaumarchaeota. Consequently, we classified MG-I, FSCG, HWCG-III, RC, SAGMCG-I, SCG, and UTs into Thaumarchaeota in our local database and considered DSAG, MCG, THSCG, and UG to be distinct phylum-level groups, not affiliated with any recognized archaeal phylum. Consistent with our results, Pester et al. [62] showed that Thaumarchaeota included the archaeal groups MG-I, SCG, HWCG-III, SAGMCG-I, and FSCG. Pester and colleagues also noted that DSAG and MCG are not clearly affiliated with any established archaeal phylum. In the final version of our local database, we noticed that sequences belonging to another recently proposed archaeal phylum, Aigarchaeota [71], were included in the phylum Crenarchaeota. No close relationships between aigarchaeotal sequences and T-RDP sequences were observed during our analysis.
Table 2

Genetic distances between archaeal taxa (lower left half, mean genetic distance; upper right half, standard deviation) included in the local databasea.

TaxaCrenEuryKorNanoDSAGMCGTHSCGUGThaumFSCGHWCG-IIIMG-IRCSAGMCG-ISCGUT-IUT-IIUT-III
Crenarchaeota (Cren)0.0730.0560.0620.0460.0650.0650.0630.0570.0620.0580.0510.0590.0530.0630.0580.0510.054
Euryarchaeota (Eury)0.3020.0510.0640.0380.0470.0520.0410.0300.0320.0410.0240.0340.0330.0340.0310.0290.032
Korarchaeota (Kor)0.2600.3070.0180.0160.0230.0240.0270.0240.0300.0230.0740.0590.0640.0760.0770.0990.075
Nanoarchaeota (Nano)0.2770.3180.3160.0130.0130.0140.0240.0240.0290.0140.0140.0060.0110.0100.007NAb 0.003
DSAG0.3410.3470.3620.4150.0230.0120.0140.0160.0220.0160.0160.0170.0160.0140.0130.0120.017
MCG 0.246 c 0.2960.2680.3230.3040.0200.0200.0270.0190.0200.0140.0330.0150.0120.0170.0120.017
THSCG 0.232 0.296 0.248 0.3120.317 0.233 0.0170.0270.0240.0190.0170.0340.0200.0170.0190.0180.019
Unclassified group (UG)0.2750.3120.2920.3510.316 0.219 0.2610.0250.0200.0180.0220.0330.0140.0170.0150.0230.014
Thaumarchaeota (Thaum)0.3080.3420.3040.3910.3210.2750.2540.283
FSCG0.2880.3320.3090.3670.3380.255 0.234 0.2870.0270.0260.0340.0250.0280.0260.0260.026
HWCG-III 0.239 0.296 0.245 0.3260.305 0.225 0.198 0.246 0.182 0.0150.0100.0140.0160.0150.0160.015
MG-I0.3210.3480.3160.4020.3230.2880.2680.292 0.234 0.193 0.0100.0150.0130.0100.0090.010
RC0.2790.3190.2870.3750.338 0.238 0.228 0.264 0.174 0.176 0.199 0.0770.0670.0680.0730.071
SAGMCG-I0.2940.3340.2850.3760.3230.262 0.242 0.274 0.219 0.171 0.123 0.166 0.0150.0140.0080.021
SCG0.2740.3270.2730.3640.315 0.243 0.217 0.260 0.200 0.136 0.174 0.175 0.155 0.0200.0120.022
UT-I0.2910.3350.2830.3770.3170.264 0.240 0.270 0.225 0.166 0.124 0.191 0.135 0.118 0.1060.094
UT-II0.3070.3530.3000.3870.3460.2750.2530.296 0.229 0.172 0.146 0.196 0.157 0.116 0.170 0.008
UT-III0.2700.3160.2760.3500.323 0.249 0.236 0.267 0.200 0.152 0.146 0.160 0.095 0.108 0.127 0.132

For the phyla Crenarchaeota, Euryarchaeota, Korarchaeota, and Nanoarchaeota, the backbone sequences were used to estimate the genetic distances.

Not available due to the insufficient number of comparisons.

Genetic distances less than 0.25 are in bold.

For the phyla Crenarchaeota, Euryarchaeota, Korarchaeota, and Nanoarchaeota, the backbone sequences were used to estimate the genetic distances. Not available due to the insufficient number of comparisons. Genetic distances less than 0.25 are in bold. Using cluster analysis (cutoff level, 98% cophenetic similarity), thaumarchaeotal phylotypes were defined with 1,549 thaumarchaeotal 16S rRNA gene sequences (T-RDP sequences; average size, 1376.7±54.3 bases) in the local database. In total, 114 phylotypes were observed (Table 3), with the majority (66.7%) belonging to the MG-I and SCG subgroups, which together contained most (93.9%) of the thaumarchaeotal sequences in our local database. No plateaus were observed on Thaumarchaeota rarefaction curves (Fig. 3), suggesting that, on a global scale, sequence sampling is still inadequate for accurately estimating the extent of diversity within the phylum Thaumarchaeota. However, the numbers of sequences per phylotype within the MG-I and SCG subgroups were much larger than those for other subgroups (Table 3), indicating that the majority of the sequences currently appended to these two groups actually belong to previously sampled phylotypes.
Table 3

Number of phylotypes and intra-group genetic distances of archaeal groups included in the local database.

TaxaIntra-group genetic distancea (mean ± SDb)No. of phylotypeAverage no. of sequences per phylotype
Crenarchaeota 0.152±0.050-c -
Euryarchaeota 0.253±0.075--
Korarchaeota 0.089d --
Nanoarchaeota NAe --
DSAG0.086±0.059--
MCG0.146±0.043--
THSCG0.091±0.060--
UG0.124±0.036--
Thaumarchaeota 0.153±0.05411414.3
FSCG0.108±0.028101.7
HWCG-III0.097±0.027112.3
MG-I0.083±0.0224925.0
RC0.082±0.03332.3
SAGMCG-I0.074±0.02294.3
SCG0.078±0.0212712.5
UT-I0.074d 21.0
UT-IINA11.0
UT-III0.048d 21.0

Intra-group genetic distances for Crenarchaeota, Euryarchaeota, Korarchaeota, and Nanoarchaeota were estimated from the backbone sequences.

Standard deviation.

Not determined.

Standard deviation not available.

Not available due to the insufficient number of sequences.

Figure 3

Rarefaction curves for the phylum Thaumarchaeota and its subgroups.

Phylotypes were defined as operational taxonomic unit (OTU). (A) Phylum Thaumarchaeota and subgroups MG-I and SCG. (B) Subgroups FSCG, HWCG-III, RC, and SAGMCG-I.

Rarefaction curves for the phylum Thaumarchaeota and its subgroups.

Phylotypes were defined as operational taxonomic unit (OTU). (A) Phylum Thaumarchaeota and subgroups MG-I and SCG. (B) Subgroups FSCG, HWCG-III, RC, and SAGMCG-I. Intra-group genetic distances for Crenarchaeota, Euryarchaeota, Korarchaeota, and Nanoarchaeota were estimated from the backbone sequences. Standard deviation. Not determined. Standard deviation not available. Not available due to the insufficient number of sequences.

Design and In silico Evaluation of the Thaumarchaeota-directed Primer

Taking into consideration the specificity for its intended target sequences, as well as its thermodynamic propensities to form self-complementary structures (such as hair-pins and primer-dimers), we developed a Thaumarchaeota-directed primer, henceforth referred to as THAUM-494, from the consensus sequence of all thaumarchaeotal phylotype sequences (Table 3; Table S3). The specificity of THAUM-494, defined in terms of its coverage and tolerance, was evaluated in silico with our local database. We defined primer coverage as the extent to which the primer binds its target group sequences, tolerance as the extent to which the primer binds non-target sequences; both of these parameters for THAUM-494 were compared with those of previously designed primers (or probes) targeting sequences belonging to MG-I or mesophilic Crenarchaeota (Table 4 and 5; Table S4). THAUM-494 showed >90% coverage for Thaumarchaeota and <1% tolerance to non-target taxa, indicating high specificity for the phylum Thaumarchaeota. All non-target sequences (n = 8) with regions complementary to THAUM-494 belonged to the class Thermoprotei of the phylum Crenarchaeota. Further examination of the non-target sequences binding THAUM-494, using our local Bayesian classifier, revealed that they were all highly related to the thaumarchaeotal subgroups SCG or HWCG-III. THAUM-494 covered the major subgroups MG-I, SCG, and SAGMCG-I at a rate of >90% each. THAUM-494 did not bind to sequences in FSCG and RC, two minor subgroups comprising only 1.5% of all thaumarchaeotal sequences.
Table 4

Primers used for PCR and in silico evaluation of the specificity.

PrimerTarget taxonSequence (5′→3′)Sequence position%GCNo.of degenerate sitesThermodynamic propertiesa Reference
E. coli M. jannasch RatingTmHairpin ΔGDimer ΔG
THAUM-494PhylumThaumarchaeota GAATAAGGGGTGGGCAAGT 494–511435–45352.6010056.40.00.0This study
CREN512PhylumCrenarchaeota CTGGTGTCAGCCGCCG 512–527454–46975.0010059.10.00.0Jürgens et al., 2000
542FPhylumCrenarchaeota CGCGGTAATACCAGCYC526–542468–48462.518152.20.0–10.4Hershberger et al., 1996
Cren745aPhylumCrenarchaeota GGTGAGGGATGAAAGCTGGG 755–774696–71760.008661.60.0–7.3Simon et al., 2000
Cren518RPhylumCrenarchaeota TCAGCCGCCGCGGTAAWACCAGC518–541460–48268.216874.7–1.5–16.5Perevalova et al., 2003
GI–554Group MG-IAGGAKGATTATTGGGCCTAA554–573496–51542.117951.6–1.2–10.3Massana et al., 1997
GI_751FGroup MG-IGTCTACCAGAACAYGTTC734–751672–69247.118936.20.0–5.9Gubry-Rangin et al., 2010
771FGroup MG-I ACGGTGAGGGATGAAAGCT 753–771694–71252.608656.70.0–7.3Ochsenreiter et al., 2003
GI_956RGroup MG-ICAATTGGAGTCAACGCCD957–974903–92052.908351.60.0–9.3Beman et al., 2008
957RGroup MG-I CAATTGGAGTCAACGCCG 957–974903–92055.608357.70.09.3Ochsenreiter et al., 2003
MCGI-554rb Group MG-I CAGCACCTCAAGTGGTCA 537–554479–49655.608851.8–1.1–5.4Auguet et al., 2012
MCGI-391fGroup MG-IAAGGTTARTCCGAGTGRTTTC391–422376–39642.1210048.80.00.0Auguet et al., 2012
333Fab DomainArchaea TCCAGGCCCTACGGG 334–348320–33373.308054.3–1.9–9.3Baker et al., 2003
Arch338FDomainArchaea GGCCCTAYGGGGYGCASCAGGC338–359324–34479.036368.3–4.8–16.4Kublanov et al., 2009
340RADomainArchaea CCRGGCCCTACGGGG335–349321–33485.717757.6–3.4–9.8Barns et al., 1994
A340FDomainArchaea CCCTACGGGGYGCASCAG340–357326–34275.029756.9–1.70.0Vetriani et al., 1999
ARC349FDomainArchaea GYGCASCAGKCGMGAAW349–365334–35066.7510039.40.00.0Takai and Horikoshi, 2000
EK510RDomainArchaea AAGGGCYGGGCAAG498–510439–45269.2110048.90.00.0Baker et al., 2003
ARC516DomainArchaea TGYCAGCCGCCGCGGTAAHACCVGC516–541458–48272.735578.9–10.8–16.5Takai and Horikoshi, 2000
PARCH519FDomainArchaea CAGCMGCCGCGGTAA519–533461–47571.416855.20.0–17.5Øvreås et al., 1997
A751FDomainArchaea CCGACGGTGAGRGRYGAA750–767691–70866.7310051.80.00.0Baker et al., 2003
744RADomainArchaea GGATTAGATACCCSGG785–800726–74153.318040.90.0–10.8Barns et al., 1994
ARC806RDomainArchaea ATTAGATACCCSBGTAGTCC787–806728–74744.4210043.00.00.0Takai and Horikoshi, 2000
765FADomainArchaea TAGATACCCSSGTAGTCC789–806730–74750.0210037.70.00.0Barns et al., 1994
Ar9RDomainArchaea GAAACTTAAAGGAATTGGCGGG 906–927851–87245.509062.20.0–5.4Jürgens et al., 2000
ARC915Rb DomainArchaea AGGAATTGGCGGGGGAGCAC 915–934860–87965.009067.80.0–5.4Casamayor et al., 2000
ARC917RDomainArchaea GAATTGGCGGGGGAGCAC 915–934860–87966.709063.20.0–5.4Loy et al., 2002
Arch958Rb DomainArchaea AATTGGAKTCAACGCCGGR958–975904–92152.928056.90.0–10.8DeLong, 1992
1017FARb DomainArchaea GAGAGGWGGTGCATGGCC1044–1060982–99970.618158.60.0–10.3Barns et al., 1994
D34DomainArchaea CAGGCAACGAGCGAGACC 1096–11131035–105266.7010059.50.00.0Arahal et al., 1996
A1098FDomainArchaea GGCAACGAGCGMGACCC1098–11141037–105375.0110059.70.00.0Baker et al., 2003
A1115RDomainArchaea CAACGAGCGAGACCC 1100–11141039–105366.7010048.50.00.0Baker et al., 2003

Calculated using NetPrimer (http://www.premierbiosoft.com/netprimer). Tm was estimated using the Nearest neighbor method implemented in the NetPrimer.

Primers MCGI-554r, 333Fa, ARC915R, Arch958R, and 1017FAR are identical to CREN537, A333F, A934R, A976R, and A1040F, respectively.

Table 5

In silico evaluation (percent matched 16S rRNA gene sequences in the target taxon) of the specificity of Thaumarchaeota- and Crenarchaeota-directed primers.

TaxaNo. of sequences used for evaluation Thaumarchaeota-directed primer Crenarchaeota-directed primer/probeMG-I-directed primer/probe
THAUM-494542FCREN512Cren745aCren518RGI554771F957RGI751FGI956RMCGI391f (MGI391)MCGI554r (Cren537)
Crenarchaeota 8720.9 86.4 90.0 11.0 86.5 10.9 0.6 51.0
Euryarchaeota 6,072≈0≈0≈0≈0
Korarchaeota 88
Nanoarchaeota 3
Thaumarchaeota 1,549 92.9 a 1.4 96.6 94.3 95.9 65.494.622.429.695.435.865.5
FSCG17 82.4 88.2 76.5 82.4 76.5 94.1
HWCG-III2536.0 76.0 100 96.0 100 76.0
MG-I1,100 96.6 97.3 94.8 95.7 92.1 95.2 0.141.8 96.5 50.4 92.3
RC7 100 100 85.7 100 85.7 100
SAGMCG-I40 100 97.5 85.0 97.5 87.5 10.0 97.5
SCG355 90.7 96.3 94.1 96.6 94.1 96.1 93.0
UT-I2 100 100 100 100 100 50.0 100
UT-II1 100 100 100 100 100
UT-III2 100 100 100 100 100 100
DSAG326
THSCG50 92.0 94.0 92.0 2.0
MCG483 88.2 89.6 27.3 88.0 0.2 28.0 0.2 25.5
UG12 100
Unclassified Archaea 272 41.9 44.9 41.2 42.3 41.2 0.4 41.3
Domain Bacteria b 667,899

Coverage values of more than 80% for the target taxa are in bold, and tolerance values of more than 1% to the non-target taxa are under-lined.

Estimation using RDP’s ProbeMatch.

Calculated using NetPrimer (http://www.premierbiosoft.com/netprimer). Tm was estimated using the Nearest neighbor method implemented in the NetPrimer. Primers MCGI-554r, 333Fa, ARC915R, Arch958R, and 1017FAR are identical to CREN537, A333F, A934R, A976R, and A1040F, respectively. Coverage values of more than 80% for the target taxa are in bold, and tolerance values of more than 1% to the non-target taxa are under-lined. Estimation using RDP’s ProbeMatch. In addition to THAUM-494, we also used our local database to evaluate the specificity of previously designed primers (probes) targeting Thaumarchaeota-related taxa (Table 5; Table S4). The first oligonucleotide sequence specifically designed to target one of the thaumarchaeotal subgroups was the probe GI-554 [72], developed for MG-I (crenarchaeal group I in original paper) in 1997. Although the number of available MG-I sequences was limited at the time, GI-554 showed satisfactory coverage (92.1%) for MG-I and extremely low (0.2%) non-target binding in our in silico analysis. However, as intended, GI-554 bound only to MG-I sequences and did not bind to sequences in other thaumarchaeotal subgroups. Several years later, three primer pairs, 771F-957R [7], [33], [35], [73], GI751F-GI956R [31], [74], [75], and MCGI391f-MCGI554r (identical to MGI391 and Cren537) [30], [36], [76], were developed to amplify the 16S rRNA gene sequences of terrestrial crenarchaeota, nitrifying archaea, and MG-I, respectively. Our in silico results showed that 771F had relatively high coverage (94.6%) for Thaumarchaeota, but also had high tolerance to Crenarchaeota (10.9%), as well as other non-target groups (>28.0%). The reverse partner of 771F, 957R, covered only SCG sequences (96.1%) and exhibited low tolerance (<1%) to non-thaumarchaeotal sequences. Similarly, GI-751F bound 41.8% of all MG-I sequences, but did not cover other thaumarchaeotal subgroups. GI-956R showed high coverage (95.4%) for Thaumarchaeota, but was also highly tolerant to Crenarchaeota (tolerance, 51.0%). Primer MCGI391f (MGI391) showed a specificity similar to the primer GI-751F, with slightly increased coverage (50.4%) for MG-I. The coverage and tolerance of MCGI554r (Cren537) were very similar to those of GI-554. Primers 542F, CREN512, Cren745a, and Cren518, originally designed to target the mesophilic group of Crenarchaeota when the current thaumarchaeotal subgroups (MG-I, SCG, and FSCG) were considered to belong to Crenarchaeota [77]–[80], had high coverage not only for Crenarchaeota, but also for Thaumarchaeota. In summary, the previously designed primers with high coverage for the phylum Thaumarchaeota showed high tolerance to non-thaumarchaeotal taxa, and the primers with low tolerance to non-thaumarchaeotal taxa showed low coverage for Thaumarchaeota.

Empirical Evaluation of the Thaumarchaeota-directed Primer

To empirically evaluate the specificity of THAUM-494, we constructed clone libraries using PCR products obtained with THAUM-494 (Table 6). For a reverse primer, one of universal primers, ARC917R [60] or 1017FAR [59] was used, and environmental DNA extracted from soil and marine samples was used as a PCR template. The in silico coverage estimates of the universal primers ARC917R and 1017FAR for Thaumarchaeota were 96.1% and 82.2%, respectively (Table 7 and Table 8). Among the primer pair combinations (THAUM-494 and universal primer) that generated PCR amplicons >400 bp in size, primer pair THAUM-494-ARC917R demonstrated the highest coverage (89.3%) for Thaumarchaeota (Table 8). Five clone libraries were constructed for each primer pair. Phylogenetic positions of cloned sequences were determined from phylogenetic trees built with reference sequences (Fig. S2–S6) and were confirmed with our local Bayesian classifier, developed with the local database sequences.
Table 6

Empirical evaluation of the specificity (percentage of cloned sequences belonging to the phylum Thaumarchaeota and its subgroups) of primer THAUM-494.

SampleReverse primerPhylum Thaumarchaeota Thaumarchaeotal subgroupsNo. of clones analyzed
SiteTypeMG-ISCGSAGMCG-IHWCG-IIIFSCGRCUTsa
Hillgate, CaliforniaWoodlandARC917R96.31.295.181
1017FAR10071.628.4102
Incheon, KoreaPaddy soilARC917R98.95.568.125.391
1017FAR98.93.31.171.123.390
Jeju, KoreaMarine sedimentARC917R90.987.91.51.566
1017FAR85.480.51.23.782
Kellerberrin, AustrailiaWoodlandARC917R92.311.080.21.191
1017FAR10096.43.683
La Campana, ChileWoodlandARC917R98.92.196.894
1017FAR10097.52.579
TotalARC917R95.5±3.7b 423
1017FAR96.9±6.4436

UT-I, -II, and -III.

Standard deviation.

Table 7

In silico evaluation (percent matched 16S rRNA gene sequences) of the specificity of primer pairs (THAUM-494-universal primer).

TaxaNo. of sequences used for evaluationUniversal primers
333Fa (A333F)Arch338F340RAA340FARC349FEK510RARC516PARCH519FA751F744RA
Crenarchaeota 8720 (8.4)a 0.6 (74.7)0 (85.7 b)0 (86.9)0.7 (71.9)0 (0.1)0.9 (94.6)0.9 (96.8)0.3 (90.0)0.9 (49.8)
Euryarchaeota 6,0720 (55.2)0 (78.1)0 (81.9)0 (91.2)0 (86.6)0 (46.3)0 (81.1)0 (96.8)0 (32.9)0 (93.0)
Korarchaeota 880 (22.7)0 (9.1)0 (25.0)0 (19.3)0 (26.1)0 (93.2)0 (92.0)0 (96.6)
Nanoarchaeota 30 (33.0)
Thaumarchaeota 1,5490 (0.1) 85.9 b (90.1)0.1 (0.3)0.1 (0.1) 91.0 (97.2) 89.6 (95.9) 91.1 (97.4)35.6 (36.9) 90.5 (97.0)
FSCG170 (11.8)0 (11.8)0 (82.4)0 (82.4)0 (82.4)0 (82.4)
HWCG-III2524.0 (56.0)36.0 (96.0)36.0 (96.0)36.0 (36.0)32.0 (96.0)
MG-I1,100 88.6 (91.8) 94.7 (98.0) 92.8 (95.9) 94.7 (97.8)49.4 (51.1) 94.5 (97.5)
RC70 (100)0 (100)0 (100)0 (100)
SAGMCG-I40 95.0 (95.0) 97.5 (97.5) 97.5 (97.5) 97.5 (97.5) 95.0 (95.0)
SCG355 88.2 (96.3)0.6 (0.6)0.6 (0.6) 89.3 (98.0) 88.5 (96.3) 89.0 (96.9) 87.9 (96.3)
UT-I2 100 (100) 100 (100) 100 (100) 100 (100) 100 (100)
UT-II1 100 (100) 100 (100) 100 (100) 100 (100) 100 (100)
UT-III2 100 (100) 100 (100) 100 (100) 100 (100) 100 (100)
DSAG3260 (3.7)0 (9.5)0 (9.5)0 (20.6)0 (95.1)0 (96.6)0 (5.5)0 (92.9)
THSCG500 (74.0)0 (88.0)0 (92.0)0 (90.0)0 (94.0)0 (92.0)0 (96.0)0 (94.0)0 (98.0)
MCG4830 (20.9)0 (11.8)0 (86.1)0 (86.1)0 (68.5)0 (90.7)0 (94.0)0 (37.5)0 (88.4)
UG120 (25.0)0 (100)0 (83.3)0 (100)0 (100)0 (83.3)0 (91.7)0 (58.3)
Unclassified Archaea 2720 (35.3)0 (55.9)0 (47.8)0 (77.6)0 (86.0)0 (50.7)0 (86.8)0 (28.7)0 (89.0)
Domain Bacteria c 667,899(92.3 b)

Coverage values of universal primer.

Coverage values of more than 80% for the target taxa are in bold, and tolerance values of more than 1% to the non-target taxa are under-lined.

Estimation using RDP’s ProbeMatch.

Table 8

In silico evaluation (percent matched 16S rRNA gene sequences) of the specificity of primer pairs (THAUM-494-universal primer).

TaxaNo. of sequences used for evaluationUniversal primers
ARC806R765FAAr9RARC915RARC917RArch958R1017FAR (A1040F)D34A1098FA1115R
Crenarchaeota 8720.9 (85.1 b)a 0.9 (53.7)0.9 (92.3)0.9 (75.5)0.9 (76.3)0.8 (51.9)0.9 (74.8)0.1 (26.3)0.1 (61.6)0.1 (61.8)
Euryarchaeota 6,0720 (91.5)0 (91.8)0 (91.5)0 (91.2)0 (91.8)0 (45.3)0 (84.0)0 (59.2)0 (58.8)0 (59.1)
Korarchaeota 880 (93.2)0 (93.2)0 (92.0)0 (92.0)0 (92.0)
Nanoarchaeota 3
Thaumarchaeota 1,549 89.5 (95.9) 89.9 (96.3) 89.2 (95.7) 89.2 (95.9) 89.3 (96.1)20.3 (23.4)76.0 (82.2)0.3 (1.7)0.3 (1.7)0.3 (1.7)
FSCG170 (82.4)0 (82.4)0 (88.2)0 (100)0 (100)0 (100)0 (100)0 (100)0 (100)
HWCG-III2528.0 (88.0)28.0 (88.0)32.0 (96.0)32.0 (92.0)32.0 (92.0)36.0 (100)36.0 (100)12.0 (12.0)12.0 (12.0)12.0 (12.0)
MG-I1,100 93.3 (96.3) 93.5 (96.5) 93.4 (96.5) 92.7 (95.7) 92.9 (96.0)0.4 (0.4)73.5 (75.6)0.2 (0.2)0.2 (0.2)0.2 (0.2)
RC70 (100)0 (100)0 (100)0 (100)0 (100)0 (57.1)0 (100)0 (57.1)0 (57.1)0 (57.1)
SAGMCG-I40 95.0 (95.0) 95.0 (95.0) 97.5 (97.5) 97.5 (97.5) 97.5 (97.5)7.5 (7.5) 100 (100)
SCG355 87.3 (96.1) 88.2 (96.9) 85.6 (93.8) 87.6 (96.3) 87.6 (96.3) 83.4 (91.5) 88.7 (97.7)
UT-I2 100 (100) 100 (100) 100 (100)50.0 (50.0)50.0 (50.0) 100 (100) 100 (100)
UT-II1 100 (100) 100 (100) 100 (100) 100 (100) 100 (100)
UT-III2 100 (100) 100 (100) 100 (100) 100 (100) 100 (100) 100 (100)
DSAG3260 (91.4)0 (91.7)0 (12.3)0 (92.9)0 (93.3)0 (0.9)0 (84.7)0 (3.1)0 (3.1)0 (3.1)
THSCG500 (100)0 (98.0)0 (92.0)0 (68.0)0 (68.0)0 (60.0)0 (94.0)0 (70.0)0 (70.0)0 (70.0)
MCG4830 (88.0)0 (88.2)0 (93.8)0 (92.3)0 (93.2)0 (76.0)0 (88.6)0 (1.2)0 (1.9)0 (1.9)
UG120 (58.3)0 (58.3)0 (91.7)0 (91.7)0 (91.7)0 (91.7)0 (91.7)
Unclassified Archaea 2720 (88.6)0 (89.3)0 (84.6)0 (76.1)0 (78.3)0 (48.9)0 (43.4)0 (36.4)0 (42.3)0 (62.1)
Domain Bacteria c 667,899(4.5 b)(3.7)

Coverage values of universal primer.

Coverage values of more than 80% for the target taxa are in bold, and tolerance values of more than 1% to the non-target taxa are under-lined.

Estimation using RDP’s ProbeMatch.

UT-I, -II, and -III. Standard deviation. Coverage values of universal primer. Coverage values of more than 80% for the target taxa are in bold, and tolerance values of more than 1% to the non-target taxa are under-lined. Estimation using RDP’s ProbeMatch. Coverage values of universal primer. Coverage values of more than 80% for the target taxa are in bold, and tolerance values of more than 1% to the non-target taxa are under-lined. Estimation using RDP’s ProbeMatch. Both primer pairs showed satisfactory specificity under experimental conditions, as predicted by our in silico analysis. Phylogenetic analyses of 436 and 423 clones from libraries constructed with THAUM-494-ARC917R and THAUM-494-1017FA showed that 95.5±3.7% and 96.9±6.4%, respectively, belonged to the phylum Thaumarchaeota (Table 6). The thaumarchaeotal subgroups most frequently sampled were SCG, MG-I, and SAGMCG-I. THAUM-494 showed very low non-target binding under the PCR conditions used in this study. The majority of non-target sequences amplified with both primer pairs belonged to the MCG subgroup. Interestingly, a considerable number of unclassified thaumarchaeotal sequences (UT sequences, 23–28%), whose phylogenetic positions within Thaumarchaeota were not precisely defined by our thaumarchaeotal subgrouping scheme, were observed in the Hillgate woodland soil sample and an Incheon paddy soil. These results suggest that Thaumarchaeota might contain additional as-yet-undiscovered diversity, which can be further explored with THAUM-494 in future work. The proportions of thaumarchaeotal subgroups in each clone library varied slightly with the universal primer used. For example, MG-I sequences were not detected in the Kelleberrin sample when THAUM-494 was paired with 1017FAR; in contrast, MG-I sequences comprised 11% of the total amplified sequences when THAUM-494 was paired with ARC917R. We attributed such variations in subgroup detection to coverage differences between universal primers for each thaumarchaeotal subgroup. Our in silico analysis (Table 8) indicated that 1017FAR showed lower coverage (75.6%) for MG-I than ARC917R (96.0%). Hence, it is very important to pair THAUM-494 with the appropriate universal primer for unbiased sampling of Thaumarchaeota diversity. Although we experimentally tested only two universal primers to determine whether the choice of universal primer affects post-PCR sequence analysis, the coverages and tolerances of other well-known archaeal universal primers were estimated throughout the archaeal taxa by in silico analyses (Table 7 and Table 8; Table S5–S7). Thus, the results presented here will serve as a guideline for the selection of appropriate primer pairs for researchers to use in their particular applications.

Concluding Remarks

Our knowledge of the phylum Thaumarchaeota, particularly regarding its ecological niche and diversity, is expanding rapidly, but is still limited. Since Thaumarchaeota are globally distributed and abundant [2], [62], these archaea likely play a crucial role in sustaining species diversity as well as maintaining geochemical cycles. Physiological, molecular, and ecological surveys have been undertaken to better understand this phylum. As a result of such efforts, a large number of thaumarchaeotal 16S rRNA gene sequences have been deposited in public databases. However, our rarefaction analyses indicate that the richness of thaumarchaeotal phylotypes has not yet reached its plateau, indicating that this phylum may have a much wider phylogenetic breadth than currently estimated. To facilitate comprehensive exploration of the diversity and ecological role of Thaumarchaeota, we developed THAUM-494, the first phylum-level primer for Thaumarchaeota to the best of our knowledge. The high coverage and low tolerance of THAUM-494 make it especially useful for estimating phylogenetic diversity and determining the distribution patterns of Thaumarchaeota (e.g., high-throughput metagenome sequencing and real-time PCR assays). Furthermore, this primer will be a valuable tool for understanding the ecological niche of Thaumarchaeota. Backbone phylogenetic tree. The phylogenetic distances of each sequence were calculated using the Jukes-Cantor model, and the tree was constructed using the neighbor-joining algorithm. The numbers at the nodes indicates the bootstrap score (as a percentage) and are shown for the frequencies at or above the threshold of 50%. The scale bar represents the expected number of substitutions per nucleotide position. (PDF) Click here for additional data file. Phylogenetic positions of cloned sequences. Cloned sequences recovered from Kellerberrin, Australia. A, primer pairs THAUM-494-ARC917R; B, primer pairs THAUM-494-1017R. The phylogenetic distances of each sequence were calculated using the Jukes-Cantor model, and the tree was constructed using the neighbor-joining algorithm. The numbers at the nodes indicates the bootstrap score (as a percentage) and are shown for the frequencies at or above the threshold of 50%. The scale bar represents the expected number of substitutions per nucleotide position. (PDF) Click here for additional data file. Phylogenetic positions of cloned sequences. Cloned sequences recovered from Hillgate, California. A, primer pairs THAUM-494-ARC917R; B, primer pairs THAUM-494-1017R. The phylogenetic distances of each sequence were calculated using the Jukes-Cantor model, and the tree was constructed using the neighbor-joining algorithm. The numbers at the nodes indicates the bootstrap score (as a percentage) and are shown for the frequencies at or above the threshold of 50%. The scale bar represents the expected number of substitutions per nucleotide position. (PDF) Click here for additional data file. Phylogenetic positions of cloned sequences. Cloned sequences recovered from La Campana, Chile. A, primer pairs THAUM-494-ARC917R; B, primer pairs THAUM-494-1017R. The phylogenetic distances of each sequence were calculated using the Jukes-Cantor model, and the tree was constructed using the neighbor-joining algorithm. The numbers at the nodes indicates the bootstrap score (as a percentage) and are shown for the frequencies at or above the threshold of 50%. The scale bar represents the expected number of substitutions per nucleotide position. (PDF) Click here for additional data file. Phylogenetic positions of cloned sequences. Cloned sequences recovered from Incheon, Korea. A, primer pairs THAUM-494-ARC917R; B, primer pairs THAUM-494-1017R. The phylogenetic distances of each sequence were calculated using the Jukes-Cantor model, and the tree was constructed using the neighbor-joining algorithm. The numbers at the nodes indicates the bootstrap score (as a percentage) and are shown for the frequencies at or above the threshold of 50%. The scale bar represents the expected number of substitutions per nucleotide position. (PDF) Click here for additional data file. Phylogenetic positions of cloned sequences. Cloned sequences recovered from Jeju, Korea. A, primer pairs THAUM-494-ARC917R; B, primer pairs THAUM-494-1017R. The phylogenetic distances of each sequence were calculated using the Jukes-Cantor model, and the tree was constructed using the neighbor-joining algorithm. The numbers at the nodes indicates the bootstrap score (as a percentage) and are shown for the frequencies at or above the threshold of 50%. The scale bar represents the expected number of substitutions per nucleotide position. (PDF) Click here for additional data file. Sequences used for the backbone phylogenetic tree. (PDF) Click here for additional data file. Thaumarchaeotal sequences included in the local database. (PDF) Click here for additional data file. Primers designed in this study and their thermodynamic properties. (PDF) Click here for additional data file. Previously designed -directed primers and MG-I-directed primers not included in . (PDF) Click here for additional data file. Archaeal universal primers not included in . (PDF) Click here for additional data file. evaluation of the specificity of the primers not included in and (local database). (PDF) Click here for additional data file. evaluation of the specificity of the primers not included in and (RDP database). (PDF) Click here for additional data file.
  78 in total

1.  Population structure and phylogenetic characterization of marine benthic Archaea in deep-sea sediments.

Authors:  C Vetriani; H W Jannasch; B J MacGregor; D A Stahl; A L Reysenbach
Journal:  Appl Environ Microbiol       Date:  1999-10       Impact factor: 4.792

2.  Enrichment and characterization of an autotrophic ammonia-oxidizing archaeon of mesophilic crenarchaeal group I.1a from an agricultural soil.

Authors:  Man-Young Jung; Soo-Je Park; Deullae Min; Jin-Seog Kim; W Irene C Rijpstra; Jaap S Sinninghe Damsté; Geun-Joong Kim; Eugene L Madsen; Sung-Keun Rhee
Journal:  Appl Environ Microbiol       Date:  2011-10-14       Impact factor: 4.792

3.  Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB.

Authors:  T Z DeSantis; P Hugenholtz; N Larsen; M Rojas; E L Brodie; K Keller; T Huber; D Dalevi; P Hu; G L Andersen
Journal:  Appl Environ Microbiol       Date:  2006-07       Impact factor: 4.792

Review 4.  The origin and evolution of Archaea: a state of the art.

Authors:  Simonetta Gribaldo; Celine Brochier-Armanet
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2006-06-29       Impact factor: 6.237

5.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2006-08-23       Impact factor: 6.937

6.  Archaeal diversity and the prevalence of Crenarchaeota in salt marsh sediments.

Authors:  Katelyn A Nelson; Nicole S Moin; Anne E Bernhard
Journal:  Appl Environ Microbiol       Date:  2009-04-24       Impact factor: 4.792

7.  The influence of soil pH on the diversity, abundance and transcriptional activity of ammonia oxidizing archaea and bacteria.

Authors:  Graeme W Nicol; Sven Leininger; Christa Schleper; James I Prosser
Journal:  Environ Microbiol       Date:  2008-08-14       Impact factor: 5.491

8.  High level of bacterial diversity and novel taxa in continental shelf sediment.

Authors:  Jin-Kyung Hong; Jae-Chang Cho
Journal:  J Microbiol Biotechnol       Date:  2012-06       Impact factor: 2.351

9.  Ammonia-oxidizing bacteria and archaea grow under contrasting soil nitrogen conditions.

Authors:  Hong J Di; Keith C Cameron; Ju-Pei Shen; Chris S Winefield; Maureen O'Callaghan; Saman Bowatte; Ji-Zheng He
Journal:  FEMS Microbiol Ecol       Date:  2010-03-08       Impact factor: 4.194

10.  Genomic analysis reveals chromosomal variation in natural populations of the uncultured psychrophilic archaeon Cenarchaeum symbiosum.

Authors:  C Schleper; E F DeLong; C M Preston; R A Feldman; K Y Wu; R V Swanson
Journal:  J Bacteriol       Date:  1998-10       Impact factor: 3.490

View more
  2 in total

1.  Environmental Variables Shaping the Ecological Niche of Thaumarchaeota in Soil: Direct and Indirect Causal Effects.

Authors:  Jin-Kyung Hong; Jae-Chang Cho
Journal:  PLoS One       Date:  2015-08-04       Impact factor: 3.240

2.  Correction: Novel PCR Primers for the Archaeal Phylum Thaumarchaeota Designed Based on the Comparative Analysis of 16S rRNA Gene Sequences.

Authors:  Jin-Kyung Hong; Hye-Jin Kim; Jae-Chang Cho
Journal:  PLoS One       Date:  2017-04-11       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.