| Literature DB >> 24572375 |
Mark A Ragan1, Guillaume Bernard1, Cheong Xin Chan1.
Abstract
From 1971 to 1985, Carl Woese and colleagues generated oligonucleotide catalogs of 16S/18S rRNAs from more than 400 organisms. Using these incomplete and imperfect data, Carl and his colleagues developed unprecedented insights into the structure, function, and evolution of the large RNA components of the translational apparatus. They recognized a third domain of life, revealed the phylogenetic backbone of bacteria (and its limitations), delineated taxa, and explored the tempo and mode of microbial evolution. For these discoveries to have stood the test of time, oligonucleotide catalogs must carry significant phylogenetic signal; they thus bear re-examination in view of the current interest in alignment-free phylogenetics based on k-mers. Here we consider the aims, successes, and limitations of this early phase of molecular phylogenetics. We computationally generate oligonucleotide sets (e-catalogs) from 16S/18S rRNA sequences, calculate pairwise distances between them based on D 2 statistics, compute distance trees, and compare their performance against alignment-based and k-mer trees. Although the catalogs themselves were superseded by full-length sequences, this stage in the development of computational molecular biology remains instructive for us today.Entities:
Keywords: 16S ribosomal RNAs; k-mers; molecular phylogenetics; oligomers
Mesh:
Substances:
Year: 2014 PMID: 24572375 PMCID: PMC4008546 DOI: 10.4161/rna.27505
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652

Figure 1. Trees for 16S/18S rRNAs in the three-kingdom data set inferred via multiple sequence alignment of full-length rRNAs using MUSCLE and (A) RAxML or (B) MRBAYES; (C) computed via neighbor-joining from the similarity matrix in reference 74; (D) calculated via and neighbor-joining from our e-catalogs; and calculated via and neighbor-joining from k-mer spectra at (E) k = 6, (F) k = 8, (G) k = 12, or (H) k = 16. To facilitate comparison, all trees were rooted similarly (arbitrarily on archaea), except for (C) in which trees were rooted independently on archaea (left), bacteria (middle), and eukaryotes (right).

Figure 2. Trees for 16S rRNA in the proteobacterial data set inferred via multiple sequence alignment of full-length rRNAs using MUSCLE and (A) RAxML or (B) MRBAYES; (C) calculated via D S/2 and neighbor-joining from our e-catalogs; and calculated via D S/2 and neighbor-joining from k-mer spectra at (D) k = 6, (E) k = 8, (F) k = 12, or (G) k = 16. To facilitate comparison, all trees were rooted similarly on the 16S rRNA of the cyanobacterium Synechocystis.
Table 1. All 16S ribosomal rRNA sequences used in this study, their GenBank accession numbers, and their inclusion in our re-analysis of rRNAs from (A) three kingdoms74 and (B) proteobacteria (Fig. 4 of ref. 40). For proteobacteria in our analysis B, we identify class (α, β, γ, or Δ-proteobacteria).
| Source organism | GenBank accession | Analysis |
|---|---|---|
| X00686.1 | A | |
| V01335.1 | A | |
| AF207023.1 | A | |
| NR_074117.1 | A | |
| NR_074174.1 | A | |
| NR_074253.1 | A | |
| Methanothermobacter thermautotrophicus | NR_074260.1 | A |
| JQ282815 | A | |
| M62791 | A | |
| NR_103937.1 | A | |
| NC_010109.1* | A | |
| NR_074311.1 | A, B | |
| NR_029215.1 | B | |
| NR_074249.1 | B | |
| D14513.1 | B | |
| AF155147.1 | B | |
| NR_036778.1 | B | |
| NR_102804.1 | A, B | |
| NR_074199.1 | B | |
| NR_074828.1 | B |
*, positions 106162–107648.
Table 2. Numbers of unique oligonucleotides in empirical 16S rRNA catalogs, and of k-mers in e-catalogs. Escherichia coli empirical catalog from Uchida et al. as corrected by Magrum et al., and Methanobacterium ruminantium M-1 (later renamed Methanobrevibacter ruminantium M1) empirical catalog from Fox, et al. For the calculation of matching, modifications of bases are ignored and ambiguities are resolved favorably.
| Oligomer length or | ||||||
|---|---|---|---|---|---|---|
| empirical | e-catalog | match | empirical | e-catalog | match | |
| 6a | 21b | 21 | 21 | 22 | 22 | 20 |
| 7 | 17 | 16 | 16 | 15 | 16 | 13 |
| 8 | 10 | 11 | 10 | 14 | 15 | 13 |
| 9 | 13 | 12 | 12 | 10 | 9 | 8 |
| ≥10 | 11 | 13 | 10 | 11 | 12 | 10 |
| Total | 72 | 73 | 69 | 72 | 74 | 64 |
a Includes the 5′ termimus. bUchida et al. report one 6-mer sequence twice, once as unmodified and once as modified; for the purposes of this table we count them once.
Table 3. Nucleotide coverage of full-length 16S rRNA sequence by oligonucleotides in empirical catalogs, and k-mers in e-catalogs, of Escherichia coli and Methanobacterium ruminantium M-1 (Methanobrevibacter ruminantium M1). For catalogs, see Table S1. Multiple (non-unique) instances are counted (note that Fox et al. do not report multiple occurrences, which in any case were rare for oligonucleotides ≥ 6). Full-length sequences are NR_102904.1 and NR_074117.1, respectively.
| 16S rRNA source | Number (empirical) | Coverage (%) | Number (e-catalog) | Coverage (%) |
|---|---|---|---|---|
| 584/1542 | 37.9 | 590/1542 | 38.3 | |
| 572/1436 | 39.8 | 601/1436 | 41.9 |