| Literature DB >> 26502935 |
Philippe Lopez1, Sébastien Halary2, Eric Bapteste3.
Abstract
BACKGROUND: Microbial genetic diversity is often investigated via the comparison of relatively similar 16S molecules through multiple alignments between reference sequences and novel environmental samples using phylogenetic trees, direct BLAST matches, or phylotypes counts. However, are we missing novel lineages in the microbial dark universe by relying on standard phylogenetic and BLAST methods? If so, how can we probe that universe using alternative approaches? We performed a novel type of multi-marker analysis of genetic diversity exploiting the topology of inclusive sequence similarity networks.Entities:
Mesh:
Year: 2015 PMID: 26502935 PMCID: PMC4624368 DOI: 10.1186/s13062-015-0092-3
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Main annotated functions of the gene families for which environmental homologs were identified
| Main annotated functions | Samples with (#) env. homologs |
|---|---|
| maltose ABC transporter, periplasmic maltose-binding protein | HMQ(127); JM(6) |
| rhomboid family protein | HMQ(907); AA(95); HA(33); JM(29); BB(18); WL(6) |
| phage shock protein A, PspA | HMQ(111); JM(10); BB(3); AA(3) |
| segregation and condensation protein B | HMQ(1083); HA(120); AA(117); JM(39); BB(30); WL(11) |
| V-type ATP synthase subunit I | HMQ(326); AA(16); JM(15) |
| protein of unknown function DUF192 | HMQ(94); AA(33); HA(23); BB(19); WL(10) |
| nucleotide kinase | HMQ(9); HA(3) |
| 30S ribosomal protein S8e | HMQ(7); AA(5) |
| homoserine kinase | HMQ(274); AA(77); HA(43); JM(17); BB(8) |
| DNA-directed RNA polymerase subunit L | HMQ(4); HA(3) |
| 30S ribosomal protein S24e | HMQ(6) |
| ribosomal biogenesis GTPase | HMQ(980); JM(32); HA(23); AA(5); WL(4); BB(3) |
| DNA-directed RNA polymerase I, II, and III, 7.3 kDa polypeptide | HMQ(291); AA(34); WL(5); JM(5) |
| protein of unknown function DUF167 | HMQ(94); AA(62); WL(18); JM(3) |
| CoA-substrate-specific enzyme activase | HMQ(919); JM(64); AA(9); WL(4) |
| RNA-binding protein | HMQ(717); JM(52); WL(12); HA(10); BB(8); AA(8) |
| hypothetical protein | HA(43); AA(29); BB(16) HMQ(9) |
| protein of unknown function DUF420 | AA(18); BB(8); WL(3) |
| twin arginine-targeting protein translocase | HMQ(344); HA(105); AA(105); BB(31); WL(20); JM(17); MB(8); SI(5); G(3) |
| protein of unknown function DUF502 | HA(87); AA(59) HMQ(47); BB(24); WL(15) |
| ribonuclease III | HMQ(1370); AA(549); HA(115); JM(66); BB(51); WL(6) |
| polyprenyltransferase | HMQ(44); AA(44); HA(32); BB(8); WL(7) |
| Rossmann fold nucleotide-binding protein | HMQ(1133); AA(203); HA(79); JM(46); BB(28); WL(18) |
| glutamyl-tRNA reductase | HMQ(299); HA(31); AA(31); BB(14); WL(7); JM(7) |
| 50S ribosomal protein L30P | AA(13) HMQ(12); HA(3) |
| 50S ribosomal protein L34e | HMQ(1); HA(1); AA(2) |
| 50S ribosomal protein L14e | AA(7) |
| Pre-mRNA processing ribonucleoprotein, binding region | HMQ(6) |
| like-Sm ribonucleoprotein, core | AA(23) HMQ(7); HA(4); BB(3) |
| 30S ribosomal protein S26e | HA(2) |
| 30S ribosomal protein S3Ae | HMQ(5) |
| 30S ribosomal protein S27e | AA(8) HMQ(4) |
| cobalamin 5'-phosphate synthase | HMQ(871); JM(30); AA(17); HA(15); WL(8); BB(3) |
| phenylalanyl-tRNA synthetase subunit alpha | HMQ(1698); AA(207); HA(127); JM(66); BB(48); WL(7) |
| cell division protein and ATP-dependent metalloprotease FtsH | HMQ(1707); HA(338); AA(335); JM(109); BB(79); WL(6) |
| 50S ribosomal protein L1P | HMQ(1231); AA(215); HA(146); JM(65); BB(43); WL(16) |
| AAA family ATPase, Cell Division Cycle CDC48 subfamily protein | HMQ(10) |
| methionyl-tRNA synthetase | HMQ(1390); AA(211); HA(103); JM(81); BB(35); WL(7); MB(5) |
| 50S ribosomal protein L2P | HMQ(815); AA(227); HA(147); JM(75); BB(63); WL(3) |
| 50S ribosomal protein L22P | HMQ(1765); AA(256); HA(150); JM(88); BB(53); WL(19); MB(3); SI(3) |
| inositol monophosphatase | HMQ(1349); HA(342); AA(307); BB(113); JM(55); WL(15) |
| chaperonin GroEL | HMQ(678); AA(191); HA(128); BB(67); JM(48); WL(5) |
| 50S ribosomal protein L13 | HMQ(1827); AA(286); HA(142); BB(63); JM(53); WL(8); MB(4); SI(4) |
| 50S ribosomal protein L5P | HMQ(2379); AA(520); HA(373); JM(150); BB(116); WL(25); MB(9) |
| 30S ribosomal protein S12 | HMQ(956); AA(239); HA(155); JM(61); BB(57); SI(20); WL(14); MB(5); G(3) |
| Bcr/CflA subfamily drug resistance transporter | HMQ(399); JM(19); WL(3) |
| DNA repair and recombination protein RadA | HMQ(3327); AA(372); HA(195); JM(109); BB(86); WL(21) |
| 50S ribosomal protein L6 | HMQ(1402); AA(239); HA(142); JM(73); BB(55); WL(16) |
| methionine aminopeptidase | HMQ(6396); AA(538); HA(316); JM(217); BB(184); WL(22); LM(3) |
| ribonuclease HII | HMQ(2476); AA(173); HA(130); JM(60); BB(43); WL(13) |
| 50S ribosomal protein L23 | HMQ(2229); AA(253); HA(158); JM(99); BB(46); WL(22); MB(14) |
| molybdenum cofactor biosynthesis protein A | HMQ(2024); AA(146); JM(62); HA(49); BB(40); WL(17) |
| 50S ribosomal protein L3P | HMQ(1404); AA(205); HA(103); BB(61); JM(61); WL(8); SI(3) |
| 50S ribosomal protein L18e | HMQ(10); AA(5); BB(3); HA(3) |
| tryptophanyl-tRNA synthetase | HMQ(1877); AA(213); HA(151); JM(52); BB(45); WL(9) |
| aspartyl-tRNA synthetase | HMQ(3159); AA(348); HA(225); JM(149); BB(95); WL(13) |
| nicotinate nucleotide adenylyltransferase | HMQ(2308); AA(170); HA(138); JM(73); BB(62); WL(15) |
| 30S ribosomal protein S17P | HMQ(1771); AA(264); HA(137); JM(79); BB(50); WL(23); MB(11); SI(10) |
| putative RNA methylase | HMQ(1410); HA(35); JM(34); AA(20); BB(6); WL(6) |
| translation-associated GTPase | HMQ(2629); AA(316); HA(247); JM(97); BB(65); WL(11) |
| 50S ribosomal protein L29 | HMQ(1969); AA(219); JM(82); HA(48); BB(24); SI(22); MB(16); WL(12) |
| appr-1-p processing domain-containing protein | HMQ(805); JM(55); AA(20); WL(5); BB(3) |
| 30S ribosomal protein S7 | HMQ(1095); AA(313); HA(165); BB(67); JM(67); SI(10); WL(6); G(3) |
| TatD-related deoxyribonuclease | HMQ(3338); AA(182); HA(128); JM(86); BB(46); WL(10) ; OM(5) |
| mevalonate kinase | HMQ(1131); JM(45); AA(16); HA(6) |
| O-sialoglycoprotein endopeptidase | HMQ(2408); AA(213); HA(163); JM(99); BB(53); WL(5); LM(3) |
| MiaB-like tRNA modifying enzyme | HMQ(5237); AA(252); HA(197); JM(135); BB(88); WL(12) |
| 50S ribosomal protein L15P | HMQ(1798); AA(174); HA(133); JM(103); BB(56); WL(19); MB(4) |
| 6,7-dimethyl-8-ribityllumazine synthase | HMQ(1365); AA(180); HA(149); JM(50); BB(38); WL(25) |
| 30S ribosomal protein S5P | HMQ(3691); AA(416); HA(299); JM(154); BB(105); WL(23); MB(5) |
| carbohydrate kinase, YjeF related protein | HMQ(1547); JM(38); HA(27); AA(13); WL(8); BB(6) |
| ribose-phosphate pyrophosphokinase | HMQ(1807); AA(191); HA(158); JM(70); BB(39); WL(11) |
| replication factor C small subunit | HMQ(6239); AA(712); HA(225); JM(167); BB(86); WL(19); MB(5) |
| hydrolase | HMQ(6012); JM(207); HA(45); AA(22); BB(10) |
| tyrosyl-tRNA synthetase | HMQ(1123); AA(192); HA(128); JM(65); BB(55); WL(6) |
| 50S ribosomal protein L18P | HMQ(14) |
| Fmu (Sun) domain-containing protein | HMQ(2680); JM(76); AA(50); HA(47); BB(20); WL(12) |
| 30S ribosomal protein S2 | HMQ(834); AA(225); HA(130); JM(48); BB(44); WL(8) |
| leucyl-(isoleucyl-) and (valyl-)tRNA synthetase | HMQ(2653); AA(112); JM(78); HA(75); BB(51); WL(7) |
Environments of origins were HMQ: Human microbiome Qin2010; JM: Japanese Microbiome; AA: Antarctica Aquatic; HA: Hot Aloha; BB: Botany Bay; WL: Washington Lake; MB: Monterey Bay; SI: Sapelo Island; G: Glacier; LM: Lean Mice; OM: Obese Mice.
Fig. 1Distribution of identity percentage to closest published relative for environmental sequences. For each of the 131,162 environmental sequence retrieved by our protocol, the closest published relative is identified as the best BLAST hit against July 2013 release of nr NCBI database. Identity percentage between the two sequences, shown in abscissa, is computed as the hit coverage relatively to the smallest sequence multiplied by the BLAST identity percentage. The proportion of environmental sequences showing a given identity percentage is given in ordinate, in blue for sequences from human gut microbiome and in red for others. a) Class1 (direct link to cultivated hosts sequences) and class2 (indirect link) are cumulated. b) Class1 and Class2 are separated on top and bottom, respectively
Fig. 2Sequence similarity networks for cultivated hosts sequences and their associated environmental sequences. Each node corresponds to a sequence. Two nodes are connected when they share > = 30 % identity, for a hit covering > = 80 % of both of their lengths, with a BLAST score < 1e-5. Sequences are yellow for Archaea, green for Bacteria, orange for environmental homologs whose identity to their closest published relative is lower than 60 %, and grey for environmental homologs whose identity to their closest published relative is higher than 60 %. Left: DUF167 protein, right: cobalamine phosphate synthase
Fig. 3Other sequence similarity networks for cultivated hosts sequences and their associated environmental sequences. The color code is the same as in Fig 2. Left: metalloendoprotease, right: ribosomal protein RPL23/25
Fig. 4Distribution of a) Average Amino Acid Identity and b) Average dN/dS for cliques of highly divergent environmental sequences. A total of 569 cliques (totally connected subnetworks) of highly divergent environmental sequences (whose identity to CPR is lower than 60 %) was identified. Sequences from each of these cliques were aligned, and then their average amino acid identity and dN/dS were computed, and shown in abscissa. Proportion of corresponding cliques is given in ordinate
Fig. 5A phylogeny-based illustration of the actual divergence of some of the environmental homologs detected in our networks. Maximum likelihood phylogenetic trees were based on sequences of archaeal (yellow), bacterial (green), eukaryotes (red) and a subset of the alignable positions of some environmental homologs (purple for gut sequences, pink for other environments), extracted using a maximal clique search. Three trees presenting a remarkable pattern are shown here. Left: cobalamine phosphate synthase (27 alignable positions), middle: RPL29 (52 alignable positions), right: metalloendoprotease (10 alignable positions). A scale bar for branch lengths (number of substitutions per site) is given on bottom left