| Literature DB >> 32427907 |
Francesco Asnicar1, Andrew Maltez Thomas1, Francesco Beghini1, Claudia Mengoni1, Serena Manara1, Paolo Manghi1, Qiyun Zhu2, Mattia Bolzan1,3, Fabio Cumbo1, Uyen May4, Jon G Sanders2,5, Moreno Zolfo1, Evguenia Kopylova2,6, Edoardo Pasolli1,7, Rob Knight2,8,9,10, Siavash Mirarab4, Curtis Huttenhower11,12, Nicola Segata13.
Abstract
Microbial genomes are available at an ever-increasing pace, as cultivation and sequencing become cheaper and obtaining metagenome-assembled genomes (MAGs) becomes more effective. Phylogenetic placement methods to contextualize hundreds of thousands of genomes must thus be efficiently scalable and sensitive from closely related strains to divergent phyla. We present PhyloPhlAn 3.0, an accurate, rapid, and easy-to-use method for large-scale microbial genome characterization and phylogenetic analysis at multiple levels of resolution. PhyloPhlAn 3.0 can assign genomes from isolate sequencing or MAGs to species-level genome bins built from >230,000 publically available sequences. For individual clades of interest, it reconstructs strain-level phylogenies from among the closest species using clade-specific maximally informative markers. At the other extreme of resolution, it scales to large phylogenies comprising >17,000 microbial species. Examples including Staphylococcus aureus isolates, gut metagenomes, and meta-analyses demonstrate the ability of PhyloPhlAn 3.0 to support genomic and metagenomic analyses.Entities:
Mesh:
Year: 2020 PMID: 32427907 PMCID: PMC7237447 DOI: 10.1038/s41467-020-16366-7
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1PhyloPhlAn 3.0 phylogenetically places microbial isolate or metagenomic assemblies.
PhyloPhlAn 3.0 provides strain-to-phylum level phylogenies built from newly generated microbial genomes (isolate or metagenomic assemblies) in the context of over 80,000 existing isolate genomes and 150,000 metagenomic assemblies. It automatically selects the most informative loci on a clade-specific basis, handles incomplete or fragmented assemblies, and can be configured to provide the resulting multiple-sequence alignment, estimated mutation rates (optionally), and phylogenetic tree.
Fig. 2Accurate reconstruction of Staphylococcus aureus phylogenies using PhyloPhlAn 3.0.
a Phylogenetic tree of 135 S. aureus strains from a pediatric hospital[36] reconstructed by PhyloPhlAn 3.0 using 2127 automatically identified core genes (rendered by GraPhlAn[39] see Supplementary Fig. 2 for a full comparison). Green circles represent the methicillin-sensitive S. aureus (MSSA), while red circles represent methicillin-resistant S. aureus (MRSA). Blue circles internal to the phylogeny identify subtrees with bootstrap >80%. b Normalized phylogenetic distances in the PhyloPhlAn 3.0-reconstructed tree and in a manually curated phylogeny from ref. [36] highlighting strong consistency between the automated PhyloPhlAn 3.0 results and the curated tree (0.992 Pearsonʼs correlation coefficient). c Multidimensional scaling ordination of pairwise phylogenetic distances from the tree integrating the 135 S. aureus isolates (crosses) with 1000 automatically selected S. aureus reference genomes (circles, Supplementary Fig. 1). The ten most prevalent sequence types (STs)[23] are highlighted in different colors.
Fig. 3Phylogenetic analysis of MAGs from 50 rural Ethiopian metagenomes.
a Occurrence of the 20 most prevalent SGBs among 50 previously sequenced Ethiopian gut metagenomes highlights the presence of many previously identified but largely uncharacterized species-level genome bins (uSGBs) and the identification of few additional MAGs (unassigned) that are not recapitulated in any already defined SGB. The presence/absence profiles are clustered using average linkage with Euclidean distances. b Multidimensional scaling ordination using the t-SNE algorithm on phylogenetic distances from PhyloPhlAn 3.0's tree of eight Ethiopian E. coli MAGs (kSGB 10068) integrated with 200 automatically selected E. coli reference genomes using 3246 UniRef90 gene families for phylogenetic reconstruction. c PhyloPhlAn 3.0 phylogeny of Ethiopian MAGs assigned to uSGB ID 19436 including all reference genomes for the closest phyla (589 in total) according to the prokaryotes tree-of-life in Fig. 4. Phylogeny reconstruction used 400 universal markers selected by PhyloPhlAn 3.0 for deep-branching phylogenies. Portions of the tree collapsed are labeled and numbers in parentheses represent the number of genomes in the collapsed subtrees. Uncollapsed phylogeny is available in Supplementary Fig. 4.
Fig. 4PhyloPhlAn 3.0 microbial tree-of-life with 17,672 species-representative genomes from 51 known and 84 candidate phyla.
With 17,672 species-dereplicated isolate genomes and MAGs as input (see “Methods“), PhyloPhlAn 3.0 used 400 optimized universal marker sequences to produce a pan-microbial phylogeny in approximately 10 days (~24,000 CPU-hours on 100 parallel cores). The underlying multiple-sequence alignment comprised 4522 amino acid positions from among 1,872,710 in the untrimmed concatenated marker alignments.