| Literature DB >> 34828348 |
Luc Cornet1, Anne-Catherine Ahn2, Annick Wilmotte2, Denis Baurain3.
Abstract
The continuous increase in sequenced genomes in public repositories makes the choice of interesting bacterial strains for future sequencing projects ever more complicated, as it is difficult to estimate the redundancy between these strains and the already available genomes. Therefore, we developed the Nextflow workflow "ORPER", for "ORganism PlacER", containerized in Singularity, which allows the determination the phylogenetic position of a collection of organisms in the genomic landscape. ORPER constrains the phylogenetic placement of SSU (16S) rRNA sequences in a multilocus reference tree based on ribosomal protein genes extracted from public genomes. We demonstrate the utility of ORPER on the Cyanobacteria phylum, by placing 152 strains of the BCCM/ULC collection.Entities:
Keywords: SSU (16S) rRNA; cyanobacteria; phylogenomics; ribosomal proteins; sequencing; workflow
Mesh:
Substances:
Year: 2021 PMID: 34828348 PMCID: PMC8623055 DOI: 10.3390/genes12111741
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Overview of ORPER workflow. Users should specify at least four pieces of information to run ORPER: (i) their SSU (16S) rRNA sequences, (ii) the taxon of interest, (iii) the outgroup of the phylogeny and (iv) the taxonomic level (Green part). Yellow boxes are mandatory steps of ORPER whereas grey boxes are optional steps. Contamination estimation, SSU rRNA prediction and filtration are performed twice, once for the reference group and once for the outgroup.
Figure 2Constrained cyanobacterial phylogenetic tree of the BCCM/ULC collection. The tree is the output of ORPER, a Maximum-likelihood constrained inference computed under the GTRGAMMA model. Clades correspond to the groups defined in Moore et al. (2019) [9]. Clades 10 and 11 have been divided into two sub-clades, adding, respectively “Non-Nostocales” and “Unicellular” sub-clades to Moore et al.’s phylogeny. Blue dots indicate ULC/BCCM strains. The clade absent from Moore et al.’s phylogeny is indicated as “Missing Clade”.