| Literature DB >> 33093185 |
Laura G Macías1,2, Eladio Barrio1,2, Christina Toft3.
Abstract
One of the most widely used programs for detecting positive selection, at the molecular level, is the program codeml, which is implemented in the Phylogenetic Analysis by Maximum Likelihood (PAML) package. However, it has a limitation when it comes to genome-wide studies, as it runs on a gene-by-gene basis. Furthermore, the size of such studies will depend on the number of orthologous genes the genomes have income and these are often restricted to only account for instances where a one-to-one relationship is observed between the genomes. In this work, we present GWideCodeML, a Python package, which runs a genome-wide codeml with the option of parallelization. To maximize the number of analyzed genes, the package allows for a variable number of taxa in the alignments and will automatically prune the topology to fit each of them, before running codeml.Entities:
Keywords: Comparative genomics; Genome analysis; Molecular evolution; Positive selection; Protein sequence analysis; Python
Year: 2020 PMID: 33093185 PMCID: PMC7718741 DOI: 10.1534/g3.120.401874
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Schematization of the GwideCodeML workflow divided in the three main parts: required input, preprocessing steps necessary for creating control codeml files necessary for testing both null and alternative hypotheses. The last part of the pipeline shows the output file obtained after running LRTs on the codeml results file. This output file contains all the candidate genes of being under positive selection.
Overlapping features between GWideCodeML and other bioinformatics tools. *1 Built-in models: site model (SM), branch model (BM), branch-site model (BSM). *2 PosiGene generates a new tree for each gene, where GWideCodeML prunes the provided species tree
| Feature | LMAP | POTION | PosiGene | GWideCodeML |
|---|---|---|---|---|
| Built-in models *1 | SM, BM, BSM | SM | BSM | SM, BM, BSM |
| Run costume models | Yes | — | — | Yes |
| Easy branch labeling | Yes | — | Yes | Yes |
| Automatic pruning *2 | — | — | Yes | Yes |
| Filter out low quality orthologs | — | Yes | Yes | Yes |
| Multithreading | Yes | Yes | Yes | Yes |
Case-study results. Number of detected genes under positive selection after running GwideCodeML twice, one for each branch, using the three built-in nested models. *1. In site models, there is no dN/dS ratio variation among branches, therefore, it was run once
| Model | Nested models (null | No. genes under positive selection in | No. genes under positive selection in |
|---|---|---|---|
| Branch | M0 | 83 | 31 |
| Branch-site | MAnull
| 137 | 96 |
| Site | M1a | 32*1 | 32*1 |