| Literature DB >> 32645885 |
Daniela Sánchez-Soto1, Guillermin Agüero-Chapin2,3, Vinicio Armijos-Jaramillo4, Yunierkis Perez-Castillo5, Eduardo Tejera4, Agostinho Antunes2,3, Aminael Sánchez-Rodríguez1.
Abstract
Horizontal gene transfer (HGT) plays an important role for evolutionary innovations within prokaryotic communities and is a crucial event for their survival. Several computational approaches have arisen to identify HGT events in recipient genomes. However, this has been proven to be a complex task due to the generation of a great number of false positives and the prediction disagreement among the existing methods. Phylogenetic reconstruction methods turned out to be the most reliable ones, but they are not extensible to all genes/species and are computationally demanding when dealing with large datasets. In contrast, the so-called surrogate methods that use heuristic solutions either based on nucleotide composition patterns or phyletic distribution of BLAST hits can be applied easily to the genomic scale, but they fail in identifying common HGT events. Here, we present ShadowCaster, a hybrid approach that sequentially combines nucleotide composition-based predictions by support vector machines (SVMs) under the shadow of phylogenetic models independent of tree reconstruction, to improve the detection of HGT events in prokaryotes. ShadowCaster successfully predicted close and distant HGT events in both artificial and bacterial genomes. ShadowCaster detected HGT related to heavy metal resistance in the genome of Rhodanobacter denitrificans with higher accuracy than the most popular state-of-the-art computational approaches, encompassing most of the predicted cases made by other methods. ShadowCaster is released at the GitHub platform as an open-source software under the GPLv3 license.Entities:
Keywords: horizontal gene transfer; hybrid approach; implicit phylogenetic model; parametric method
Year: 2020 PMID: 32645885 PMCID: PMC7397055 DOI: 10.3390/genes11070756
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
State-of-the-art computational approaches for horizontal gene transfer detection with emphasis in prokaryotic genomes.
| Classification | Implementation | Methodological Highlights | Application Domain | Reference |
|---|---|---|---|---|
|
| ||||
|
| Alien Hunter ( | Uses Interpolated Variable Order Motifs (IVOMs) coupled to a Hidden Markov Model (HMM) to detect alien (atypical genes). | bacterial genomes | [ |
| No implementation available | Detects atypical genes based on | viral, archaeal and bacterial genomes | [ | |
| No implementation available | Combines two compositional features using a Kullback–Leibler divergence metric to improve the detection of atypical genes. | artificial genomes | [ | |
| GOHTAM ( | Uses a Jensen-Shannon divergence metric from window or gene-based signature data to detect atypical genes. | prokaryotic and eukaryotic genomes | [ | |
| No implementation available | Detects atypical genes based on the selection of nine compositional features using a SVM. | bacterial genomes | [ | |
|
| No implementation available | Implements a multiple-threshold approach to detect atypical genes from compositional features and genomic context information to reduce the chance of misclassification. | artificial genomes | [ |
|
| ||||
|
| DarkHorse ( | Calculates a lineage probability index from BLAST searches to predict atypical genes. | prokaryotic and eukaryotic genomes. | [ |
| HGTFinder ( | Calculates a horizontal transfer index from BLAST searches to predict atypical genes. | prokaryotic and eukaryotic genomes. | [ | |
| HGTector ( | Establishes statistical thresholds to detect genes that do not adhere to a priori defined hierarchical evolutionary categories inferred from BLAST searches. | artificial, prokaryotic and eukaryotic genomes. | [ | |
|
| ||||
|
| ShadowCaster | See further | prokaryotic genomes | This work |
Figure 1Graphical representation of the conceptual model behind ShadowCaster and of its main outputs. (A) Probability of true orthology between three gene pairs shared by a recipient species and three phylogenetically related donor species, respectively (species tree in the left-hand side). True orthology probability values (P0) are sampled from probability distributions according to: vertical inheritance (phylogenetic shadowing model defined by the number of orthologs sharing recipient-donor species at different phylogenetic distances, green panel); and lateral inheritance (horizonal gene transfer (HGT)) model, gradient color curves from orange to deep red represent the P0 distribution in true HGT events occurring at different phylogenetic distance along the species tree). P0 decreases in both vertical and lateral inheritances with the increase of the phylogenetic distance, however P0 distribution (curves) is different between them, especially for medium and far distances. (B) The distribution/separation of typical (in grey color) and atypical (in red and green color) genes achieved by the parametric component of ShadowCaster (4-mer frequency and codon usage). (C) Log-likelihoods for all atypical genes detected by the parametric component in a given recipient genome. Log-likelihoods and P0 are related by Equation (2).
Figure 2True positive (TP) and False positive (FP) rate curves for showing the performance of ShadowCaster according to the two user defined parameters: nu and number of proteomes. TP and FP rate curves corresponding to the nu parameter are shown in (A,B) while (C,D) display the same curves influenced by the number of proteomes.
Figure 3Venn diagram illustrating the HGT events detected by three of the state-of-the-art computational tools (AlienHunter, DarkHorse and HGTector), and by the presented methodology ShadowCaster in the genome of Rhodanobacter denitrificans 2APBS1. HGT predictions performed by each tool is framed inside a coloured ellipse. All HGT detections are shown for each tool (black numbers inside each ellipse): AlienHunter (571), DarkHorse (1007), HGTector (649) and ShadowCaster (940) while HGT events related to heavy metal resistance are labelled in bold numbers: AlienHunter (27), DarkHorse (21), HGTector (2) and ShadowCaster (29). Elapsed time during the HGT detections by AlienHunter (31 min 43 s), DarkHorse (93 min 57 s), HGTector (66 min 03 s) and ShadowCaster (103 min 02 s).
Figure 4Gene Ontology (GO) Enrichment Analysis. Distribution of GO-InterPro terms exhibiting statistical significance difference (Fisher Exact Test, filtering p-values for multiple testing using False Discovery Rate) for all HGT detections performed by ShadowCaster (A), AlienHunter (B), DarkHorse (C) and HGTector (D) in the genome of Rhodanobacter denitrificans 2APBS1. InterPro categories highlighted in blue are explicitly related to heavy-metal metabolism. The analysis was conducted using the Blast2GO PRO version.