| Literature DB >> 26578597 |
Hyeonsoo Jeong1, Samsun Sung2, Taehyung Kwon3, Minseok Seo4, Kelsey Caetano-Anollés5, Sang Ho Choi6, Seoae Cho2, Arshan Nasir7, Heebal Kim8.
Abstract
The HGTree database provides putative genome-wide horizontal gene transfer (HGT) information for 2472 completely sequenced prokaryotic genomes. This task is accomplished by reconstructing approximate maximum likelihood phylogenetic trees for each orthologous gene and corresponding 16S rRNA reference species sets and then reconciling the two trees under parsimony framework. The tree reconciliation method is generally considered to be a reliable way to detect HGT events but its practical use has remained limited because the method is computationally intensive and conceptually challenging. In this regard, HGTree (http://hgtree.snu.ac.kr) represents a useful addition to the biological community and enables quick and easy retrieval of information for HGT-acquired genes to better understand microbial taxonomy and evolution. The database is freely available and can be easily scaled and updated to keep pace with the rapid rise in genomic information.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26578597 PMCID: PMC4702880 DOI: 10.1093/nar/gkv1245
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Workflow of the HGTree analysis pipeline.(A) HGT-detection in prokaryotic genomes. (B) Pipeline to process user gene and genome data. See Materials and Methods and main text for detailed description and filtering criteria.
Summary statistics
| Type | Number of records |
|---|---|
| Total non-redundant microbial genomes | 2472a |
| Genomes part of human microbiota | 30 |
| Total protein sequences | 7 748 306 |
| Number of orthologous gene sets | 154 805 |
| Detected putative HGT events | 660 840 |
a156 Archaea and 2316 Bacteria.
Processing time required for genomes of varying sizes.
| Genome | GSa (Mb) | NPb | NJc (min) | MLd (min) |
|---|---|---|---|---|
| 0.11 | 137 | 1.69 | 1.67 | |
| 1.01 | 753 | 4.43 | 4.71 | |
| 1.18 | 972 | 12.74 | 23.23 | |
| 1.58 | 1206 | 12.30 | 35.20 | |
| 1.93 | 1530 | 16.92 | 30.92 | |
| 2.06 | 1750 | 17.06 | 39.93 | |
| 2.37 | 1953 | 19.16 | 44.75 | |
| 2.49 | 2298 | 26.58 | 83.07 | |
| 2.82 | 2775 | 20.85 | 55.80 | |
| 3.05 | 2559 | 41.78 | 95.82 | |
| 3.4 | 2943 | 35.14 | 68.03 | |
| 3.6 | 3197 | 25.01 | 43.03 | |
| 4.11 | 3770 | 36.64 | 77.86 | |
| 4.44 | 3800 | 43.28 | 109.25 | |
| 4.85 | 4354 | 46.68 | 79.82 | |
| 5.37 | 4660 | 54.64 | 124.01 | |
| 7.97 | 6003 | 43.56 | 89.64 | |
| 9.03 | 7136 | 50.75 | 89.85 | |
| 10.35 | 7949 | 55.71 | 88.94 | |
| 13.03 | 9445 | 56.69 | 87.51 |
aGenome Size.
bNumber of protein coding sequences.
cProcessing time using NJ.
dProcessing time using ML.
Figure 2.Screenshots of HGT Browser functionality in HGTree. (A) Users can either search for their genome of interest or navigate through the ‘Taxonomic Tree’. Upon selection of genome(s), list of HGT-related genes are displayed at the bottom. (B) Tables display basic information about all genes that have participated in HGT events. (C) Plots display donors and recipient genomes in each HGT event, as well as both gene and species trees. (D) Users can query their gene or genome sequences against our servers to identify HGT-related genes in their data.
Figure 3.Microbial genomes as viewed by HGTree. (A) Each triangle in the scatter-plot represents one microbial genome. The fitted regression line (blue) (y = −44.31 + 0.33X; R2 = 0.81) describes a linear relationship between the number of HGT-related genes and the total number of genes in each genome. The gray area around the regression line indicates standard error. The red-dotted line excludes organisms that fall in the upper and lower 5% percentiles of HGT-index. (B) Boxplots show the distribution of HGT-index values for organisms in each major microbial phylum in our data set. The horizontal red line represents the global median HGT-index value (0.3). Phyla are sorted in descending order based on their median HGT-index. Numbers in parenthesis indicate total number of genomes sampled for each phylum/group.