| Literature DB >> 29134153 |
Weizhi Song1,2, Kerrin Steensen1,3, Torsten Thomas1,4.
Abstract
The development and application of metagenomic approaches have provided an opportunity to study and define horizontal gene transfer (HGT) on the level of microbial communities. However, no current metagenomic data simulation tools offers the option to introduce defined HGT within a microbial community. Here, we present HgtSIM, a pipeline to simulate HGT event among microbial community members with user-defined mutation levels. It was developed for testing and benchmarking pipelines for recovering HGTs from complex microbial datasets. HgtSIM is implemented in Python3 and is freely available at: https://github.com/songweizhi/HgtSIM.Entities:
Keywords: Bioinformatics; Horizontal gene transfer; Simulator
Year: 2017 PMID: 29134153 PMCID: PMC5681852 DOI: 10.7717/peerj.4015
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Mutation types of codons.
The changed bases are displayed in bold. The corresponding amino acid change is given in parenthesis. As the number of silent two- and three-bases mutations are low (1%) compared to non-silent mutations, we here combined them into the same categories. The start and stop codons were excluded when calculating the number of mutation types.
| Category | Mutation type | Example | Total number |
|---|---|---|---|
| One-base, silent | AT | 124 | |
| One-base, non-silent | 356 | ||
| Two bases, silent | 20 | ||
| Two bases, non-silent | C | 1,394 | |
| Three bases, silent | 12 | ||
| Three bases, non-silent | 1,400 |
Figure 1The workflow of HgtSIM.
The selected 20 genomes used in this study.
| Class | Strain | NCBI BioProject ID | Genome size (Mbp) |
|---|---|---|---|
|
| 3.58 | ||
|
| 2.64 | ||
|
| 3.74 | ||
|
| 5.91 | ||
|
| 4.04 | ||
|
| 4.30 | ||
|
| 3.98 | ||
|
| 3.35 | ||
|
| 4.54 | ||
|
| 3.74 | ||
|
| 4.76 | ||
|
| 3.63 | ||
|
| 3.02 | ||
|
| 5.26 | ||
|
| 3.04 | ||
|
| 3.88 | ||
|
| 2.41 | ||
|
| 2.99 | ||
|
| 2.86 | ||
|
| 4.16 |
Figure 2The correlation of mutation on the nucleotide level and the resulting aa changes under different mutation category ratios.
The four numbers separated by colon refer to the ratio between C1, C2, C3 and C4.
Figure 3The effect of assemble k-mer range on the recovery of HGT events.
Figure 4The total length (A), percentage of recovered sequences (B), contig number and N50 (C) of assembler produced assemblies. (D) Number of recovered transfers.
The lines showing the number of contigs and N50 of metaSPAdes produced assemblies with two different k-mer settings were overlapping in panel (C).
The effect of reads length and insert size on the recovery of 100 simulated HGT events.
| Reads length (bp) | 100 | 250 | |||||
| Insert size (bp) | 250 | 500 | 1,000 | 250 | 500 | 1,000 | |
| Recovered gene transfers | 69 | 55 | 63 | 15 | 23 | 51 | |
Figure 5The correlation between the length of gene transfers and their recovery rate.