| Literature DB >> 30733625 |
Muhammad Tariq Pervez1, Hayat Ali Shah2, Masroor Ellahi Babar3, Nasir Naveed2, Muhammad Shoaib4.
Abstract
Simulated alignments are alternatives to manually constructed multiple sequence alignments for evaluating performance of multiple sequence alignment tools. The importance of simulated sequences is recognized because their true evolutionary history is known, which is very helpful for reconstructing accurate phylogenetic trees and alignments. However, generating simulated alignments require expertise to use bioinformatics tools and consume several hours for reconstructing even a few hundreds of simulated sequences. It becomes a tedious job for an end user who needs a few datasets of variety of simulated sequences. Currently, there is no databank available which may help researchers to download simulated sequences/alignments for their study. Major focus of our study was to develop a database of simulated protein sequences (SAliBASE) based on different varying parameters such as insertion rate, deletion rate, sequence length, number of sequences, and indel size. Each dataset has corresponding alignment as well. This repository is very useful for evaluating multiple alignment methods.Entities:
Keywords: SAliBASE; simulated alignment; true alignment
Year: 2019 PMID: 30733625 PMCID: PMC6343434 DOI: 10.1177/1176934318821080
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Parameters used in 5 sets of simulated alignments.
| Varying deletion rate | Varying insertion rate | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sequence length | Indel size | Insertion rate |
| Number of sequences in each alignment | Sequence length | Indel size |
| Deletion rate | Number of sequences in each alignment | |
| 1000 | 20 | 0.000002 | 0.000002-0.1 | 100 | 1000 | 20 | 0.000002-0.1 | 0.000002 | 100 | |
| Varying indel sizes | Varying sequence lengths | |||||||||
| Sequence length |
| Insertion rate | Deletion rate | Number of sequences in each alignment |
| Indel size | Insertion rate | Deletion rate | Number of sequences in each alignment | |
| 15 000 | 100-5000 | 0.000002 | 0.000002 | 100 | 1000-20 800 | 20 | 0.000002 | 0.000002 | 100 | |
| Varying number of sequences | ||||||||||
| Sequence length | Indel size | Insertion rate | Deletion rate |
| ||||||
| 500 | 20 | 0.000002 | 0.000002 | 100-100 000 | ||||||
In each of 5 sets, 4 parameters were kept constant and 1 was varying (given in boldface).
Figure 1.The steps to generate simulated datasets.
Figure 2.(A) Command used to generate tree in R and (B) the command used to generate simulated sequences in indel-seq-gen.
Figure 3.Online interface of SAliBASE which shows links for downloading various datasets.