| Literature DB >> 25987827 |
Surendra Kumar1, Anders K Krabberød1, Ralf S Neumann1, Katerina Michalickova2, Sen Zhao3, Xiaoli Zhang1, Kamran Shalchian-Tabrizi1.
Abstract
SUMMARY: We present a pipeline named BIR (Blast, Identify and Realign) developed for phylogenomic analyses. BIR is intended for the identification of gene sequences applicable for phylogenomic inference. The pipeline allows users to apply their own manually curated sequence alignments (seed) in search for homologous genes in sequence databases and available genomes. BIR automatically adds the identified sequences from these databases to the seed alignments and reconstruct a phylogenetic tree from each. The BIR pipeline is an efficient tool for the identification of orthologous gene copies because it expands user-defined sequence alignments and conducts massive parallel phylogenetic reconstruction. The application is also particularly useful for large-scale sequencing projects that require management of a large number of single-gene alignments for gene comparison, functional annotation, and evolutionary analyses. AVAILABILITY: The BIR user manual is available at http://www.bioportal.no/ and can be accessed through Lifeportal at https://lifeportal.uio.no. Access is free but requires a user account registration using the link "Register for BIR access" from the Lifeportal homepage.Entities:
Keywords: alignment construction; genomics; ortholog prediction; phylogenetics; phylogenomics; transcriptomics
Year: 2015 PMID: 25987827 PMCID: PMC4412416 DOI: 10.4137/EBO.S10189
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1Overview of steps in BIR pipeline. 1) The user provides a zipped file with the query sequences and another zipped file with the seed alignments. The sequence and alignments should be in FASTA format. Additionally, protein sequences from completely sequenced genomes (Table 1) can be added. Sequences from query files and selected reference genomes are added to the seed alignments with highest match using BLAST. 2) The modified seed alignments can be realigned using MAFFT. 3) Gblocks or trimAl can be used for removal of unambiguously aligned regions. 4) Phylogenetic trees can be inferred with FastTree or RaxML. 5) Paralog prediction is done by the COCO-CL program. Putative paralogs are marked in circles with a dashed line. The resulting phylogenetic trees can then further be assessed and interpreted using any tree-viewing software.
Completely sequenced genomes from the eukaryotic super groups available in the BIR pipeline.
| ORGANISM | SUPERGROUP | SIZE (MB) | GC% | #AA | BIOPROJECT |
|---|---|---|---|---|---|
| Plantae | 119.67 | 36.1 | 35375 | PRJNA116, PRJNA10719 | |
| SAR | 0.17 | 29.7 | 136 | PRJNA27939, PRJNA27935 | |
| Amoebozoa | 34.2 | 22.5 | 13315 | PRJNA13925, PRJNA201 | |
| Hacrobia | 0.3 | 29.2 | 309 | PRJNA210, PRJNA20389, PRJNA27847 | |
| Opisthokonta | 3224.46 | 41.7 | 34931 | PRJNA168, PRJNA31257 | |
| Opisthokonta | 38.73 | 54.8 | 9203 | PRJNA28133, PRJNA19045 | |
| Excavata | 36.3 | 33.1 | 15759 | PRJNA43691, PRJNA14010 | |
| SAR | 72.07 | 28.1 | 40043 | PRJNA19409, PRJNA18363 | |
| Opisthokonta | 12.16 | 38.2 | 5909 | PRJNA128, PRJNA13838, PRJNA43747 | |
| SAR | 32.44 | 46.9 | 11849 | PRJNA34119, PRJNA191 |
Note:
SAR = Stramenopila, Alveolata, Rhizaria. #AA = Number of protein sequences in each genome.