| Literature DB >> 29348708 |
Tomáš Fér1, Roswitha E Schmickl2.
Abstract
SUMMARY: Hybridization-based target enrichment in combination with genome skimming (Hyb-Seq) is becoming a standard method of phylogenomics. We developed HybPhyloMaker, a bioinformatics pipeline that performs target enrichment data analysis from raw reads to supermatrix-, supertree-, and multispecies coalescent-based species tree reconstruction. HybPhyloMaker is written in BASH and integrates common bioinformatics tools. It can be launched both locally and on a high-performance computer cluster. Compared with existing target enrichment data analysis pipelines, HybPhyloMaker offers the following main advantages: implementation of all steps of data analysis from raw reads to species tree reconstruction, calculation and summary of alignment and gene tree properties that assist the user in the selection of "quality-filtered" genes, implementation of several species tree reconstruction methods, and analysis of the coding regions of organellar genomes. AVAILABILITY: The HybPhyloMaker scripts, manual as well as a test data set, are available in https://github.com/tomas-fer/HybPhyloMaker/. HybPhyloMaker is licensed under open-source license GPL v.3 allowing further modifications.Entities:
Keywords: Target enrichment; genome skimming; locus selection; phylogenomics; species tree
Year: 2018 PMID: 29348708 PMCID: PMC5768271 DOI: 10.1177/1176934317742613
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1.HybPhyloMaker processing steps. Input data and intermediate results are displayed in white boxes, modification steps are shown in gray boxes. Each modification step is performed by a particular HybPhyloMaker script (small gray boxes).
List of software that must be installed/must be present on the local computer/cluster before running HybPhyloMaker.
| Software | Source | Install (yes/no) | Used command(s) | 0 | 1 | 2 | 3 | 4 | 4b | 5 | 6 | 7 | 8a | 8b | 8c | 8e | 8f | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample preparation | Raw data processing | Read mapping | Generate pslx | Process pslx | Correct frame, translate | Missing data handling | BUILD GENE TREES | Root GENE trees | ASTRAL | ASTRID | MRL | Concatenated FastTree | ExaML | Update | Collapse trees and select | ||||
| GNU parallel |
| y | parallel | X | x | ||||||||||||||
| Bowtie 2 |
| y | bowtie2-build, bowtie2 | x | x | ||||||||||||||
| BWA |
| y | bwa mem | x | |||||||||||||||
| SAMtools |
| y | samtools | x | x | ||||||||||||||
| bam2fastq |
| y | bam2fastq | x | |||||||||||||||
| Trimmomatic |
| n | java-jar trimmomatic-0.33.jar | x | |||||||||||||||
| FastUniq |
| y | fastuniq | x | |||||||||||||||
| JDK/JRE |
| y | java | x | x | x | x | x | |||||||||||
| OCOCO |
| y | ococo | x | |||||||||||||||
| Perl |
| y | perl | x | x | x | x | ||||||||||||
| BLAT suite |
| y | blat | x | |||||||||||||||
| MAFFT |
| y | mafft | x | |||||||||||||||
| Python |
| y | python | x | x | x | |||||||||||||
| Python3 |
| y | python3 | x | |||||||||||||||
| AMAS |
| n | python3 amas.py | x | x | x | x | x | x | x | x | ||||||||
| trimAl |
| y | trimal | x | |||||||||||||||
| MstatX |
| y | mstatx | x | |||||||||||||||
| FastTree |
| y | fasttree | x | x | ||||||||||||||
| Newick Utilities |
| y | nw_reroot, nw_topology | x | x | x | x | x | x | ||||||||||
| RAxML |
| y | raxmlHPC | x | x | x | |||||||||||||
| R |
| y | R | x | x | x | |||||||||||||
| ASTRAL |
| n | java-jar astral.4.11.1.jar | x | |||||||||||||||
| ASTRID |
| n | ASTRID | x | |||||||||||||||
| p4 |
| y | p4 | x | x | ||||||||||||||
| mrpmatrix |
| n | java-jar mrp.jar | x | |||||||||||||||
| ExaML |
| y | examl | x |
All the software could be automatically installed using the script “install_software.sh”. For each software, the source and the specific command for calling the software are provided, and it is indicated in which HybPhyloMaker script the particular software is used (HybPhyloMaker0-10). Software that needs to be installed/must be present on the computer/cluster is marked with “y”; if marked with “n”, it is provided with HybPhyloMaker and does not need to be installed.
Comparison of the performance of the three pipelines PHYLUCE, HybPiper, and HybPhyloMaker when processing 6 samples from the plant genus Oxalis[11].
| Name and code | No. of raw reads | PHYLUCE | HybPiper | HybPhyloMaker | ||
|---|---|---|---|---|---|---|
| No. (%) of recovered loci; no filtering against missing data | No. (%) of recovered loci; no filtering against missing data | No. (%) of recovered loci; ≥25% data completeness | No. (%) of putative paralogs; ≥25% data completeness | No. (%) of recovered loci; ≥25% data completeness | ||
| 1 905 062 | 43 (3.7) | 1102 (94.7) | 1080 (92.8) | 11 (0.9) | 1160 (99.7) | |
| 1 553 282 | 156 (13.4) | 1148 (98.6) | 1141 (98.0) | 14 (1.2) | 1161 (99.7) | |
| 1 306 633 | 125 (10.7) | 1147 (98.5) | 1139 (97.9) | 20 (1.7) | 1161 (99.7) | |
| 1 847 669 | 53 (4.6) | 1134 (97.4) | 1130 (97.1) | 15 (1.3) | 1161 (99.7) | |
| 1 785 030 | 84 (7.2) | 1118 (96.0) | 1108 (95.2) | 14 (1.2) | 1163 (99.9) | |
| 1 818 390 | 47 (4.0) | 994 (85.5) | 968 (83.2) | 5 (0.4) | 1161 (99.7) | |
The number and percentage of genes that were recovered (ie, with ≥25% completeness of each gene in case of HybPiper and HybPhyloMaker) and the number and percentage of putative paralogs in case of HybPiper are reported. Filtering against missing data was not performed in PHYLUCE, thereby the most conservative number and percentage of recovered genes are provided. Duplicate read removal was performed in case of HybPiper and HybPhyloMaker.
Comparison between the major steps of PHYLUCE, HybPiper, and HybPhyloMaker.
| Step | PHYLUCE | HybPiper | HybPhyloMaker |
|---|---|---|---|
| Download from Illumina BaseSpace | No | No | Yes |
| Input | Paired-end Illumina reads | Paired-end and single-end Illumina reads | Paired-end Illumina reads |
| Adapter trimming and quality filtering of reads | Yes | No | Yes |
| Duplicate read removal | No | Yes (Super deduper[ | Yes (FastUniq) |
| Assembly | De novo (Velvet; ABySS[ | De novo (SPAdes[ | Reference-guided (Bowtie 2/BWA; OCOCO/Kindel) |
| Identification of sequences that match to the targeted sequences | Done by matching contigs to the targeted sequences (as nucleotide sequences with LASTZ); after assembly | Before assembly: done by matching reads to the targeted sequences (as peptide sequences with BLASTX); as nucleotide sequences with BWA; | Done by matching contigs to the targeted sequences (as nucleotide sequences with BLAT); after assembly |
| Filtering against paralogs | Yes | Yes | No |
| Particularly suitable for exonic probe sequences | No | Yes | Yes |
| Extraction of flanking intronic regions | No | Yes | No |
| Missing data calculation | Yes | No | Yes |
| Calculation of alignment and gene tree properties | No | No | Yes |
| Flexible handling of excluding accessions and loci | Yes | No | Yes |
| Gene tree reconstruction | No | No | Yes (RAxML, FastTree) |
| Concatenation | Yes (ExaBayes[ | No | Yes (FastTree, ExaML[ |
| Species tree reconstruction | No | No | Yes (ASTRAL, ASTRID, MRL) |
| Organellar phylogeny | No | Yes (from coding sequences) |
Input file preparation.