Literature DB >> 29348708

HybPhyloMaker: Target Enrichment Data Analysis From Raw Reads to Species Trees.

Tomáš Fér1, Roswitha E Schmickl2.   

Abstract

SUMMARY: Hybridization-based target enrichment in combination with genome skimming (Hyb-Seq) is becoming a standard method of phylogenomics. We developed HybPhyloMaker, a bioinformatics pipeline that performs target enrichment data analysis from raw reads to supermatrix-, supertree-, and multispecies coalescent-based species tree reconstruction. HybPhyloMaker is written in BASH and integrates common bioinformatics tools. It can be launched both locally and on a high-performance computer cluster. Compared with existing target enrichment data analysis pipelines, HybPhyloMaker offers the following main advantages: implementation of all steps of data analysis from raw reads to species tree reconstruction, calculation and summary of alignment and gene tree properties that assist the user in the selection of "quality-filtered" genes, implementation of several species tree reconstruction methods, and analysis of the coding regions of organellar genomes. AVAILABILITY: The HybPhyloMaker scripts, manual as well as a test data set, are available in https://github.com/tomas-fer/HybPhyloMaker/. HybPhyloMaker is licensed under open-source license GPL v.3 allowing further modifications.

Entities:  

Keywords:  Target enrichment; genome skimming; locus selection; phylogenomics; species tree

Year:  2018        PMID: 29348708      PMCID: PMC5768271          DOI: 10.1177/1176934317742613

Source DB:  PubMed          Journal:  Evol Bioinform Online        ISSN: 1176-9343            Impact factor:   1.625


Introduction

Hybridization-based target enrichment in combination with genome skimming (Hyb-Seq) is becoming a standard method of phylogenomics (in plants see, for example, the works by Mandel et al., Weitemier et al., and Nicholls et al.[1-3]; see also the works by Lemmon and Lemmon, Heyduk et al.[4,5] for a general overview of genome subsampling methods). Up to now, two well-documented data analysis pipelines have been published: PHYLUCE[6] and HybPiper[7]. PHYLUCE has been developed and optimized for working with ultraconserved elements (UCEs)[8,9], but it performs poorly in case of targeted sequences in the form of multiple exons per gene, which are common targets in plant phylogenetics due to the paucity of UCEs[10]; for locus selection in plants, see, for example, the works by Weitemier et al. and Nicholls et al.[2,11]. PHYLUCE applies a very stringent filter on potentially paralogous loci. This might result in a severe loss of loci in case one is working with multiple targeted exons per gene. The often multiple contigs per gene after de novo read assembly are interpreted by PHYLUCE as an indication of paralogy, and the respective loci are rejected from phylogenetic reconstruction. This can result in a dramatic decrease in potentially orthologous and phylogenetically informative data[2]. The alternative pipeline, HybPiper, is able to handle not only the exonic probe sequences but also the intronic flanking regions, and it identifies and separates putative paralogs. However, apart from the identification of putative paralogs, there are no further criteria for locus selection, and gene and species tree reconstruction as well as the reconstruction of organellar phylogenies are not part of HybPiper. Therefore, phylogeneticists using exonic probe sequences lack a straightforward and well-documented bioinformatics pipeline that performs target enrichment data analysis from raw reads to species trees, including quality filtering of raw reads, read assembly, alignment of loci, evaluation of missing data and phylogenetic utility of loci, phylogenetic reconstruction in form of gene/species trees and concatenation as well as phylogenetic reconstruction from organellar data. Especially plastid reads are often obtained in sufficient quantity as part of the off-target reads (eg, 2%[12]; 5%[13]). Incongruence between the nuclear and plastid trees often gives evidence of hybridization events, and both these data sets are usually used in phylogenetics. Here, we present our pipeline HybPhyloMaker, which carries out all of these tasks.

Implementation

HybPhyloMaker consists of 11 major BASH scripts (HybPhyloMaker0-10) that integrate common bioinformatics tools of high-throughput sequencing and phylogenomics. These scripts perform the various steps of data analysis as separate modules within a particular directory structure that is created by them. HybPhyloMaker has a command line interface and can be run both locally and on a high-performance computer cluster. The modular BASH scripts of HybPhyloMaker enable flexible use. All steps are described in detail in Figure 1.
Figure 1.

HybPhyloMaker processing steps. Input data and intermediate results are displayed in white boxes, modification steps are shown in gray boxes. Each modification step is performed by a particular HybPhyloMaker script (small gray boxes).

HybPhyloMaker processing steps. Input data and intermediate results are displayed in white boxes, modification steps are shown in gray boxes. Each modification step is performed by a particular HybPhyloMaker script (small gray boxes).

Data preparation for phylogenetic tree reconstruction

HybPhyloMaker requires two types of input files: (1) paired-end Illumina reads in form of two gzipped FASTQ files per sample and (2) sequences of the probes that were used for target enrichment (FASTA file). The script HybPhyloMaker0 prepares the raw reads for HybPhyloMaker use. It gives the paired-end raw reads a unique label, sorts them according to the HybPhyloMaker-specific directory structure, and creates the reference sequence (“pseudoreference”) for the subsequent reference-guided assembly of the enriched nuclear loci: the probe sequences are concatenated and separated by a string of several hundreds of Ns each (400 Ns are recommended for 2 × 150 bp [base pairs] reads). PhiX read removal, adapter trimming, quality filtering, and duplicate read removal are done with HybPhyloMaker1, using Bowtie 2[14], SAMtools[15], bam2fastq[16], Trimmomatic[17], and FastUniq[18]. In a subsequent step, reads are mapped to a “pseudoreference” that was created from the probe sequences with HybPhyloMaker0. Read mapping is performed with HybPhyloMaker2 using Bowtie 2 or BWA[19], and the consensus sequence is called either with OCOCO[20] or Kindel[21], which is also implemented in HybPhyloMaker2. The consensus sequence is called according to adjustable majority. It results in the reconstruction of the most abundant sequence, which is considered to be the ortholog, as paralogs are usually not enriched in similar quantities compared to orthologs due to a higher sequence dissimilarity to the probe sequences (see Supplement Figure 1). A similar approach was used in recent publications[3,13]. With HybPhyloMaker3, the consensus sequence is fragmented into the exonic parts, which will be called contigs hereafter, and those are matched to the probe sequences using BLAT (BLAST-like alignment tool)[22]. Exonic multiple sequence alignments are constructed with HybPhyloMaker4a, which uses the Python script “assembled_exons_to_fastas.py”[2]; if an exon is missing for a particular accession, Ns are added. Also with HybPhyloMaker4a, exons are aligned using MAFFT[23], and exons from the same gene are concatenated using the Perl script “catfasta2phyml.pl”[24]. Optionally, exon and gene alignments can be adjusted to correct the reading frame with HybPhyloMaker4b. This option later allows not only for per-exon but also for per-codon partitioning at the same time when gene trees are estimated. With HybPhyloMaker5, the amount of missing data is calculated, and accessions as well as loci that match a user-defined threshold of missing data are retained for further analysis. In a first step, accessions that equal or exceed the maximum allowed percentage of missing data per locus are omitted from the respective loci. Then, the number of remaining accessions per locus is calculated and those loci retained that exceed the minimum allowed percentage of accessions per locus. In addition, HybPhyloMaker5 uses AMAS[25], MstatX[26], and trimAl[27] to calculate summary statistics of properties of each locus alignment, which will in a subsequent step assist in a more stringent locus selection. Tables that summarize the amount of missing data within both the entire data set and the user-selected loci as well as histograms that show the distribution of alignment properties are provided.

Gene tree and species tree reconstruction

Gene trees are reconstructed with HybPhyloMaker6. FastTree[28] and RAxML[29] are the tree-building algorithms to choose from, both can be run with or without bootstrapping. FastTree is computationally less demanding than RAxML, but it tends to provide higher branch support values (based on the Shimodaira-Hasegawa test[28]) compared to bootstrapping in RAxML[30]. RAxML trees can be estimated from unpartitioned or partitioned (by exon or by codon position) data sets. In addition, HybPhyloMaker6 calculates summary statistics of properties of each gene tree, using the R script “tree_props.R” (modified from the work of Borowiec[31]) and the R packages ape[32] and seqinr[33]. Alignment summary statistics, which were inferred with HybPhyloMaker5, and gene tree summary statistics are combined, and correlations among all properties are calculated and visualized with the R script “plotting_correlations.R” (modified from the work of Borowiec[31]). Based on those summaries and correlations, the user can optimize phylogenetic reconstruction using “quality-filtered” genes with HybPhyloMaker9. Especially saturated genes (those deviating from simple linear regression on uncorrected p-distances against inferred distances[34]) should be omitted from downstream analysis. However, this step is optional and users must make themselves familiar with any steps that select particular genes before applying HybPhyloMaker9. With HybPhyloMaker7, all gene trees are combined into one file and the trees optionally rooted using Newick Utilities[35]. Users also have the possibility to collapse unsupported branches in gene trees by specifying a minimum support value for which the branch is kept and/or subselect trees containing selected samples using HybPhyloMaker10. Species trees are reconstructed with HybPhyloMaker8. There are several options: ASTRAL[36], ASTRID[37] (both coalescent summary methods), MRL[38] (supertree method using matrix representation with likelihood), and maximum likelihood implemented in FastTree and ExaML[39] (concatenation). In preparation of an ExaML run, the selected loci are concatenated, and gene partition information is provided by AMAS. Partition Finder 2[40,41] is used to find the optimal partitioning scheme.

Organellar reads

HybPhyloMaker also allows working with organellar reads that are often obtained in sufficient quantity as off-target reads (eg, 2%[12]; 5%[13]), ie, it is possible to work with organellar sequences even if one does not specifically target them. Such amount of organellar reads usually provides sufficient sequencing depth, especially for coding regions. For phylogenetic reconstruction based on organellar genomes, the user needs to provide sequences of the coding regions from the target group or from a closely related group. First, the organellar reads are extracted from the total read pool with HybPhyloMaker2 by mapping to an organellar “pseudoreference” (concatenated, coding organellar sequences that are separated by a string of several hundreds of Ns each; prepared using HybPhyloMaker0b). The resulting contigs are matched to the coding sequences with BLAT. The subsequent analysis follows the pipeline of enriched nuclear loci in most instances. Commands for processing organellar data are implemented in HybPhyloMaker2-10.

Computational implementation, performance, and pipeline comparison

HybPhyloMaker runs on major Linux distributions (Debian, Ubuntu, openSUSE, Fedora, CentOS, Scientific Linux) and on MacOS X. Automated installation of the numerous software packages that are required to run HybPhyloMaker (Table 1) is provided by the script “install_software.sh”; smaller scripts and utilities (Perl, Python, Java, and R) are provided with HybPhyloMaker. The cluster version of HybPhyloMaker was optimized on the Smithsonian Institution High Performance Cluster (SI/HPC) and the Czech National Grid Organization MetaCentrum NGI (http://metacentrum.cz/) but could easily be modified for running on any other computer cluster.
Table 1.

List of software that must be installed/must be present on the local computer/cluster before running HybPhyloMaker.

SoftwareSourceInstall (yes/no)Used command(s)012344b5678a8b8c8e8f910
Sample preparationRaw data processingRead mappingGenerate pslxProcess pslxCorrect frame, translateMissing data handlingBUILD GENE TREESRoot GENE treesASTRALASTRIDMRLConcatenated FastTreeExaMLUpdateCollapse trees and select
GNU parallel http://www.gnu.org/software/parallel/ yparallelXx
Bowtie 2 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml ybowtie2-build, bowtie2xx
BWA http://bio-bwa.sourceforge.net/ ybwa memx
SAMtools http://samtools.sourceforge.net/ ysamtoolsxx
bam2fastq https://gsl.hudsonalpha.org/information/software/bam2fastq/ ybam2fastqx
Trimmomatic http://www.usadellab.org/cms/?page=trimmomatic njava-jar trimmomatic-0.33.jarx
FastUniq https://sourceforge.net/projects/fastuniq/ yfastuniqx
JDK/JRE http://www.oracle.com/technetwork/java/javase/ yjavaxxxxx
OCOCO https://github.com/karel-brinda/ococo/ yococox
Perl https://www.perl.org/ yperlxxxx
BLAT suite http://genome.ucsc.edu/goldenPath/help/blatSpec.html yblatx
MAFFT http://mafft.cbrc.jp/alignment/software/ ymafftx
Python https://www.python.org/ ypythonxxx
Python3 https://www.python.org/download/releases/3.0/ ypython3x
AMAS https://github.com/marekborowiec/AMAS/ npython3 amas.pyxxxxxxxx
trimAl http://trimal.cgenomics.org/ ytrimalx
MstatX https://github.com/gcollet/MstatX/ ymstatxx
FastTree http://www.microbesonline.org/fasttree/ yfasttreexx
Newick Utilities http://cegg.unige.ch/newick_utils/ ynw_reroot, nw_topologyxxxxxx
RAxML https://sco.h-its.org/exelixis/web/software/raxml/ yraxmlHPCxxx
R https://www.r-project.org/ yRxxx
ASTRAL https://github.com/smirarab/ASTRAL/ njava-jar astral.4.11.1.jarx
ASTRID https://github.com/pranjalv123/ASTRID/ nASTRIDx
p4 http://p4.nhm.ac.uk/ yp4xx
mrpmatrix https://github.com/smirarab/mrpmatrix/ njava-jar mrp.jarx
ExaML https://sco.h-its.org/exelixis/web/software/examl/index.html yexamlx

All the software could be automatically installed using the script “install_software.sh”. For each software, the source and the specific command for calling the software are provided, and it is indicated in which HybPhyloMaker script the particular software is used (HybPhyloMaker0-10). Software that needs to be installed/must be present on the computer/cluster is marked with “y”; if marked with “n”, it is provided with HybPhyloMaker and does not need to be installed.

List of software that must be installed/must be present on the local computer/cluster before running HybPhyloMaker. All the software could be automatically installed using the script “install_software.sh”. For each software, the source and the specific command for calling the software are provided, and it is indicated in which HybPhyloMaker script the particular software is used (HybPhyloMaker0-10). Software that needs to be installed/must be present on the computer/cluster is marked with “y”; if marked with “n”, it is provided with HybPhyloMaker and does not need to be installed. We tested the performance of HybPhyloMaker using Hyb-Seq data sets from 6 samples of the plant genus Oxalis, each containing 1.3 to 1.9 million 2 × 150 bp raw reads. These Hyb-Seq libraries were enriched for 4,926 exons from 1,164 loci[11]. Run time, size of produced data files, and peak of RAM usage were recorded for each HybPhyloMaker script on a computer equipped with Intel Xeon E7-4860 CPU using 4 cores at 2.27 GHz and running CentOS 7.3.1611 (Supplement Table 1). In addition, we compared the number and percentage of mapped reads using Bowtie 2, BWA, and Geneious[42] (Supplement Table 2). Finally, we processed the same samples with HybPiper and PHYLUCE (Table 2). A direct comparison of steps within each of these pipelines (Table 3), regarding, eg, contig number, is not helpful in our opinion, due to different approaches and implementation of different software with noncomparable parameter settings in steps such as assembly (reference-guided versus de novo) and identification of contigs that match to the targeted sequences (as nucleotide sequences with BLAT [HybPhyloMaker], with exonerate[49] [HybPiper], and with LASTZ[50] [PHYLUCE]). We provide an approximate comparison of the three pipelines by recording the number of genes that were recovered (ie, with ≥25% completeness of each gene in case of HybPhyloMaker and HybPiper) and by indicating the number and percentage of putative paralogs in case of HybPiper and PHYLUCE. Filtering against missing data was not performed in PHYLUCE, thereby providing the most conservative number and percentage of recovered genes. Duplicate read removal was performed in case of HybPhyloMaker and HybPiper. In PHYLUCE, assembly of adapter- and quality-trimmed reads was performed with Velvet[51] using k-mer length k = 35. Matching of contigs to probe sequences was performed with 90% minimum sequence identity.
Table 2.

Comparison of the performance of the three pipelines PHYLUCE, HybPiper, and HybPhyloMaker when processing 6 samples from the plant genus Oxalis[11].

Name and codeNo. of raw readsPHYLUCEHybPiperHybPhyloMaker
No. (%) of recovered loci; no filtering against missing dataNo. (%) of recovered loci; no filtering against missing dataNo. (%) of recovered loci; ≥25% data completenessNo. (%) of putative paralogs; ≥25% data completenessNo. (%) of recovered loci; ≥25% data completeness
Oxalis blastorrhiza J5571 905 06243 (3.7)1102 (94.7)1080 (92.8)11 (0.9)1160 (99.7)
Oxalis creaseyii J11-9611 553 282156 (13.4)1148 (98.6)1141 (98.0)14 (1.2)1161 (99.7)
Oxalis gracilis J5581 306 633125 (10.7)1147 (98.5)1139 (97.9)20 (1.7)1161 (99.7)
Oxalis helicoides J3191 847 66953 (4.6)1134 (97.4)1130 (97.1)15 (1.3)1161 (99.7)
Oxalis inconspicua J5951 785 03084 (7.2)1118 (96.0)1108 (95.2)14 (1.2)1163 (99.9)
Oxalis polyphylla J11-441 818 39047 (4.0)994 (85.5)968 (83.2)5 (0.4)1161 (99.7)

The number and percentage of genes that were recovered (ie, with ≥25% completeness of each gene in case of HybPiper and HybPhyloMaker) and the number and percentage of putative paralogs in case of HybPiper are reported. Filtering against missing data was not performed in PHYLUCE, thereby the most conservative number and percentage of recovered genes are provided. Duplicate read removal was performed in case of HybPiper and HybPhyloMaker.

Table 3.

Comparison between the major steps of PHYLUCE, HybPiper, and HybPhyloMaker.

StepPHYLUCEHybPiperHybPhyloMaker
Download from Illumina BaseSpaceNoNoYes
InputPaired-end Illumina readsPaired-end and single-end Illumina readsPaired-end Illumina reads
Adapter trimming and quality filtering of readsYesIllumiprocessor[43]; pairs with both mates surviving and orphaned reads are usedNoAdapter trimming and quality filtering of reads need to be performed before using HybPiper; pairs with both mates surviving are used as input for HybPiperYesTrimmomatic; pairs with both mates surviving and orphaned reads are used
Duplicate read removalNoYes (Super deduper[44])Yes (FastUniq)
AssemblyDe novo (Velvet; ABySS[45]; Trinity[46])De novo (SPAdes[47])Reference-guided (Bowtie 2/BWA; OCOCO/Kindel)
Identification of sequences that match to the targeted sequencesDone by matching contigs to the targeted sequences (as nucleotide sequences with LASTZ); after assemblyBefore assembly: done by matching reads to the targeted sequences (as peptide sequences with BLASTX); as nucleotide sequences with BWA;After assembly: done by matching contigs to the targeted sequences with exonerateDone by matching contigs to the targeted sequences (as nucleotide sequences with BLAT); after assembly
Filtering against paralogsYesParalogy is indicated if a targeted locus matches multiple contigs or if a contig matches multiple targeted loci (the respective loci are excluded)YesParalogy is indicated if a targeted locus matches multiple long-length contigs (the respective loci are flagged); separation of putative paralogs possibleNoConsensus calling after the reference-guided assembly is according to majority; this results in the reconstruction of the most abundant sequence, which is considered to be the ortholog
Particularly suitable for exonic probe sequencesNoYesYes
Extraction of flanking intronic regionsNoYesNo
Missing data calculationYesNoYes
Calculation of alignment and gene tree propertiesNoNoYes
Flexible handling of excluding accessions and lociYesNoYes
Gene tree reconstructionNoNoYes (RAxML, FastTree)
ConcatenationYes (ExaBayes[a,48]; RAxML; ExaML[a])NoYes (FastTree, ExaML[a])
Species tree reconstructionNoNoYes (ASTRAL, ASTRID, MRL)
Organellar phylogenyNoYes (from coding sequences)

Input file preparation.

Comparison of the performance of the three pipelines PHYLUCE, HybPiper, and HybPhyloMaker when processing 6 samples from the plant genus Oxalis[11]. The number and percentage of genes that were recovered (ie, with ≥25% completeness of each gene in case of HybPiper and HybPhyloMaker) and the number and percentage of putative paralogs in case of HybPiper are reported. Filtering against missing data was not performed in PHYLUCE, thereby the most conservative number and percentage of recovered genes are provided. Duplicate read removal was performed in case of HybPiper and HybPhyloMaker. Comparison between the major steps of PHYLUCE, HybPiper, and HybPhyloMaker. Input file preparation.

Results and Discussion

Performance and pipeline comparison

Computer performance of HybPhyloMaker is summarized in Supplement Table 1. The most time-consuming steps are read mapping and consensus calling, reconstruction of RAxML gene trees, and ExaML analysis of the concatenated and partitioned data set. The most RAM memory-demanding step is phylogenetic tree reconstruction based on the concatenated data set (both FastTree and ExaML). The largest files are FASTQ files that are generated during raw read processing and BAM files obtained in the step of read mapping. Geneious performed best among the three implemented mapping software: BWA and Bowtie 2 mapped 78% to 93% and 67% to 80% of reads that were mapped by Geneious, respectively. HybPhyloMaker is the first data analysis pipeline for hybridization-based target enrichment data that are generated with exonic probe sequences, which performs all relevant steps from raw reads to species and organellar trees. Two alternative, well-documented data analysis pipelines are available, PHYLUCE and HybPiper, and a detailed comparison of the steps of these two pipelines with HybPhyloMaker is provided in Table 3. Major differences between them are as follows: (1) The assembly strategy (de novo in PHYLUCE and HybPiper or reference-guided in HybPhyloMaker): both assembly strategies allow for the assembly of both exonic and intronic regions. In HybPhyloMaker, this is due to the use of a reference sequence that is built from the concatenated exonic probe sequences, which are separated by a string of several hundreds of Ns each. (2) Paralog identification: both PHYLUCE and HybPiper detect putatively paralogous loci, which are either excluded from subsequent analyses (PHYLUCE) or flagged (HybPiper). In HybPhyloMaker, an adjustable majority consensus sequence is obtained. This results in the reconstruction of the most abundant sequence, which is considered to be the ortholog (Supplement Figure 1). (3) Suitability for exonic probe sequences: both HybPiper and HybPhyloMaker are tailored for exonic probe sequences, whereas PHYLUCE might exclude a large number of loci in case one works with multiple targeted exons per gene (Table 2), as in such case multiple contigs per gene are often formed, which is an indicator of paralogy in PHYLUCE. HybPiper filters putative paralogs less stringently (Table 2), as in case of multiple contigs per gene these contigs must exceed a certain minimum length threshold (>85% length of the targeted locus). (4) Extraction of flanking intronic regions: only HybPiper provides a script for that, the other pipelines obtain these intronic regions during assembly, but do not process them further. (5) Missing data calculation: PHYLUCE and HybPhyloMaker offer estimation of missing data. (6) Calculation of alignment and gene tree properties: this is only implemented in HybPhyloMaker. The alignment properties comprise number of accessions, alignment length, proportion of variable sites, proportion of parsimony informative sites, GC content, alignment entropy, and conservation distribution. Gene tree properties are as follows: average bootstrap support, average branch length, average uncorrected p-distance, clocklikeness, simple linear regression on uncorrected p-distances against inferred distances, and long-branch score. (7) Flexible handling of excluding accessions and loci: this is possible in both PHYLUCE and HybPhyloMaker. (8) Gene and species tree reconstruction: software for gene tree reconstruction is implemented in PHYLUCE and for both gene and species tree reconstruction implemented in HybPhyloMaker. HybPhyloMaker offers per-exon and per-codon partitioning. (9) Reconstruction of organellar phylogenies: only HybPhyloMaker offers their reconstruction, based on coding regions. We consider PHYLUCE not well suitable for exclusively exonic probe sequences due to the drastic loss of potentially orthologous loci (Table 2). HybPiper has the benefits of extraction of the flanking intronic regions, which are especially needed in the reconstruction of shallow phylogenies, and identification of putative paralogs. The identification of paralogs is mainly essential (1) if putatively paralogous loci are not excluded during probe design (in such case, the identified paralogs should be excluded from phylogenetic reconstruction) and (2) if the ancestry of an allopolyploid is of interest (in such case, paralogs can be beneficial for the inference of complex reticulate relationships[52,53]). HybPhyloMaker treats the most abundant sequence of a locus as ortholog and does not identify putatively paralogous loci, which we consider an appropriate approach, except for the latter two cases. Compared with existing target enrichment data analysis pipelines, HybPhyloMaker offers the following main advantages: It implements all steps of target enrichment data analysis: from raw reads to species tree reconstruction. It provides calculation and summary of many alignment and gene tree properties that assist the user in the selection of appropriate “quality-filtered” genes for species tree reconstruction. This step is optional and users must make themselves familiar with any steps that select particular genes. It implements several species tree reconstruction methods (ASTRAL, ASTRID, MRL) as well as concatenation (FastTree, ExaML). It allows the analysis of the coding part of organellar genomes, ie, the analysis of a large proportion of the off-target reads, especially plastid reads.

Conclusions

HybPhyloMaker is a user-friendly pipeline that conducts the analysis of phylogenetic Hyb-Seq data sets from raw reads to species tree reconstruction. It is written in BASH and requires a priori installation of several other software packages. An install script is provided for easy installation of these software packages. HybPhyloMaker runs on major Linux distributions and MacOS X. The software is open source and available in https://github.com/tomas-fer/HybPhyloMaker/.
  40 in total

1.  The rooting of the universal tree of life is not reliable.

Authors:  H Philippe; P Forterre
Journal:  J Mol Evol       Date:  1999-10       Impact factor: 2.395

2.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

3.  Long identical multispecies elements in plant and animal genomes.

Authors:  Jeff Reneker; Eric Lyons; Gavin C Conant; J Chris Pires; Michael Freeling; Chi-Ren Shyu; Dmitry Korkin
Journal:  Proc Natl Acad Sci U S A       Date:  2012-04-10       Impact factor: 11.205

4.  Phylogenetic marker development for target enrichment from transcriptome and genome skim data: the pipeline and its application in southern African Oxalis (Oxalidaceae).

Authors:  Roswitha Schmickl; Aaron Liston; Vojtěch Zeisek; Kenneth Oberlander; Kevin Weitemier; Shannon C K Straub; Richard C Cronn; Léanne L Dreyer; Jan Suda
Journal:  Mol Ecol Resour       Date:  2015-12-15       Impact factor: 7.090

5.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

6.  AMAS: a fast tool for alignment manipulation and computing of summary statistics.

Authors:  Marek L Borowiec
Journal:  PeerJ       Date:  2016-01-28       Impact factor: 2.984

7.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Authors:  Salvador Capella-Gutiérrez; José M Silla-Martínez; Toni Gabaldón
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

8.  FastUniq: a fast de novo duplicates removal tool for paired short reads.

Authors:  Haibin Xu; Xiang Luo; Jun Qian; Xiaohui Pang; Jingyuan Song; Guangrui Qian; Jinhui Chen; Shilin Chen
Journal:  PLoS One       Date:  2012-12-20       Impact factor: 3.240

9.  Selecting optimal partitioning schemes for phylogenomic datasets.

Authors:  Robert Lanfear; Brett Calcott; David Kainer; Christoph Mayer; Alexandros Stamatakis
Journal:  BMC Evol Biol       Date:  2014-04-17       Impact factor: 3.260

10.  ASTRID: Accurate Species TRees from Internode Distances.

Authors:  Pranjal Vachaspati; Tandy Warnow
Journal:  BMC Genomics       Date:  2015-10-02       Impact factor: 3.969

View more
  7 in total

1.  New Insights Into the Relationships Within Subtribe Scorzonerinae (Cichorieae, Asteraceae) Using Hybrid Capture Phylogenomics (Hyb-Seq).

Authors:  Elham Hatami; Katy E Jones; Norbert Kilian
Journal:  Front Plant Sci       Date:  2022-07-01       Impact factor: 6.627

2.  Interrogating Phylogenetic Discordance Resolves Deep Splits in the Rapid Radiation of Old World Fruit Bats (Chiroptera: Pteropodidae).

Authors:  Nicolas Nesi; Georgia Tsagkogeorga; Susan M Tsang; Violaine Nicolas; Aude Lalis; Annette T Scanlon; Silke A Riesle-Sbarbaro; Sigit Wiantoro; Alan T Hitch; Javier Juste; Corinna A Pinzari; Frank J Bonaccorso; Christopher M Todd; Burton K Lim; Nancy B Simmons; Michael R McGowen; Stephen J Rossiter
Journal:  Syst Biol       Date:  2021-10-13       Impact factor: 15.683

3.  An empirical assessment of a single family-wide hybrid capture locus set at multiple evolutionary timescales in Asteraceae.

Authors:  Katy E Jones; Tomáš Fér; Roswitha E Schmickl; Rebecca B Dikow; Vicki A Funk; Sonia Herrando-Moraira; Paul R Johnston; Norbert Kilian; Carolina M Siniscalchi; Alfonso Susanna; Marek Slovák; Ramhari Thapa; Linda E Watson; Jennifer R Mandel
Journal:  Appl Plant Sci       Date:  2019-10-25       Impact factor: 1.936

4.  How to Tackle Phylogenetic Discordance in Recent and Rapidly Radiating Groups? Developing a Workflow Using Loricaria (Asteraceae) as an Example.

Authors:  Martha Kandziora; Petr Sklenář; Filip Kolář; Roswitha Schmickl
Journal:  Front Plant Sci       Date:  2022-01-07       Impact factor: 5.753

5.  New targets acquired: Improving locus recovery from the Angiosperms353 probe set.

Authors:  Todd G B McLay; Joanne L Birch; Bee F Gunn; Weixuan Ning; Jennifer A Tate; Lars Nauheimer; Elizabeth M Joyce; Lalita Simpson; Alexander N Schmidt-Lebuhn; William J Baker; Félix Forest; Chris J Jackson
Journal:  Appl Plant Sci       Date:  2021-06-14       Impact factor: 1.936

6.  Relative performance of customized and universal probe sets in target enrichment: A case study in subtribe Malinae.

Authors:  Roman Ufimov; Vojtěch Zeisek; Soňa Píšová; William J Baker; Tomáš Fér; Marcela van Loo; Christoph Dobeš; Roswitha Schmickl
Journal:  Appl Plant Sci       Date:  2021-07-23       Impact factor: 1.936

7.  HybPhaser: A workflow for the detection and phasing of hybrids in target capture data sets.

Authors:  Lars Nauheimer; Nicholas Weigner; Elizabeth Joyce; Darren Crayn; Charles Clarke; Katharina Nargar
Journal:  Appl Plant Sci       Date:  2021-07-21       Impact factor: 1.936

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.