Literature DB >> 33253423

Hybrid capture of 964 nuclear genes resolves evolutionary relationships in the mimosoid legumes and reveals the polytomous origins of a large pantropical radiation.

Erik J M Koenen1, Catherine Kidner2,3, Élvia R de Souza4, Marcelo F Simon5, João R Iganci6, James A Nicholls2,7, Gillian K Brown8, Luciano P de Queiroz4, Melissa Luckow9, Gwilym P Lewis10, R Toby Pennington3,11, Colin E Hughes1.   

Abstract

PREMISE: Targeted enrichment methods facilitate sequencing of hundreds of nuclear loci to enhance phylogenetic resolution and elucidate why some parts of the "tree of life" are difficult (if not impossible) to resolve. The mimosoid legumes are a prominent pantropical clade of ~3300 species of woody angiosperms for which previous phylogenies have shown extensive lack of resolution, especially among the species-rich and taxonomically challenging ingoids.
METHODS: We generated transcriptomes to select low-copy nuclear genes, enrich these via hybrid capture for representative species of most mimosoid genera, and analyze the resulting data using de novo assembly and various phylogenomic tools for species tree inference. We also evaluate gene tree support and conflict for key internodes and use phylogenetic network analysis to investigate phylogenetic signal across the ingoids.
RESULTS: Our selection of 964 nuclear genes greatly improves phylogenetic resolution across the mimosoid phylogeny and shows that the ingoid clade can be resolved into several well-supported clades. However, nearly all loci show lack of phylogenetic signal for some of the deeper internodes within the ingoids.
CONCLUSIONS: Lack of resolution in the ingoid clade is most likely the result of hyperfast diversification, potentially causing a hard polytomy of six or seven lineages. The gene set for targeted sequencing presented here offers great potential to further enhance the phylogeny of mimosoids and the wider Caesalpinioideae with denser taxon sampling, to provide a framework for taxonomic reclassification, and to study the ingoid radiation.
© 2020 The Authors. American Journal of Botany published by Wiley Periodicals LLC on behalf of Botanical Society of America.

Entities:  

Keywords:  Caesalpinioideae; Fabaceae; Leguminosae; hard polytomy; hybrid capture; incomplete lineage sorting; ingoid clade; lack of phylogenetic signal; mimosoid clade; phylogenomics

Mesh:

Year:  2020        PMID: 33253423      PMCID: PMC7839790          DOI: 10.1002/ajb2.1568

Source DB:  PubMed          Journal:  Am J Bot        ISSN: 0002-9122            Impact factor:   3.844


The field of molecular plant phylogenetics has had tremendous impacts on botanical studies and taxonomic classification, macroevolution and biogeography, ever since the pioneering studies of Chase et al. (1993) based on DNA sequence data. While those early studies used just a single locus, the plastid gene rbcL, modern studies often employ hundreds to several thousands of genes to infer phylogenetic relationships (e.g., Lee et al., 2011; Wen et al., 2013; Wickett et al., 2014; Yang et al., 2015; Zeng et al., 2017). Targeted enrichment via hybrid capture is now one of the most widely used methods for phylogenomics (e.g., Mandel et al., 2014; Weitemier et al., 2014; Nicholls et al., 2015; Sass et al., 2016; Johnson et al., 2018; Couvreur et al., 2019; Ojeda et al., 2019). Several methods for selecting genes (e.g., Johnson et al., 2018; Vatanparast et al., 2018) and assembling and analyzing the captured DNA sequence data have recently been developed. A number of pipelines are available to assemble gene matrices from the captured loci (Yang and Smith, 2014, with modifications described here; Johnson et al., 2016; Moore et al., 2017). At the same time, it has become clear that many parts of the “tree of life” that are difficult to resolve are rife with conflicting gene tree histories, resulting in polytomies in species tree inference. Gene tree conflict can be caused by lack of phylogenetic signal (Salichos and Rokas, 2013; Shen et al., 2017), incomplete lineage sorting (ILS), intragenic recombination (Scornavacca and Galtier, 2017; Smith et al., 2020), hybridization and/or horizontal gene transfer, or combinations of these (Rokas et al., 2003; Salichos and Rokas, 2013; Suh et al., 2015; Copetti et al., 2017; Moore et al., 2017; Walker et al., 2018; Koenen et al., 2020a), and can be aggravated by gene tree estimation errors (Richards et al., 2018). Furthermore, ancient whole‐genome duplications (WGDs), and gene duplications more generally, can complicate orthology assessment and contribute to the difficulties of resolving phylogenetic relationships (Koenen et al., 2020b). Detailed analyses of phylogenetic signal and conflict across a large number of gene trees can shed light on what factors are causing lack of resolution and determine whether they should be represented, in extreme cases, as candidate hard polytomies (Suh, 2016) (i.e., episodes of nearly instantaneous speciation of three or more lineages). In the present study, we used hybrid capture to enrich a set of 964 putative low‐copy genes, with the goal of inferring a robust generic backbone phylogeny for the mimosoid legumes, which include the large ingoid clade that has been particularly recalcitrant to phylogenetic resolution. The mimosoid clade (LPWG, 2017), formerly subfamily Mimosoideae, comprises ~3300 species in ~87 genera of trees, shrubs, geoxyles, and lianas. Highly typical of the clade, though also found in other members of subfamily Caesalpinioideae, are bipinnate leaves (with few exceptions, most notably the once‐pinnate leaves of the genus Inga and the phyllodes of Acacia s.s.; note that taxonomic authorities of all mimosoid genera are included in Table 2) that show extensive quantitative variation in size and numbers of leaflets and pinnae, and usually bear extrafloral nectaries on the petiole, rachis, and/or pinnae (Marazzi et al., 2019). Furthermore, many mimosoids have some form of armature (i.e., stipular spines, spinescent shoots, or prickles). Also highly characteristic of the clade is the diversity of inflorescence types composed of many small flowers in which the often colorful stamens are the most conspicuous floral whorl, and the whole inflorescence acts as the unit of pollinator attraction. Pollen characteristics are diverse and, notably, pollen is aggregated into tetrads or often in larger (up to 48‐celled) polyads in many genera (Guinet, 1981). By contrast, floral morphology is relatively uniform across mimosoids, all species having radially symmetric flowers with valvate petal aestivation, showing mainly quantitative variation in sizes of organs, numbers of floral parts per whorl, and the degree of fusion within whorls.
Table 2

Higher‐ and lower‐level clades informally recognized in this study.

Higher‐level cladesNo. of genera a No. of species b
Mimosoid clade~87~3300
Core mimosoids~72~3220
Ingoid clade~43~2000

Numbers of genera remain tentative pending resolution of generic delimitation issues caused by generic non‐monophyly across the mimosoid clade.

Numbers of species are approximate but most likely underestimated pending the description and further discovery of species new to science.

Including lineages/species pending transfer to newly described segregate genera.

Not sampled here but placement inferred from previous studies.

Note that genus is non‐monophyletic.

Based on a few conspicuous floral characters, the clade has been divided into three large tribes (Elias, 1981; Lewis et al., 2005): Mimoseae Bronn (≤10 free stamens per flower), Acacieae Benth. (usually >30 free stamens, but sometimes slightly fused at the base), and Ingeae Benth. (usually >30 stamens partly fused into a tube), which have all been shown to be non‐monophyletic (Fig. 1; Luckow et al., 2003, 2005; LPWG, 2013). The smaller tribe Parkieae (Wright & Arn.) Benth. is also non‐monophyletic, and Parkia itself is nested within Mimoseae (Luckow et al., 2003), as is the monospecific tribe Mimozygantheae Burkart (Luckow et al., 2005). With a dysfunctional tribal classification, generic affinities have increasingly been referred to informally named clades (e.g., Hughes et al., 2003) and informal generic groups (Lewis et al., 2005) or alliances (Barneby and Grimes, 1996). Generic delimitation remains frustrated by what appears to be extensive morphological homoplasy and lack of phylogenetic resolution, and many genera remain poorly defined and have been suspected or shown to be non‐monophyletic; examples include Archidendron (Brown et al., 2008), Prosopis (Catalano et al., 2008), Abarema (Iganci et al., 2016), Stryphnodendron (Simon et al., 2016), and Zygia (Ferm et al., 2019). This has been especially the case for tribe Ingeae, for which different authors have proposed starkly discordant generic systems (e.g., Nielsen, 1981; Lewis and Rico Arce, 2005; Barneby and Grimes, 1996; reviewed by Brown, 2008). In particular, the genus Albizia is poorly defined and its delimitation remains one of the most challenging taxonomic problems in the legume family. Indeed, Albizia is now considered the main “dustbin” genus, following the narrower circumscription of Pithecellobium, which was previously the dumping ground for difficult taxa (Nielsen, 1981; Barneby and Grimes, 1996; Brown, 2008).
Figure 1

Mimosoid phylogeny, classification, and diversity. (A) Majority‐rule bootstrap consensus tree from 1000 bootstrap replicates of the matK phylogeny from LPWG (2017), indicating the position of the mimosoid clade (crown node indicated by a blue triangle) within subfamily Caesalpinioideae (shaded orange) and showing that the ingoid clade (crown node indicated by a yellow circle) is the least resolved portion of the legume phylogeny. (B) Majority‐rule Bayesian consensus tree for the mimosoid clade, extracted from the matK phylogeny of LPWG (2017), highlighting the non‐monophyly of mimosoid tribes Parkieae (dark blue), Mimoseae (pink), Acacieae (yellow), and Ingeae (green). The monotypic Mimozygantheae (light blue) is nested in Mimoseae. (C) From left to right, top to bottom: pod valves of Pentaclethra macrophylla Benth., spicate inflorescence of Entada chrysostachys Drake, heteromorphic inflorescences of Dichrostachys akataensis Villiers and likewise for Parkia bahiae H.C.Hopkins, compound inflorescence of Vachellia karroo (Hayne) Banfi & Galasso, capitate inflorescence of Mimosa blanchetii Benth, spicate inflorescence of Senegalia ataxacantha (DC.) Kyal. & Boatwr., capitate inflorescence of Calliandra fuscipila Harms, dehisced fruit with seeds suspended on arillodia in Pithecellobium diversifolium Benth., flowers of Inga subnuda Salzm. Ex Benth., dimorphic inflorescence with enlarged central flowers of Hydrochorea corymbosa (Rich.) Barneby & J.W.Grimes and likewise for Albizia grandibracteata Taub., spicate inflorescences of Acacia longifolia Paxton. All photos by E. J. M. Koenen.

Mimosoid phylogeny, classification, and diversity. (A) Majority‐rule bootstrap consensus tree from 1000 bootstrap replicates of the matK phylogeny from LPWG (2017), indicating the position of the mimosoid clade (crown node indicated by a blue triangle) within subfamily Caesalpinioideae (shaded orange) and showing that the ingoid clade (crown node indicated by a yellow circle) is the least resolved portion of the legume phylogeny. (B) Majority‐rule Bayesian consensus tree for the mimosoid clade, extracted from the matK phylogeny of LPWG (2017), highlighting the non‐monophyly of mimosoid tribes Parkieae (dark blue), Mimoseae (pink), Acacieae (yellow), and Ingeae (green). The monotypic Mimozygantheae (light blue) is nested in Mimoseae. (C) From left to right, top to bottom: pod valves of Pentaclethra macrophylla Benth., spicate inflorescence of Entada chrysostachys Drake, heteromorphic inflorescences of Dichrostachys akataensis Villiers and likewise for Parkia bahiae H.C.Hopkins, compound inflorescence of Vachellia karroo (Hayne) Banfi & Galasso, capitate inflorescence of Mimosa blanchetii Benth, spicate inflorescence of Senegalia ataxacantha (DC.) Kyal. & Boatwr., capitate inflorescence of Calliandra fuscipila Harms, dehisced fruit with seeds suspended on arillodia in Pithecellobium diversifolium Benth., flowers of Inga subnuda Salzm. Ex Benth., dimorphic inflorescence with enlarged central flowers of Hydrochorea corymbosa (Rich.) Barneby & J.W.Grimes and likewise for Albizia grandibracteata Taub., spicate inflorescences of Acacia longifolia Paxton. All photos by E. J. M. Koenen. Most species of mimosoids occur in the tropics, with major centers of diversity in Central and South America, Australia, Africa, and Madagascar. The ability of most mimosoids to fix atmospheric nitrogen through nodulation (Sprent, 2007) means they are important in tropical agroforestry, and their nitrogen‐ and protein‐rich leaves and fruits are often used as animal fodder and green manure, among many other human uses including for timber, ornamentals, food, and hallucinogens (Lewis et al., 2005). Mimosoids occur in virtually every lowland tropical biome or vegetation type. They are abundant and diverse in evergreen rainforests, in particular in Africa and the Americas; form some of the most prominent groups in the woody flora of tropical grasslands in Brazil, Africa, and Australia; and dominate seasonally dry tropical forests and woodlands (SDTFs sensu Pennington et al., 2000, 2009; SDTFWs sensu Queiroz et al., 2017; or the succulent biome sensu Schrire et al., 2005; Gagnon et al., 2019; Ringelberg et al., 2020) in Mexico, Central America, the Caribbean, Northeast Brazil, the Horn of Africa, and Madagascar. Because of this prominence across tropical lowland biomes, the mimosoid clade offers an excellent study system to investigate adaptation along the gradient from ever‐wet to seasonally dry and arid tropical climates, as well as the extent of phylogenetic biome conservatism vs. biome shifting, which are the focus of forthcoming studies. However, a well‐resolved species tree for comparative analyses is lacking. Lack of resolution is particularly stark in the large Ingeae + Acacieae p.p. clade (Luckow et al., 2003; Miller et al., 2003; Brown et al., 2008; Bouchenak‐Khelladi et al., 2010; LPWG, 2017), hereafter referred to as the ingoid clade (Fig. 1A, B). This clade includes some 2000 species in ~38 genera, but the relationships among these genera are uncertain because of lack of phylogenetic resolution, even though all were sampled in the most densely sampled legume phylogeny to date based on the chloroplast gene matK (Fig. 1; LPWG, 2017). In fact, this clade appears to represent the least resolved part of the whole legume matK phylogeny (Fig. 1A). Here, we present a complete phylogenomics project using hybrid capture, from generating transcriptome data and selecting targeted genes to assembling and analyzing the captured DNA sequence data. The targeted genes were selected using a custom pipeline, which has recently also been used to select loci for other groups (Couvreur et al., 2019; Ojeda et al., 2019) and which is potentially useful across all taxonomic groups. Using these genome‐scale data, we generated a robust generic backbone phylogeny for the mimosoid clade, which forms the foundation for expanded taxon sampling to address biogeographical and macroecological questions. Here, however, we focus especially on the large, poorly resolved ingoid clade, to try to understand why inferred relationships in this clade have been so contentious. To address this, we also quantified conflicting signals across gene trees and used phylogenetic network approaches to assess whether the evolution of the ingoid clade is treelike or polytomous.

MATERIALS AND METHODS

The workflow for selecting targeted genes from transcriptome data is presented in Figure 2. The workflow for assembly and phylogenetic analysis of the captured DNA sequence data is presented in Figure 3.
Figure 2

Target gene selection workflow, indicating the number of sequences and loci retained at each step for the RBH4 gene set (see text).

Figure 3

Workflow for phylogenetic ortholog selection and gene tree and species tree analyses.

Target gene selection workflow, indicating the number of sequences and loci retained at each step for the RBH4 gene set (see text). Workflow for phylogenetic ortholog selection and gene tree and species tree analyses.

RNAseq to generate genomic resources

With no fully sequenced genome for mimosoids available when we started this study, we generated transcriptome data for four mimosoid genera to select nuclear markers for targeted enrichment. For the species Albizia julibrissin Durazz., Entada abyssinica Steud. ex A.Rich., and Microlobius foetidus (Jacq.) M.Sousa & G.Andrade, seedlings were grown at the Botanic Garden of the University of Zurich, and RNA was extracted from young leaves and shoot tips, as well as roots (A. julibrissin), using the RNeasy Plant Mini kit (Qiagen, Venlo, The Netherlands). Libraries for sequencing were produced using the TruSeq RNA Library Prep kit (Illumina, San Diego, California, USA) and sequenced 3‐plex on an Illumina HiSeq‐2000 sequencer, at the Functional Genomics Center in Zurich. Raw data were cleaned with prinseq‐lite (Schmieder and Edwards, 2011), and transcriptomes assembled using Trinity (Grabherr et al., 2011; Haas et al., 2013) with default settings. In addition, transcriptome data for three species of Inga were generated at the Royal Botanic Garden Edinburgh using similar sequencing and assembly methods (described in Nicholls et al., 2015). Since the comparisons among transcriptomes (see below) are effectively carried out at genus level, we generated a more comprehensive transcriptome for the genus Inga by combining the separate assemblies for the three Inga species into a nonredundant set of transcripts. This was done by running BLAST searches of the largest Inga transcriptome assembly against the second‐largest assembly and adding all transcripts without a significant hit (e‐value cutoff 1e‐10) in the latter. This procedure was then repeated for the third species.

Selecting putative single‐copy genes

From the four transcriptome data sets, putative single‐ or low‐copy nuclear genes were selected, using a procedure inspired by Wu et al. (2006) (Fig. 2). This procedure was recently used by Couvreur et al. (2019) and Ojeda et al. (2019), but because it was first designed for the mimosoid bait set, it is described in more detail here. First, for each of the four transcriptome data sets, TransDecoder (https://github.com/TransDecoder/TransDecoder) was used to predict open reading frames (ORFs) and translate those to protein sequences, using default settings. Highly similar proteins were removed to reduce redundancy (i.e., keeping only one protein sequence per gene and removing multiple alleles and isoforms) with CDHIT (Li and Godzik, 2006). This was repeated with four cutoff values (90%, 95%, 97%, and 99% identity), to avoid either clustering paralogous sequences with relatively low divergence or keeping alleles and isoforms with relatively high divergence. This means that the following steps were each repeated four times, and from each repetition only the putative orthologs that were more divergent among and less divergent within taxa were kept. In other words, for each repetition, sequences with higher identity among taxa than the cutoff values were removed. For each transcriptome, we performed a BLAST search of the CDHIT output against itself (“selfBLAST”) with an e‐value cutoff of 1e‐10, and sequences with multiple hits within the same transcriptome were removed to eliminate gene families. Next, a reciprocal best hit (RBH) algorithm was implemented in a custom Python script (available from https://github.com/erikkoenen/mimobaits/), to compare the four transcriptome data sets after removing redundancy and gene families. This is an extension of the RBH triangulation method of Wu et al. (2006), where a set of four sequences are considered as a putative ortholog if all possible pairwise reciprocal BLAST searches among the four transcriptomes yield the same RBH (Fig. 2). This works as follows: first, we take the first sequence of the transcriptome that we want to design the baits from (in our case Albizia) and run a BLAST search against one of the other transcriptomes; the best hit from the second transcriptome is then used as a query for a BLAST search against the first transcriptome, and when the original sequence that we started with is recovered as the best hit, this is considered an RBH. This is repeated for all combinations of transcriptomes by taking the sequence of the previous RBH and running a BLAST search against another transcriptome. This procedure was repeated for all sequences from the first transcriptome and then for sequences from the four transcriptomes that gave an RBH across all pairwise BLAST searches, and these were then written to separate FASTA files for each putative ortholog. Putative orthologs in which sequence length varied by >5% were discarded as an additional quality‐control step. From the resulting FASTA files, we also performed a phylogenetic congruence test similar to that of Wu et al. (2006). Orthologs were aligned with MAFFT using the G‐INS‐i algorithm (Katoh et al., 2005), alignments trimmed with BMGE with default settings (Criscuolo and Gribaldo, 2010) and rapid bootstrap analyses carried out with RAxML under the PROTCATLGF model (Stamatakis, 2014). If the resulting 95% bootstrap consensus topology was incongruent with previously known, and well established, relationships among the four taxa (Fig. 2), the putative ortholog was discarded. After running these procedures for each of the four different CDHIT cutoff values, the resulting ortholog sets were combined as the “RBH4 set.” Additionally, an “RBH3 set” was generated by comparing just the three largest transcriptomes (Albizia, Entada, and Microlobius) but omitting the phylogenetic congruence test (because a minimum of four taxa are needed to infer a phylogeny). A third set of putative orthologs was generated by running RBH comparisons among the two largest transcriptomes (Albizia and Microlobius), to sets of genes found by De Smet et al. (2013) to be strictly or mostly single‐copy across 20 angiosperm genomes (using sequences of Arabidopsis thaliana (L.) Heynh.). This third set is split into two subsets referred to as “SSC” (strictly single‐copy) and “MSC” (mostly single‐copy), following terminology from De Smet et al. (2013).

Bait design

For bait design, the sequences of the Albizia julibrissin transcriptome were used, because the genus Albizia and allies are the focus of an ongoing project in Zurich, and this will increase successful capture for these taxa. Intron‐exon boundaries were predicted for all transcripts in the four ortholog sets (RBH4, RBH3, SSC, and MSC), by running BLAST searches against a custom database combining the Arabidopsis thaliana (Lamesch et al., 2011), Medicago truncatula Gaertn. (Young et al., 2011), and Glycine max (L.) Merr. (Schmutz et al., 2010) genomes. For the genome database, gene models including introns were used, and the coordinates to which our transcripts aligned were used to partition sequences for each predicted exon to avoid designing baits spanning intron‐exon boundaries. This step is not essential but is likely to increase the efficiency of the capture. In addition to coding sequences, we also included 120 bp of the 3′‐UTR and 240 bp of the 5′‐UTR, but sequences obtained for these regions are not analyzed further here. Furthermore, additional target genes were added that included functionally interesting genes and genes targeted for separate studies in Inga (Nicholls et al., 2015), but again, none of these genes are analyzed here, as we focus on the low‐copy loci selected for phylogenetic analysis. Final bait design was carried out by Mycroarray (now Arbor Biosciences, Ann Arbor, Michigan, USA), with 3× tiling, and RNA baits were synthesized as part of the myBaits Custom Target Capture kit.

DNA extraction, library preparation, hybrid capture, and sequencing

We extracted DNA for 122 accessions, partly from tissue preserved in silica gel and partly from herbarium specimens, representing 75 of the ~87 currently recognized mimosoid genera and six closely related genera of non‐mimosoid Caesalpinioideae (voucher details in Appendix 1), using the DNeasy Plant Mini Kit (Qiagen). Sequencing libraries were prepared with the NEBNext Ultra DNA Library Prep kit for Illumina (New England Biolabs, Ipswich, Massachusets, USA), in combination with the NEBNext Multiplex Oligos for Illumina (both single and dual index kits). Libraries were quantified using qPCR and pooled prior to hybrid capture. Pools consisted of 8–21 libraries based on approximate evolutionary distances to the species from which the baits were designed (thus, species of Albizia were pooled together, species of closely related genera were pooled together in another pool, and species from more distantly related genera were pooled in yet another pool, etc.). This was done to avoid more distantly related accessions being underrepresented in the postcapture pools because it is expected that DNA molecules with higher sequence similarity hybridize more efficiently. The different pools were then enriched for the targeted regions in separate reactions with the myBaits Custom Target Capture kit. Enriched pools were quantified and pooled into a single library that was sequenced on Illumina HiSeq 2000 at the Functional Genomics Center in Zurich.

Assembly of sequence data and aligned matrices

After demultiplexing, raw reads were processed with Trimmomatic (Bolger et al., 2014) to remove adapter sequence artifacts and trim or remove low‐quality reads (using the settings MAXINFO:40:0.1 LEADING:20 TRAILING:20), and PEAR (Zhang et al., 2013) to merge overlapping read pairs (after removing adapter artifacts but before trimming). Resulting fastq files of quality‐filtered merged, paired, and unpaired reads were used in a de novo assembly for each accession using the SPAdes assembler (Bankevich et al., 2012). From the resulting scaffolds, we extracted all ORFs of ≥300 bp long between two stop codons with getorf (using the option ‐find 2) from the Emboss software suite. We reduced redundancy in the set of ORFs found for each accession with cdhit, using an identity cutoff of 0.99. For all ORFs from each accession, a BLAST search was carried out against the target sequences and for each target a multifasta file was created. Each ORF for each accession was added to the target multifasta file for which it received the best BLAST hit under an e‐value cutoff of 1e‐10, resulting in multifasta files for each target with potentially multiple sequences per accession included, which we refer to hereafter as “clusters.” Numbers of reads on target were estimated by mapping the untrimmed reads to the bait sequences with BLAT (Kent, 2002), using a minimum sequence identity threshold of 70%. Numbers of recovered loci were estimated with BLASTX, using protein sequences for the 964 targeted genes as the database and the SPAdes contigs as the query sequences, with an e‐value cutoff of 1e‐10. Using the fasta_to_tree.py script of Yang and Smith (2014), each cluster was aligned with MAFFT, sites with excessive missing data were removed (with a minimum column occupancy of 0.3), and a tree was inferred for each cluster with RAxML. We used other scripts of Yang and Smith (2014) to trim outlier long tips (with relative and absolute cutoffs of 0.1 and 0.3, respectively), mask monophyletic and paraphyletic clusters belonging to the same taxon, and cut deep paralogs (cutting internal branches >0.3 and keeping subtrees of ≥25 accessions). From the resulting trimmed subtrees, new multifasta files were created for a second round of tree inference, trimming, and masking. However, for the second round we used MACSE (Ranwez et al., 2018), instead of MAFFT, to obtain more accurate alignments and TreeShrink (with quantile of trees to remove set to q = 0.1; Mai and Mirarab, 2018) to trim tips, instead of relative and absolute cutoffs. Finally, after cutting deep paralogs again, we extracted all non‐overlapping subclusters with ≥25 accessions using the maximum inclusion (MI) method of Yang and Smith (2014). Besides analyzing the targeted nuclear genes, we also extracted off‐target reads with a BLAST hit against a reference set of chloroplast genomes of Inga leiocalycina Benth. (Dugas et al., 2015; GenBank accession KT428296), Leucaena trichandra Urb. (Dugas et al., 2015; GenBank accession KT428297), and Erythrophleum fordii Oliv. (Huang et al., 2018; GenBank accession MG644609), assembled chloroplast sequences for all accessions, and extracted the coding sequences gene by gene using a custom Python script with BLAST searches, confirming that sequence data for the chloroplast genome can be efficiently extracted and analyzed from off‐target reads in hybrid capture experiments as shown by Weitemier et al. (2014). The clpP gene was discarded because it shows accelerated evolution (Williams et al., 2015; Dugas et al., 2015) and yields a tree that strongly conflicts with those inferred using the other chloroplast genes. The accD gene has been lost from the chloroplast genome in several papilionoids, is highly variable in others (Magee et al., 2010), and is difficult to align across mimosoids, so we also removed this gene for phylogenetic analysis. The remaining 72 plastid genes were aligned with MACSE and concatenated with the pxcat program of the phyx package (Brown et al., 2017).

Phylogenetics

The MI subclusters were aligned with MACSE (Ranwez et al., 2018) to yield codon alignments, codons with >95% missing data were removed using pxclsq from the phyx package, and initial gene trees were inferred with RAxML. Using TreeShrink with a relatively high quantile cutoff (q = 0.25), we removed outlier long tips, to ensure a low error rate in the alignments. The drawback of this is that outgroup taxa and other taxa outside the “core mimosoids” (as defined in Appendix 2) also get pruned relatively frequently from these loci. Given that the mimosoid phylogeny in those parts is already well characterized from previous work (Luckow et al., 2003; Bouchenak‐Khelladi et al., 2010), this is unlikely to be problematic. For gene tree inference, codons with ambiguous or missing sites for >75% of accessions were removed from the alignments, after which sequences <300 bp and at the same time occupying <50% of the total aligned length were removed. Gene trees were inferred with RAxML under the GTRGAMMA model with 200 rapid bootstrap replicates. Using pxlstr from the phyx package, root‐to‐tip variance was estimated to discover outlier gene trees that might have originated from poor orthology inference or alignment artifacts. After inspecting a subset of gene trees, we decided to discard all those with a root‐to‐tip variance >0.01. Gene trees were used to calculate internode certainty all (ICA) values using RAxML (Kobert et al., 2016), for species tree analysis using ASTRAL‐III (Zhang et al., 2018), and for phylogenetic supernetwork analysis. ASTRAL‐III analyses were done on the best maximum likelihood (ML) gene trees, and subsets of gene trees with >25% or >50% of the accessions present to check if the analyses are sensitive to including gene trees with a lot of missing data. We also ran the polytomy test in ASTRAL‐III (Sayyari and Mirarab, 2018) to see for which nodes a polytomy null model could not be rejected. Another way to analyze conflicting signals across gene trees is to infer a filtered Z‐closure supernetwork (Whitfield et al., 2008). For deciding which splits to consider, we used the “mintrees” parameter, which allowed us to infer multiple networks, including rarer splits or only fewer, more commonly observed, and therefore better‐supported splits. For phylogenetic supernetwork analysis, we pruned all gene trees to a selection of taxa from the ingoid clade representing its main lineages that were present in high proportions of the gene trees, yielding a total of 878 gene trees in which more than half of the selected ingoid taxa were represented (≥6 out of 11). All pruned gene trees with less than half of the selected taxa present were discarded. Phylogenetic supernetworks were constructed using Splitstree version 4 (Huson, 1998), using different cutoffs for the MinTrees setting, representing 2.5%, 5%, 7.5%, and 10% of the total number of gene trees. For phylogenetic analyses of the concatenated alignments, codons with missing data for >90% of the accessions were removed. Both nucleotide and translated peptide alignments of loci with more than half of the taxa present were concatenated with pxcat of the phyx package. Loci for which the gene tree had a root‐to‐tip variance >0.01 were discarded prior to concatenation. Concatenated alignments, including the chloroplast alignment, were analyzed with RAxML, using the GTRCAT model for DNA sequences and the PROTGAMMALG4X model for protein sequences (Le et al., 2008), running 200 rapid bootstrap replicates for each. In addition, we carried out a gene jackknifing analysis with Phylobayes (Lartillot et al., 2013) using the CATGTR model, by dividing the loci randomly over four relatively equally sized concatenated protein sequence alignments with 10 replicates, running a total of 40 analyses for 1000 cycles. For faster convergence, the ML estimate of the concatenated analysis in RAxML was provided as a starting tree for the chains. The first 500 cycles of each replicate were discarded as burn‐in prior to summarizing a majority‐rule consensus tree over all replicates.

Visualizing gene tree discordance

Numbers of supporting and conflicting bipartitions for each node were extracted from gene trees with more than half the accessions present, using Phyparts (Smith et al., 2015). For this, gene trees were first rooted using pxrr from the phyx package, with a list of outgroup taxa outside the “core mimosoids” ranked by their relative divergence from Ingeae/Acacieae. Additionally, we visualized proportions of supporting and rejecting gene trees for selected clades with DiscoVista (Sayyari et al., 2018), from the same set of gene trees for which at least half the accessions are present. Clades for these visualizations were selected based on results from the ASTRAL polytomy test (described above).

RESULTS

Transcriptome sequencing, gene selection, and bait design

Transcriptome sequencing statistics are in Table 1; data are available on the National Center for Biotechnology Information (NCBI) databases under BioProjects PRJEB8722 and PRJNA574148; FASTQ files with raw read data are available on the Sequence Read Archive (SRA), under accession nos. SRX6901075 (Albizia julibrissin), SRX6901076 (Entada abyssinica), ERX719658 (Inga spectabilis (Vahl) Willd.), ERX719681 (Inga umbellifera (Vahl) Steud. ex DC.), ERX719690 (Inga sapindoides Willd.), and SRX6901077 (Microlobius foetidus); assembled transcripts are available on Dryad (for the Inga spp., https://doi.org/10.5061/dryad.r9c12) and through the Transcriptome Shotgun Assembly (TSA) database, accession nos. GHWM00000000 (Albizia julibrissin), GHWN00000000 (Entada abyssinica), and GHWO00000000 (Microlobius foetidus). Results from the gene selection procedure for the RBH4 gene set are summarized in Figure 2. After running the pipeline with four different similarity cutoffs in CDHIT, we found 433 RBH4 and 334 RBH3 target genes. We recovered 320 MSC and 19 SSC genes, of which 134 and eight genes, respectively, were already included in the RBH sets. Combining all gene sets we obtained a total of 964 low‐copy nuclear genes for enrichment. The complete coding sequences from the Albizia julibrissin transcriptome for these targeted genes are in Appendix S1. The bait design included 24,856 probes at 3× tiling. Target sequences and baits are also available at https://github.com/erikkoenen/mimobaits/.
Table 1

Transcriptome sequencing and assembly.

TaxonTotal no. of readsQuality filtered readsTrinity contigsPredicted ORFs
Albizia julibrissin 65,129,217

Left: 60,128,377

Right: 57,345,882

153,721104,184
Entada abyssinica 65,006,875

Left: 59,821,838

Right: 56,882,422

130,06291,882
Microlobius foetidus 97,515,912

Left: 89,669,576

Right: 85,024,338

188,370126,976

Inga (three species,

nonredundant set)

NA a NA a 106,58945,139

See Nicholls et al. (2015) for sequencing results of the three Inga transcriptomes used here.

Transcriptome sequencing and assembly. Left: 60,128,377 Right: 57,345,882 Left: 59,821,838 Right: 56,882,422 Left: 89,669,576 Right: 85,024,338 Inga (three species, nonredundant set) See Nicholls et al. (2015) for sequencing results of the three Inga transcriptomes used here.

Targeted sequencing and data assembly

Sequencing and de novo assembly statistics for targeted sequencing for all accessions are presented in Appendix 1, including full species names with taxonomic authorities, and sequence reads have been submitted to the European Nucleotide Archive (study no. PRJEB38138). Accessions were enriched and sequenced in three separate batches, with different levels of multiplexing, which explains some of the variation observed in numbers of total reads and reads on target. Total reads per accession varied from 1,360,502 to 70,271,424. For the largest batch of samples, the enrichment was less efficient, with number of reads on target between 3.81% and 17.77%, while for the two smaller batches it varied between 69.00% and 85.27%. The percentage of reads on target is particularly low for taxa most distantly related to Albizia julibrissin on which the bait sequences are based. Highly divergent sequences are not expected to be captured, but even so, these percentages of reads on target may be underestimated if the targeted sequences are highly divergent (<70% sequence identity to the baits) given the mapping threshold that we employed. Despite the variable enrichment efficiency, we were able to reconstruct at least partial sequences for the large majority of loci across almost all taxa (Appendix 1), with the number of target loci that were at least partially recovered, as determined by BLASTX searches of the scaffolds, ranging from 644 to 957. After ortholog detection, a total of 1915 gene alignments were recovered (Fig. 3), representing 767 of the targeted genes. Clusters representing the remaining 197 targeted genes were discarded because orthologous subclusters contained too few accessions, which may in turn be caused by poor phylogenetic resolution. For 279 targets, only a single gene alignment was recovered (i.e., they are putatively single‐copy). For the remainder of the gene alignments, it is sometimes difficult to establish whether the multiple alignments represent paralogous copies, multiple exon alignments for the same gene that became separated during phylogenetic ortholog detection, or gene alignments that were split into two‐taxon sets because of long internal branches. Using BLAST searches of the longest sequence of each gene alignment against the target sequences, it became clear that many of these do indeed represent different non‐overlapping fragments (most likely exons) of the same gene. Furthermore, some of the multiple alignments for the same gene do not have any overlapping accessions, which suggests they represent orthologous sequences for two distinct groups of taxa. It is thus not straightforward to accurately determine the precise number of paralog copies among the targeted genes. The number of accessions per gene alignment ranged from 13 to 121 (Fig. 4A), and aligned length per gene alignment varied from 282 to 2526 bp (Fig. 4B). Taxon occupancy per locus shows about a fourfold difference, with generally higher occupancy for members of the ingoid clade compared to more divergent taxa (Fig. 4C). However, even the least represented accession (Acaciella villosa) is still present in 274 gene alignments, which is likely sufficient to resolve its placement in the phylogeny, at least in concatenated analyses. Numbers of distinct alignment patterns, an indication of the phylogenetic informativeness of an alignment, show an uneven distribution across gene alignments. This suggests there are relatively few highly informative genes in the data set, but also few that are relatively uninformative (Fig. 4D). However, this does not indicate whether certain genes are particularly informative for deeper nodes or for more recent ones.
Figure 4

Statistics for recovered loci. (A) Number of accessions per locus, (B) aligned length per locus, (C) taxon occupancy per locus (all vs. rttvar = all loci or only those with <0.01 root‐to‐tip variance in the gene trees; all vs. min31 vs. min62 = without or with minimum taxon cutoffs of 25% or 50%, respectively), (D) number of alignment patterns per locus, and (E) root‐to‐tip variance in the inferred gene trees, with the dashed line at 0.01 indicating the cutoff for retaining or discarding loci.

Statistics for recovered loci. (A) Number of accessions per locus, (B) aligned length per locus, (C) taxon occupancy per locus (all vs. rttvar = all loci or only those with <0.01 root‐to‐tip variance in the gene trees; all vs. min31 vs. min62 = without or with minimum taxon cutoffs of 25% or 50%, respectively), (D) number of alignment patterns per locus, and (E) root‐to‐tip variance in the inferred gene trees, with the dashed line at 0.01 indicating the cutoff for retaining or discarding loci.

Gene and species tree inference

Gene trees for 148 of the genes had relatively high root‐to‐tip variances (>0.01; Fig. 4E). This marked branch length variation suggests they are not suitable for phylogenetic reconstruction, and inspection of these gene trees made it clear (based on our understanding of mimosoid phylogeny) that many of the inferred relationships were spurious. Apart from genuine variation in substitution rates, it is also likely that missing data (e.g., complete exons missing in some unrelated taxa) could lead to such relationships being inferred. These gene trees were discarded and not analyzed further. After excluding these loci, the remaining 1767 were aligned, giving a total aligned length of 861,525 bp, with 450,375 alignment patterns and 62.12% missing data. A second concatenated alignment for only those loci with at least half of the accessions included (510 genes/exons) has a total aligned length of 254,250 bp, or 84,750 amino acids with 176,713 or 73,179 alignment patterns, respectively, and 34.89% missing data. Jackknife alignments consist of between 127 and 129 genes with total aligned lengths of 19,949 to 22,218 amino acids. The chloroplast alignment is 60,321 bp long, contains 16,589 alignment patterns, and has 17.33% missing data. The concatenated ML and ASTRAL species tree analyses yielded highly supported and similar topologies, except for a relatively small number of internodes (Fig. 5). ML analyses of the concatenated alignment of 510 loci (Appendices S2 and S3) show higher support and almost identical topologies. The Bayesian jackknife consensus tree (Fig. 6) shows a polytomy at the base of the mimosoid clade, involving the position of Chidlowia and several polytomies within the ingoid clade, including a large one along the backbone of that clade. The chloroplast phylogeny (Appendix S4) differs in some places from the species trees inferred from nuclear gene data. For example, there is notable cytonuclear discordance in relation to the monophyly of Senegalia (see below). The chloroplast phylogeny is less robustly supported than the nuclear species tree, particularly within the ingoid clade. A tanglegram comparing the chloroplast phylogeny with the ASTRAL species tree (Appendix S5) shows only minor differences outside the ingoid clade, but rather different relationships across the base of the ingoid clade, as expected due to low support in that portion of the tree (see below). Generally, apart from the well‐supported discordance related to Senegalia, the differences between the chloroplast and ASTRAL phylogenies do not appear to be beyond what could be expected, based on the observed gene tree incongruence among nuclear genes, and coincide mostly with poorly supported nodes. Alignments and trees are included in Appendices [Link], [Link], [Link], [Link], [Link], [Link] and available on TreeBASE (http://purl.org/phylo/treebase/phylows/study/TB2:S26316).
Figure 5

Generic backbone phylogeny of mimosoid legumes. Comparison between the concatenated ML and ASTRAL species trees, with gray shading indicating topological differences. (A) RAxML tree inferred from the full concatenated alignment (1767 loci) with bootstrap support indicated for internodes that received <100%, and branch lengths in number of substitutions per site. (B) ASTRAL species tree inferred from 1229 loci with more than a quarter of the accessions present, with branch lengths in coalescent units. Local posterior probability is indicated for internodes that received <1.00 pp; circles on nodes indicate those nodes for which a polytomy could not be rejected. Terminal branch lengths in the ASTRAL tree are set at 1 (instead of 0) for better visualization.

Figure 6

Robustly supported clades in the mimosoid phylogeny. Clades are annotated on the Bayesian jackknife majority‐rule consensus tree, with posterior probability values for internodes with <1.00 pp indicated. Colored taxon names indicate non‐monophyly of all but one of the alliances recognized by Barneby and Grimes (1996), as per the legend. Terminal names in black were not included in any alliance by Barneby and Grimes (1996) because they did not include genera outside tribe Ingeae and did not comprehensively treat the genera of Ingeae that do not occur in the Americas.

Generic backbone phylogeny of mimosoid legumes. Comparison between the concatenated ML and ASTRAL species trees, with gray shading indicating topological differences. (A) RAxML tree inferred from the full concatenated alignment (1767 loci) with bootstrap support indicated for internodes that received <100%, and branch lengths in number of substitutions per site. (B) ASTRAL species tree inferred from 1229 loci with more than a quarter of the accessions present, with branch lengths in coalescent units. Local posterior probability is indicated for internodes that received <1.00 pp; circles on nodes indicate those nodes for which a polytomy could not be rejected. Terminal branch lengths in the ASTRAL tree are set at 1 (instead of 0) for better visualization. Robustly supported clades in the mimosoid phylogeny. Clades are annotated on the Bayesian jackknife majority‐rule consensus tree, with posterior probability values for internodes with <1.00 pp indicated. Colored taxon names indicate non‐monophyly of all but one of the alliances recognized by Barneby and Grimes (1996), as per the legend. Terminal names in black were not included in any alliance by Barneby and Grimes (1996) because they did not include genera outside tribe Ingeae and did not comprehensively treat the genera of Ingeae that do not occur in the Americas.

Characterization of well‐supported clades

The species trees provide a robust framework for recognizing two higher‐level and 15 lower‐level clades within the mimosoid clade (sensu LPWG, 2017; i.e., former subfamily Mimosoideae plus Chidlowia, which is here confirmed as a member of the mimosoid clade, as suggested by Manzanillo and Bruneau, 2012; see Fig. 6 and Table 2) that receive high support in (almost) all analyses, and that are mostly also well supported across gene trees (Fig 7A). These clades serve as an informal classification for communicating about, and navigating across, the mimosoid phylogeny. Following the long tradition of using informal group or clade names in legume systematics (Polhill and Raven, 1981; Lewis et al., 2005; LPWG, 2013), lower‐level clades are named after a characteristic genus within each clade and provide monophyletic groupings of genera to replace previously defined informal groups or alliances (Barneby and Grimes, 1996; Lewis et al., 2005), almost all of which are now shown to be non‐monophyletic. However, not all genera are included in a named clade, because of the imbalanced topology, which includes a few paraphyletic grades. These include a grade of Prosopis africana and the genera Plathymenia, Fillaeopsis, Newtonia, Cylicodiscus, and Prosopis laevigata; the senegalioid grade that includes Mariosousa, Senegalia and its recent segregates Pseudosenegalia and Parasenegalia (Miller et al., 2017; Seigler et al., 2017; neither of which is sampled here); as well as several genera in isolated positions with deep‐branching stem lineages (e.g., Cedrelinga, Chidlowia, and Lachesiodendron). The named clades and unplaced genera are listed in Table 2, and clade definitions are included in Appendix 2 with notes about notable characteristics.
Figure 7

Evaluation of gene tree support for selected nodes. (A) Bar graphs of supporting and rejecting gene trees for the two higher‐level and 15 lower‐level clades identified in this study and for alternative topologies involving (B) placement of Chidlowia, (C) placement of Prosopis, (D) monophyly or paraphyly of the Piptadenia group, (E) placement of Cedrelinga, (F) placement of Pseudosamanea, (G) affinities of the Samanea clade, and (H) all possible sister pairs, clade triplets, and quartets within the polytomous portion of the ingoid clade after pruning Cedrelinga and the Samanea clade from the gene trees. Note that for panels B–H, the bars for each graph are sorted from most to least supported. Abbreviations: mims = mimosoids, Stryphno = Stryphnodendron clade, Ced = Cedrelinga cateniformis, Pseu = Pseudosamanea guachapele, Arc = Archidendron clade, Jup = Jupunba clade, Ing = Inga clade, Sa = Samanea clade, Alb = Albizia clade, PseuSa = Pseudosamanea guachapele + Samanea clade.

Evaluation of gene tree support for selected nodes. (A) Bar graphs of supporting and rejecting gene trees for the two higher‐level and 15 lower‐level clades identified in this study and for alternative topologies involving (B) placement of Chidlowia, (C) placement of Prosopis, (D) monophyly or paraphyly of the Piptadenia group, (E) placement of Cedrelinga, (F) placement of Pseudosamanea, (G) affinities of the Samanea clade, and (H) all possible sister pairs, clade triplets, and quartets within the polytomous portion of the ingoid clade after pruning Cedrelinga and the Samanea clade from the gene trees. Note that for panels B–H, the bars for each graph are sorted from most to least supported. Abbreviations: mims = mimosoids, Stryphno = Stryphnodendron clade, Ced = Cedrelinga cateniformis, Pseu = Pseudosamanea guachapele, Arc = Archidendron clade, Jup = Jupunba clade, Ing = Inga clade, Sa = Samanea clade, Alb = Albizia clade, PseuSa = Pseudosamanea guachapele + Samanea clade. Higher‐ and lower‐level clades informally recognized in this study. Numbers of genera remain tentative pending resolution of generic delimitation issues caused by generic non‐monophyly across the mimosoid clade. Numbers of species are approximate but most likely underestimated pending the description and further discovery of species new to science. Including lineages/species pending transfer to newly described segregate genera. Not sampled here but placement inferred from previous studies. Note that genus is non‐monophyletic.

Evaluation of support for inferred relationships

The ASTRAL topology differs in only five places from the ML topology (Fig. 5): (1) Prosopis laevigata is sister to the Dichrostachys clade with 0.44 pp, instead of to the rest of the core mimosoids with 80% BS; (2) Stryphnodendron pulcherrimum and Pseudopiptadenia contorta have swapped positions, with 0.02 pp in the ASTRAL tree, while the alternative relationship in the ML tree has full support; (3) Cedrelinga cateniformis is sister to a large clade composed of several subclades of the ingoid clade with 0.38 pp, instead of being sister to the Jupunba clade with 60% BS; (4) Abarema cochliacarpos and Leucochloron limae are not sister taxa, with full support, while they are in the ML tree with 87% BS; and (5) Albizia atakataka is in a different position in the two trees, with 49% BS vs. 0.36 pp. Support along the backbone of the phylogeny, with the exception of the ingoid clade, is generally high in concatenated analyses (Figs. 5 and 6) but is known to be overestimated in large data sets (Salichos and Rokas, 2013). Taking into account conflicting signals across gene trees, levels of support are less robust, with many internodes receiving relatively low ICA support (<0.5; Appendix S12) suggesting significant conflict at those nodes. In a few cases, ICA values below zero indicate that the most common conflicting bipartitions are more prevalent than the supporting ones (Fig. 7). Comparing proportions of gene tree bipartitions supporting an internode, in relation to the most common conflicting bipartions, all other conflicting bipartitions, and uninformative gene trees (including those with missing data; e.g., most strikingly for the Calliandra clade, due to poor representation of Acaciella villosa across gene alignments), it is clear that the majority of gene trees are either uninformative or contain an infrequent conflicting bipartition (Fig. 7 and Appendix S13). This strongly suggests that the majority of the gene trees lack phylogenetic signal, especially across the ingoid backbone. The ASTRAL polytomy test showed that for several nodes, the null model of a polytomy could not be rejected (given a p‐value threshold of 0.05; Fig. 4B and Appendix S14). We quantified gene tree conflict in more detail for three questionable deeper nodes along the backbone of the phylogeny (Fig. 7B–D): (1) placement of Chidlowia, (2) placement of Prosopis laevigata, and (3) mono/paraphyly of the Piptadenia group sensu Lewis et al. (2005) but excluding Parkia, Anadenanthera, and the recently segregated Lachesiodendron (Ribeiro et al., 2018). The same was done for the backbone of the ingoid clade (Fig. 7E–H), where polytomies could not be ruled out for several nodes. This shows that the placement of Chidlowia as sister to all other mimosoids excluding the Xylia clade (as in Fig. 5) is preferred slightly over the two alternative hypotheses (Fig. 7B). For Prosopis laevigata, a sister‐group relationship with the rest of the core mimosoids (i.e., being sister to the Dichrostachys clade + the remaining core mimosoids except Cylicodiscus, as in Figs. 5A and 6) is equally or slightly better supported across gene trees than the two alternatives (Fig. 7C). For the Piptadenia group, paraphyly is slightly more often supported across gene trees than monophyly, with the Mimosa clade as the most likely sister group of the ingoid clade (Figs. 5, 6, and 7D). Within the ingoid clade, there is a notable lack of resolution especially in the clade that includes Cedrelinga and Pseudosamanea plus the Archidendron, Jupunba, Inga, Samanea, and Albizia clades. The phylogenetic placement of the monotypic Cedrelinga appears to be unstable, with hardly any gene tree support for any of its possible placements (Fig. 7E). There are some weakly supporting gene trees showing a sister‐group relationship of Cedrelinga with Pseudosamanea, but that taxon is more likely related to Chloroleucon and Samanea (Fig. 7F), and one of the other three possible placements (Fig. 7E) is probably more likely. There are no gene trees strongly supporting Cedrelinga as sister to the rest, and for the other two options there is just one gene tree strongly in support of each. A sister‐group relationship between the Samanea (including Pseudosamanea) and Albizia clades has minimal support across gene trees, even though it is found in the ML and ASTRAL species tree analyses (Fig. 5) and remains the most likely possibility in relation to alternatives (Fig. 7G). These results suggest that Cedrelinga and Pseudosamanea, and perhaps also the (other) two genera of the Samanea clade, are potentially causing lack of resolution in the ingoid clade, acting as “rogue taxa,” for example due to lack of phylogenetic signal for the placement of these taxa or long branch attraction (LBA) artifacts, particularly for Cedrelinga. Another possibility is that ancient hybridization or (allo)polyploidization has occurred, giving rise to (some of) these rogue lineages. ML analyses on the concatenated alignment of 510 genes omitting these taxa do indeed increase support along the ingoid backbone (compare Appendices S3 and [Link], [Link], [Link]). To investigate this further, we evaluated support for all possible groupings of the Archidendron, Jupunba, Inga, Samanea, and Albizia clades as sister clades, triplets, and quartets across gene trees with Cedrelinga and Pseudosamanea removed. This shows that the sister‐group relationship of the Albizia and Samanea clades is more likely than any other conflicting relationship (Fig. 7H) and that the Jupunba and Inga clades are likely to be sister clades. No well‐supported triplets are found, while the quartet that unites the Jupunba, Inga, Samanea, and Albizia clades is better supported than all other possible quartets (Fig. 7H). Taken together, this would suggest a branching order of (Archidendron((Jupunba,Inga),(Samanea,Albizia))) for these clades. However, none of the possible relationships among these clades, nor the placements of Cedrelinga and Pseudosamanea, appear in many gene trees with strong support, and it is striking that there are many more strongly conflicting gene trees for most of these (Fig. 7H).

Phylogenetic supernetwork analysis

At the lowest mintrees setting (n = 22, ~2.5% of the total number of trees; Fig. 8A, B), there appears to be little signal. Increasing to n = 44 or n = 66 (Fig. 8C–F), the network becomes somewhat more treelike and shows more or less the same relationships among clade representatives as the gene tree support summarization (Fig. 7E–H). However, increasing mintrees to n = 88 causes that resolution to collapse (Fig. 8G), showing that just limited phylogenetic signal hints at a resolved topology. In other words, taking into account more of the uncommon splits across gene trees (Fig. 8A, B), discordance is too high to reveal phylogenetic signal, while a stricter approach taking into account only splits that are more commonly found across gene trees (Fig. 8G) results in no phylogenetic signal being observed.
Figure 8

Phylogenetic Z‐closure filtered supernetworks for a selection of taxa representing the main lineages of the ingoid clade with mintrees parameter set at 22, drawn (A) with and (B) without the Convex Hull algorithm, and the same for (C, D) mintrees setting at 44, (E, F) mintrees setting at 66, and (G) the mintrees setting at 88 (the networks with and without the Convex Hull method are identical, indicating that not many splits are included under this parameter setting).

Phylogenetic Z‐closure filtered supernetworks for a selection of taxa representing the main lineages of the ingoid clade with mintrees parameter set at 22, drawn (A) with and (B) without the Convex Hull algorithm, and the same for (C, D) mintrees setting at 44, (E, F) mintrees setting at 66, and (G) the mintrees setting at 88 (the networks with and without the Convex Hull method are identical, indicating that not many splits are included under this parameter setting).

DISCUSSION

We found that targeted enrichment via hybrid capture is a powerful and efficient way to reconstruct the phylogeny of a challenging taxonomic group, in line with findings across a rapidly growing number of other groups (e.g., Mandel et al., 2014; Weitemier et al., 2014; Nicholls et al., 2015; Sass et al., 2016; Johnson et al., 2018; Couvreur et al., 2019; Ojeda et al., 2019). The phylogenetic resolution and statistical support obtained here offer a significant improvement over previous mimosoid phylogenies (Luckow et al., 2003; Bouchenak‐Khelladi et al., 2010; LPWG, 2017), yielding a robust generic backbone tree for the mimosoid clade (Figs. 5 and 6). Nevertheless, relationships among well‐supported clades within the ingoid clade appear to be impossible to resolve with our data set (Fig. 7E–H), which is surprising given the large number of genes deployed here and the general robustness of the mimosoid phylogeny that was recovered using these genes. Therefore, this lack of resolution is probably not caused by insufficient data, but is instead most likely the result of extremely rapid speciation leading to a lack of phylogenetic signal as implied by lack of resolution across nearly all gene trees (Fig. 7H). While evaluation of supporting gene trees and the filtered supernetworks (Fig. 8) suggest some clade relationships as more likely than others, this may simply be an exercise in extracting the least conflicting signal from a data set where there is virtually no signal to begin with. In any case, there appear to be many conflicting bipartitions among the set of gene trees (Fig. 7E–H), and hardly any that strongly support any of the possible relationships among the ingoid subclades. Gene tree conflict is often attributed to ILS, as found in the initial radiation of the Neoaves clade of birds (Suh et al., 2015), which provides one of the most convincing examples of a hard polytomy documented so far (Suh, 2016). Suh et al. (2015) used retroposon insertion sites that are virtually free from homoplasy as strong evidence for ILS. While such evidence is lacking here, part of the ingoid backbone appears similarly unresolvable based on 964 nuclear genes. In other cases, such as mammals, ILS has been shown to be only a minor cause of gene tree conflict (Scornavacca and Galtier, 2017), suggesting that such conflict could equally be caused by gene tree estimation errors due to lack of phylogenetic signal, homoplasy, alignment errors, and/or poor model fit (Richards et al., 2018). Across the ingoid clade, the majority of conflicting gene tree bipartitions appear to be rare and most of them are only weakly conflicting (Fig. 7E–H). This suggests that most of the conflicting bipartitions stem from lack of phylogenetic signal, with gene tree estimation errors accounting in part for the strongly conflicting bipartitions (Richards et al., 2018). Other reasons for poor gene tree estimation include alignment errors, homoplasy, poor model fit, and LBA artifacts. We have attempted to minimize alignment errors by using MACSE (Ranwez et al., 2018), which simultaneously aligns coding sequences and the amino acid translations, yielding considerably better alignments than MAFFT and making the additional computational time worthwhile. The interrelated issues of homoplasy, poor model fit, and LBA artifacts are less easily tackled and could be the main sources of gene tree estimation errors in our data set. In that case, this conflict would constitute phylogenetic noise rather than genuine conflicting signal, and such noise is present across much of the tree (Appendix S13). However, even though the number of conflicting bipartitions for many nodes across the tree far outnumber the most prevalent bipartition, the second most prevalent (green part of pie charts in Appendix S13) is not close to equally prevalent in parts of the species tree where resolution and support are consistently high. Within the ingoid radiation there is simply not enough signal to override this noise. Apart from gene tree estimation errors and ILS, ancient hybridization during the radiation of the ingoid clade could offer an alternative explanation for the large number of strongly conflicting gene tree topologies. The strong conflicting gene tree support for the placement of Pseudosamanea, in particular, could be indicative of hybridization, although it is also possible that LBA artifacts could be causing an apparent sister‐group relationship with Cedrelinga in some gene trees. For the Neoaves clade of birds, lack of treelike structure in phylogenetic supernetworks is similar to that found in networks generated from simulated random topologies, suggesting that this clade is indeed best considered a hard polytomy (Suh, 2016). Together with the lack of gene tree support (Fig. 7E–H), our supernetworks (Fig. 8) also suggest that the ingoid radiation perhaps constitutes a hard polytomy. With intermediate mintrees parameter settings (Fig. 8C–F) the networks show some structure. However, given that this resolution collapses at the higher mintrees setting (Fig. 8G), this is likely driven by a very small number of gene trees, while conflicting gene trees largely outnumber the few supporting ones (Fig. 7E–H), in line with the idea that many contentious relationships are supported by just a handful of genes (Shen et al., 2017). Our network at the lowest mintrees setting is similar to that of a simulated hard polytomy (cf. Fig. 8A, B, with Suh, 2016: fig. 4E). We therefore conclude, pending enhanced taxon sampling and eventually completely sequenced genomes, that there is potentially a hard polytomy embedded in the backbone of the ingoid clade, from which derives a large pantropical radiation that includes an estimated 1750 extant species. This putative hard polytomy involves six or seven lineages and is resistant to resolution, even using sequences from hundreds of nuclear genes. With complete exomes and positional homology data, it will be possible to investigate the sorting of different unlinked exons, genes, or other genomic elements (e.g., retroelement insertions; Doronina et al., 2015; Suh et al., 2015; Suh, 2016; Springer et al., 2020) across lineages within the ingoid clade in greater detail to shed light on the underlying treelike structure of the phylogeny, or the lack thereof, ultimately confirming or rejecting the hard polytomy suggested here. We note that our analyses employ only protein‐coding sequence data, but noncoding data flanking the targeted exons (UTRs, introns) were also captured to some extent, and may be more variable and phylogenetically informative. We chose to use only the coding data because we consider them superior to noncoding data, being less saturated with multiple substitutions and more reliably alignable, especially with an alignment program that takes protein translations into account (Ranwez et al., 2018). The resulting alignments can also be translated and analyzed with models of protein evolution such as LG4X (Le et al., 2008) and CAT (Lartillot and Philippe, 2004), which are more realistic and less prone to LBA artifacts than DNA nucleotide substitution models (Lartillot et al., 2007; Philippe et al., 2011). It may nonetheless be interesting to explore noncoding regions from our data set in the future. Furthermore, it is possible that fragmentation of exons from the same gene could have contributed to lack of resolution across gene trees. While it is well suited for distinguishing orthologs from paralogs, fragmentation of exons is a limitation of the modified Yang and Smith (2014) ortholog selection pipeline used here. Other available pipelines can potentially deal with this issue and hence improve individual gene trees and thereby allow more accurate evaluation of alternative topologies. However, these pipelines have other limitations. For example, the Hybpiper pipeline (Johnson et al., 2016) could potentially reconstruct longer gene sequences than the pipeline used here, but does not automatically sort different paralogs into separate gene alignments. Similarly, Moore et al.’s (2017) method to classify exons by their respective paralog gene copies offers a promising approach, but relies on having initial backbone gene family trees for all loci. Furthermore, recombination may also take place in between different exons of the same gene, suggesting that exons could be better evolutionary units for phylogenetic analysis than full gene sequences (Scornavacca and Galtier, 2017), thereby potentially mitigating this limitation of the Yang and Smith (2014) pipeline. It has been established that multiple WGD events occurred during the early evolution of the legume family (Cannon et al., 2015; Koenen et al., 2020b), including one that affected subfamily Caesalpinioideae, in which the mimosoid clade is embedded. This WGD has most likely contributed to the number of genes for which multiple copies were found in our study, raising the possibility that paralogy could have contributed to conflicting topologies among gene trees. However, given that paleopolyploidy most likely occurred before the crown group divergence of Caesalpinioideae (Koenen et al., 2020b), and the fact that the mimosoid crown group diverged substantially later than that, paralog copies derived from this WGD are expected to have sufficiently divergent sequences across mimosoids for correct separation of paralogous sequences into separate alignments using the robust orthology assessment pipeline of Yang and Smith (2014) for most of these genes. Paleopolyploidy is also known to have occurred within the mimosoid clade (e.g., Leucaena, Govindarajulu et al., 2011; Mimosa, Dahmer et al., 2011), but chromosome count data suggest that these events were restricted to a few genera (Santos et al., 2012) and that polyploidy is most likely rare in the ingoid clade. Furthermore, this also suggests that a WGD event shared by a larger clade within mimosoids can probably be ruled out. Because the Yang and Smith (2014) pipeline also efficiently removes paralogous copies found within single accessions to which a WGD is restricted, the relationships inferred here are also unlikely to be affected by WGD events within the mimosoid clade. Conceptually, a hard polytomy may seem problematic, because it may appear unlikely that multiple populations would become instantaneously and simultaneously isolated from each other. However, several processes could explain a hard polytomy. First, the scenario of a paraphyletic “mother” species (Naciri and Linder, 2015) would most likely mean that population‐level processes would mitigate against inferring a branching order. Additionally, or alternatively, when the spread of “daughter” populations outpaces the rate at which genetic mutations become fixed within these populations, the result would also be a lack of phylogenetic signal across nearly all loci. These scenarios are especially likely when a species rapidly expands its range, followed by isolation and differentiation of subpopulations. Given that several extant mimosoid species are widespread and among the world’s most notorious invasive plants, this hypothesis could provide a possible explanation for the ingoid polytomy.

Implications for the taxonomic classification of mimosoids

In this study, we advance our understanding of the evolutionary relationships among mimosoid legumes, in particular for the ingoid clade, moving forward from a soft polytomy that included almost all of the ~43 ingoid genera, to identify a potentially hard polytomy that involves six or seven highly supported monophyletic lineages. These lineages provide a robust framework for recognizing a set of informally named clades, replacing the previously defined informal groups and alliances, most of which are now shown to be non‐monophyletic (Fig. 6). This framework provides the first step toward a new tribal (Linnean) and clade‐based (Phylocode) classification of mimosoids and the wider Caesalpinioideae. Achieving this will require expanded taxon sampling of all potentially non‐monophyletic and missing genera within mimosoids, as well as wider sampling of genera across subfamily Caesalpinioideae as a whole, sampling that is currently being undertaken using the gene set employed here (J. J. Ringelberg, E. J. M. Koenen, et al., unpublished data). However, it is already clear that establishing a Linnean classification of tribes within mimosoids would require recognition of a large number of monogeneric tribes because of the strong imbalance across the generic backbone phylogeny (Figs. 5 and 6), which does not serve the purpose of hierarchical rank‐based classification. Therefore, recognition of the mimosoid clade as a single tribe, Mimoseae (the oldest tribal name; see Polhill and Raven, 1981), within Caesalpinioideae is more fit‐for‐purpose, complemented with a Phylocode classification to formally name and describe clades within mimosoids along the lines informally outlined here (Appendix 2), once they are better characterized with denser taxon sampling. The absence of a fully bifurcating topology for the ingoid clade could have led to incomplete sorting of morphological characters across the clade and the consequent difficulties associated with delimiting genera, the discordant generic systems of different authors (reviewed by Brown, 2008), and the non‐monophyly of previous generic groupings (e.g., Barneby and Grimes, 1996), which were entirely morphologically based. For example, lomentaceous fruits that break up into one‐seeded articles occur in at least six different lineages scattered across the Albizia, Inga, and Jupunba clades plus Cedrelinga cateniformis (Barneby and Grimes, 1996, 1997; E. J. M. Koenen, personal observation). Dimorphic capitate inflorescences with an enlarged central nectar‐producing flower are similarly phylogenetically scattered across genera in the Albizia, Jupunba, and Samanea clades and in Blanchetiodendron blanchetii and Calliandra (Barneby and Grimes, 1996, 1997; Barneby, 1998; E. J. M. Koenen, personal observation). While reconstructing the evolution of pollination and seed dispersal syndromes across the ingoid clade would undoubtedly be illuminating in this regard, it remains unclear to what extent this will be possible in the face of lack of phylogenetic resolution. At generic level within the mimosoids, it has been clear for some time that despite significant progress, further generic re‐delimitation is needed to account for the non‐monophyly of several genera (Luckow et al., 2003; Brown et al., 2008; Iganci et al., 2016; Ferm et al., 2019; É.  R. de Souza et al., unpublished data). Our results add to this tally of non‐monophyletic mimosoid genera. For example, while the non‐monophyly of Albizia has long been suspected, we demonstrate robust support for two separate main evolutionary lineages currently ascribed to the genus: Albizia s.s., which includes species from Africa, Madagascar, and Asia; and the Neotropical Albizia sect. Arthrosamanea. Nielsen (1992:143) considered Cathormion to be a monotypic genus restricted to Asia, preferring to assign the African and American species to Albizia. Barneby and Grimes (1996) subsequently referred the American species of Cathormion to Albizia, Chloroleucon, and Hydrochorea. Lewis et al. (2005) followed Nielsen (1992) and Barneby and Grimes (1996), but the inclusion of the African species of Cathormion in Albizia has not been universally accepted, with some of these being referred to Samanea (e.g., Hawthorne and Jongkind, 2006). We show here that Cathormion should be considered a synonym of Albizia, as its type species C. umbellatum is nested within that genus (Figs. 5 and 6), and hence we make the new combination Albizia umbellata (Vahl) E.J.M. Koenen comb.nov.(see Appendix 2). However, the African species previously referred to Cathormion are not included in Albizia: A. altissima (syn. Cathromion altissimum) and A. dinklagei (syn. Samanea dinklagei and Cathormion dinklagei) are here resolved as sister taxa within the Inga clade and will need to be ascribed to a new genus. Furthermore, Albizia obliquifoliolata (syn. Cathormion obliquifoliolatum) appears to be most closely related to the Neotropical genus Hydrochorea in the Jupunba clade. Our results also show that Balizia is non‐monophyletic with respect to A. obliquifoliolata and Hydrochorea, providing further evidence that the genera of the Abarema alliance of Barneby and Grimes need to be re‐delimited (Iganci et al., 2016). Finally, the non‐monophyly of Senegalia (beyond the recent segregation of Parasenegalia and Pseudosenegalia; Miller et al., 2017; Seigler et al., 2017) identified in all nuclear data analyses here (Figs. 5 and 6; with 100% BS or 1.00 pp) is unexpected, given that Boatwright et al. (2015) showed Malagasy species of Senegalia grouping with the rest of the genus based on three chloroplast regions. Notably, in our chloroplast phylogeny, the two species of Senegalia form a sister pair (Appendix S4; 100% BS), suggesting that the evolutionary history of chloroplast genomes of Senegalia conflicts with the nuclear‐based species tree due to ILS or introgression (e.g., chloroplast capture or hybridization). This probable non‐monophyly of Senegalia, with the two species sampled potentially representing the two main clades recovered for Senegalia (Kyalangalilwa et al., 2013; Boatwright et al., 2015; Terra et al., 2017), would imply that yet another segregate genus of Acacia s.l. may need to be erected to accommodate a large subset of species currently placed in Senegalia.

Outlook for Caesalpinioideae phylogenomics

In this study, we developed a gene set for targeted enrichment via hybrid capture in the mimosoid clade. The resulting bait design (available at https://github.com/erikkoenen/mimobaits/) can be used for phylogenomic studies across mimosoids and beyond. Further work in our lab has shown the utility of this gene set across the whole of Caesalpinioideae (J. J. Ringelberg and E. J. M. Koenen et al., unpublished data) and at species level within the genus Albizia (E. J. M. Koenen et al., unpublished data). As taxon sampling is increased across the Caesalpinioideae and more studies are carried out using this gene set on individual genera, eventually a large and densely sampled phylogeny for the subfamily can be inferred and used for taxonomic reclassification and to study the evolution of this prominent tropical woody plant clade.

FUNDING INFORMATION

This work was supported by the Swiss National Science Foundation (grants 31003A_135522 and 31003A_182453 to C.E.H.), the U.K. Natural Environment Research Council (Grant NE/I027797/1 to R.T.P.), the Claraz Schenkung Foundation, and the Department of Systematic and Evolutionary Botany, University of Zurich. Field trips in Brazil were partially funded by FAPESB (PTX0004 and APP0096 to L.P.d.Q.), CNPq (480530/2012‐2 and PROTAX 440487 to J.R.I and L.P.d.Q.), and Energia Sustentável do Brasil. The use of DNA from Brazilian plant species is authorized by SISGEN n° R4CAAB3 and n° R0AAA9E.

AUTHOR CONTRIBUTIONS

E.J.M.K., C.K., R.T.P., and C.E.H. designed the study. E.J.M.K. and C.K. carried out the targeted gene selection, E.J.M.K. did the labwork, analyses, and wrote the draft manuscript. M.F.S., L.P.d.Q., M.L., and G.P.L. contributed tissue samples for sequencing, C.K., J.A.N., and R.T.P. contributed data. E.J.M.K., E.R.d.S., M.F.S., J.R.I., L.P.d.Q., M.L., and C.E.H. carried out the fieldwork. All coauthors contributed to interpretation of the results and writing of the final version of the manuscript. APPENDIX S1. Complete ORFs for the 964 target genes used for bait design are included in the file Albizia_target_ORFs.fa, with sequences derived from the transcriptome of Albizia julibrissin. Click here for additional data file. APPENDIX S2. ML tree of the concatenated amino acid alignment of the 510 gene alignments with more than half of the accessions present, inferred with the LG4X model. Click here for additional data file. APPENDIX S3. ML tree of the concatenated nucleotide alignment of the 510 gene alignments with more than half of the accessions present, inferred with the GTRCAT model. Click here for additional data file. APPENDIX S4. ML phylogeny of 72 protein coding genes from the chloroplast genome inferred with the GTRCAT model. Click here for additional data file. APPENDIX S5. Tanglegram comparing the ASTRAL topology with the chloroplast ML topolgy. Click here for additional data file. APPENDIX S6. The concatenated alignment of all gene alignments with <0.01 root‐to‐tip length variance. Click here for additional data file. APPENDIX S7. The concatenated alignment of all gene alignments with <0.01 root‐to‐tip length variance and more than half the taxa present. Click here for additional data file. APPENDIX S8. The concatenated amino acid alignment of all gene alignments with <0.01 root‐to‐tip length variance and more than half the taxa present. Click here for additional data file. APPENDIX S9. The 1915 gene alignments, with sequences both <300 bp long and <50% of the aligned length removed, from which final gene trees were inferred. Click here for additional data file. APPENDIX S10. The 1915 gene trees inferred from the alignments of Appendix S9, with bootstrap support indicated. Click here for additional data file. APPENDIX S11. The concatenated alignment of chloroplast genes. Click here for additional data file. APPENDIX S12. ML topology of concatenated alignment of 1767 gene alignments, with ICA values indicated as branch labels. Click here for additional data file. APPENDIX S13. ML topology of the concatenated alignment of the 510 gene alignments with more than half of the accessions present, with number of concordant and conflicting gene trees from the same set of 510 alignments written above and below internodes, respectively. Click here for additional data file. APPENDIX S14. ASTRAL tree with polytomy test results indicated. Click here for additional data file. APPENDIX S15. ML tree of the concatenated nucleotide alignment of the 510 gene alignments with more than half of the accessions present, but with Cedrelinga cateniformis removed, inferred with the GTRCAT model. Click here for additional data file. APPENDIX S16. ML tree of the concatenated nucleotide alignment of the 510 gene alignments with more than half of the accessions present, but with Cedrelinga cateniformis and Pseudosamanea guachapele removed, inferred with the GTRCAT model. Click here for additional data file. APPENDIX S17. ML tree of the concatenated nucleotide alignment of the 510 gene alignments with more than half of the accessions present, but with Cedrelinga cateniformis, Pseudosamanea guachapele and the Samanea clade removed, inferred with the GTRCAT model. Click here for additional data file.

Taxon

Voucher

ENA accession

number

Total number

of reads

Reads on target

No. of targets

recovered

No. of gene

alignments

Abarema cochliacarpos (Gomes) Barneby & J.W.GrimesL.P. de Queiroz 15538 (HUEFS)ERS48128382377977419277048 (81.06%)940 (97.51%)1078 (56.29%)
Acacia longifolia (Andrews) Willd,E. Koenen 182 (Z)ERS48128404142738288693 (6.97%)830 (86.10%)520 (27.15%)
Acaciella villosa (Sw.) Britton & RoseC.E. Hughes 2635 (FHO)ERS48128413393924129207 (3.81%)644 (66.80%)274 (14.31%)
Adenanthera pavonina L.Ambriansyah & Arifin AA295 (K)ERS48128424812194251953 (5.24%)879 (91.18%)606 (31.64%)
Adenopodia patens (Hook. & Arn.) J.R.Dixon ex BrenanSandoval MS343 (K)ERS48128437863140791520 (10.07%)919 (95.33%)769 (40.16%)
Adenopodia scelerata (A. Chev.) BrenanC. Jongkind 10602 (WAG)ERS4812844100943601357624 (13.45%)950 (98.55%)902 (47.10%)
Alantsilodendron pilosum VilliersE. Koenen 203 (Z)ERS48128453468930364779 (10.52%)933 (96.78%)806 (42.09%)
Albizia adianthifolia (Schumach.) W.WightJ.J. Wieringa 6278 (WAG)ERS4812846101288801151955 (11.37%)945 (98.03%)976 (50.97%)
Albizia altissima Hook.f.C. Jongkind 10709 (WAG)ERS48128472362256618312807 (77.52%)943 (97.82%)1097 (57.28%)
Albizia anthelmintica Brongn.O. Maurin 0363 (JRAU)ERS48128485576920754863 (13.54%)934 (96.89%)894 (46.68%)
Albizia atakataka CapuronE. Koenen 229 (Z)ERS48128494760887433141289 (69.61%)940 (97.51%)1010 (52.74%)
Albizia aurisparsa (Drake) R.Vig.E. Koenen 230 (Z)ERS4812850158160782127848 (13.45%)952 (98.76%)1047 (54.67%)
Albizia bernieri E. Fourn. ex VilliersE. Koenen 354 (Z)ERS48128513752342331728 (8.84%)898 (93.15%)692 (36.14%)
Albizia boivinii E. Fourn.E. Koenen 270 (Z)ERS48128523677634453433 (12.33%)925 (95.95%)882 (46.06%)
Albizia brevifolia SchinzO. Maurin 0826 (JRAU)ERS48128532819456327264 (11.61%)878 (91.08%)678 (35.40%)
Albizia burkartiana Barneby & J.W.GrimesStival‐Santos 678 (RB)ERS48128546094776465465 (7.64%)927 (96.16%)882 (46.06%)
Albizia dinklagei HarmsC. Jongkind 7359 (WAG)ERS48128552074148144269 (6.96%)912 (94.61%)802 (41.88%)
Albizia edwallii (Hoehne) Barneby & J.W.GrimesDalmaso 272 (RB)ERS48128563339160359353 (10.76%)931 (96.58%)947 (49.45%)
Albizia ferruginea (Guill. & Perr.) Benth.C. Jongkind 10762 (WAG)ERS481285767229921035687 (15.41%)936 (97.10%)968 (50.55%)
Albizia grandibracteata Taub.E. Koenen 159 (WAG)ERS48128583810955227944549 (73.33%)946 (98.13%)950 (49.61%)
Albizia inundata (Mart.) Barneby & J.W.GrimesJ.R.I. Wood 26530 (K)ERS48128593577549225641806 (71.67%)942 (97.72%)965 (50.39%)
Albizia mahalao CapuronE. Koenen 216 (Z)ERS48128607027142455605048 (79.13%)946 (98.13%)906 (47.31%)
Albizia masikororum R.Vig.E. Koenen 237 (Z)ERS4812861125996761562117 (12.40%)953 (98.86%)1024 (53.47%)
Albizia obbiadensis (Chiov.) BrenanThulin 4163 (UPS)ERS48128625614760735383 (13.10%)940 (97.51%)937 (48.93%)
Albizia obliquifoliolata De Wild.J.J. Wieringa 6519 (WAG)ERS48128631330321810816943 (81.31%)941 (97.61%)1047 (54.67%)
Albizia polyphylla E.Fourn.E. Koenen 256 (Z)ERS48128643215066434008 (13.50%)932 (96.68%)843 (44.02%)
Albizia retusa Benth.Hyland 2732 (L)ERS4812865119963681476589 (12.31%)948 (98.34%)1004 (52.43%)
Albizia sahafariensis CapuronE. Koenen 405 (Z)ERS4812866129948461600201 (12.31%)945 (98.03%)1011 (52.79%)
Albizia saponaria (Lour.) BlumeJobson 1041 (BH)ERS48128673920219027805263 (70.93%)944 (97.93%)998 (52.11%)
Albizia versicolor Welw. ex Oliv,O. Maurin 560 (JRAU)ERS48128686654725853198730 (79.94%)945 (98.03%)1045 (54.57%)
Albizia viridis E.Fourn.Du Puy M251 (K)ERS48128697260284870430 (11.99%)934 (96.89%)988 (51.59%)
Albizia zygia (DC.) J.F.Macbr.J.J. Wieringa 5915 (WAG)ERS48128708003478793032 (9.91%)941 (97.61%)977 (51.02%)
Amblygonocarpus andongensis (Welw. ex Oliv.) Exell & TorreSokpon 1451 (WAG)ERS48128715307456263884 (4.97%)843 (87.45%)569 (29.71%)
Anadenanthera colubrina (Vell.) BrenanL.P. de Queiroz 15685 (HUEFS)ERS48128724286504491557 (11.47%)929 (96.37%)841 (43.92%)
Archidendron lucidum (Benth.) I.C.NielsenWang and Lin 2534 (L)ERS48128736285326658012 (10.47%)939 (97.41%)972 (50.76%)
Archidendron quocense (Pierre) I.C.NielsenNewman 2094 (E)ERS48128745544688841045031 (74.03%)947 (98.24%)972 (50.76%)
Archidendropsis granulosa (Labill.) I.C.NielsenMcKee 38353 (L)ERS4812875131501381492706 (11.35%)947 (98.24%)1047 (54.67%)
Aubrevillea kerstingii (Harms) Pellegr.Nimba Botanic Team JR957 (WAG)ERS48128766327042343767 (5.43%)936 (97.10%)770 (40.21%)
Balizia pedicellaris (DC.) Barneby & J.W.GrimesL.P. de Queiroz 15529 (HUEFS)ERS48128772819386222668050 (80.40%)941 (97.61%)1104 (57.65%)
Balizia sp.nov.M.P. Morim 577 (RB)ERS48128782123964416903890 (79.59%)936 (97.10%)1071 (55.93%)
Blanchetiodendron blanchetii (Benth.) Barneby & J.W.GrimesL.P. de Queiroz 15616 (HUEFS)ERS48128796639992780827 (11.76%)936 (97.10%)965 (50.39%)
Calliandra hygrophila Mackinder & G.P.LewisL.P. de Queiroz 15542 (HUEFS)ERS48128804127232483827 (11.72%)910 (94.40%)732 (38.22%)
Calpocalyx dinklagei HarmsJ.J. Wieringa 6094 (WAG)ERS481288111391816614443 (5.39%)929 (96.37%)671 (35.04%)
Cathormion umbellatum Kosterm.Jobson 1037 (BH)ERS48128822612988820718828 (79.29%)944 (97.93%)1118 (58.38%)
Cedrelinga cateniformis (Ducke) DuckeT.D. Pennington 17761 (K)ERS48128834070738406653 (9.99%)919 (95.33%)803 (41.93%)
Chidlowia sanguinea HoyleJ.J. Wieringa 4338 (WAG)ERS48128849263792438049 (4.73%)888 (92.12%)584 (30.50%)
Chloroleucon tenuiflorum (Benth.) Barneby & J.W.GrimesL.P. de Queiroz 15514 (HUEFS)ERS48128857301118779106 (10.67%)945 (98.03%)1031 (53.84%)
Cojoba arborea (L.) Britton & RoseM.F. Simon 1545 (CEN)ERS481288699489721062718 (10.68%)954 (98.96%)1095 (57.18%)
Cylicodiscus gabunensis HarmsM. Sosef 645A (WAG)ERS48128876792968649666 (9.56%)951 (98.65%)943 (49.24%)
Desmanthus leptophyllus KunthC.E. Hughes 2035 (FHO)ERS48128884816620392291 (8.14%)923 (95.75%)816 (42.61%)
Dichrostachys cinerea (L.) Wight & Arn.O. Maurin 256 (JRAU)ERS48128894876856416124 (8.53%)935 (96.99%)822 (42.92%)
Dimorphandra macrostachya Benth.J.R. Iganci 877 (RB)ERS48128906731034248839 (3.70%)935 (96.99%)689 (35.98%)
Diptychandra aurantiaca Tul.J.R.I. Wood 26513 (K)ERS48128918520962117138 (1.37%)881 (91.39%)400 (20.89%)
Ebenopsis confinis (Standl.) Britton & RoseC.E. Hughes 1539 (FHO)ERS48128925779758654578 (11.33%)936 (97.10%)927 (48.41%)
Elephantorrhiza elephantina (Burch.) SkeelsKMS198 (JRAU)ERS48128937379446717080 (9.72%)946 (98.13%)765 (39.95%)
Entada rheedei Spreng.E. Koenen 496 (Z)ERS48128948695656531548 (6.11%)948 (98.34%)661 (34.52%)
Enterolobium contortisiliquum (Vell.) MorongL.P. de Queiroz 15579 (HUEFS)ERS48128952729658240130 (8.80%)919 (95.33%)868 (45.33%)
Erythrophleum ivorense A.Chev.J.J. Wieringa 5487 (WAG)ERS481289611500640485354 (4.22%)947 (98.24%)719 (37.55%)
Faidherbia albida (Delile) A.Chev.O. Maurin 3495 (JRAU)ERS48128976376338734941 (11.53%)945 (98.03%)946 (49.40%)
Falcataria moluccana (Miq.) Barneby & J.W.GrimesAmbri & Arifin W826A (K)ERS48128987669018815087 (10.63%)946 (98.13%)991 (51.75%)
Fillaeopsis discophora HarmsJ.J. Wieringa 5498 (WAG)ERS48128992259316111269 (4.92%)816 (84.65%)597 (31.17%)
Havardia pallens (Benth.) Britton & RoseC.E. Hughes 2138 (FHO)ERS48129006521266726457 (11.14%)943 (97.82%)1056 (55.14%)
Hesperalbizia occidentalis (Brandegee) Barneby & J.W.GrimesC.E. Hughes 1296 (FHO)ERS48129015403622788809 (14.60%)947 (98.24%)1032 (53.89%)
Hydrochorea corymbosa (Rich.) Barneby & J.W.Grimes [1]F. Bonadeu 655 (RB)ERS48129023964509027356455 (69.00%)943 (97.82%)1028 (53.68%)
Hydrochorea corymbosa (Rich.) Barneby & J.W.Grimes [2]J.R. Iganci 862 (RB)ERS48129031990909015987983 (80.30%)944 (97.93%)1071 (55.93%)
Inga alba (Sw.) Willd.P.D. Coley & T.A. Kursar TAKPDC1677 (UT)ERR77684416588801363817 (82.21%)942 (97.72%)1062 (55.46%)
Inga edulis Mart,P.D. Coley & T.A. Kursar TAKPDC1719 (UT)ERR77683816174101324567 (81.89%)934 (96.89%)1076 (56.19%)
Inga huberi DuckeP.D. Coley & T.A. Kursar TAKPDC1755 (UT)ERR77681015552081291086 (83.02%)937 (97.20%)1085 (56.66%)
Inga laurina (Sw.) Willd.K.G. Dexter 398 (E)ERR77681616121101374610 (85.27%)944 (97.93%)1053 (54.99%)
Inga stipularis DC.P.D. Coley & T.A. Kursar TAKPDC1856 (UT)ERR77682116922901393432 (82.34%)940 (97.51%)1055 (55.09%)
Inga tenuistipula DuckeK.G. Dexter 110 (E)ERR77683113880021125394 (81.08%)938 (97.30%)1077 (56.24%)
Jupunba trapezifolia Moldenke M.F. Simon 1600 (CEN)ERS48128391635708413117719 (80.20%)945 (98.03%)1042 (54.41%)
Kanaloa kahoolawensis Lorence & K.R.WoodLorence 7380 (PTBG)ERS4812904122220021915460 (15.67%)956 (99.17%)933 (48.72%)
Lachesiodendron viridiflorum (Kunth) P.G.Ribeiro, L.P.Queiroz & LuckowL.P. de Queiroz 15614 (HUEFS)ERS4812905186328522381616 (12.78%)957 (99.27%)973 (50.81%)
Lemurodendron capuronii Villiers & P.GuinetE. Koenen 435 (Z)ERS48129067108042881933 (12.41%)947 (98.24%)1000 (52.22%)
Leucochloron bolivianum C.E. Hughes & AtahuachiC.E. Hughes 2608 (FHO)ERS481290779464341218355 (15.33%)950 (98.55%)1046 (54.62%)
Leucochloron limae Barneby & J.W.GrimesMWC8250 (K)ERS48129087767490965594 (12.43%)949 (98.44%)1078 (56.29%)
Lysiloma candidum BrandegeeB. Marazzi 300 (ASU)ERS48129092030974102461 (5.04%)753 (78.11%)428 (22.35%)
Macrosamanea amplissima (Ducke) Barneby & J.W.GrimesBonadeu 663 (RB)ERS48129102360238217690 (9.22%)920 (95.44%)824 (43.03%)
Mariosousa sericea (M.Martens & Galeotti) Seigler & EbingerMWC18949 (K)ERS481291181603161450135 (17.77%)951 (98.65%)1011 (52.79%)
Mimosa grandidieri Baill.E. Koenen 207 (Z)ERS48129127792272717042 (9.20%)951 (98.65%)795 (41.51%)
Mimosa tenuiflora (Willd.) Poir.L.P. de Queiroz 15498 (HUEFS)ERS48129136210710475738 (7.66%)944 (97.93%)799 (41.72%)
Mimozyganthus carinatus (Griseb.) BurkartC.E. Hughes 2476 (FHO)ERS48129148148502817441 (10.03%)943 (97.82%)944 (49.30%)
Neptunia oleracea Lour.E. Koenen 283 (Z)ERS4812915108366801176757 (10.86%)945 (98.03%)861 (44.96%)
Newtonia hildebrandtii (Vatke) TorreO. Maurin 2457 (JRAU)ERS48129168663120826146 (9.54%)948 (98.34%)914 (47.73%)
Pachyelasma tessmannii (Harms) HarmsJ.J. Wieringa 5229 (WAG)ERS481291711845384793886 (6.70%)954 (98.96%)766 (40.00%)
Parapiptadenia zehntneri (Harms) M.P.Lima & H.C.LimaL.P. de Queiroz 15692 (HUEFS)ERS48129184446508378119 (8.50%)932 (96.68%)939 (49.03%)
Pararchidendron pruinosum (Benth.) I.C.NielsenJobson 1039 (BH)ERS48129197647352738506 (9.66%)952 (98.76%)1052 (54.93%)
Paraserianthes lophantha (Willd.) I.C.NielsenM. van Slageren & R. Newton MSRN648 (K)ERS48129206378910751573 (11.78%)950 (98.55%)1048 (54.73%)
Parkia panurensis Benth. ex H.C.HopkinsJ.R. Iganci 842 (RB)ERS48129212640302231835 (8.78%)907 (94.09%)814 (42.51%)
Peltophorum africanum Sond.E. Koenen 601 (Z)ERS48129222910944145733 (5.01%)717 (74.38%)367 (19.16%)
Pentaclethra macrophylla Benth.Galeuchet & Balthazar 10 (Z)ERS481292318158278776900 (4.28%)949 (98.44%)734 (38.33%)
Piptadenia robusta PittierM. Luckow 4633 (BH)ERS48129243486554371485 (10.65%)938 (97.30%)898 (46.89%)
Piptadeniastrum africanum (Hook.f.) BrenanE. Koenen 152 (WAG)ERS48129258894316514787 (5.79%)948 (98.34%)741 (38.69%)
Piptadeniopsis lomentifera BurkartM. Luckow 4505 (BH)ERS48129266399676826642 (12.92%)947 (98.24%)926 (48.36%)
Pithecellobium dulce (Roxb.) Benth.B. Marazzi 309 (ASU)ERS48129276485068881345 (13.59%)954 (98.96%)1061 (55.40%)
Pityrocarpa moniliformis (Benth.) Luckow & R.W. JobsonJ.R.I. Wood 26516 (K)ERS48129286003692449263 (7.48%)938 (97.30%)951 (49.66%)
Plathymenia reticulata Benth.L.P. de Queiroz 15688 (HUEFS)ERS48129292477330209417 (8.45%)920 (95.44%)757 (39.53%)
Prosopidastrum globosum (Gillies ex Hook. & Arn.) BurkartM. Luckow sn (BH)ERS48129306211352691922 (11.14%)946 (98.13%)924 (48.25%)
Prosopis africana (Guill. & Perr.) Taub.Essou 2110 (WAG)ERS4812931114592521374601 (12.00%)953 (98.86%)860 (44.91%)
Prosopis laevigata (Humb. & Bonpl. ex Willd.) M.C.Johnst.C.E. Hughes 2058 (FHO)ERS48129323353428246735 (7.36%)920 (95.44%)843 (44.02%)
Pseudopiptadenia contorta (DC.) G.P.Lewis & M.P.LimaL.P. de Queiroz 15582 (HUEFS)ERS48129337625306719618 (9.44%)949 (98.44%)1009 (52.69%)
Pseudoprosopis gilletii (De Wild.) VilliersJ.J. Wieringa 6021 (WAG)ERS48129345958100264874 (4.45%)931 (96.58%)700 (36.55%)
Pseudosamanea guachapele (Kunth) HarmsC.E. Hughes 1198 (FHO)ERS481293573968241018670 (13.77%)944 (97.93%)1015 (53.00%)
Samanea saman (Jacq.) Merr.C.E. Hughes 421 (FHO)ERS48129363344450515562 (15.42%)943 (97.82%)1027 (53.63%)
Schleinitzia novoguineensis (Warb.) Verdc.Chaplin 57/84ERS4812937165657322799712 (16.90%)956 (99.17%)881 (46.01%)
Senegalia ataxacantha (DC.) Kyal. & Boatwr.C. Jongkind 10603 (WAG)ERS4812938119877641423870 (11.88%)951 (98.65%)941 (49.14%)
Senegalia sakalava (Drake) Boatwr.E. Koenen 215 (Z)ERS481293912102414990240 (8.18%)946 (98.13%)863 (45.07%)
Serianthes nelsonii Merr.P. Moore 1241 (L)ERS48129407283252597846 (8.21%)943 (97.82%)1038 (54.20%)
Sphinga acatlensis (Benth.) Barneby & J.W.GrimesC.E. Hughes 2112 (FHO)ERS481294192389961054313 (11.41%)950 (98.55%)1035 (54.05%)
Stryphnodendron pulcherrimum (Willd.) Hochr.L.P. de Queiroz 15482 (HUEFS)ERS4812942118521181440760 (12.16%)954 (98.96%)1007 (52.58%)
Tachigali odoratissima (Spruce ex Benth.) Zarucchi & Herend.M.P. Morim 562 (RB)ERS48129437900532193104 (2.44%)925 (95.95%)622 (32.48%)
Tetrapleura tetraptera (Schumach. & Thonn.) Taub.E. Koenen 155 (WAG)ERS48129444276206310727 (7.27%)933 (96.78%)707 (36.92%)
Vachellia tortilis (Forssk.) Galasso & BanfiE. Koenen 603 (Z)ERS48129455519408614954 (11.14%)930 (96.47%)830 (43.34%)
Vachellia viguieri (Villiers & Du Puy) Boatwr.E. Koenen 199 (Z)ERS48129464782514572917 (11.98%)945 (98.03%)902 (47.10%)
Viguieranthus glaber VilliersE. Koenen 325 (Z)ERS4812947105698061179380 (11.16%)950 (98.55%)1004 (52.43%)
Xylia hoffmannii (Vatke) DrakeE. Koenen 402 (Z)ERS48129487511352464326 (6.18%)944 (97.93%)713 (37.23%)
Zapoteca caracasana (Jacq.) H.M.Hern.C.E. Hughes 3071 (FHO)ERS48129495236294475921 (9.09%)912 (94.61%)608 (31.75%)
Zygia claviflora (Spruce ex Benth.) Barneby & J.W.GrimesJ.R. Iganci 841 (RB)ERS48129502422758195154 (8.06%)910 (94.40%)784 (40.94%)
Zygia inaequalis (Humb. & Bonpl. ex Willd.) PittierJ.R. Iganci 832 (RB)ERS48129515622222494386 (8.79%)931 (96.58%)899 (46.95%)
Zygia racemosa (Ducke) Barneby & J.W.GrimesM.F. Simon 1658 (CEN)ERS481295210769766850638 (7.90%)946 (98.13%)1012 (52.85%)
Zygia sp.P.D. Coley & T.A. Kursar Tip917 (UT)ERR77682413605021069298 (78.60%)938 (97.30%)1091 (56.97%)
  71 in total

1.  Genome-scale approaches to resolving incongruence in molecular phylogenies.

Authors:  Antonis Rokas; Barry L Williams; Nicole King; Sean B Carroll
Journal:  Nature       Date:  2003-10-23       Impact factor: 49.962

2.  The evolutionary history and biogeography of Mimosoideae (Leguminosae): an emphasis on African acacias.

Authors:  Yanis Bouchenak-Khelladi; Olivier Maurin; Johan Hurter; Michelle van der Bank
Journal:  Mol Phylogenet Evol       Date:  2010-08-07       Impact factor: 4.286

3.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

4.  SplitsTree: analyzing and visualizing evolutionary data.

Authors:  D H Huson
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

5.  PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment.

Authors:  Nicolas Lartillot; Nicolas Rodrigue; Daniel Stubbs; Jacques Richer
Journal:  Syst Biol       Date:  2013-04-05       Impact factor: 15.683

6.  Inferring ancient divergences requires genes with strong phylogenetic signals.

Authors:  Leonidas Salichos; Antonis Rokas
Journal:  Nature       Date:  2013-05-08       Impact factor: 49.962

7.  ILS-Aware Analysis of Low-Homoplasy Retroelement Insertions: Inference of Species Trees and Introgression Using Quartets.

Authors:  Mark S Springer; Erin K Molloy; Daniel B Sloan; Mark P Simmons; John Gatesy
Journal:  J Hered       Date:  2020-04-02       Impact factor: 2.645

8.  Phyx: phylogenetic tools for unix.

Authors:  Joseph W Brown; Joseph F Walker; Stephen A Smith
Journal:  Bioinformatics       Date:  2017-06-15       Impact factor: 6.937

9.  The Dynamics of Incomplete Lineage Sorting across the Ancient Adaptive Radiation of Neoavian Birds.

Authors:  Alexander Suh; Linnéa Smeds; Hans Ellegren
Journal:  PLoS Biol       Date:  2015-08-18       Impact factor: 8.029

10.  Revisiting the Zingiberales: using multiplexed exon capture to resolve ancient and recent phylogenetic splits in a charismatic plant lineage.

Authors:  Chodon Sass; William J D Iles; Craig F Barrett; Selena Y Smith; Chelsea D Specht
Journal:  PeerJ       Date:  2016-01-21       Impact factor: 2.984

View more
  4 in total

1.  Hybrid capture of 964 nuclear genes resolves evolutionary relationships in the mimosoid legumes and reveals the polytomous origins of a large pantropical radiation.

Authors:  Erik J M Koenen; Catherine Kidner; Élvia R de Souza; Marcelo F Simon; João R Iganci; James A Nicholls; Gillian K Brown; Luciano P de Queiroz; Melissa Luckow; Gwilym P Lewis; R Toby Pennington; Colin E Hughes
Journal:  Am J Bot       Date:  2020-11-30       Impact factor: 3.844

2.  Highly Resolved Papilionoid Legume Phylogeny Based on Plastid Phylogenomics.

Authors:  In-Su Choi; Domingos Cardoso; Luciano P de Queiroz; Haroldo C de Lima; Chaehee Lee; Tracey A Ruhlman; Robert K Jansen; Martin F Wojciechowski
Journal:  Front Plant Sci       Date:  2022-02-23       Impact factor: 5.753

3.  Accelerating the discovery of rare tree species in Amazonian forests: integrating long monitoring tree plot data with metabolomics and phylogenetics for the description of a new species in the hyperdiverse genus Inga Mill.

Authors:  Juan Ernesto Guevara Andino; Consuelo Hernández; Renato Valencia; Dale Forrister; María-José Endara
Journal:  PeerJ       Date:  2022-08-29       Impact factor: 3.061

4.  The innovation of the symbiosome has enhanced the evolutionary stability of nitrogen fixation in legumes.

Authors:  Sergio M de Faria; Jens J Ringelberg; Eduardo Gross; Erik J M Koenen; Domingos Cardoso; George K D Ametsitsi; John Akomatey; Marta Maluk; Nisha Tak; Hukam S Gehlot; Kathryn M Wright; Neung Teaumroong; Pongpan Songwattana; Haroldo C de Lima; Yves Prin; Charles E Zartman; Janet I Sprent; Julie Ardley; Colin E Hughes; Euan K James
Journal:  New Phytol       Date:  2022-07-28       Impact factor: 10.323

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.