| Literature DB >> 34336403 |
Carolina M Siniscalchi1,2, Oriane Hidalgo3,4, Luis Palazzesi5, Jaume Pellicer3,4, Lisa Pokorny3,6, Olivier Maurin3, Ilia J Leitch3, Felix Forest3, William J Baker3, Jennifer R Mandel2,7.
Abstract
PREMISE: Phylogenetic studies in the Compositae are challenging due to the sheer size of the family and the challenges they pose for molecular tools, ranging from the genomic impact of polyploid events to their very conserved plastid genomes. The search for better molecular tools for phylogenetic studies led to the development of the family-specific Compositae1061 probe set, as well as the universal Angiosperms353 probe set designed for all flowering plants. In this study, we evaluate the extent to which data generated using the family-specific kit and those obtained with the universal kit can be merged for downstream analyses.Entities:
Keywords: Asteraceae; angiosperms; paralogy; phylogenomics; target capture
Year: 2021 PMID: 34336403 PMCID: PMC8312747 DOI: 10.1002/aps3.11422
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
Summary of the four treatments used in the study.
| Treatment | Sequenced with | Assembled using the reference | No. of samples included | Recovered loci, average (range) | Percent of paralogous loci |
|---|---|---|---|---|---|
| A | Compositae1061 | Compositae1061 | 8 | 721 (3–1012) | 0–46% |
| B | Angiosperms353 | Angiosperms353 | 8 | 287 (242–323) | 0.6–13% |
| C | Compositae1061 | Angiosperms353 | 6 | 25 (29–38) | ca. 5% |
| D | Angiosperms353 | Compositae1061 | 8 | 35 (21–59) | 2–25% |
Loci shared across both probe sets. For cases where more than one species representative is included for the same gene, all loci that had hits were included.
| Angiosperms353 loci | Compositae1061 loci |
|---|---|
| Ambtr‐6412, NVSO‐6412 | sunf‐At3g23400, saff‐At3g23400 |
| Ambtr‐6447, IHPC‐6447 | saff‐At2g27290, sunf‐At2g27290 |
| Ambtr‐6462, EDXZ‐6462, LYPZ‐6462, NUZN‐6462 | sunf‐At2g27450, saff‐At2g27450, lett‐At2g27450 |
| AQZD‐5614, LVUS‐5614, XRCX‐5614 | sunf‐At3g03790, lett‐At3g03790 |
| AQZD‐5870 | sunf‐At2g15240 |
| Arath‐5477 | lett‐At1g14620 |
| Arath‐5840, EZZT‐5840, KDCH‐5840, NVSO‐5840 | lett‐At4g35250, saff‐At4g35250, sunf‐At4g35250 |
| Arath‐5857, BXBF‐5857, UYED‐5857 | lett‐At2g43030, sunf‐At2g43030, saff‐At2g43030 |
| AZBL‐5841, HYZL‐5841, QIKZ‐5841, SVVG‐5841, UMUL‐5841 | saff‐At1g20540, lett‐At1g20540, sunf‐At1g20540 |
| BEFC‐6449 | sunf‐At5g57860 |
| BIDT‐5562 | sunf‐At2g36930, lett‐At2g36930 |
| BIDT‐5910 | sunf‐At4g25080, lett‐At4g25080 |
| BIDT‐6733 | sunf‐At4g37020 |
| BIDT‐6946, UMUL‐6946, VUSY‐6946 | sunf‐At3g03100, saff‐At3g03100, lett‐At3g03100 |
| BIDT‐6954, IDAU‐6954 | sunf‐At1g50575 |
| DOVJ‐7371 | lett‐At4g13500 |
| DUNJ‐6498, HLJG‐6498, OXYP‐6498, RCUX‐6498 | sunf‐At1g55670, lett‐At1g55670 |
| EMBR‐5918, JYMN‐5918 | saff‐At2g31040, sunf‐At2g31040, lett‐At2g31040 |
| HOKG‐6458 | lett‐At1g04620 |
| JEPE‐4527 | sunf‐At1g09830 |
| JNKW‐6705 | lett‐At3g62810 |
| JYMN‐7141 | sunf‐At3g55250 |
| LSJW‐5933, MDJK‐5933 | sunf‐At3g63140, lett‐At3g63140 |
| NUZN‐6139 | lett‐At1g75330 |
| NVSO‐7194 | sunf‐At1g76450 |
| Orysa‐6038 | saff‐At4g32770 |
| QIKZ‐7367, WAIL‐7367, ZCUA‐7367 | sunf‐At2g03420 |
| QTJY‐6068, UYED‐6068 | sunf‐At3g05070, saff‐At3g05070, lett‐At3g05070 |
| VVPY‐6913, WYIG‐6913 | lett‐At5g23120 |
| XRCX‐5594 | lett‐At1g76080 |
FIGURE 1Basic assembly statistics. (A) Number of parsimony‐informative (PI) sites in relation to the alignment length. Circles represent data sequenced and assembled using the same probe set as a reference, while triangles represent an assembly using the other probe set as a reference. (B) Percentage of reads mapping to targets (recovered) in each treatment. Error bars represent the 25th and 75th percentiles.
Summary of assembly statistics.
| Treatment | Sample | Genes recovered | Genes flagged as paralogs | Genes not flagged as paralogs | Percentage of genes flagged as paralogs |
|---|---|---|---|---|---|
| A (data generated with Compositae1061 and assembled using Compositae1061 as the reference) |
| 1008 | 470 | 538 | 47% |
|
| 893 | 82 | 811 | 9% | |
|
| 3 | 0 | 3 | 0 | |
|
| 903 | 211 | 692 | 23% | |
|
| 1012 | 301 | 711 | 29% | |
|
| 951 | 248 | 703 | 26% | |
|
| 977 | 181 | 796 | 18% | |
|
| 23 | 0 | 23 | 0 | |
| B (data generated with Angiosperms353 and assembled using Angiosperms353 as the reference) |
| 315 | 41 | 274 | 13% |
|
| 314 | 2 | 312 | 0.6% | |
|
| 296 | 3 | 293 | 1% | |
|
| 242 | 5 | 237 | 2% | |
|
| 272 | 5 | 267 | 2% | |
|
| 275 | 6 | 269 | 2% | |
|
| 323 | 1 | 322 | 0.3% | |
|
| 261 | 11 | 250 | 4% | |
| C (data generated with Compositae1061 and assembled using Angiosperms353 as the reference) |
| 38 | 2 | 36 | 5% |
|
| 29 | 0 | 29 | 0 | |
|
| 0 | 0 | 0 | NA | |
|
| 31 | 0 | 31 | 0 | |
|
| 37 | 0 | 37 | 0 | |
|
| 35 | 2 | 33 | 5% | |
|
| 35 | 0 | 35 | 0 | |
|
| 0 | 0 | 0 | NA | |
| D (data generated with Angiosperms353 and assembled using Compositae1061 as the reference) |
| 59 | 15 | 44 | 25% |
|
| 34 | 0 | 34 | 0 | |
|
| 31 | 0 | 31 | 0 | |
|
| 21 | 0 | 21 | 0 | |
|
| 31 | 0 | 31 | 0 | |
|
| 30 | 2 | 28 | 6% | |
|
| 48 | 1 | 47 | 2% | |
|
| 28 | 1 | 27 | 3% |
FIGURE 2Phylogenies obtained using the different data sets and assembly strategies. Values on the nodes are local posterior probabilities obtained using ASTRAL‐III. (A) Data generated with Compositae1061 and assembled using Compositae1061 as the reference. (B) Data generated with Angiosperms353 and assembled using Angiosperms353 as the reference. (C) Data generated with Compositae1061 and assembled using Angiosperms353 as the reference. (D) Data generated with Angiosperms353 and assembled using Compositae1061 as the reference.
FIGURE 3Phylogenies obtained using the different data sets and assembly strategies after the removal of loci flagged as paralogs. Values on the nodes are local posterior probabilities obtained using ASTRAL‐III. (A) Data generated with Compositae1061 and assembled using Compositae1061 as the reference. (B) Data generated with Angiosperms353 and assembled using Angiosperms353 as the reference. (C) Data generated with Compositae1061 and assembled using Angiosperms353 as the reference. (D) Data generated with Angiosperms353 and assembled using Compositae1061 as the reference.
FIGURE 4Phylogenies combining all 16 samples. (A) All samples assembled using Compositae1061 as the reference. (B) All samples assembled using Angiosperms353 as the reference. The suffixed numbers refer to the probe set used.
FIGURE 5Phylogeny combining six samples sequenced with Compositae1061 (regular text) and two with Angiosperms353 (bold text), all assembled using Compositae1061 as the reference. Node values are local posterior probabilities.
FIGURE 6Gene tree discordance analysis. Pie charts represent the proportion of gene trees that support that specific node. Blue represents gene trees agreeing with the species tree, orange those that agree with the main alternative topology, red those that agree with all other topologies, and gray the proportion of uninformative trees. The numbers on the branches represent the number of concordant gene trees (top) and the number of conflicting trees (bottom). (A) Data generated with Compositae1061 and assembled using Compositae1061 as the reference. (B) Data generated with Angiosperms353 and assembled using Angiosperms353 as the reference. (C) Data generated with Compositae1061 and assembled using Angiosperms353 as the reference. (D) Data generated with Angiosperms353 and assembled using Compositae1061 as the reference.