| Literature DB >> 34934068 |
Yannick Woudstra1,2, Juan Viruel3, Martin Fritzsche4, Thomas Bleazard4, Ryan Mate4, Caroline Howard5, Nina Rønsted6,7, Olwen M Grace3.
Abstract
Plant molecular identification studies have, until recently, been limited to the use of highly conserved markers from plastid and other organellar genomes, compromising resolution in highly diverse plant clades. Due to their higher evolutionary rates and reduced paralogy, low-copy nuclear genes overcome this limitation but are difficult to sequence with conventional methods and require high-quality input DNA. Aloe vera and its relatives in the Alooideae clade (Asphodelaceae, subfamily Asphodeloideae) are of economic interest for food and health products and have horticultural value. However, pressing conservation issues are increasing the need for a molecular identification tool to regulate the trade. With > 600 species and an origin of ± 15 million years ago, this predominantly African succulent plant clade is a diverse and taxonomically complex group for which low-copy nuclear genes would be desirable for accurate species discrimination. Unfortunately, with an average genome size of 16.76 pg, obtaining high coverage sequencing data for these genes would be prohibitively costly and computationally demanding. We used newly generated transcriptome data to design a customised RNA-bait panel targeting 189 low-copy nuclear genes in Alooideae. We demonstrate its efficacy in obtaining high-coverage sequence data for the target loci on Illumina sequencing platforms, including degraded DNA samples from museum specimens, with considerably improved phylogenetic resolution. This customised target capture sequencing protocol has the potential to confidently indicate phylogenetic relationships of Aloe vera and related species, as well as aid molecular identification applications.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34934068 PMCID: PMC8692607 DOI: 10.1038/s41598-021-03300-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Target capture sequencing statistics per sample. Origin of sample is denoted with ‘S’ for silica-dried freshly harvested material, ‘H’ for herbarium specimen, ‘P’ for DNA extracts from samples used in previously published studies and ‘R’ for RNA from freshly harvested material. *: Sample sequenced in larger multiplex run on Illumina HiSeq platform as part of a separate study.
| Sample | Phytogeographic region | Origin of sample | Ultrasonication time (s) | Reads After trimming | Reads mapped | % reads on target | Total Assembled Target exon length | SLCN Loci with sequence | % Target length recovered |
|---|---|---|---|---|---|---|---|---|---|
| Tropical East Africa | S | 50 | 2,399,050 | 1,268,854 | 54.2 | 321,066 | 189 | 92.5 | |
| Madagascar | P | 50 | 2,416,438 | 1,285,339 | 53.1 | 329,211 | 188 | 94.8 | |
| Tropical East Africa | S | 50 | 1,602,898 | 773,633 | 48.3 | 334,326 | 188 | 96.3 | |
| Tropical East Africa | H | 50 | 2,789,581 | 142,523 | 4.5 | 149,709 | 168 | 43.1 | |
| Southern Africa | S | 50 | 1,498,657 | 805,082 | 53.7 | 323,970 | 188 | 93.3 | |
| Southern Africa | P | 50 | 3,640,964 | 2,043,250 | 56.1 | 327,033 | 189 | 94.2 | |
| Southern Africa (Namibia) | S | 50 | 3,876,389 | 2,318,089 | 59.8 | 323,070 | 188 | 93.1 | |
| Southern Africa | S | 50 | 2,368,852 | 1,263,235 | 53.3 | 327,576 | 189 | 94.4 | |
| Tropical East Africa | S | 50 | 2,788,225 | 1,454,756 | 52.2 | 327,432 | 189 | 94.3 | |
| Southern Africa | S | 50 | 2,542,514 | 1,380,373 | 54.3 | 322,503 | 187 | 92.9 | |
| South Tropical Africa | S | 50 | 2,358,342 | 1,209,120 | 51.3 | 321,351 | 189 | 92.6 | |
| Horn of Africa | P | 50 | 2,818,267 | 1,477,635 | 52.4 | 324,225 | 189 | 93.4 | |
| Tropical East Africa | P | 50 | 1,090,090 | 557,847 | 51.2 | 318,018 | 188 | 91.6 | |
| Tropical East Africa | S | 50 | 2,542,942 | 1,261,381 | 49.6 | 327,498 | 187 | 94.3 | |
| Horn of Africa | S | 50 | 1,512,167 | 723,207 | 47.8 | 322,302 | 189 | 92.8 | |
| South Tropical Africa | P | 50 | 3,514,483 | 1,922,219 | 54.7 | 329,751 | 189 | 95.0 | |
| Horn of Africa | S | 50 | 2,773,518 | 1,520,473 | 54.8 | 326,418 | 189 | 94.0 | |
| Horn of Africa | H | – | 3,696,487 | 1,978,408 | 53.5 | 323,346 | 187 | 93.1 | |
| Southern Africa | S | 50 | 5,184,554 | 3,133,167 | 60.4 | 331,827 | 189 | 95.6 | |
| South Tropical Africa | S | 50 | 2,324,265 | 1,145,238 | 49.7 | 329,352 | 188 | 94.9 | |
| Madagascar | P | 50 | 3,163,131 | 1,712,967 | 54.2 | 313,962 | 188 | 90.4 | |
| Madagascar | P | 50 | 2,560,543 | 1,434,442 | 56.0 | 330,045 | 189 | 95.1 | |
| Arabian Peninsula | S | 50 | 2,920,940 | 1,561,211 | 53.4 | 326,742 | 188 | 94.1 | |
| Southern Africa | F | 3,279,922 | 1,790,419 | 54.6 | 311,208 | 189 | 89.6 | ||
| – | P | 60 | 3,050,140 | 454,810 | 14.9 | 257,868 | 183 | 74.3 | |
| – | P | 50 | 10,785,948 | 4,880,245 | 45.2 | 250,695 | 183 | 72.2 | |
| – | P | 60 | 2,639,965 | 504,220 | 19.1 | 165,225 | 152 | 47.6 | |
| Southern Africa | R | – | – | – | – | 344,044 | 189 | – | |
| Western Africa | R | – | – | – | – | 349,657 | 189 | – | |
| Cultivation | R | – | – | – | – | 350,347 | 189 | – | |
| Southern Africa | R | – | – | – | – | 340,629 | 187 | – | |
| Average genus | 2,712,317 | 1,407,498 | 51.2 | 317,858 | 187 | 91.6 |
Figure 1Heatmap indicating gene recovery success per gene in each sample, scale colour indicates success rate.
Figure 2Cophylogeny (tanglegram) showing maximum-likelihood trees estimated with IQTree from 189 low-copy nuclear loci generated in this study (A) and from traditional markers (B). Pie charts indicate node support (black) calculated with bootstrap analysis (1000 replicates). Lines between the two phylogenies link tips belonging to the same taxon to indicate (dis)similarity between the topologies. Commercially used species are labelled in green in both topologies to highlight changes in relationships. For the taxa Xanthorrhoea and Hemerocallis only the genus name is indicated since different species were used in constructing the respective phylogenies (“Phylogenetic estimation and comparison” section for details).
Figure 3Phylogeny for Aloe, related genera and outgroups estimated with the coalescent-based ASTRAL-III algorithm from 188 maximum likelihood gene trees. Pie charts indicate node support (green) calculated as Local Posterior Probability by the ASTRAL software. Arrow indicates the node of the clade to which repetitive element of locus #2 is mostly restricted (“Target capture sequencing” section).