| Literature DB >> 33082403 |
Tin Hang Hung1, Thea So2, Syneath Sreng2, Bansa Thammavong3, Chaloun Boounithiphonh3, David H Boshier4, John J MacKay5.
Abstract
Dalbergia is a pantropical genus with more than 250 species, many of which are highly threatened due to overexploitation for their rosewood timber, along with general deforestation. Many Dalbergia species have received international attention for conservation, but the lack of genomic resources for Dalbergia hinders evolutionary studies and conservation applications, which are important for adaptive management. This study produced the first reference transcriptomes for 6 Dalbergia species with different geographical origins and predicted ~ 32 to 49 K unique genes. We showed the utility of these transcriptomes by phylogenomic analyses with other Fabaceae species, estimating the divergence time of extant Dalbergia species to ~ 14.78 MYA. We detected over-representation in 13 Pfam terms including HSP, ALDH and ubiquitin families in Dalbergia. We also compared the gene families of geographically co-occurring D. cochinchinensis and D. oliveri and observed that more genes underwent positive selection and there were more diverged disease resistance proteins in the more widely distributed D. oliveri, consistent with reports that it occupies a wider ecological niche and has higher genetic diversity. We anticipate that the reference transcriptomes will facilitate future population genomics and gene-environment association studies on Dalbergia, as well as contributing to the genomic database where plants, particularly threatened ones, are currently underrepresented.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33082403 PMCID: PMC7576600 DOI: 10.1038/s41598-020-74814-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Basic details and conservation status of the 6 Dalbergia species covered in this study.
| Scientific name | Common name | Native occurrence | Habitat | IUCN status | CITES status | References |
|---|---|---|---|---|---|---|
| Siamese rosewood | Cambodia, Lao PDR, Thailand, Vietnam | Terrestrial; open semi-deciduous forests | Vulnerable A1cd (1998) | II (2017) | [ | |
| Brazilian tulipwood | Columbia, Amazonia, Andes, Caribbean Plain, Magdalena Valley | Variable, usually as a liana | Unclassified | II (2017) | [ | |
| African blackwood | Wide geographical distribution in sub-Saharan countries | Range of woodland habitats | Near threatened (1998) | II (2017) | [ | |
| Jacaranda-do-cerrado | Brazil, Bolivia | Savannah | Unclassified | II (2017) | [ | |
| Burmese rosewood | Cambodia, Lao PDR, Myanmar, Thailand, Vietnam | Mixed deciduous forests and tropical evergreen | Endangered A1cd (1998) | II (2017) | [ | |
| North Indian rosewood; Shisham | Indian Subcontinent | Deciduous forests | Unclassified | II (2017) | [ |
Figure 1Bioinformatic pipeline of de novo transcriptome analysis and gene annotation for the 6 Dalbergia species. For the software details, see “Methods”.
Summary of transcriptome assembly statistics of the 6 Dalbergia species.
| Feature | ||||||
|---|---|---|---|---|---|---|
| Number of paired-end raw reads | 168,351,690 | 71,187,798 | 74,366,734 | 91,273,654 | 181,456,683 | 73,160,910 |
| Number of paired-end filtered reads | 156,116,637 (92.7%) | 65,092,217 (91.4%) | 67,994,105 (91.4%) | 83,178,635 (91.1%) | 169,551,748 (93.4%) | 67,086,967 (91.7%) |
| Number of transcripts in initial assembly | 277,981 | 274,663 | 363,116 | 208,249 | 376,014 | 195,268 |
| Number of genes in initial assembly | 161,051 | 179,085 | 212,141 | 123,962 | 223,289 | 121,629 |
| Total length of transcripts (bp) | 316,346,363 | 255,266,594 | 309,909,355 | 237,557,440 | 357,336,705 | 216,910,975 |
| Average transcript length (bp) | 1,138.01 | 929.38 | 853.47 | 1,140.74 | 950.33 | 1,110.84 |
| N501 (bp) | 2,159 | 1,749 | 1,477 | 2,074 | 1,851 | 2,019 |
| GC (%) | 40.25 | 41.88 | 41.38 | 40.60 | 40.59 | 41.06 |
| Map representation alignment rate (%) | 89.85 | 87.71 | 85.69 | 89.20 | 87.14 | 89.02 |
| Number of non-redundant transcripts | 224,511 | 231,281 | 271,088 | 174,382 | 293,334 | 168,039 |
| Number of transcripts in final assembly | 84,003 | 84,897 | 80,484 | 69,357 | 92,906 | 67,379 |
| Total length of transcripts (bp) | 81,157,122 | 75,431,325 | 70,467,927 | 68,915,367 | 83,501,667 | 67,138,149 |
| Average transcript length (bp) | 966.12 | 888.50 | 875.55 | 993.63 | 898.78 | 996.43 |
| N50 of transcripts (bp) | 1,254 | 1,152 | 1,149 | 1,290 | 1,179 | 1,305 |
| GC of transcripts (%) | 44.66 | 46.01 | 45.24 | 44.68 | 44.97 | 45.00 |
| Number of genes in final assembly | 34,655 | 48,591 | 43,848 | 31,678 | 43,879 | 32,753 |
| Total length of genes (bp) | 33,219,183 | 41,338,207 | 37,309,763 | 31,488,922 | 37,371,154 | 32,374,118 |
| Average gene length (bp) | 958.57 | 850.74 | 850.89 | 994.03 | 851.69 | 988.43 |
| N50 of genes (bp) | 1,341 | 1,145 | 1,173 | 1,383 | 1,182 | 1,374 |
| GC of genes (%) | 45.37 | 47.43 | 46.00 | 45.32 | 45.97 | 45.92 |
| BUSCO Score2 (N = 2,121) (%) | C: 92.2; F: 4.9; M: 2.9 | C: 92.1; F: 4.8; M: 3.1 | C: 92.3; F: 5.1; M: 2.6 | C: 93.1; F: 4.6; M: 2.3 | C: 90.9; F: 6.5; M: 2.6 | C: 94.4; F: 3.3; M: 2.3 |
1Sequence length of the shortest contig at 50% of the total transcriptome length.
2Results of BUSCO analysis; (%) per category: C: complete, F: fragmented, M: missing, N: number of BUSCOs tested in the OrthoDB v10 eudicot dataset.
Transcriptome annotation statistics of the 6 Dalbergia species. For the versions of annotation databases, see “Methods” for details.
| Number of transcripts in final assembly | 84,003 | 84,897 | 80,484 | 69,357 | 67,379 | |
| Araip 1.1 | 74,397 (88.6%) | 67,052 (79.0%) | 67,164 (83.5%) | 78,245 (84.2%) | 58,512 (86.8%) | |
| Araport 11 | 70,780 (84.3%) | 63,438 (74.7%) | 62,185 (77.3%) | 73,889 (79.5%) | 56,091 (83.2%) | |
| SwissProt | 63,175 (75.2%) | 61,062 (71.9%) | 56,193 (69.8%) | 67,064 (72.2%) | 51,051 (75.8%) | |
| GO | 61,993 (73.8%) | 60,005 (70.7%) | 55,022 (68.4%) | 65,740 (70.8%) | 50,008 (74.2%) | |
| KEGG | 55,538 (66.1%) | 52,709 (62.1%) | 48,603 (60.4%) | 57,896 (62.3%) | 45,190 (67.1%) | |
| EggNOG | 52,510 (62.5%) | 44,849 (52.8%) | 44,802 (55.7%) | 54,059 (58.2%) | 41,184 (61.1%) | |
| Pfam | 58,589 (69.7%) | 56,835 (66.9%) | 51,717 (64.3%) | 62,162 (66.9%) | 47,842 (71.0%) | |
| TMHMM | 15,424 (18.2%) | 14,864 (18.5%) | 14,338 (20.7%) | 18,359 (19.8%) | 13,671 (20.3%) | |
| SignalP | 5603 (6.7%) | 5214 (6.1%) | 4880 (6.1%) | 5896 (6.3%) | 4643 (6.9%) | |
| Number of genes in final assembly | 34,655 | 43,848 | 31,678 | 43,879 | 32,753 | |
| Araip 1.1 | 28,277 (81.6%) | 33,452 (68.8%) | 33,617 (76.7%) | 32,936 (75.1%) | 26,141 (79.8%) | |
| Araport 11 | 26,388 (76.1%) | 31,421 (64.7%) | 30,420 (69.4%) | 30,497 (69.5%) | 24,837 (75.8%) | |
| SwissProt | 24,175 (69.8%) | 32,281 (66.4%) | 28,022 (63.9%) | 28,658 (65.3%) | 23,396 (71.4%) | |
| GO | 23,686 (68.4%) | 31,733 (65.3%) | 27,471 (62.7%) | 28,116 (64.1%) | 22,926 (70.0%) | |
| KEGG | 20,603 (59.5%) | 27,102 (55.8%) | 23,609 (53.8%) | 19,606 (61.9%) | 23,810 (54.3%) | |
| EggNOG | 19,163 (55.3%) | 20,886 (43.0%) | 21,121 (48.2%) | 21,470 (48.9%) | 17,635 (53.8%) | |
| Pfam | 23,134 (66.8%) | 30,561 (62.9%) | 26,161 (59.7%) | 27,332 (62.3%) | 22,544 (68.8%) | |
| TMHMM | 8120 (16.7%) | 7748 (17.7%) | 6417 (20.3%) | 8006 (18.3%) | 6447 (19.7%) | |
| SignalP | 2607 (7.5%) | 3060 (6.3%) | 2763 (6.3%) | 2874 (6.6%) | 2401 (7.3%) | |
Highest numbers for each row are highlighted in bold.
Figure 2Dated phylogeny of 16 Fabaceae species based on Bayesian analysis of a supergene from the 256 single-copy orthologs (479,064 bp) from their transcriptomes. Node bars indicate 95% CI for the estimated divergence time. Numbers on branches indicate posterior probability (1 for all branches).
Figure 3Heatmap of annotated Pfam domains of the 13 Fabaceae species, only showing domains (n = 91) that are significantly contracted (negative) or expanded (positive) in the Dalbergia species (p < 0.05, two-tailed Fisher’s exact test of independence). See Supplementary Table 2 for species abbreviations.
Figure 4Results of GO enrichment analysis on positively selected genes, which are single-copy orthologs, between D. cochinchinensis (N = 371, GO annotated n = 299) and D. oliveri (N = 439, GO annotated n = 361), only showing terms that are significant (p < 0.05, chi-square test of independence).