| Literature DB >> 35567476 |
Anthony E Melton1, Andrew W Child2, Richard S Beard1, Carlos Dave C Dumaguit1, Jennifer S Forbey1, Matthew Germino3, Marie-Anne de Graaff1, Andrew Kliskey2, Ilia J Leitch4, Peggy Martinez1, Stephen J Novak1, Jaume Pellicer4,5, Bryce A Richardson6, Desiree Self1, Marcelo Serpe1, Sven Buerki1.
Abstract
Increased ecological disturbances, species invasions, and climate change are creating severe conservation problems for several plant species that are widespread and foundational. Understanding the genetic diversity of these species and how it relates to adaptation to these stressors are necessary for guiding conservation and restoration efforts. This need is particularly acute for big sagebrush (Artemisia tridentata; Asteraceae), which was once the dominant shrub over 1,000,000 km2 in western North America but has since retracted by half and thus has become the target of one of the largest restoration seeding efforts globally. Here, we present the first reference-quality genome assembly for an ecologically important subspecies of big sagebrush (A. tridentata subsp. tridentata) based on short and long reads, as well as chromatin proximity ligation data analyzed using the HiRise pipeline. The final 4.2-Gb assembly consists of 5,492 scaffolds, with nine pseudo-chromosomal scaffolds (nine scaffolds comprising at least 90% of the assembled genome; n = 9). The assembly contains an estimated 43,377 genes based on ab initio gene discovery and transcriptional data analyzed using the MAKER pipeline, with 91.37% of BUSCOs being completely assembled. The final assembly was highly repetitive, with repeat elements comprising 77.99% of the genome, making the Artemisia tridentata subsp. tridentata genome one of the most highly repetitive plant genomes to be sequenced and assembled. This genome assembly advances studies on plant adaptation to drought and heat stress and provides a valuable tool for future genomic research.Entities:
Keywords: zzm321990 Artemisia tridentatazzm321990 ; genomic resources; keystone species
Mesh:
Year: 2022 PMID: 35567476 PMCID: PMC9258541 DOI: 10.1093/g3journal/jkac122
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
Fig. 1.Map highlighting the sagebrush ecosystems and the site of collection of IDT3 within the Soda Fire site (burned in 2015) in Idaho, USA. Sagebrush ecosystems (also called the “Sagebrush Biome” per Rigge ) currently cover an estimated range of 653,316 km2. The inset shows a landscape photo of the Soda Fire site.
Fig. 2.Linkage–density histogram for the HiRise assembly generated by Dovetail Genomics. The axes represent the mapping positions along the genome assembly of the first (x-axis) and second (y-axis) read in the read pair, grouped into bins. The color of each square represents the number of reads within a given bin, with darker colors indicating more reads being mapped within the given bin. Vertical and horizontal lines have been added to delimit the scaffolds (smaller scaffolds are not visible in the plot due to scale and are represented by the large gray lines at the upper limits of the X- and Y-axes). X and Y-axes represent the position within the genome assembly in Gb, with pseudo-chromosomal scaffolds ordered largest to smallest.
Fig. 3.Density plot of k-mer analysis in GenomeScope and genome map showing GC content (%), % repeat per 1 million nucleotides, number of genes per 1 million nucleotides, and the size of the scaffold for the nine pseudo-chromosomal scaffolds. Subset (a) shows the genome feature mapping for the nine pseudo-chromosomal scaffolds, subset (b) shows GenomeScope results, and subset (c) shows the Smudgeplot results. GenomeScope summary statistics, including heterozygosity rate (listed as “het”), are listed at the top of plot (b). Two primary k-mer peaks are present, indicating that the genome is diploid. The Smudgeplot shows the frequency of k-mer pairs within the genome, with darker colors indicating the group is less frequent and bright yellow indicating the group is more frequent. When visualized, the plot shows distinct “smudges” representing each k-mer pair with the greatest of density of k-mers relating to the ploidy level of the genome (e.g. the diploid A. tridentata genome has the brightest “smudge” for the diploid AB k-mer pair).
Summary statistics for the 9 pseudo-chromosomal scaffolds within the IDT3 “G1_b2” genome assembly.
| Scaffold | Length in Gb (% of assembly) | Protein coding genes | Total gene length in Gb (% of assembly) | Repeat occurrences | Repeat length total in Gb (% of assembly) |
|---|---|---|---|---|---|
| 1 | 0.528 (12.58) | 5,869 | 0.018 (3.49) | 709,220 | 0.444 (84.00) |
| 2 | 0.514 (12.23) | 5,153 | 0.015 (2.99) | 682,886 | 0.443 (86.21) |
| 3 | 0.472 (11.24) | 4,781 | 0.015 (3.15) | 624,680 | 0.406 (86.04) |
| 4 | 0.446 (10.62) | 4,707 | 0.015 (3.33) | 591,412 | 0.378 (84.73) |
| 5 | 0.445 (10.59) | 4,951 | 0.017 (3.73) | 591,818 | 0.371 (83.43) |
| 6 | 0.439 (10.46) | 4,358 | 0.013 (3.04) | 580,217 | 0.379 (86.38) |
| 7 | 0.385 (9.18) | 4,096 | 0.013 (3.30) | 513,867 | 0.330 (85.52) |
| 8 | 0.361 (8.61) | 3,520 | 0.011 (3.03) | 480,240 | 0.311 (86.11) |
| 9 | 0.338 (8.06) | 3,430 | 0.011 (3.11) | 446,444 | 0.295 (87.12) |
| Total | 3.929 (93.58) | 40,865 | 0.128 (3.25) | 5,220,784 | 3.356464852 (85.43) |
Summary statistics for the de novo and HiRise genome assembly outputs.
| De novo assembly | HiRise assembly | |
|---|---|---|
| Total length (bp) | 4,197,847,053 | 4,198,560,453 |
| N50 | 965,994 | 444,777,032 |
| L50 | 1,188 | 5 |
| N90 | 246,927 | 338,336,202 |
| L90 | 4,521 | 9 |
| Largest scaffold (bp) | 10,654,198 | 528,210,163 |
| Number of scaffolds | 12,613 | 5,500 |
| Number of scaffolds >1 kb | 12,577 | 5,464 |
| Number of gaps | 1,859 | 8,993 |
| Number of | 1 | 18 |
| Complete BUSCOs (C) | 232 (90.98%) | 233 (91.37%) |
| Complete and single-copy BUSCOs (S) | 175 (68.63%) | 188 (73.73%) |
| Complete and duplicated BUSCOs (D) | 57 | 45 |
| Fragmented BUSCOs (F) | 2 | 5 |
| Missing BUSCOs (M) | 21 | 17 |
| Total BUSCO groups searched | 255 | 255 |
The final assembly, with scaffolds <200 bases in length and 1 mitochondrial fragment removed, totaled 4,198,553,833 bases and comprised 5,492 scaffolds.