| Literature DB >> 31702783 |
Valerie L Soza1, Dale Lindsley1,2, Adam Waalkes1,3, Elizabeth Ramage1, Rupali P Patwardhan4, Joshua N Burton4,5, Andrew Adey4,6, Akash Kumar4,7, Ruolan Qiu4,2, Jay Shendure4,8,9, Benjamin Hall1.
Abstract
The genus Rhododendron (Ericaceae), which includes horticulturally important plants such as azaleas, is a highly diverse and widely distributed genus of >1,000 species. Here, we report the chromosome-scale de novo assembly and genome annotation of Rhododendron williamsianum as a basis for continued study of this large genus. We created multiple short fragment genomic libraries, which were assembled using ALLPATHS-LG. This was followed by contiguity preserving transposase sequencing (CPT-seq) and fragScaff scaffolding of a large fragment library, which improved the assembly by decreasing the number of scaffolds and increasing scaffold length. Chromosome-scale scaffolding was performed by proximity-guided assembly (LACHESIS) using chromatin conformation capture (Hi-C) data. Chromosome-scale scaffolding was further refined and linkage groups defined by restriction-site associated DNA (RAD) sequencing of the parents and progeny of a genetic cross. The resulting linkage map confirmed the LACHESIS clustering and ordering of scaffolds onto chromosomes and rectified large-scale inversions. Assessments of the R. williamsianum genome assembly and gene annotation estimate them to be 89% and 79% complete, respectively. Predicted coding sequences from genome annotation were used in syntenic analyses and for generating age distributions of synonymous substitutions/site between paralgous gene pairs, which identified whole-genome duplications (WGDs) in R. williamsianum. We then analyzed other publicly available Ericaceae genomes for shared WGDs. Based on our spatial and temporal analyses of paralogous gene pairs, we find evidence for two shared, ancient WGDs in Rhododendron and Vaccinium (cranberry/blueberry) members that predate the Ericaceae family and, in one case, the Ericales order.Entities:
Keywords: chromatin conformation capture (Hi-C); chromosome-scale scaffolding; de novo genome assembly; linkage map; restriction-site associated DNA (RAD) sequencing; synteny
Mesh:
Substances:
Year: 2019 PMID: 31702783 PMCID: PMC6907397 DOI: 10.1093/gbe/evz245
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Combining ALLPATHS-LG with fragScaff Improved the De Novo Assembly of the Rhododendron williamsianum Genome
| Assembly/Stats | ALLPATHS-LG | fragScaff | Final |
|---|---|---|---|
| No. scaffolds | 18,269 | 11,962 | 11,985 |
| Scaffold N10 size (bp) | 508,357 | 931,684 | 815,789 |
| Scaffold N50 size (bp) | 132,014 | 225,489 | 218,828 |
| Scaffold N90 size (bp) | 10,942 | 29,431 | 29,444 |
| Total bp | 491,643,723 | 532,499,445 | 532,123,622 |
Combined results from ALLPATHS-LG, fragScaff, and linkage map. Linkage analysis split misassembled scaffolds, resulting in slightly more scaffolds. Scaffolds were also filtered for duplicates and mitochondrial contamination.
. 1.—Comparison of chromosome-scale scaffolding for the Rhododendron williamsianum genome by two methods. Comparison of ordering and orienting scaffolds from the R. williamsianum de novo assembly within linkage groups based on two methods, LACHESIS assembly of Hi-C data and linkage map of RAD-seq data.
Chromosome-Scale Scaffolding Statistics for the Final Rhododendron williamsianum Genome
| Statistics | Final Assembly |
|---|---|
| Clustered scaffolds | 3,984 |
| Total bp clustered | 394,456,490 |
| Clustered and ordered scaffolds | 1,708 |
| Total bp clustered and ordered | 368,385,547 |
| Clustered and unordered scaffolds | 2,276 |
| Total bp clustered and unordered | 26,070,943 |
| Unclustered scaffolds | 8,001 |
| Total bp unclustered | 137,667,132 |
| Total scaffolds in final assembly | 11,985 |
Final assembly used LACHESIS results for clustering and ordering scaffolds and linkage map results for fixing large-scale inversions.
Statistics for the Structural Gene Annotation of the Rhododendron williamsianum Genome
| Gene Statistic | Final Assembly |
|---|---|
| Total predicted genes | 23,559 |
| Gene density/kb | 0.044 |
| Average gene length (bp) | 4,628 |
| Average no. exons/gene | 5.68 |
| Average exon length (bp) | 212 |
| Average intron length (bp) | 645 |
. 2.—Estimates of chromosome size for the Rhododendron williamsianum genome. Chromosome sizes were estimated by summing the lengths of ordered and unordered scaffolds within each linkage group (LG). Two chromosome size estimates are provided for each LG, including and excluding runs of 20 Ns or more.
. 3.—Gene Ontology (GO) classification of functionally annotated, predicted genes in Rhododendron williamsianum. The three panels represent the three main GO domains. Each panel represents level 2 classes from the GO directed acyclic graph (left side) and the top 25 classes across all levels (right side). Left side: top pie chart represents all annotated genes in genome, bottom pie chart represents syntenic genes within genome. Right side: blue bars represent syntenic genes, green bars represent all annotated genes for biological process; orange bars represent syntenic genes, yellow bars represent all annotated genes for cellular component; pink bars represent syntenic genes, blue bars represent all annotated genes for molecular function. Only classes with at least 10% of annotated genes within a domain are listed.
. 4.—Syntenic blocks within the Rhododendron williamsianum genome indicate multiple whole-genome duplications. The 13 chromosomes of R. williamsianum (RW) are arranged along the circumference of the Circos (Krzywinski et al. 2009) plot to reduce crossing of bundles. Each bundle represents a block of at least five syntenic gene pairs shared between two chromosomes (interior bundles) or within a chromosome (exterior bundles). Colored bundles highlight two syntenic regions with 1:5 syntenic depths.
. 5.—Syntenic blocks between Ericaceae genomes, Rhododendron williamsianum and Vaccinium macrocarpon (cranberry). The 12 chromosomes of V. macrocarpon (VM) and 13 chromosomes of R. williamsianum (RW) are arranged along the circumference of the Circos (Krzywinski et al. 2009) plot to reduce crossing of bundles. Each colored bundle represents a block of at least five syntenic gene pairs shared between two chromosomes.
. 6.—Distributions of synonymous substitutions/site (Ks) for paralogous gene pairs in Ericaceae genomes. For each genome, top panel shows histogram of Ks data overlaid by normal mixture model from EMMIX (McLachlan and Peel 1999). Two or three components of the normal mixture model are shown in green, red, and blue that correspond with SiZer (Chaudhuri and Marron 1999) results below. Bottom panel shows two or three significant peaks identified by SiZer map, where blue indicates significant increases and red indicates significant decreases in curves; purple is not significant, gray indicates sparse data. (A) Rhododendron delavayi. (B) R. williamsianum. (C) Vaccinium corymbosum. (D). V. macrocarpon.
Normal Mixture Model Parameters Estimated by EMMIX for Synonymous Substitutions/Site (Ks) Distributions of Ericales Genomes
| Genome |
|
|
|---|---|---|
|
| 0.49 | 1.72 |
|
| 0.67 | 1.65 |
|
| 0.61 | 1.71 |
|
| 0.59 | 1.54 |
|
| 0.60 | 1.60 |
. 7.—Whole-genome duplication events detected in Ericaceae genomes. Phylogeny of Ericales genomes sequenced to date and outgroup Vitis vinifera. Whole-genome duplication (WGD) events detected in Ericaceae genomes in this study indicated by blue and red stars. At-γ is named after WGD detected in Arabidopsis thaliana (Bowers et al. 2003); Ad-β is named after WGD detected in Actinidia (Shi et al. 2010).