| Literature DB >> 33561229 |
Guillermo Friis1, Joel Vizueta2, Edward G Smith1, David R Nelson1, Basel Khraiwesh1,3, Enas Qudeimat1,3, Kourosh Salehi-Ashtiani1, Alejandra Ortega4, Alyssa Marshell5, Carlos M Duarte4, John A Burt1.
Abstract
The gray mangrove [Avicennia marina (Forsk.) Vierh.] is the most widely distributed mangrove species, ranging throughout the Indo-West Pacific. It presents remarkable levels of geographic variation both in phenotypic traits and habitat, often occupying extreme environments at the edges of its distribution. However, subspecific evolutionary relationships and adaptive mechanisms remain understudied, especially across populations of the West Indian Ocean. High-quality genomic resources accounting for such variability are also sparse. Here we report the first chromosome-level assembly of the genome of A. marina. We used a previously release draft assembly and proximity ligation libraries Chicago and Dovetail HiC for scaffolding, producing a 456,526,188-bp long genome. The largest 32 scaffolds (22.4-10.5 Mb) accounted for 98% of the genome assembly, with the remaining 2% distributed among much shorter 3,759 scaffolds (62.4-1 kb). We annotated 45,032 protein-coding genes using tissue-specific RNA-seq data in combination with de novo gene prediction, from which 34,442 were associated to GO terms. Genome assembly and annotated set of genes yield a 96.7% and 95.1% completeness score, respectively, when compared with the eudicots BUSCO dataset. Furthermore, an FST survey based on resequencing data successfully identified a set of candidate genes potentially involved in local adaptation and revealed patterns of adaptive variability correlating with a temperature gradient in Arabian mangrove populations. Our A. marina genomic assembly provides a highly valuable resource for genome evolution analysis, as well as for identifying functional genes involved in adaptive processes and speciation.Entities:
Keywords: zzm321990 Avicennia marinazzm321990 ; Arabia; HiC; genome assembly; gray mangrove
Mesh:
Year: 2021 PMID: 33561229 PMCID: PMC8022769 DOI: 10.1093/g3journal/jkaa025
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Geography and adaptive variability in Arabian gray mangroves. (A) Locations of the six stands sampled for whole-genome resequencing (colored circles) and for RNA-seq (green star). (B) FST genome scan based on 22,181 windows of 20 kb. Boxplot outliers (coefficient = 1.5) are marked in red (C) t-SNE based on 613 SNP outliers linked to functional genes. The background shows the correlation of t-SNE1 and t-SNE2 with the annual temperature range registered in each one of the sampling locations. Temperature depicted in the legend is in °C.
Summary statistics for the genome assembly and annotation of A. marina
| Genome assembly | |
|---|---|
| Total length | 456,526,188 bp |
| Number of scaffolds | 3,791 |
| N50/L50 | 13,979,447 bp/15 scaffolds |
| N90/L90 | 11,144,373 bp/29 scaffolds |
| Chromosome scale | 10,583,658 bp/32 scaffolds |
| Longest scaffold | 22,400,447 bp |
| Missingness | 10.6% |
| GC content | 35.2 % |
| BUSCO eukaryota database | C: 98.8% [S: 81.2%, D: 17.6%], F: 0.8%, M: 0.4%, |
| BUSCO eudicots database | C: 96.7% [S: 89.2%, D: 7.5%], F: 0.8%, M: 2.5%, |
|
| |
|
| |
|
| |
| Number of genes | 41,206 |
| Number of annotated genes | 35,604 |
| Number of genes with GOs | 34,442 |
| Average gene length | 3,152.28 |
| Number of CDS | 45,032 |
| Average CDS length (bp) | 1,097.74 |
| Number of exons | 233,312 |
| Average exon length (bp) | 211.87 |
| Number of introns | 188,280 |
| Average intron length (bp) | 536.98 |
| BUSCO eukaryota database | C: 98.9% [S: 82.4%, D: 16.5%], F: 0.8%, M: 0.3%, |
| BUSCO eudicots database | C: 95.1% [S: 87.3%, D: 7.8%], F: 1.4%, M: 3.5%, |
BUSCO parameters are C, complete BUSCO; S, complete and single-copy BUSCOs; D, complete and duplicated BUSCOs; F, fragmented BUSCOs; M, missing BUSCOs; N, total BUSCO groups searched. CDS indicates protein-coding sequences.
Figure 2High-quality assembly and annotation for A. marina. (A) Length bar-plot of the longest 40 scaffolds arranged by decreasing size. Bars show per-chromosome genomic proportions corresponding to identified gene regions; and repetitive elements (SINEs + LINEs: short and long interspersed nuclear elements; LTRs, long terminal repeats). (B) BUSCO completeness percentages using the eukaryota (N = 255) and eudicots (N = 2,326) databases for the genome assembly (left) and for the annotation (right).