| Literature DB >> 31540972 |
Ayako Izuno1,2, Thomas Wicker3, Masaomi Hatakeyama2,4,5, Dario Copetti2,6, Kentaro K Shimizu2,7.
Abstract
Accurate feature annotation as well as assembly contiguity are important requisites of a modern genome assembly. They allow large-scale comparison of genomes across and within species and identification of polymorphisms, leading evolutionary and functional studies. We report an updated genome resource for Metrosideros polymorpha, the most dominant tree species in the Hawaiian native forests and a unique example of rapid and remarkable ecological diversification of woody species. Ninety-one percent of the bases in the sequence assembly (304 Mb) were organized into 11 pseudo-molecules, which would represent the chromosome structure of the species assuming the synteny to a close relative Eucalyptus Our complementary approach using manual annotation and automated pipelines identified 11.30% of the assembly to be transposable elements, in contrast to 4.1% in previous automated annotation. By increasing transcript and protein sequence data, we predicted 27,620 gene models with high concordance from the supplied evidence. We believe that this assembly, improved for contiguity, and annotation will be valuable for future evolutionary studies of M. polymorpha and closely related species, facilitating the isolation of specific genes and the investigation of genome-wide polymorphisms associated with ecological divergence.Entities:
Keywords: Hawaii; MAKER; Metrosideros polymorpha; reannotation; transposable element
Mesh:
Substances:
Year: 2019 PMID: 31540972 PMCID: PMC6829130 DOI: 10.1534/g3.119.400643
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Workflow for re-annotation of the Metrosideros polymorpha genome using the MAKER2 pipeline.
Summary of Metrosideros polymorpha pseudo-molecules. Sequences were assigned to 11 chromosomes based on the collinearity with Eucalyptus genes, assuming a complete synteny between the two species
| Pseudo-molecule ID | Size, bp | Gaps, % |
|---|---|---|
| Mpol_Chr01 | 26,760,463 | 2.5 |
| Mpol_Chr02 | 25,476,500 | 2.9 |
| Mpol_Chr03 | 27,763,088 | 3.8 |
| Mpol_Chr04 | 21,740,955 | 2.7 |
| Mpol_Chr05 | 19,091,012 | 4.9 |
| Mpol_Chr06 | 32,945,760 | 2.8 |
| Mpol_Chr07 | 14,555,926 | 3.4 |
| Mpol_Chr08 | 40,261,496 | 3.4 |
| Mpol_Chr09 | 21,463,900 | 2.8 |
| Mpol_Chr10 | 27,804,677 | 2.4 |
| Mpol_Chr11 | 25,789,789 | 3.5 |
| Total | 283,653,566 | 3.6 |
Figure 2Circos plots for the comparison between Metrosideros polymorpha pseudo-molecules (ver. 2.0; red labels) and Eucalyptus grandis genome (Bartholomé ; black labels). To make the plot clearer, only every 5th gene is shown. (a) genome wide comparison. (b) an example of chromosome 1 which shows almost perfect conservation in gene order. (c), (d) translocations (or assembly errors) occurred between chromosomes 7 and 8.
Summary of transposable elements in the Metrosideros polymorpha pseudo-molecules identified with manual annotation and RepeatMasker
| Manual annotation | RepeatMasker | ||||||
|---|---|---|---|---|---|---|---|
| Nr. families | Nr. elements | TE space, bp | TE space, % | Nr. elements | TE space, bp | TE space, % | |
| Class I (retrotransposons) | |||||||
| LTR | 45 | 30516 | 15896123 | 5.60 | 14095 | 6370191 | 2.25 |
| 19 | 15563 | 6978439 | 2.46 | 3769 | 1439715 | 0.51 | |
| 24 | 8733 | 6258363 | 2.21 | 7799 | 4228467 | 1.49 | |
| 185 | 21334 | 0.01 | |||||
| SINE | 196 | 17591 | 0.01 | ||||
| LINE | 2901 | 701030 | 0.25 | ||||
| 2020 | 583118 | 0.21 | |||||
| 327 | 66310 | 0.02 | |||||
| 153 | 19698 | 0.01 | |||||
| 175 | 14276 | 0.01 | |||||
| Class II (DNA transposons) Subclass 1 | |||||||
| TIR | 16 | 40813 | 6643207 | 2.34 | 5686 | 820749 | 0.29 |
| | 4 | 22608 | 3564813 | 1.26 | |||
| | 1 | 348 | 369394 | 0.13 | |||
| | 2 | 1396 | 370560 | 0.13 | 1312 | 199320 | 0.07 |
| | 823 | 178858 | 0.06 | ||||
| | 113 | 7463 | <0.01 | ||||
| Class II (DNA transposons) Subclass 2 | |||||||
| | 3 | 2344 | 238737 | 0.08 | 971 | 237019 | 0.08 |
| Unclassified | 109 | 10297 | <0.01 | ||||
| Total TEs | 22778067 | 8.03 | 14763005 | 2.88 | |||
Figure 3Contributions of the 20 most abundant transposable element (TE) families to the whole genome. Fifteen families could be assigned to four different superfamilies (see inset), the remaining did not contain coding sequences (e.g., transposase), which would have allowed their classification into known superfamilies.
Summary of the ver. 1.0 and ver. 2.0 Metrosideros polymorpha genome annotations
| ver. 1.0 | ver. 2.0 | |
|---|---|---|
| Nr. protein-coding genes | 39305 | 27620 |
| Total gene space, Mb | 132.5 | 136.5 |
| Gene space, % | 38.2 | 39.3 |
| Mean gene size, bp | 3371.9 | 4942.7 |
| Nr. exons per gene | 5.7 | 6.1 |
| Mean exon size, bp | 280.2 | 287.5 |
| Mean intron size, bp | 420.3 | 683.4 |
| Total Nr. transcript isoforms | 41874 | 40206 |
| Average Nr. transcript isoforms per gene | 1.1 | 1.5 |
| Mean coding sequence length, bp | 3514.2 | 6860.8 |
| Transcript isoforms with Pfam domain, % | 60.8 | 72.0 |
| Transcript isoforms with BLASTP hit, % | 55.5 | 69.8 |
| Transcript isoforms with AED < 0.5, % | 66.5 | 88.4 |
| Transcript isoforms with AED = 1.0, % | 20.9 | 3.4 |
| BUSCO complete, % | 92.9 | 90.3 |
| BUSCO partial, % | 3.1 | 4.0 |
| BUSCO missing, % | 4.0 | 5.7 |
Figure 4Cumulative fraction of transcript isoforms in the ver. 1.0 and ver. 2.0 Metrosideros polymorpha genome annotation with evidence support represented by the annotation edit distance (AED) metric. Lower AED scores indicate greater concordance with available evidence data (Eilbeck ).