| Literature DB >> 34951395 |
Sebastian Cristian Treitli1, Priscila Peña-Diaz1, Paweł Hałakuc2, Anna Karnkowska2, Vladimír Hampl1.
Abstract
Monocercomonoides exilis is considered the first known eukaryote to completely lack mitochondria. This conclusion is based primarily on a genomic and transcriptomic study which failed to identify any mitochondrial hallmark proteins. However, the available genome assembly has limited contiguity and around 1.5 % of the genome sequence is represented by unknown bases. To improve the contiguity, we re-sequenced the genome and transcriptome of M. exilis using Oxford Nanopore Technology (ONT). The resulting draft genome is assembled in 101 contigs with an N50 value of 1.38 Mbp, almost 20 times higher than the previously published assembly. Using a newly generated ONT transcriptome, we further improve the gene prediction and add high quality untranslated region (UTR) annotations, in which we identify two putative polyadenylation signals present in the 3'UTR regions and characterise the Kozak sequence in the 5'UTR regions. All these improvements are reflected by higher BUSCO genome completeness values. Regardless of an overall more complete genome assembly without missing bases and a better gene prediction, we still failed to identify any mitochondrial hallmark genes, thus further supporting the hypothesis on the absence of mitochondrion.Entities:
Keywords: Monocercomonoides; amitochondriate; genome; nanopore
Mesh:
Substances:
Year: 2021 PMID: 34951395 PMCID: PMC8767320 DOI: 10.1099/mgen.0.000745
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
General statistics of the previously published Monocercomonoides exilis 454 genome assembly and the ONT genome assembly obtained in this study
|
454 assembly |
ONT assembly | |
|---|---|---|
|
|
74 712 536 |
82 301 135 |
|
|
36.8 |
37.2 |
|
|
2092/6648 |
101/101 |
|
|
71 440 |
1 379 369 |
|
|
16 767 |
18 152 |
|
|
486 |
1 |
|
|
|
16448/16 323 |
|
|
|
319 |
|
|
|
633/300 |
|
|
|
54/110 |
|
|
|
2838 |
|
|
|
1829 |
|
|
2704 |
2730 |
|
|
1484 |
1855 |
|
|
31 693 |
35 345 |
|
|
1.90 |
1.95 |
|
|
124 |
119 |
|
|
25 |
27.6 |
|
|
6840 |
8354 |
|
|
166 |
312 |
|
|
6967 |
5279 |
|
|
108 |
62 |
Fig. 1.Circular representation of the ten complete chromosomes from the ONT assembly. The outermost track represents the chromosome-size scaffolds followed by GC content, coding percentage calculated for 5kbp windows, location and types of repetitive elements, and locations of protein tyrosine kinases (PTK) on the chromosomes. PTK’s overlapping unclassified repeats are represented by orange bars, and those not overlapping unclassified repeats are represented in blue.
Repetitive elements identified in the ONT genome assembly of M. exilis
|
Type of repeats |
No. masked bases (bp) |
Percentage of the assembly |
|---|---|---|
|
LTR elements |
1 415 863 |
1.72 |
|
DNA transposons |
3 722 012 |
4.52 |
|
Simple repeats |
2 749 397 |
3.34 |
|
Low complexity |
999 721 |
1.21 |
|
Unclassified |
28 945 590 |
35.17 |
|
|
37 832 583 |
45.97 |
Fig. 2.Two examples of gene prediction improvement on scaffold10. The first row represents the original 454 gene model. The second row represents full-length transcripts mapped to the genome using PASA. The last row represents the final gene models after prediction improvement with ONT generated transcriptome. Coding sequences are coloured in red, untranslated regions are represented in blue and introns are represented in grey.
Fig. 3.BUSCO genome completeness estimated on the list of predicted genes. The estimation was carried out using odbv9 dataset (n=303). The completeness was estimated after each step. ONT final prediction represents the published prediction after the fourth round of cDNA polishing.
Fig. 4.UTR characteristics in the genome of Monocercomonoides exilis. (a) 3′ UTR length distribution based on all annotated UTRs; (b) Single-nucleotide scan from positions −100 to +10 in the 3′ UTR upstream and downstream region. The occurrence probability of the two identified polyadenylation signals is represented on the second axis, and the average content of uridine bases is represented on the first axis. The pink line marks the position of the cleavage site; (c) 5′ UTR length distribution based on all annotated UTRs; (d) A sequence logo showing the conservation of the bases around the start codon based on 632 sequences. Larger letters indicate higher frequency of the bases at that location.