| Literature DB >> 33781200 |
Aleksandra Bliznina1, Aki Masunaga2, Michael J Mansfield2, Yongkai Tan2, Andrew W Liu2, Charlotte West2,3, Tanmay Rustagi2, Hsiao-Chiao Chien2, Saurabh Kumar2, Julien Pichon2, Charles Plessy4, Nicholas M Luscombe2,3,5.
Abstract
BACKGROUND: The larvacean Oikopleura dioica is an abundant tunicate plankton with the smallest (65-70 Mbp) non-parasitic, non-extremophile animal genome identified to date. Currently, there are two genomes available for the Bergen (OdB3) and Osaka (OSKA2016) O. dioica laboratory strains. Both assemblies have full genome coverage and high sequence accuracy. However, a chromosome-scale assembly has not yet been achieved.Entities:
Keywords: Chromosome-scale assembly; Hi-C; Oikopleura dioica; Oxford Nanopore sequencing; Single individual; Telomere-to-telomere
Mesh:
Year: 2021 PMID: 33781200 PMCID: PMC8008620 DOI: 10.1186/s12864-021-07512-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Genome assembly and annotation workflow used to generate the OKI2018_I69 genome assembly. a Life images of adult male (top) and female (bottom) O. dioica. b The assembly was generated using Nanopore and Illumina data, followed by scaffolding using Hi-C chromosomal capture information data
Fig. 2Quality control checks implemented on different steps of genome sequencing and assembly. a Graph showing length distribution of raw Nanopore reads used to generate the OKI2018_I69 assembly. b Estimated total and repetitive genome size based on k-mer counting of the Illumina paired-end reads used for polishing the OKI2018_I69 assembly. c Pairwise genome alignment of the contig assemblies of I69 and I28 O. dioica individuals
Fig. 3OKI2018_I69 assembly of the Okinawan O. dioica. a Treemap comparison between the contig (left) and scaffold (right) assemblies of the O. dioica genome. Each rectangle represents a contig or a scaffold in the assembly with the area proportional to its length. b Comparison between the OKI2018_I69 (left) and OdB3 (right) linkage groups. The Sankey plot shows what proportion of each chromosome in the OKI2018_I69 genome is aligned to the OdB3 linkage groups. c Contact matrix generated by aligning Hi-C data set to the OKI2018_I69 assembly with Juicer and 3D-DNA pipelines. Pixel intensity in the contact matrices indicates how often a pair of loci collocate in the nucleus
Comparison of the OKI2018_I69 assembly with the previously published O. dioica genomes
| OdB3 | OSKA2016 | OKI2018_I69 | |
|---|---|---|---|
| Geographical origin | Bergen, Norway (North Atlantic) | Hyogo, Japan (Western Pacific) | Okinawa, Japan (Ryukyu archipelago) |
| Assembly length (Mbp) | 70.4 | 65.6 | 64.3 |
| Number of scaffolds | 1260 | 576 | 19 |
| Longest scaffold (Mbp) | 3.2 | 6.8 | 17.1 |
| Scaffold N50 (Mbp) | 0.4 | 1.5 | 16.2 |
| Number of contigs | 5917 | 746 | 42 |
| Contig N50 (Mbp) | 0.02 | 0.6 | 4.7 |
| GC content (%) | 39.77 | 41.34 | 41.06 |
| Gap rate (%) | 5.589 | 0.585 | 0.034 |
| Complete BUSCOs (%) | 70.8 | 71.7 | 73.01 |
Fig. 4Chromosome-level features of the Okinawan O. dioica genome. a Visualization of sequence properties across chromosomes in the OKI2018_I69 assembly. For each chromosome, 50 kbp windows of GC (orange), Nanopore sequence coverage (blue), the percent of nucleotides masked by RepeatMasker (purple), and the number of genes (yellow) are indicated. Differences in these sequence properties occur near predicted sites of centromeres and telomeres, as well as between the short and long arms of each non-sex-specific chromosome. Telomeres and gaps in the assembly are indicated with black and grey rectangles, respectively. b Long and short chromosome arms exhibit significant differences sequence properties, including GC content, repetitive sequence content, and the number of restriction sites recognized by the DpnII enzyme used to generate the Hi-C library
Fig. 5Quality assessment of the OKI2018_I69 genome assembly. a Proportion of BUSCO genes detected or missed in Oikopleura genomes and transcriptomes. The search on the OKI2018_I69 assembly was repeated with default parameters (“no training”) to display the effect of AUGUSTUS training. b Venn diagram showing the number of BUSCO genes missing in OKI2018_I69, OdB3 and/or OSKA2016 genomes
Fig. 6Analysis of repetitive elements. The repeat landscape and proportions of various repeat classes in the genome are indicated and color-coded according to the classes shown on the right side of the figure. The non-repetitive fraction of the genome is shown in black
Comparison of the annotations of the three O. dioica genome assemblies
| OdB3 | OSKA2016 | OKI2018_I69 | |
|---|---|---|---|
| Masked sequence (%) | 15.0 | – | 14.4 |
| Number of genes | 18,020 | 18,743 | 17,260 |
| Median gene length (bp) | 1488 | 1483 | 1505 |
| Median exon length (bp) | 159 | 155 | 152 |
| Median intron length | 48 | 51 | 49 |
Fig. 7Draft scaffold of the mitochondrial genome in the OKI2018_I69 assembly. a Predicted gene annotation of the draft mitochondrial genome sequence. b Self-similarity plot of the draft mitochondrial genome sequence. A tandem repeat can be seen, which complicates the complete assembly of the mitochondrial genome from whole-genome sequencing data
Fig. 8Genomic locations of various oikopleurid gene homologs in the OKI2018_I69. The genes are searchable by name and PubMed identifiers in the ZENBU genome browser. Colours indicate genes from the same family