| Literature DB >> 32033466 |
Evgeny S Gerasimov1,2,3, Ksenia A Zamyatnina1, Nadezda S Matveeva1,2, Yulia A Rudenskaya1, Natalya Kraeva4, Alexander A Kolesnikov1, Vyacheslav Yurchenko2,4.
Abstract
Maxicircles of all kinetoplastid flagellates are functional analogs of mitochondrial genome of other eukaryotes. They consist of two distinct parts, called the coding region and the divergent region (DR). The DR is composed of highly repetitive sequences and, as such, remains the least explored segment of a trypanosomatid genome. It is extremely difficult to sequence and assemble, that is why very few full length maxicircle sequences were available until now. Using PacBio data, we assembled 17 complete maxicircles from different species of trypanosomatids. Here we present their large-scale comparative analysis and describe common patterns of DR organization in trypanosomatids.Entities:
Keywords: divergent region; genomic rearrangements; kinetoplast; maxicircle; mitochondrion; repeats; trypanosomatids
Year: 2020 PMID: 32033466 PMCID: PMC7169413 DOI: 10.3390/pathogens9020100
Source DB: PubMed Journal: Pathogens ISSN: 2076-0817
Overview of assembled maxicircles. Brief statistics shows the basic assembly parameters: total maxicircle length, divergent region length, coding region length, average coverage per nucleotide, and total number of reads, included in the assembly and GenBank accession number.
| Species (Strain) | Length, bp | DR, bp | CR, bp | Coverage | Reads | GenBank |
|---|---|---|---|---|---|---|
| 27,631 | 11,485 | 16,146 | 8.2 | 39 | MN904521 | |
| 27,138 | 10,860 | 16,278 | 10.6 | 36 | MN904523 | |
| 29,037 | 12,978 | 16,059 | 5.1 | 20 * | MN904514 | |
| 29,512 | 13,313 | 16,199 | 21.7 | 94 | MN904522 | |
| 29,557 | 13406 | 16,151 | 20.1 | 64 | MN904525 | |
| 33,779 | 17,521 | 16,258 | 10.0 | 37 | MN904515 | |
| 34,088 | 17,883 | 16,205 | 8.9 | 31 | MN904519 | |
| 33,278 | 17,238 | 16,040 | 16.0 | 61 | MN904520 | |
| 36,676 | 20,521 | 16,155 | 17.2 | 62 | MN904518 | |
| 26,728 | 10,568 | 16,160 | 13.9 | 89 | MN904516 | |
| 29,618 | 13,442 | 16,176 | 20.1 | 87 | MN904517 | |
| 23,201 | 8391 | 14,810 | 9.1 | 33 * | MN904526 | |
| 39,883 | 24,498 | 15,385 | 25.7 | 93 | MN904528 | |
| 47,384 | 32,012 | 15,372 | 27.9 | 98 | MN904527 | |
| 31,564 | 15,450 | 16,114 | 19.9 | 47 | MN904524 | |
|
| 29,680 | 14,463 | 15,217 | 17.7 | 57 | MN904513 |
| 25,722 | 8594 | 17,128 | 26.78 | 57 | MN904512 |
* Assembled maxicircle contig was not marked as ‘circular’ by Canu assembler.
Figure 1Principal component analysis of triplet spectra of maxicircles. Triplet frequency vectors were calculated for the coding region (green) and divergent region (red) of each maxicircle. One and two principal components were chosen to show on the scatter plot. Lama = Leishmania amazonensis; Hmeg = Herpetomonas megaseliae; Laet = Leishmania aethiopica; Cexp = Crithidia expoeki; Lbra2 = Leishmania braziliensis (208-905); Lbra = Leishmania braziliensis (208-954); Ldon1 = Leishmania donovani (Pasteur); Ldon2 = Leishmania donovani (193-S616); Ldon3 = Leishmania donovani (FDAARGOS_361); Lguy = Leishmania guyanensis; Linf = Leishmania infantum; Lmex = Leishmania mexicana; Lpyr = Leptomonas pyrrhocoris; Ltro = Leishmania tropica; Tbru = Trypanosoma brucei; Dm28c = Trypanosoma cruzi (Dm28c); TCC = Trypanosoma cruzi (TCC).
Figure 2Number of occurrences of most frequent k-mer in maxicircle for values of k = 16, 24, 36, 48, 64, and 96. Species abbreviations are the same as for Figure 1.
Figure 3Dotplots of maxicircles of Leishmania guyanensis (a); L. infantum (b); L. amazonensis (c); Leptomonas pyrrhocoris (d); Trypanosoma cruzi strain TCC (e); T. brucei (f). Green and blue arrows denote 12S rRNA and ND5 genes, respectively. All plots show full maxicircle starting with the ND5 gene at (0,0). Red and black arrows in (b) indicate the I-element and a single-copy array repeat unit, respectively. Dot plots of other assembled maxicircles can be found in Figure A1.
Figure A1Dotplots of maxicircles of Leishmania aethiopica (a); L. tropica (b); L. mexicana (c); Trypanosoma cruzi strain Dm28c (d); Crithidia expoeki (e); Herpetomonas megaseliae (f); L. braziliensis strain 208-954 (g); L. braziliensis strain 208-905 (h). All plots show full maxicircle starting with the ND5 gene at (0, 0).
Figure A2Circos plots of the DR of Leishmania aethiopica (a); Leishmania tropica (b); Leishmania mexicana (c); Trypanosoma cruzi strain Dm28c (d); Crithidia expoeki (e); Herpetomonas megaseliae (f); Leishmania braziliensis strain 208-954 (g); Leishmania braziliensis strain 208-905 (h). Green and blue arrows indicate 12S and ND5 genes, respectively. The outer track is a histogram of 24-mer frequency; the middle track shows tandem repeats; the inner track is a GC content profile. Ribbons inside the circle connect homologous regions; the color represents percent of sequence identity in the range [80%; 100%]. Violet arcs denote inverted repeats.
Figure 4Circos plots of the divergent region (DR) of Leishmania guyanensis (a); L. infantum (b); L. amazonensis (c); Leptomonas pyrrhocoris (d); Trypanosoma cruzi strain TCC (e); T. brucei (f). Green and blue arrows denote the 12S rRNA and ND5 genes, respectively. Outer tracks are histograms of 24-mer frequency; middle tracks indicate tandem repeats; and the inner tracks are GC-content profiles. Ribbons inside the circle connect homologous regions; color represents percent of sequence identity in range [80%; 100%] in the order red, yellow, green, and blue. Violet arcs denote inverted repeats.
Figure 5Dotplots and corresponding Circos plots of the DR of Leishmania donovani strains: Pasteur (a,d); 193-S616 (b,e); FDAARGOS_361 (c,f) Circos plots are presented in the same way as in Figure 4.
Figure 6Circos plots comparing the DRs of two species or strains: T. cruzi strains TCC (left contig) and Dm28c (right contig) (a); L. braziliensis strains (b); Leishmania tropica (left contig) and L. aethiopica (right contig) (c); L. donovani strains 193-S616 (left contig) and FDAARGON_361 (right contig) (d); L. donovani strains 193-S616 (left contig) and Pasteur (right contig) (e).
Sources of used datasets. Accession numbers for raw data used to assemble maxicircle sequences in this study. All accessions numbers are for the PacBio reads, if not specified otherwise.
| Species (Strain) | BioProject | Run Accession Number |
|---|---|---|
| PRJNA484340 | SRR7867272, SRR7867273, SRR7867284,SRR7867285 | |
| SRR7867261, SRR7867262, | ||
| SRR8377733, SRR7867274, | ||
| SRR7867264-SRR7867268 | ||
| SRR7867286-SRR7867292 | ||
| SRR7867275-SRR7867277, SRR8377732 | ||
| SRR7880312; SRR7880319; SRR7880320 | ||
| SRR7880309-SRR7880311 | ||
| SRR7880313, SRR7880314, SRR7880316 | ||
| PRJNA231221 | SRR5932752-SRR5932754, | |
| PRJNA396645 | SRR5902665-SRR5902672 | |
| PRJEB18945 | ERR1794935-ERR1794915 | |
| PRJNA432753 | SRR6822075 | |
| PRJNA433042 | SRR6809376 | |
| PRJNA598933 1 | ||
|
| PRJEB7883 | ERR1036240-ERR1036242, ERR1046607-ERR1046611 |
| Assembled sequence 3 |
1 Sequence was assembled from PacBio data generated in our lab. 2 GenBank CP022652 (Ramasamy, G.; McDonald, J.; Sur, A.; and Myler, P., BioProject PRJNA396645, unpublished). 3 Sequence was assembled from PacBio data generated in [43].