| Literature DB >> 34489462 |
Hatim Almutairi1,2, Michael D Urbaniak1, Michelle D Bates1, Narissara Jariyapan3, Godwin Kwakye-Nuako4, Vanete Thomaz Soccol5, Waleed S Al-Salem2, Rod J Dillon1, Paul A Bates1, Derek Gatherer6.
Abstract
We provide the raw and processed data produced during the genome sequencing of isolates from six species of parasites from the sub-family Leishmaniinae: Leishmania martiniquensis (Thailand), Leishmania orientalis (Thailand), Leishmania enriettii (Brazil), Leishmania sp. Ghana, Leishmania sp. Namibia and Porcisia hertigi (Panama). De novo assembly was performed using Nanopore long reads to construct chromosome backbone scaffolds. We then corrected erroneous base calling by mapping short Illumina paired-end reads onto the initial assembly. Data has been deposited at NCBI as follows: raw sequencing output in the Sequence Read Archive, finished genomes in GenBank, and ancillary data in BioSample and BioProject. Derived data such as quality scoring, SAM files, genome annotations and repeat sequence lists have been deposited in Lancaster University's electronic data archive with DOIs provided for each item. Our coding workflow has been deposited in GitHub and Zenodo repositories. This data constitutes a resource for the comparative genomics of parasites and for further applications in general and clinical parasitology.Entities:
Mesh:
Year: 2021 PMID: 34489462 PMCID: PMC8421402 DOI: 10.1038/s41597-021-01017-3
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Stacked column chart showing number of sequenced reads in GigaReads (blue), number of yielded bases in GigaBases (red), and the file sizes in Gigabytes (yellow) for each genome assembly.
Fig. 2Flowchart showing the analysis workflow strategy.
Sample descriptions for all assemblies.
| Sample | Strain | Isolate | BioSample | BioProject |
|---|---|---|---|---|
| LV760 | LSCM1 | SAMN17294109 | PRJNA691531 | |
| LV768 | LSCM4 | SAMN17294111 | PRJNA691532 | |
| LV763 | CUR178 | SAMN17294112 | PRJNA691534 | |
| LV757 | GH5 | SAMN17294115 | PRJNA691536 | |
| LV425 | 253 | SAMN17294129 | PRJNA689706 | |
| LV43 | C119 | SAMN17294121 | PRJNA691541 |
Tools used in analysis workflow with conda or docker link.
| Tool | Website | conda or docker link |
|---|---|---|
| AGAT | ||
| AUGUSTUS | ||
| BCFtools | ||
| bedtools | ||
| blast+ | ||
| FastQC | ||
| Flye | ||
| funannotate | ||
| GAAS | ||
| GeneMark | ||
| Genometools | ||
| interproscan | ||
| MAKER2 | ||
| minimap2 | ||
| MultiQC | ||
| MUMmer | ||
| Pilon | ||
| pycoQC | ||
| RaGOO | ||
| RepeatMasker | ||
| SAMtools | ||
| Snakemake | ||
| TEclass | ||
| wordcloud | Not available |
Fig. 3Dotplot representing synteny between each of our genomes and its wordcloud-predicted closest related reference genome, produced using MUMmer.
Fig. 4Example genome-wide repeat plot for L. martiniquensis, stratified: simple (micro-satellites), low complexity, DNA, long terminal repeats (LTRs), long interspersed nuclear elements (LINEs), RNA, rolling circle (RC), satellites, short interspersed nuclear elements (SINEs) and retroposons. The middle pie chart represent the proportion of each repeat class in the genome: none (94.4%), simple (micro-satellites) (4.11%), low complexity (0.655%), DNA (0.419%), unknown (0.161%), LTRs (0.110%), LINEs (0.052%), RNA (0.027%), RC (0.019%), satellites (0.010%), retroposons (0.005%), SINEs (0.004%).
Fig. 5Annotation Edit Distance (AED) score (x-axis) line plot for all assembly annotation rounds: evidence-based (solid line) and ab initio (dotted line). Y-axis represents the genome cumulative percentages.
Details of reads, bases and file sizes.
| species | Sequencing Platforms | SRA Accession | Number of Reads (GigaReads) | Bases (GigaBase) | File size (Gigabyte) |
|---|---|---|---|---|---|
| Illumina HiSeq 4000 | SRR13558784 | 0.783 | 1.182 | 2.981 | |
| SRR13558792 | 1.089 | 1.644 | 4.151 | ||
| Illumina MiSeq | SRR13558785 | 0.446 | 1.327 | 3.003 | |
| Nanopore MinION | SRR13558786 | 0.071 | 3.634 | 7.323 | |
| SRR13558788 | 0.006 | 0.321 | 0.647 | ||
| SRR13558790 | 0.004 | 0.468 | 0.940 | ||
| SRR13558793 | 0.005 | 0.385 | 0.774 | ||
| Illumina HiSeq 2500 | SRR13558774 | 1.579 | 1.437 | 4.843 | |
| SRR13558775 | 0.618 | 0.563 | 1.894 | ||
| SRR13558776 | 1.560 | 1.420 | 4.786 | ||
| SRR13558777 | 0.636 | 0.578 | 1.947 | ||
| SRR13558778 | 0.735 | 0.668 | 2.250 | ||
| Illumina HiSeq 4000 | SRR13558779 | 1.079 | 1.629 | 4.112 | |
| SRR13558780 | 1.406 | 2.123 | 5.361 | ||
| Illumina MiSeq | SRR13558781 | 0.383 | 1.135 | 2.568 | |
| Nanopore MinION | SRR13558782 | 0.054 | 3.357 | 6.756 | |
| Illumina HiSeq 4000 | SRR13558795 | 0.879 | 1.328 | 3.350 | |
| SRR13558796 | 1.214 | 1.834 | 4.630 | ||
| Illumina MiSeq | SRR13558797 | 0.506 | 1.494 | 3.385 | |
| Nanopore MinION | SRR13558798 | 0.072 | 4.365 | 8.786 | |
| Illumina HiSeq 2500 | SRR13558800 | 1.228 | 1.117 | 3.765 | |
| SRR13558801 | 0.684 | 0.623 | 2.096 | ||
| Illumina HiSeq 4000 | SRR13558802 | 1.006 | 1.519 | 3.833 | |
| SRR13558803 | 1.407 | 2.124 | 5.365 | ||
| Illumina MiSeq | SRR13558804 | 0.520 | 1.549 | 3.505 | |
| Nanopore MinION | SRR13558805 | 0.077 | 5.390 | 10.840 | |
| Illumina HiSeq 4000 | SRR13558764 | 0.527 | 1.567 | 3.546 | |
| SRR13558765 | 0.985 | 1.487 | 3.753 | ||
| Illumina MiSeq | SRR13558766 | 1.347 | 2.034 | 5.136 | |
| Nanopore MinION | SRR13558767 | 0.068 | 4.377 | 8.807 | |
| Illumina HiSeq 4000 | SRR13558754 | 0.929 | 1.403 | 3.540 | |
| SRR13558755 | 1.409 | 2.128 | 5.374 | ||
| Illumina MiSeq | SRR13558756 | 0.379 | 1.123 | 2.541 | |
| Nanopore MinION | SRR13558757 | 0.019 | 1.364 | 2.742 | |
| Grand Total | 23.708 | 58.698 | 139.327 | ||
| Measurement(s) | DNA • genome • sequence_assembly • sequence feature annotation |
| Technology Type(s) | DNA sequencing • Oxford Nanopore Sequencing • Illumina sequencing • sequence assembly process • sequence annotation |
| Sample Characteristic - Organism | Leishmaniinae |
| Sample Characteristic - Location | Namibia • Thailand • Ghana • Brazil |