| Literature DB >> 30520948 |
Harald Oey1, Martha Zakrzewski2, Kanwar Narain3, K Rekha Devi3, Takeshi Agatsuma4, Sujeevi Nawaratna2,5, Geoffrey N Gobert2,6, Malcolm K Jones7, Mark A Ragan8, Donald P McManus2, Lutz Krause1,2.
Abstract
Background: Foodborne infections caused by lung flukes of the genus Paragonimus are a significant and widespread public health problem in tropical areas. Approximately 50 Paragonimus species have been reported to infect animals and humans, but Paragonimus westermani is responsible for the bulk of human disease. Despite their medical and economic importance, no genome sequence for any Paragonimus species is available.Entities:
Mesh:
Year: 2019 PMID: 30520948 PMCID: PMC6329441 DOI: 10.1093/gigascience/giy146
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Paragonimus westermani sequencing libraries
| Library | Platform | Library type | Insert size (bp) | Read length (bp) | Read count (raw) |
|---|---|---|---|---|---|
| 200 bp | HiSeq | Paired-end | 200 | 2 × 120 | 140,542,299 |
| 450 bp | HiSeq | Paired-end | 450 | 2 × 100 | 171,954,230 |
| 5kb | HiSeq | Mate-pair | 5000 | 2 × 49 | 232,630,904 |
| 10kb | HiSeq | Mate-pair | 10 000 | 2 × 49 | 266,480,540 |
|
| PacBio | Long read | – | – | 1,731,327 |
Figure 1:k-mer frequencies for the 450 bp library. Distribution of 17-mers in the 450 bp short-insert library demonstrated low sequence heterozygosity. We observed a single peak at 26×, and the P. westermani genome size was estimated to be 1.1 Gb.
Assembly statistics for P. westermani and comparable trematode genomes of similar size
|
|
|
|
| |
|---|---|---|---|---|
| Assembly size (Mb) | 922.8 | 1,275.0 | 606.0 | 546.9 |
| Ungapped size (Mb) | 877.7 | 1,183.5 | 558.0 | 547.1 |
| Contig N50 (kb) | 7.0 (>100 bp) | 9.7 | NA | 14.7 |
| Scaffold N50 (kb) | 135 (>1kb) | 204 | 1,324 | 30.2 |
| Scaffold L50 | 1943 | 1,799 | 135 | 408 |
| Scaffold count | 30,466 (>1 kb) | 45,354 (>1 kb) | 4,919 (>1kb) | 31,822 |
| GC content (%) | 43.3 | 44.1 | 43.8 | 44.1 |
| Repeat content (%) | 45.2 | 57.1 | 28.9 | 32.6 |
| Protein coding genes | 12,852 | 15,740 | 16,356 | 13,634 |
| Longest scaffold (kb) | 809 | 1,565 | 9,657 | 2050 |
| BUSCO—Complete | 65.3% | 65.8% | 71.4% | 70.8% |
| BUSCO—Duplicated | 1.4% | 0.8% | 1.1% | 1.5% |
| BUSCO—Missing | 25.8% | 25.4% | 23.0% | 23.1% |
Combined length of all scaffolds in Mb.
Combined length of all scaffolds without gaps (Ns) in Mb.
Non-overlapping RNA-sequencing-supported gene models [23].
BUSCO: Benchmarking Universal Single-Copy Orthologs.
Figure 2:The complete P. westermani mitochondrial genome. A graphical representation of the P. westermani circular mitochondrial genome is shown, including an ∼6.9 kb repetitive region. Three distinct repeat units were identified in this region, as well as an intervening tRNA gene (tRNA-Glu). All genes are transcribed in the clock-wise direction.
Repeat content percentage of P. westermani and related trematode genome sequences
| Repeat class |
|
|
|
|
|---|---|---|---|---|
| LINE | 21.57 | 26.17 | 12.76 | 14.85 |
| LTR | 7.71 | 10.06 | 2.82 | 1.97 |
| DNA elements | 1.76 | 2.14 | 0.94 | 1.04 |
| SINE | 0.96 | 1.06 | 1.26 | 1.22 |
| Simple repeats | 0.18 | 0.63 | 0.43 | 0.36 |
| Unclassified | 12.97 | 17.06 | 10.69 | 13.15 |
|
|
|
|
|
|
Figure 3:Conservation of the P. westermani proteome across four related trematode species. Paragonimus westermani proteins were mapped to the genome sequences of O. viverrini, C. sinensis, F. hepatica, and S. mansoni using Exonerate. (A)Paragonimus westermani centered Venn diagram of 12,852 predicted proteins. The four included trematode species shared a core set of 7,599 proteins. (B) Sequence identity of P. westermani proteins and orthologues inferred in genomes of related trematodes. Average sequence identity is given in brackets. (C) Distribution of identified functional GO categories across three trematode species. GO annotations were assigned by InterProScan and visualized using WEGO.
Figure 4:Phylogenetic tree and estimated divergence times. A phylogenetic tree of selected trematodes and cestodes and S. mediterranea as outgroup was reconstructed from 104 shared single-copy proteins using the maximum likelihood method. Species divergence was estimated by a Bayesian model using MCMCTREE with relaxed molecular clock and is given in million years, with 95% confidence intervals in round brackets. The split of P. westermani was estimated to have occurred somewhere around 38.9 million years ago (Mya; 28.0–58.6 million years). The analysis was repeated using BEAST 2, and estimated divergence times are shown in square brackets. BEAST 2 estimated the split of P. westermani to have occurred 31.5 Mya.