| Literature DB >> 35167626 |
Andreas J Stroehlein1, Pasi K Korhonen1, V Vern Lee2, Stuart A Ralph2, Margaret Mentink-Kane3, Hong You4, Donald P McManus4, Louis-Albert Tchuem Tchuenté5,6, J Russell Stothard6, Parwinder Kaur7, Olga Dudchenko8,9, Erez Lieberman Aiden7,8,9,10,11, Bicheng Yang12, Huanming Yang13,14, Aidan M Emery15,16, Bonnie L Webster15,16, Paul J Brindley17, David Rollinson15,16, Bill C H Chang1, Robin B Gasser1, Neil D Young1.
Abstract
Urogenital schistosomiasis is caused by the blood fluke Schistosoma haematobium and is one of the most neglected tropical diseases worldwide, afflicting > 100 million people. It is characterised by granulomata, fibrosis and calcification in urogenital tissues, and can lead to increased susceptibility to HIV/AIDS and squamous cell carcinoma of the bladder. To complement available treatment programs and break the transmission of disease, sound knowledge and understanding of the biology and ecology of S. haematobium is required. Hybridisation/introgression events and molecular variation among members of the S. haematobium-group might effect important biological and/or disease traits as well as the morbidity of disease and the effectiveness of control programs including mass drug administration. Here we report the first chromosome-contiguous genome for a well-defined laboratory line of this blood fluke. An exploration of this genome using transcriptomic data for all key developmental stages allowed us to refine gene models (including non-coding elements) and annotations, discover 'new' genes and transcription profiles for these stages, likely linked to development and/or pathogenesis. Molecular variation within S. haematobium among some geographical locations in Africa revealed unique genomic 'signatures' that matched species other than S. haematobium, indicating the occurrence of introgression events. The present reference genome (designated Shae.V3) and the findings from this study solidly underpin future functional genomic and molecular investigations of S. haematobium and accelerate systematic, large-scale population genomics investigations, with a focus on improved and sustained control of urogenital schistosomiasis.Entities:
Mesh:
Year: 2022 PMID: 35167626 PMCID: PMC8846543 DOI: 10.1371/journal.ppat.1010288
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 6.823
Fig 1Synteny and contiguity of the Schistosoma haematobium reference genome.
Comparisons are shown with genomes of A, S. mansoni, B, S. japonicum, C, S. bovis and D, the published draft genome of S. haematobium (Shae.V2). The eight chromosomes are represented as bars in a circular fashion, are distinctly-coloured in a dark shade and named according to the S. mansoni chromosomes. Syntenic blocks containing five or more single-copy orthologs (SCOs) between S. haematobium and the respective other species are shown as ‘links’ and are coloured, in a lighter shade, based on the link that spans the largest portion of the linked reference scaffold/chromosome. The number of SCOs, syntenic blocks, and linked scaffolds, as well as the percentage of the genome assembly that they represent are shown for each panel.
Key metrics of the Schistosoma haematobium Shae.V3 assembly and comparison with assemblies for other key schistosome species.
| Metric | |||||
|---|---|---|---|---|---|
| N50 | 48,328,128 | 4,779,868 | 50,458,499 | 202,989 | 1,093,989 |
| L50 | 3 | 26 | 3 | 498 | 94 |
| N90 | 22,148,653 | 1,076,958 | 24,989,083 | 30,057 | 238,898 |
| L90 | 7 | 88 | 7 | 2299 | 348 |
| Longest scaffold | 93,306,550 | 14,276,808 | 88,881,357 | 1,115,616 | 6,264,197 |
| Shortest scaffold | 2000 | 518 | 1307 | 2009 | 1019 |
| Number of scaffolds | 163 | 666 | 320 | 4774 | 1789 |
| Genome size | 400,271,889 | 371,394,055 | 409,579,008 | 373,478,075 | 369,900,518 |
| Number of Ns | 23,062 (0.01%) | 951,002 (0.26%) | 9,332,694 (2.28%) | 12,677,721 (3.39%) | 26,673 (0.01%) |
| Number of gaps | 45 | 3128 | 282 | 16,814 | 319 |
| Repeat content | 54.3795 | 53.39 | 49.23 | 50.9114 | 46.87 |
| GC content | 35.2 | 34.4 | 34.7 | 33.2 | 33.8 |
| Complete BUSCOs | 211 (82.7%) | 195 (76.5%) | 216 (84.7%) | 203 (79.6%) | 201 (78.8%) |
| Complete and single-copy BUSCOs | 208 (81.6%) | 193 (75.7%) | 211 (82.7%) | 198 (77.6%) | 200 (78.4%) |
| Complete and duplicated BUSCOs | 3 (1.2%) | 2 (0.8%) | 5 (2.0%) | 5 (2.0%) | 1 (0.4%) |
| Fragmented BUSCOs | 22 (8.6%) | 32 (12.5%) | 13 (5.1%) | 26 (10.2%) | 20 (7.8%) |
| Missing BUSCOs | 22 (8.6%) | 28 (11.0%) | 26 (10.2%) | 26 (10.2%) | 34 (13.3%) |
a Number of Benchmarking Universal Single-Copy Orthologs (BUSCOs) identified (genome mode), and percentage of the 255 genes within the Eukaryota data set.
b NCBI accession numbers: PRJEA36577, PRJNA520774 and PRJNA451066. Data sets were obtained from WormBase Parasite (release WBPS15).
Features of the gene and protein sets for S. haematobium V3, V2 and other key schistosome species
| Feature | |||||
|---|---|---|---|---|---|
| Number of genes/mRNA | 9431/14,700 | 9314/9314 | 10,172/14,499 | 11,576/11,576 | 10,089/16,936 |
| Gene length | 23,252 ± 25,748 | 18,333 ± 20,681 | 21,682 ± 24,112 | 12,618 ± 16,045 | 18,366 ± 21,336 |
| mRNA length | 3892 ± 3651 | 2195 ± 1978 | 2794 ± 2266 | 1458 ± 1501 | 2578 ± 2068 |
| Coding domain length | 1600 ± 1659 | 2004 ± 1881 | 1775 ± 1895 | 1458 ± 1501 | 1537 ± 1498 |
| Exon length | 487 ± 1118 | 263 ± 343 | 320 ± 468 | 259 ± 314 | 333 ± 540 |
| Protein length | 532 ± 553 | 666 ± 625 | 591 ± 632 | 485 ± 500 | 512 ± 499 |
| Number of 5’ UTRs | 12,563 | 3097 | 14,157 | n/a | 12,421 |
| Number of 3’ UTRs | 12,888 | 2935 | 14,171 | n/a | 12,503 |
| Complete BUSCOs | 736 (77.1%) | 639 (67.0%) | 752 (78.8%) | 577 (60.5%) | 688 (72.1%) |
| Complete and single-copy BUSCOs | 582 (61.0%) | 628 (65.8%) | 607 (63.6%) | 548 (57.4%) | 386 (40.5%) |
| Complete and duplicated BUSCOs | 154 (16.1%) | 11 (1.2%) | 145 (15.2%) | 29 (3.0%) | 302 (31.7%) |
| Fragmented BUSCOs | 26 (2.7%) | 53 (5.6%) | 24 (2.5%) | 114 (11.9%) | 43 (4.5%) |
| Missing BUSCOs | 192 (20.1%) | 262 (27.5%) | 178 (18.7%) | 263 (27.6%) | 223 (23.4%) |
a Lengths presented as mean ± standard deviation.
b Number of Benchmarking Universal Single-Copy Orthologs (BUSCOs) identified (protein mode), and percentage of the 954 genes for the Metazoa data set.
c NCBI accession numbers: PRJEA36577, PRJNA520774 and PRJNA451066. Data sets were obtained from WormBase Parasite (release WBPS15).
d not available.
Fig 2Analysis of single nucleotide polymorphisms (SNPs) of four individual male Schistosoma haematobium worms from distinct geographic location.
A Intersections of unique or shared, fixed SNPs within the predicted coding regions for isolates from Zambia, Senegal, Mauritius or Mali. Total numbers of SNPs within individual samples are indicated by distinctly-coloured bars (bottom left). B For all samples, density and localisation of SNPs in the S. haematobium reference genome are shown as histograms in the same colour. Gene densities are shown in a histogram on the innermost track, divided into 1Mb sections along each chromosome. For each sample, SNP-rich regions of which > 20% resembled a genomic reference other than S. haematobium are labelled (i-xi), and the distribution of matches against the genome of other schistosome species is displayed as a pie chart.
Summary of the single nucleotide polymorphisms (SNPs) predicted in four representative Schistosoma haematobium males from Zambia, Senegal, Mauritius or Mali.
| Geographic location | Total SNPs | Fixed SNPs (GN = 1/1) | Fixed SNPs in protein-coding regions | Unique, fixed SNPs in |
|---|---|---|---|---|
| Zambia | 1,415,223 | 771,957 | 47,491 | 11,651 |
| Senegal | 1,617,886 | 696,405 | 42,238 | 11,600 |
| Mauritius | 1,539,711 | 603,944 | 37,754 | 4568 |
| Mali | 2,081,064 | 613,253 | 36,854 | 13,516 |
Summary of transcription levels across seven key developmental stages of Schistosoma haematobium.
| Developmental stage | Number of transcribed genes | Number of transcribed isoforms (%) | Average (mean) number of transcribed isoforms per gene | Median TPM | Median TPM of top 1% transcribed isoforms | Key protein/pathway functions for top 1% transcribed isoforms |
|---|---|---|---|---|---|---|
| Egg (from urine) | 8153 (86.4) | 11,106 (75.6) | 1.4 | 15.0 | 1790 | Translation; RNA transport; ribosomal proteins |
| Egg (from hamster) | 7446 (79.0) | 9584 (65.2) | 1.3 | 9.95 | 1592 | Ubiquitin; protein folding, sorting and degradation; RNA transport; ribosomal proteins |
| Sporocyst | 6506 (69.0) | 7990 (54.4) | 1.2 | 5.95 | 1894 | Cellular nucleic acid-binding protein; RNA transport; ribosomal proteins |
| Cercaria | 7202 (76.4) | 9280 (63.1) | 1.3 | 5.74 | 1607 | Calmodulin; cytochrome |
| Schistosomule | 7696 (81.6) | 10,657 (72.5) | 1.4 | 26.3 | 1391 | Peptidyl-prolyl isomerase; 14-3-3 protein beta; RNA transport, ribosomal proteins |
| Adult male | 8182 (86.8) | 11,935 (81.2) | 1.5 | 9.81 | 1702 | Glutathione S-transferase; peptidases/proteases |
| Adult female | 8112 (86.0) | 11,714 (79.7) | 1.4 | 7.79 | 2435 | Peptidases/proteases |
a TPM > 0.5
b transcripts per million
Fig 3Analysis of transcription for key developmental stages of Schistosoma haematobium.
A Transcription profiles of transcript isoforms across seven developmental stages/sexes, clustered (Ward; k = 7) by similarity of Z-score-normalised TPM (transcripts per million) values. Key, enriched (q < 0.05) pathways and/or protein functions are shown to the left of each cluster. Numbers of molecules in round parentheses. B Pairwise comparison of differential (DE; fold change (FC) > 2, false discovery rate (FDR) < 0.05) transcription between male (blue) and female (red) samples, displayed as a ‘volcano’ plot. Key pathways and/or protein functions enriched in DE subsets are highlighted. c Percentage of DE transcripts encoded on each chromosome/scaffold for males (blue) and females (red), respectively; chromosomes/scaffolds enriched (q < 0.05) for male or female DE genes are marked with an asterisk.
Fig 4Long-read, full-length transcripts supporting differential isoform usage in male and female Schistosoma haematobium.
The gene model MS3_00004678 encodes a small GTPase on chromosome ZW. Exons are depicted as blocks and introns as arrowed lines, indicating the coding strand. Reference transcripts are shown at the bottom in red (female; MS3_00004678.7, transcription cluster 7) and blue (male; MS3_00004678.1; transcription cluster 6) with narrow blocks at the end of the gene models representing untranslated regions (UTRs). Full-length, long-read transcripts that matched the intron-exon structure of the isoforms inferred to be transcribed in the male and female adult stage, respectively, are coloured accordingly. Transcripts that support distinct, alternative exon-intron boundaries are shown in black.