| Literature DB >> 24929830 |
Bernardo J Foth1, Isheng J Tsai1,2, Adam J Reid1, Allison J Bancroft3, Sarah Nichol1, Alan Tracey1, Nancy Holroyd1, James A Cotton1, Eleanor J Stanley1, Magdalena Zarowiecki1, Jimmy Z Liu4, Thomas Huckvale1, Philip J Cooper5,6, Richard K Grencis3, Matthew Berriman1.
Abstract
Whipworms are common soil-transmitted helminths that cause debilitating chronic infections in man. These nematodes are only distantly related to Caenorhabditis elegans and have evolved to occupy an unusual niche, tunneling through epithelial cells of the large intestine. We report here the whole-genome sequences of the human-infective Trichuris trichiura and the mouse laboratory model Trichuris muris. On the basis of whole-transcriptome analyses, we identify many genes that are expressed in a sex- or life stage-specific manner and characterize the transcriptional landscape of a morphological region with unique biological adaptations, namely, bacillary band and stichosome, found only in whipworms and related parasites. Using RNA sequencing data from whipworm-infected mice, we describe the regulated T helper 1 (TH1)-like immune response of the chronically infected cecum in unprecedented detail. In silico screening identified numerous new potential drug targets against trichuriasis. Together, these genomes and associated functional data elucidate key aspects of the molecular host-parasite interactions that define chronic whipworm infection.Entities:
Mesh:
Year: 2014 PMID: 24929830 PMCID: PMC5012510 DOI: 10.1038/ng.3010
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Genome and gene statistics for nine nematodes
| Nematode clade | I | I | I | I | III | III | III | IV | IV | V |
| Haploid chromosome number | NA | 3 | 3 | 3 | 12 | 6 | 6 | 16 | 6 | 6 |
| Genome assembly size (Mb) | 75.18 | 85.00 | 89.31e | 61.15 | 266.07 | 94.12 | 89.7 | 52.96 | 73.10 | 100.29 |
| Number of scaffolds | 3,711 | 1,123 | 1,069 | 3,853 | 2,414 | 9,805 | 3,452 | 3,389 | 3,555 | 7 |
| N50 scaffolds (kb) | 71.2 | 1,580 | 4,834 | 7,554 | 419.1 | 191.1 | 177.1 | 37.6 | 988.2 | 17,494 |
| N50 scaffolds ( | 263 | 15 | 6 | 3 | 171 | 62 | 126 | 372 | 21 | 3 |
| Longest scaffold (kb) | 533.8 | 7,990 | 17,505 | 12,041 | 3,795 | 5,236 | 1,325 | 360.4 | 3,612 | 20,924 |
| Mean scaffold length (kb) | 20.3 | 75.7 | 83.5 | 15.9 | 110.2 | 9.6 | 26.0 | 15.6 | 20.6 | 14,327 |
| Gaps, combined length (kb) | 14.5 | 227.9 | 4,542 | 4,987 | 7,429 | 7,759 | 3,847 | 0 | 1,476 | 0 |
| Number of genes | 9,650 | 11,004 | 16,380 | 15,260 | 14,219 | 14,907 | 14,420 | 18,074 | 20,501 | |
| Mean protein length (aa) | 435 | 416 | 318 | 396 | 320 | 334 | 348 | 344 | 404 | |
| Median protein length (aa) | 329 | 290 | 192 | 288 | 209 | 213 | 250 | 262 | 329 | |
| Number of coding exons | 55,156 | 83,458 | 87,853 | 121,416 | 81,154 | 98,974 | 87,957 | 81,552 | 123,608 | |
| Coding exons, combined length (Mb) | 12.59 | 13.74 | 15.65 | 18.18 | 13.64 | 14.07 | 15.08 | 18.72 | 24.91 | |
| Mean number of coding exons per gene | 5.7 | 7.6 | 5.4 | 8.0 | 5.7 | 6.6 | 6.1 | 4.5 | 6.0 | |
| Mean coding exon length (bp) | 228.3 | 164.6 | 178.1 | 149.7 | 143.1 | 149.2 | 171.4 | 229.5 | 201.5 | |
| Median coding exon length (bp) | 148 | 117 | 129 | 136 | 131 | 136 | 144 | 184 | 146 | |
| Number of introns | 45,506 | 72,454 | 71,473 | 106,156 | 79,886 | 92,154 | 73,537 | 63,478 | 103,107 | |
| Mean intron length (bp) | 283.8 | 434.2 | 197.5 | 1,031 | 626.6 | 669.4 | 153.1 | 153.0 | 201.5 | |
| Median intron length (bp) | 57 | 58 | 83 | 726 | 246 | 269 | 53 | 69 | 146 |
All genome statistics are based on scaffolds, filtered for a minimum scaffold length of 1,000 bp. NA, not applicable.
aBased on T. trichiura genome assembly v2 and gene set v2.2; this study.
bBased on T. muris genome assembly v3.0 and gene set v2.3; this study.
cBased on T. muris genome assembly v4.0, which includes superscaffolds that were joined on the basis of optical map data; this study.
dBased on genome assembly and gene set release WS236 (WormBase).
eThe size shown is directly measured from the assembly. After taking into account the contribution of collapsed repeats, the true size of the haploid female genome is estimated to be 106 Mb (Supplementary Note).
Figure 1Genome structure and synteny in T. spiralis and Trichuris species.
(a) Mapping one-to-one gene orthologs (as determined by Inparanoid) between genome scaffolds for T. spiralis and T. muris (genome assembly v4) followed by clustering of the resulting ortholog pattern identifies three large linkage groups in each genome (left; Supplementary Note). The table lists the number of one-to-one gene orthologs for individual comparisons between the 11 longest scaffolds of T. spiralis and the 17 longest scaffolds of T. muris, which represent 85.2% and 89.0% of the respective genomes. The relationship of one-to-one gene orthologs shows high-level and cross-genus synteny between T. muris and T. spiralis (right). The linkage group shown in yellow is putatively identified as the sex-specific X chromosome. (b) Median relative coverage of high-throughput sequencing reads was derived for a pool of 11 female T. muris parasites or single male parasites (T. muris and T. trichiura) and was calculated per 10-kb window across all genome scaffolds that were assigned to 1 of the 3 linkage groups. In females, mapped sequence read coverage is even across all three linkage groups, whereas, in males, read coverage exhibits a bimodal distribution. In particular, the linkage group to which scaffolds belong separates well with either of the two peaks of relative read coverage. Scaffolds of linkage group X are associated with half the median read coverage found for scaffolds of linkage groups 1 and 2. (c) Levels of heterozygosity correlate strongly with affiliation with one of the three linkage groups. Both the 0.5-fold relative read coverage and the very low apparent heterozygosity of linkage group X are consistent with the corresponding scaffolds representing the sex-specific X chromosome, which is expected to occur in a single copy in the diploid genome of a male Trichuris parasite. In box plots, the center line is the median, the box shows 25% to 75% quantiles (interquartile range), the top whisker represents the highest data point below (75% quartile + 1.5 × interquartile range) and the bottom whisker shows the lowest data point above (25% quartile + 1.5 × interquartile range).
Figure 2Comparative genomics of Trichuris species.
(a) Phylogenetic analysis of genome content. The tree shown is a maximum-likelihood phylogeny based on a concatenated alignment of single-copy orthologs. Values on edges represent the inferred numbers of births (+) and deaths (−) of gene families along that edge. The scale bar shows expected number of amino acid substitutions per residue. Pie charts represent the gene family composition of each genome: the area of the circle is proportional to the predicted proteome size, and wedges represent the numbers of proteins predicted to be either singletons (not members of any gene family; yellow), members of families common to the eight genomes (red), members of gene families present only in a single genome (blue) and members of all other gene families (green). (b) Euler diagrams of shared presence and absence of gene families in clade I nematodes and the model nematode C. elegans. Note that the approach in a cannot distinguish the polarity of changes at the base of the tree; thus, for example, the value of 262 gene family gains on the basal branch will include gene families lost on the branch leading to Homo sapiens.
Figure 3Expression and structural characteristics of WAP domain–containing proteins of T. muris.
(a) Normalized transcript levels of the 44 genes encoding WAP domain–containing proteins in T. muris, comparing the parasite anterior region with the posterior regions of adult female (F) and male (M) parasites. Indication of significant transcriptional upregulation in a particular pairwise comparison (Up) refers to false discovery rate (FDR) % 0.01 and FDR % 1 × 10−5 when denoted by one asterisk and to FDR % 1 × 10−5 when denoted by two asterisks. SP, signal peptide; WAP, whey acidic protein (Interpro, IPR008197); WR1, cysteine-rich repeat (IPR006150); TIL, trypsin inhibitor–like (IPR002919). For a full version of this figure, see Supplementary Figure 4a. (b) Sequence logos show the conserved and distinct sequence characteristics of the WAP domains (Interpro, IPR008197) found in proteins from H. sapiens, T. trichiura and T. muris. The four canonical disulfide bonds formed by eight cysteine residues are highlighted at the top of the sequence logo for human WAP domains. The sequence logos representing the different species are aligned around the central CXXDXXC motif (where X is any amino acid). For a full version of this figure, see Supplementary Figure 5.
Figure 4Expression and phylogenetic analysis of DNase II–like proteins from Trichuris species.
(a) Normalized transcript levels of the 18 genes encoding DNase II domain–containing proteins in T. muris, comparing the parasite anterior region with the posterior regions of adult female and male parasites. Indication of significant transcriptional upregulation in a particular pairwise comparison (Up) refers to FDR % 0.01 and FDR > 1 × 10−5 when denoted by one asterisk and to FDR % 1 × 10−5 when denoted by two asterisks. For a full version of this figure, see Supplementary Figure 4b. (b) A maximum-likelihood phylogeny of DNase II protein domains (IPR004947) shows the relationships between the DNase II domains of proteins from Trichuris species, T. spiralis, other nematodes, insects and other invertebrates, and vertebrates. See Supplementary Figure 6 for a fully annotated version of this tree. The circled numbers highlight individual sequences of particular interest: (1) TMUE_s0085001500 1–358 (T. muris), TTRE_0000372701 1–317 (T. trichiura) and E5SXW8_TRISP 8–361 (UniProt E5SXW8); (2) TMUE_s0015002900 1–191 and TTRE_0000937801 31–255; and (3) E5S4S7_TRISP 14–306 (UniProt E5S4S7).
Highly ranked targets with available approved drugs as chemical leads
| Gene ID | Gene annotation | Drugs |
|---|---|---|
| 016011100 | Fatty acid synthase | Cerulenina, orlistatb |
| 022000400 | Na+,K+ ATPase α subunit 1 | Digitoxinc, almitrined, bepridile, bretyliumf, diazoxideg, ethacrynic acidh, hydroflumethiazideh, others* |
| 186000500 | Serine/threonine protein kinase mTOR | Everolimusi, pimecrolimusj, sirolimusi, temsirolimusk, topotecank |
| 016005900 | DNA topoisomerase 1* | Irinotecank, lucanthonel, sodium stibogluconatem |
| 217000500 | Receptor type tyrosine phosphatase | Alendronaten, etidronic acidn |
| 010006100 | Calmodulin | Aprindinef, bepridile, dibucaineo, felodipinep, fluphenazineq, loperamider, phenoxybenzamines, others* |
| 069002800 | Ribonucleoside diphosphate reductase subunit | Cladribinek, gallium nitraten |
| 117003600 | Adenosine deaminase | Dipyridamolet, nelarabinek, theophyllineu, Vidarabinev |
| 023008600 | Dual-specificity mitogen-activated protein | Bosutinibk, trametinibk |
| 029000100 | LDL receptor and EGF-domain-containing protein | Gentamicinw |
| 170000300 | Integrin α pat 2 | Antithymocyte globulinx |
| 123000500 | Tyrosine protein kinase Src42A | Dasatinibk |
| 013002700 | NADPH cytochrome P450 reductase | Benzphetaminey, daunorubicink, ethylmorphinez, nitrofurantoinw, others* |
| 061001100 | DNA polymerase ɛ catalytic subunit A | Cladribinek |
| 025003000 | Tubulin γ1 chain | Vinblastinek |
| 092002800 | DNA ligase 1 | Bleomycinaa |
| 022003300 | Amidophosphoribosyltransferase | Fluorouracilk |
| 229000900 | V type proton ATPase subunit A | Alendronaten, others* |
| 012007400 | V type proton ATPase subunit B | Gallium nitraten |
| 031000300 | ADP, ATP carrier protein | Clodronaten |
| 304000500 | Phenylalanine 4 hydroxylase | Droxidopabb |
| 001008300 | Proteasome subunit β type 2 | Bortezomibk, carfilzomibk |
| 076002300 | Proteasome subunit β type 5 | Bortezomibk, carfilzomibk |
| 064003100 | DNA polymerase α catalytic subunit | Cladribinek, others* |
| 225000200 | Short/branched chain–specific acyl CoA | Valproic acidcc |
| 106001600 | Histone deacetylase | Aminophyllinedd, lovastatinee, vorinostatk, others* |
| 050001800 | Cytochrome | Minocyclinew |
| 060007700 | Proteasome subunit β type 1 | Bortezomibk, carfilzomibk |
| 016000600 | NADH dehydrogenase ubiquinone Fe S protein | Doxorubicink |
Listings of approved drugs obtained from DrugBank[51]. Drug targets have been filtered for predicted essentiality in C. elegans, a ChEMBL Ensembl score of ≥0.5, transcript expression of >100 edgeR-normalized RNA-seq reads per kilobase of gene length in adult parasites and inferred availability of an approved and possibly suitable drug (excluding, for example, hydroxocobalamin (vitamin B12); Supplementary Tables 13 and 14). Drug purposes are indicated by footnote. Asterisks indicate other drugs (with similar activities to those listed), which are not shown.
aAntifungal.
bAnti-obesity.
cCardiac glycoside (heart failure and arrhythmia).
dRespiratory stimulant.
eAngina.
fAntiarrhythmia.
gVasodilator (hypertension).
hDiuretic (hypertension and edema).
iImmunosuppressant.
jEczema.
kCancer.
lSchistosomicide.
mLeishmaniasis.
nOsteoporosis.
oLocal anesthetic.
pCalcium channel blockers (hypertension, angina, migraine).
qAntipsychotic (schizophrenia).
rDiarrhea.
sα-adrenergic antagonist.
tVasodilator (stroke, heart attack).
uAsthma.
vAntiviral.
wAntibiotic.
xOrgan transplant.
yAnorectic.
zAnalgesic.
aaPlantar wart (veruca).
bbNeurotransmitter prodrug (hypotension).
ccAnticonvulsant and mood stabilizer.
ddBronchodilator.
eeStatin (hypercholesterolemia).