Literature DB >> 32455229

Draft Genome of the Liver Fluke Fasciola gigantica.

Tripti Pandey¹, Arpita Ghosh², Vivek N Todur², Vijayakumar Rajendran¹, Parismita Kalita¹, Jupitara Kalita¹, Rohit Shukla¹, Purna B Chetri¹, Harish Shukla¹, Amit Sonkar¹, Denzelle Lee Lyngdoh¹, Radhika Singh¹, Heena Khan¹, Joplin Nongkhlaw¹, Kanhu Charan Das¹, Timir Tripathi¹.

Abstract

Fascioliasis, a neglected foodborne disease caused by liver flukes (genus Fasciola), affects more than 200 million people worldwide. Despite technological advances, little is known about the molecular biology and biochemistry of these flukes. We present the draft genome of Fasciola gigantica for the first time. The assembled draft genome has a size of ∼1.04 Gb with an N50 and N90 of 129 and 149 kb, respectively. A total of 20 858 genes were predicted. The de novo repeats identified in the draft genome were 46.85%. The pathway included all of the genes of glycolysis, Krebs cycle, and fatty acid metabolism but lacked the key genes of the fatty acid biosynthesis pathway. This indicates that the fatty acid required for survival of the fluke may be acquired from the host bile. It may be hypothesized that the relatively larger F. gigantica genome did not evolve through genome duplications but rather is interspersed with many repetitive elements. The genomic information will provide a comprehensive resource to facilitate the development of novel interventions for fascioliasis control.

Entities: Chemical Disease Species

Year: 2020 PMID： 32455229 PMCID： PMC7241025 DOI： 10.1021/acsomega.0c00980

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Fascioliasis, caused by trematodes of the genus Fasciola, is an important foodborne parasitic disease belonging to the group of neglected tropical diseases (NTDs) defined by the WHO.[1]Fasciola hepatica and/or Fasciola gigantica infection is prevalent in over 600 million domestic ruminants worldwide (cattle, sheep, pig, donkey, buffalo, and goats), causing major economic losses of about US$3 billion p.a.[2] Fascioliasis has remarkable latitudinal, longitudinal, and altitudinal distribution due to its ability to adapt to different environments and habitats, including extreme climatic conditions. F. gigantica is found in the tropical regions of Africa, Asia, and the Middle East, where it affects 25–100% of total cattle populations. It is also prevalent in the livestock populations of India, Pakistan, Indonesia, Indochina, and the Philippines. In addition, fascioliasis has been reported in the human population in 51 different countries from five continents; this indicates the geographical expansion of the problem.[3−6] It has affected 2.4–17 million people and has put approximately 180 million people at risk globally.[7−10] The major human fascioliasis endemic areas include Africa, Europe, the Middle East (including Egypt), Southeast Asia, and Latin America; the highest prevalence at 72–100% is observed in Bolivian Altiplano.[11,12] Interestingly, the parasite is better adapted to human hosts in hyperendemic areas.[3] Most cases of human fascioliasis are reported on F. hepatica,[3,6,11,12] although a few reports on F. gigantica causing human infection are available.[13−15] The adult F. gigantica is hermaphroditic and is capable of self-fertilization. The life cycle of Fasciola involves an intermediate host snail of the family Lymnaeidae and a mammalian definitive host. The infection starts on ingesting food contaminated with the larval stage of F. gigantica, i.e., metacercariae, which are found floating freely in fresh water or attached to water plants. The metacercariae exist in the duodenum of the mammalian host and then migrate to the liver through the intestinal wall; the adults mature in the biliary ducts. The eggs are passed into the intestine and then excreted out through feces.[3] When the young flukes migrate through the liver, they cause clinical symptoms, such as abdominal pain, weight loss, fever, nausea, vomiting, hepatomegaly, hepatic tenderness, and eosinophilia. The infection causes extensive damage to the liver and may lead to portal cirrhosis. Long-term infection by Fasciola results in chronic stimulation of the bile duct epithelium due to the excretory-secretory (ES) products released from parasites into the host bile environment.[16] These ES products have key roles in feeding behavior, detoxification of bile components, and immune evasion by liver flukes.[16] Transcriptome data sets for F. gigantica include substantial representation of ES products, suggesting a role in the infection mechanism of this parasite.[17] The WHO has recommended triclabendazole, a benzimidazole compound, as the drug of choice for the treatment of fascioliasis as it is active against key parasite stages, i.e., early juvenile, juvenile, and adult stages. However, recent studies have suggested that F. hepatica has gained resistance to triclabendazole in several countries.[18−21] In principle, foodborne trematodes can be effectively controlled using multiple interventions implemented simultaneously across sectors. Recently, genomes from helminth flukes, including Schistosoma japonicum,[22]Schistosoma mansoni,[23]Schistosoma haematobium,[24,25]Opisthorchis viverrini,[26]Opisthorchis felineus,[27]Clonorchis sinensis,[28,29] and F. hepatica(30,31) have been sequenced. While the present manuscript was under communication, a genome of F. gigantica was also published;[32] however, our genome was first submitted and published as a preprint article (https://www.biorxiv.org/content/10.1101/451476v1.full). These genome sequences shed light on how these organisms survive in the host environment and show their metabolic pathways for adapting to host conditions. The F. hepatica genome is one of the largest pathogen genomes sequenced to date.[33] The noncoding region of the F. hepatica genome was presumed to be involved in gene regulation, while the genome size was correlated to its complex life cycle and various developmental stages. The foodborne trematodes, including F. hepatica, are generally metabolically less constrained than schistosomes and cestodes.[34] The presence of endobacteria, Neorickettsia, that causes chronic illness in a variety of species, including humans, in the reproductive tissues and eggs of F. hepatica suggests a possible mechanism for vertical transmission to the mammalian host. However, its presence in the oral sucker, which helps the flukes to anchor to the biliary tract lining, further suggests a probable mechanism for horizontal transmission.[34] Here, we report the draft sequence, assembly, and analysis of the F. gigantica genome. It is one of the largest parasitic genomes to be sequenced. The genomic information provides a resource to facilitate the development of novel interventions for fascioliasis control.

Results and Discussion

De Novo Genome Assembly and Annotation

To avoid technical difficulties in assembly, genomic DNA was isolated from a single adult fluke and one each of the shotgun sequencing library and mate-pair DNA library were constructed with a library size of approximately 350 bp. The Paired-end and Mate-pair libraries were sequenced using HiSeq 2500 to generate 32.7 and 1.7 Gb of data, respectively. The raw reads were then quality-filtered and adapter-trimmed. The filtered high-quality reads were assembled using SOAPdenovo-v1.5.2 program. This primary assembly was further used for gap filling by Paired-end and Mate-pair reads using GapCloser. Further, SSPACE-v2.0 was used for scaffolding. The resultant assembly was used in Chromosomer-v0.1.4a for further improvement of the assembly. The assembled draft genome obtained was 40 381 scaffolds with a genome size of 1.04 Gb (Table ), which was similar to that of the F. hepatica genome and much larger than the genomes of other parasitic flukes (Table ). The N50 and N90 values were 129 and 149 kb, respectively. A total of 16 465 scaffolds were larger than 10 kb size, resulting in 978.97 Mb of genome length comprising 94.11% of the genome assembly. The completeness of the genome was estimated to be 51.3%, which consisted of 48.1% complete and single copy and 3.2% complete and duplicate copy. Fragments were estimated to be 12.3% using BUSCO2.0. In comparison, the BUSCO completeness of the published genomes of Platyhelminthes ranges from 20 to 73%, as reported in WormBase database (http://parasite.wormbase.org/species.html#Platyhelminthes). The chromosome set of F. gigantica comprises 10 pairs of chromosomes, and the karyotype consists of the chromosomes with 2M, 4Sm, 3St, and 1T.[35]

Table 1

Assembly Features

description	F. gigantica
genome assembly size	1040 230 724 bp [1.04 Gb]
number of scaffolds	40 381
longest scaffold length	1 127 280 bp
average size of scaffolds	25 760 bp
number of genes	20 858
mean protein length	264 aa
number of coding exons	54 948
mean number of coding exons per gene	3
coding exons combined length	16 599 815
number of introns	35 695
mean intron length	2612

Table 2

Comparison of the Nuclear Genome Assemblies of F. gigantica and Related Parasitic Flukes

	F. gigantica (present work)	F. gigantica(32)	F. hepatica(31)	F. hepatica(30)	O. viverrini(26)	C. sinensis(28,29)	S. japonicum(22)	S. haematobium(24,25)	S. mansoni(23)
genome size	1.04 Gb	1.13 Gb	1.13 Gb	1.27 Gb	634.5 Mb	320.5 Mb	397 Mb	385 Mb	364 Mb
number of genes	20 858	13 940	14 851	22 676	16 379	28 407	13 469	13 073	13 184
mean number of exons per gene	3	5.9	3.18	5.3	5.8	7.7	5.3	5.4	6
mean exon length (bp)	302.5	1376	257	303	254	312	222	246	222
mean intron length (bp)	2612	3982	NA	3700	3531	359	2059	2442	2407
total GC content (%)	43.76	41.80	44	47.80	34.06	33.50	34.30	34.70

Repeat Annotation

The de novo method-predicted F. gigantica specific repeats to be 487 374 279 bp, accounting for 46.85% of the entire genome. The total number of repeat sequences identified was represented in 40 381 scaffolds. The repeat unit length ranged from 12 to 2  253 045 bp. We have identified 21.26% LINEs, 6.76% LTR elements, 45.93% total interspersed repeats, and 15.09% of unclassified repeats, as summarized in Table . The details of the repetitive elements are provided in Table S1.

Table 3

Summary of the De Novo Repeats Identified

description	number of elements	length occupied in bp
SINEs	67 024	11 130 748
MIRs	1689	186 234
LINEs	469 965	221 172 212
LINE2	12 439	4 408 849
L3/CR1	134 130	66 193 633
LTR elements	130  496	70 357 405
ERV_class I	344	32 937
ERV_class II	2413	579 624
DNA elements	63 140	18 072 908
TcMar-Tigger	176	53 598
unclassified	697 673	157 005 240
total interspersed repeats		477 738 513
small RNA	35 107	6 310 113
satellites	45 457	7 519 158
simple repeats	68 849	3 254 749
low complexity	2024	92 148

Gene Prediction and Annotation

The draft genome was further used for gene prediction to identify protein coding genes using S. mansoni as the model species. A total of 20 858 genes were predicted with an average gene length of 795 bp and 264 aa. Of them, 59% (12 285 genes) were found to have homology with NCBI NR database, and 13.9% (2900 genes) were classified with gene ontology (GO) terms (details provided in Table S2). The annotation of genes showed the highest hits against F. hepatica (5248), followed by O. viverrini (1389). A total of 2900 genes were annotated with 5641 GO terms distributed in three GO subvocabularies [i.e., cellular component (CC), biological process (BP), and molecular function (MF)]. A total of 2013 genes were classified as BP, 2352 genes as MF, and 1276 genes as CC. Out of the total of 20 858 genes, 807 genes have been found to have all three categories of GO terms (Figures and 2). Genes associated with similar functions were assigned to the same GO functional group. Further, the proteins for F. gigantica and F. hepatica were compared using Blast with 90% identity and were found to have a 65.3% similarity. Out of the total genes similar in both genomes, only 3688 genes were found to have GO terms, which included 1403 CC, 2474 BP, and 3143 MF; the details are mentioned in Figure S1.

Figure 1

Figure 2

GO classification of genes in cellular components, molecular function, and biological process.

Graphical representation of the distribution of genes assigned to GO terms. The proportion of 5371 F. gigantica proteins with functional information in different GO categories is shown as the biological process, molecular function, and cellular component. GO classification of genes in cellular components, molecular function, and biological process. The ES proteins found were cathepsin proteases (which include cathepsin L-like proteases, cathepsin B-like proteases, and cathepsin D-like proteases), glutathione transferase, fatty acid-binding protein, and glyceraldehyde-3-phosphate dehydrogenase.[16,36,37] A total of 23 blast hits against ES proteins were identified from the Blast results, in which cathepsin protein was found predominantly (Table S2). Cathepsin B and L cysteine proteases are important antigens produced in trematodes, mainly in genus Fasciola, and play an important role in parasite nutrition, immune evasion, and host invasion.[38,39] A total of 46 GO terms was assigned, and 4 genes had missing GO terms (Table S2). The significantly enriched proteins are classified in the following GO terms: proteolysis, cysteine-type endopeptidase activity, and regulation of catalytic activity.[31] The GO terms of ES proteins were classified into 0 CC, 19 MP, and 13 BP. Earlier studies have suggested that cathepsins help the parasite to survive inside the host gall bladder and bile duct. Trematodes encode various subfamilies of cathepsins, which, in turn, provide insight into host–parasite relationships and developmentally regulated expression with the passage of the parasites through the host in the life cycle.[40] Proteases may help in the activation of cathepsins, which, in turn, facilitate the digestion of host tissues, releasing essential amino acids.[22] Of the 20 858 predicted proteins, about 28% (5783) did not have sufficient similarity to proteins in other organisms to justify the provision of functional assignments or known functions. They were classified as hypothetical proteins.

Annotation of Conserved Domains

The search made against InterPro database provided 14 487 InterPro hits, 4810 InterPro hits with GO terms, and 6371 nonhits. The GO terms in InterPro were merged, which resulted in 9039 GO before merging, 12 285 GO after merging, 20 351 confirmed IPS GO, and 1608 too general IPS GO. The analysis revealed that 5205 protein sequences were categorized into 1591 domains and 2448 families. InterPro domains/families were sorted according to the assigned gene sequences; the distribution of the top 20 InterPro domains is represented in Figure . The most abundant domain (IPR000477) reverse transcriptase domain was obtained with 1155 annotated gene sequences, followed by (IPR001584) integrase catalytic core with 235 annotated gene sequences and (IPR000719) protein kinase domain with 481 annotated gene sequences. The InterPro families’ distribution is represented in Figure , and the top 5 families identified are (IPR036691) Endonuclease/exonuclease/phosphatase superfamily, (IPR027124) SWR1-complex protein, (IPR027417) P-loop containing nucleoside triphosphate hydrolase, (IPR036397) ribonuclease H superfamily, and (IPR012337) ribonuclease H-like superfamily.

Figure 3

Representation of the 20 most abundant InterPro domains revealed by InterProScan (IPS) annotation.

Figure 4

Representation of the 20 most abundant InterPro families revealed by InterProScan annotation.

Representation of the 20 most abundant InterPro domains revealed by InterProScan (IPS) annotation. Representation of the 20 most abundant InterPro families revealed by InterProScan annotation. The search found 2084 Pfam domains in 6693 genes, in which the reverse transcriptase domain [PF00078] and integrase, catalytic core [PF00665] domains were highly represented by 989 and 175 genes, respectively. The details of the conserved domains/families are provided in Table S3.

Pathway Analysis

KAAS was used to carry out ortholog and mapping of the genes to the biological pathways. The annotated genes were compared against those available in the kyoto encyclopedia of genes and genomes (KEGG) database using BLASTx with a default threshold bit score value and an expected threshold. The total assigned KO IDs were 1343 of 4016 genes that were mapped to respective pathways (details provided in Table S2). The mapped genes represented a metabolic pathway of major biomolecules, such as carbohydrates, amino acids, and other pathways. F. gigantica can obtain energy from both aerobic and anaerobic metabolism.[41] The adult metabolism is anaerobic, and juvenile metabolism is almost aerobic. It is also evident that all liver flukes inhabit the bile duct, which is anaerobic, but for the survival in the intermediate host, biochemical pathways of aerobic metabolism play crucial roles. The glycolytic pathway shows the presence of all of the key enzymes, such as hexokinase [EC: 2.7.1.1], enolase [EC: 4.2.1.11], pyruvate kinase [EC: 2.7.1.40], and lactate dehydrogenase [EC: 1.1.1.27] (Figure S2). Some of the genes involved in energy metabolism were absent, indicating that the adult worms utilize the glucose exogenously from the glycolytic pathway or may absorb nutrients from the host under anaerobic conditions.[28] All of the genes of the Krebs cycle were present (Figure S3). In the fatty acid metabolism pathway, all of the genes encoding enzymes were present (Figure S4). In contrast, only three enzymes, acetyl-CoA carboxylase/biotin carboxylase 1 [EC: 6.4.1.2 6.3.4.14], 3-oxoacyl-[acyl-carrier-protein] synthase II [EC: 2.3.1.179], and long-chain acyl-CoA synthetase [EC: 6.2.1.3], were present for the fatty acid biosynthesis pathway (Figure S5). It is known that the fatty acid-binding proteins in liver flukes play a crucial role in utilizing the fatty acid produced by the host bile. Therefore, liver flukes do not need to synthesize their fatty acids endogenously.[28] The genes of fatty acid metabolism were present, but certain genes of the fatty acid biosynthesis pathway were missing. This indicates that the fatty acid required for the survival of the fluke may be acquired from the host bile.

Analysis of Orthologous Groups

F. gigantica and F. hepatica genomes were predicted to have 20 858 and 33 454 proteins, which resulted in 9365 clusters. A total of 6241 core genes (i.e., in the cluster, multiple copies of genes are present) and 5654 single copies of gene clusters were identified between the two genomes using OrthoVenn (Figure A). In addition, 905 and 2219 unique ortholog clusters were deciphered in F. gigantica and F. hepatica genome, respectively.

Figure 5

Venn diagram showing the phylogenetic distribution of orthologous protein families. (A) Between F. gigantica and F. hepatica. (B) Between F. hepatica, F. gigantica, S. mansoni, and O. viverrini.

Venn diagram showing the phylogenetic distribution of orthologous protein families. (A) Between F. gigantica and F. hepatica. (B) Between F. hepatica, F. gigantica, S. mansoni, and O. viverrini. Similarly, we compared six genomes, i.e., F. gigantica, F. hepatica, S. mansoni, S. japonicum, S. haematobium, and C. sinensis. The total predicted proteins for S. mansoni, S. japonicum, S. haematobium, and C. sinensis were 11 774, 12 738, 11 140, and 13 634, respectively. Total clusters generated were 14 288, out of which 11 138 orthologous clusters were common in at least two species, and 1863 were single copy gene clusters. The total number of clusters identified in each genome is 7664, 10  289, 8298, 8010, 8455, and 7664, respectively. The core genes identified were 3826 from all of the six species, as shown in Figure B. The unique orthologous clusters identified in F. gigantica, F. hepatica, S. mansoni, S. japonicum, S. haematobium, and C. sinensis were 935, 1770, 95, 113, 48, and 189, respectively. Details are provided in Table S4.

Conclusions

F. gigantica is a major parasite of livestock worldwide, causing huge economic losses to agriculture and 2.4–17 million human infections annually. We studied the draft genome of the organism, which is among the largest known parasitic genomes at 1.04 Gb. The relatively larger genome size suggests that F. gigantica genome did not evolve through whole-genome duplications but rather interspersed with many repetitive elements, such as DNA transposons, SINEs, and LINEs. Detailed comparative genome sequencing will provide answers to the large genome size of this parasite. The genomic information will provide new insights into its adaptation to the host environment, and external selection pressures and will help in the development of novel therapies for fascioliasis control.

Methods

DNA Isolation

F. gigantica flukes were collected from the liver of naturally infected cattle from the Bara Bazar slaughterhouse, Shillong, India (latitude- 25.5 724 472; longitude- 91.87 45 219). The whole worm was washed with 70% ethanol, followed by rinsing several times with 1× phosphate buffer saline. Individual flukes were immediately frozen in liquid nitrogen and stored at −80 °C until processing for genomic DNA extraction. A single individual worm was crushed in liquid nitrogen to isolate its genomic DNA using the standard phenol–chloroform extraction method. The quality and integrity of the isolated DNA were checked on 0.8% Agarose gel and a Nanodrop spectrofluorimeter.

DNA Library Construction and Sequencing

One shotgun sequencing library and one Mate-pair DNA library were constructed according to the Illumina Sample Preparation Guide (Illumina, San Diego, CA). The shotgun Paired-end sequencing library with an insert size of approximately 350 bp was prepared using the TruSeq Nano DNA Library Prep Kit for Illumina. Briefly, 200 ng of DNA was fragmented by Covaris M220 to generate a mean fragment distribution of 300–400 bp. Covaris shearing generates dsDNA fragments with 3′ or 5′ overhangs that were then subjected to End Repair Mix to convert the overhangs into blunt ends. The 3′ to 5′ exonuclease activity of this mix removes the 3′ overhangs, and the 5′ to 3′ polymerase activity fills in the 5′ overhangs. A single “A” base was then added to the ends of the polished DNA fragments followed by adapter ligation to ensure a low formation rate of chimera (concatenated template). Indexing adapters were ligated to the ends of the DNA fragments to prepare them for hybridization onto a flow cell. The ligated products were size-selected using Agencourt AMPure XP beads (Beckman Coulter Life Sciences) and polymerase chain reaction (PCR)-enriched with the Illumina adapter index PCR primer for six cycles. The Mate-pair sequencing library was prepared using the Illumina Nextera Mate-pair Sample Preparation Kit. Briefly, 4 μg of the high-quality gDNA was tagmented using Mate-pair transposomes. Using Zymo Genomic DNA Clean & Concentrator kit (Zymo Research), the tagmented DNA was purified and then fragmented for circularization by repairing the ends by strand displacement reactions. Short fragments less than 1500 bp were removed using Ampure XP bead clean up steps. Precise size selection was carried out using Pippin prep system to select 8–11 kb fragments, followed by clean-up using Zymo clean Genomic DNA Clean & Concentrator Kit. The DNA fragments were then self-circularized by an intramolecular ligation, and noncircularized DNA was removed by DNA exonuclease treatment. The large circularized DNA fragments were physically sheared to smaller sized fragments (approximately 300–1000 bp) in Covaris using a defined shearing parameter. The sheared DNA fragments (Mate-pair fragments) containing the biotinylated junction adapter were purified by binding to streptavidin magnetic beads, and the unwanted, un-biotinylated molecules were removed through a series of washes. The streptavidin bead bound fragments were then subjected to end repair, A-tailing, Illumina adapter ligation, and final PCR enrichment for the Mate-pair fragments that have TruSeq DNA adapters on both of the ends. The library validation was carried out using Tape Station 4200 (Agilent Technologies) using the D1000 Screen Tape assay kit. The Paired-end sequencing run was performed on HiSeq 2500 (Illumina) using 2 × 125 bp read chemistry.

Genome Assembly

The whole-genome sequencing was carried out for Paired-end and Mate-pair library using HiSeq. 2500 with 2 × 125 bp chemistry. The raw Mate-pair reads were extracted using an in-house script based on their orientation and presence of the junction adapter between read1 and read2. The reads having the junction adapter in between the reads were used as Mate-pair reads.[42] The raw reads were adapter-trimmed and quality-filtered using Trimmomatic (v 0.35)[43] with a minimum read length cut-off of 100 bp. The assembly of Paired-end and Mate-pair reads was carried out using SOAPDenovo (v1.5.2) with an optimized 57 kmer length. After the primary assembly, GapCloser was used for gap filling and scaffolding with both Paired-end and Mate-pair libraries. Further, scaffolding was carried out using SSPACE-v2.0.[5] The resultant assembly was used with the available genome of F. hepatica (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/763/495/GCA_002763495.1_F_hepatica_1.0.allpaths.pg) using Chromosomer-v0.1.4a.[5] The assembled draft genome was used in downstream analysis. The completeness of the genome was estimated using BUSCO2.0. De novo repeat identification was performed using RepeatModeler v1.0.10. The de novo repeat libraries were constructed using the draft genome with RepeatModeler, which contains two repeat finding programs (RECON and RepatScout). This resulted in a repeat library with classified repeat families that was used in RepeatMasker v4.0.6 as the repeat library, on the draft genome to identify the de novo repeats.

Gene Annotation

The draft genome of F. gigantica was used for gene prediction using Augustus v3.2.1[44] with the gene model parameters tuned for Schistosoma; the rest of the parameters were kept as default. Functional annotations of the predicted genes were performed using BLASTx program, keeping an e value 1 × 10–6 against the NCBI NR database. BLASTx determines the homologous sequences for the genes against NR database. Homologs of F. gigantica-predicted protein sequences were identified using BLAST, and the functional domains were identified using InterPro. The results of BLAST searches were used as an input to Blast2GO PRO.[45] On the basis of the BLAST hits obtained, GO annotation was performed to obtain the GO terms and classify them into BP, MF, and CC. The GO terms associated with each of the BLAST results (mapping step) and the GO annotation assigned to the query (annotation step) were obtained. Further, the conserved domain/motifs were identified using InterProScan (IPS), an online plugin of BLAST2GO that combines various protein signature recognition methods with the Interpro database. The resulting GO terms were merged with the GO term results obtained from the above annotation step. The protein coding gene sequences of F. gigantica and F. hepatica (PRJEB6687) (downloaded from WormBase WBPS10: http://parasite.wormbase.org) were aligned using Blastn to identify the similarity in the protein coding genes. The F. hepatica genes were used as a database for the Blast against F. gigantica protein with an e value of ×10–5. To identify the potential involvement of the predicted genes of F. gigantica in biological pathways, the predicted genes were aligned to the KEGG pathway database using the kyoto encyclopedia of genes and genomes (KEGG) automatic annotation server.[46−48] KEGG analysis includes KEGG Orthology (KO) assignments and Corresponding Enzyme Commission (EC) numbers and metabolic pathways of predicted genes using KEGG automated annotation server KAAS (http://www.genome.jp/kaas-bin/kaas_main). The genes’ distribution under the respective EC number was used to map them to the KEGG biochemical pathways. This process provides an overview of the different metabolic processes active within an organism and enables further understanding of the biological functions of the genes.

Identification of Orthologous Groups

The protein sequences of F. hepatica, S. mansoni, S. japonicum, S. haematobium, and, C. sinensis were obtained from the WormBase Parasite database (http://parasite.wormbase.org). Protein sequences of F. gigantica and F. hepatica were used to perform an all-against-all comparison using BLASTP with orthoVenn at default parameters.[49] The core genes and unique genes were identified between F. gigantica and F. hepatica genomes. The ortholog analysis was also performed with F. hepatica, S. mansoni, S. japonicum, S. haematobium, and C. sinensis. This enabled us to elucidate the function and evolution of protein across the six species.

43 in total

Review 1. The phylogeny, structure and function of trematode cysteine proteases, with particular emphasis on the Fasciola hepatica cathepsin L family.

Authors: Colin Stack; John P Dalton; Mark W Robinson
Journal: Adv Exp Med Biol Date: 2011 Impact factor: 2.622

2. Collagenolytic activities of the major secreted cathepsin L peptidases involved in the virulence of the helminth pathogen, Fasciola hepatica.

Authors: Mark W Robinson; Ileana Corvo; Peter M Jones; Anthony M George; Matthew P Padula; Joyce To; Martin Cancela; Gabriel Rinaldi; Jose F Tort; Leda Roche; John P Dalton
Journal: PLoS Negl Trop Dis Date: 2011-04-05

3. The major cathepsin L secreted by the invasive juvenile Fasciola hepatica prefers proline in the S2 subsite and can cleave collagen.

Authors: Ileana Corvo; Martín Cancela; Mónica Cappetta; Natalia Pi-Denis; José F Tort; Leda Roche
Journal: Mol Biochem Parasitol Date: 2009-04-19 Impact factor: 1.759

Review 4. Fascioliasis: a worldwide parasitic disease of importance in travel medicine.

Authors: Keyhan Ashrafi; M Dolores Bargues; Sandra O'Neill; Santiago Mas-Coma
Journal: Travel Med Infect Dis Date: 2014-09-28 Impact factor: 6.211

Review 5. Current Threat of Triclabendazole Resistance in Fasciola hepatica.

Authors: Jane M Kelley; Timothy P Elliott; Travis Beddoe; Glenn Anderson; Philip Skuce; Terry W Spithill
Journal: Trends Parasitol Date: 2016-04-02

6. Apparent triclabendazole-resistant human Fasciola hepatica infection, the Netherlands.

Authors: Annemarie J S Winkelhagen; Theo Mank; Peter J de Vries; Robin Soetekouw
Journal: Emerg Infect Dis Date: 2012-06 Impact factor: 6.883

7. AUGUSTUS: ab initio prediction of alternative transcripts.

Authors: Mario Stanke; Oliver Keller; Irfan Gunduz; Alec Hayes; Stephan Waack; Burkhard Morgenstern
Journal: Nucleic Acids Res Date: 2006-07-01 Impact factor: 16.971

8. The Fasciola hepatica genome: gene duplication and polymorphism reveals adaptation to the host environment and the capacity for rapid evolution.

Authors: Krystyna Cwiklinski; John Pius Dalton; Philippe J Dufresne; James La Course; Diana Jl Williams; Jane Hodgkinson; Steve Paterson
Journal: Genome Biol Date: 2015-04-03 Impact factor: 13.583

9. Genomes of Fasciola hepatica from the Americas Reveal Colonization with Neorickettsia Endobacteria Related to the Agents of Potomac Horse and Human Sennetsu Fevers.

Authors: Samantha N McNulty; Jose F Tort; Gabriel Rinaldi; Kerstin Fischer; Bruce A Rosa; Pablo Smircich; Santiago Fontenla; Young-Jun Choi; Rahul Tyagi; Kymberlie Hallsworth-Pepin; Victoria H Mann; Lakshmi Kammili; Patricia S Latham; Nicolas Dell'Oca; Fernanda Dominguez; Carlos Carmona; Peter U Fischer; Paul J Brindley; Makedonka Mitreva
Journal: PLoS Genet Date: 2017-01-06 Impact factor: 5.917

10. The Opisthorchis viverrini genome provides insights into life in the bile duct.

Authors: Neil D Young; Niranjan Nagarajan; Suling Joyce Lin; Pasi K Korhonen; Aaron R Jex; Ross S Hall; Helena Safavi-Hemami; Worasak Kaewkong; Denis Bertrand; Song Gao; Qihui Seet; Sopit Wongkham; Bin Tean Teh; Chaisiri Wongkham; Pewpan Maleewong Intapan; Wanchai Maleewong; Xinhua Yang; Min Hu; Zuo Wang; Andreas Hofmann; Paul W Sternberg; Patrick Tan; Jun Wang; Robin B Gasser
Journal: Nat Commun Date: 2014-07-09 Impact factor: 14.919

9 in total

1. Phosphoglycerate kinase: structural aspects and functions, with special emphasis on the enzyme from Kinetoplastea.

Authors: Maura Rojas-Pirela; Diego Andrade-Alviárez; Verónica Rojas; Ulrike Kemmerling; Ana J Cáceres; Paul A Michels; Juan Luis Concepción; Wilfredo Quiñones
Journal: Open Biol Date: 2020-11-25 Impact factor: 6.411

2. The soluble glutathione transferase superfamily: role of Mu class in triclabendazole sulphoxide challenge in Fasciola hepatica.

Authors: Rebekah B Stuart; Suzanne Zwaanswijk; Neil D MacKintosh; Boontarikaan Witikornkul; Peter M Brophy; Russell M Morphew
Journal: Parasitol Res Date: 2021-01-27 Impact factor: 2.289

3. Proteomic Profiling of the Liver, Hepatic Lymph Nodes, and Spleen of Buffaloes Infected with Fasciola gigantica.

Authors: Rui-Si Hu; Fu-Kai Zhang; Hany M Elsheikha; Qiao-Ni Ma; Muhammad Ehsan; Quan Zhao; Xing-Quan Zhu
Journal: Pathogens Date: 2020-11-24

Review 4. Chronic Wasting Due to Liver and Rumen Flukes in Sheep.

Authors: Alexandra Kahl; Georg von Samson-Himmelstjerna; Jürgen Krücken; Martin Ganter
Journal: Animals (Basel) Date: 2021-02-19 Impact factor: 2.752

5. A global phosphoproteomics analysis of adult Fasciola gigantica by LC-MS/MS.

Authors: Ming Pan; Shao-Yuan Bai; Jing-Zhi Gong; Dan-Dan Liu; Feng Lu; Qi-Wang Jin; Jian-Ping Tao; Si-Yang Huang
Journal: Parasitol Res Date: 2022-01-05 Impact factor: 2.289

6. In silico characterisation of the complete Ly6 protein family in Fasciola gigantica supported through transcriptomics of the newly-excysted juveniles.

Authors: Sarah D Davey; Iain W Chalmers; Narcis Fernandez-Fuentes; Martin T Swain; Dan Smith; Syed M Abbas Abidi; Mohammad K Saifullah; Muthusamy Raman; Gopalakrishnan Ravikumar; Paul McVeigh; Aaron G Maule; Peter M Brophy; Russell M Morphew
Journal: Mol Omics Date: 2022-01-17

7. High-quality reference genome of Fasciola gigantica: Insights into the genomic signatures of transposon-mediated evolution and specific parasitic adaption in tropical regions.

Authors: Xier Luo; Kuiqing Cui; Zhiqiang Wang; Zhipeng Li; Zhengjiao Wu; Weiyi Huang; Xing-Quan Zhu; Jue Ruan; Weiyu Zhang; Qingyou Liu
Journal: PLoS Negl Trop Dis Date: 2021-10-05

8. Draft genome of the bluefin tuna blood fluke, Cardicola forsteri.

Authors: Lachlan Coff; Andrew J Guy; Bronwyn E Campbell; Barbara F Nowak; Paul A Ramsland; Nathan J Bott
Journal: PLoS One Date: 2022-10-14 Impact factor: 3.752

9. Identification and characterization of cytosolic malate dehydrogenase from the liver fluke Fasciola gigantica.

Authors: Purna Bahadur Chetri; Rohit Shukla; Timir Tripathi
Journal: Sci Rep Date: 2020-08-07 Impact factor: 4.379

9 in total