| Literature DB >> 28095660 |
Jinhwa Kong1,2, Jungim Won2, Jeehee Yoon1, UnJoo Lee3, Jong-Il Kim4, Sun Huh5.
Abstract
This study aimed at constructing a draft genome of the adult female worm Toxocara canis using next-generation sequencing (NGS) and de novo assembly, as well as to find new genes after annotation using functional genomics tools. Using an NGS machine, we produced DNA read data of T. canis. The de novo assembly of the read data was performed using SOAPdenovo. RNA read data were assembled using Trinity. Structural annotation, homology search, functional annotation, classification of protein domains, and KEGG pathway analysis were carried out. Besides them, recently developed tools such as MAKER, PASA, Evidence Modeler, and Blast2GO were used. The scaffold DNA was obtained, the N50 was 108,950 bp, and the overall length was 341,776,187 bp. The N50 of the transcriptome was 940 bp, and its length was 53,046,952 bp. The GC content of the entire genome was 39.3%. The total number of genes was 20,178, and the total number of protein sequences was 22,358. Of the 22,358 protein sequences, 4,992 were newly observed in T. canis. Following proteins previously unknown were found: E3 ubiquitin-protein ligase cbl-b and antigen T-cell receptor, zeta chain for T-cell and B-cell regulation; endoprotease bli-4 for cuticle metabolism; mucin 12Ea and polymorphic mucin variant C6/1/40r2.1 for mucin production; tropomodulin-family protein and ryanodine receptor calcium release channels for muscle movement. We were able to find new hypothetical polypeptides sequences unique to T. canis, and the findings of this study are capable of serving as a basis for extending our biological understanding of T. canis.Entities:
Keywords: Toxocara canis; de novo synthesis; genomics; next generation sequencing
Mesh:
Substances:
Year: 2016 PMID: 28095660 PMCID: PMC5266360 DOI: 10.3347/kjp.2016.54.6.751
Source DB: PubMed Journal: Korean J Parasitol ISSN: 0023-4001 Impact factor: 1.341
Fig. 1Overview of the genome analysis process. The overall workflow of the genomic analysis of Toxocara canis is shown, and all software tools and annotation databases used are also summarized. For acquisition of more exact results of structural annotation, new tools such as RepeatRunner, Maker, EVM, and PASA were used. For functional annotation, Blast2GO was used.
Features of the Toxocara canis draft genome
| Items | Size or number |
|---|---|
| Total number of scaffolds | 10,853 |
| Total size of scaffolds (bp) | 341,776,187 |
| N50 length (bp) | 108,950 |
| GC content of the entire genome (%) | 39.3 |
| Total number of genes | 20,178 |
| Average gene length (bp) | 6,055 |
| Average exon number per gene | 7.09 |
| Average exon length (bp) | 172 |
| Average intron length (bp) | 793 |
| Average coding sequence length (bp) | 1,077 |
Fig. 2Venn diagram showing the results of the homology comparison of Toxocara canis ortholog genes with other closely related species.
Fig. 3Distribution of gene ontology functional terms for Toxocara canis protein sequences. The graphs show level-2 annotations for biological processes (BP), molecular functions (MF), and cellular components (CC).
Domain information obtained from the Toxocara canis genome
| Ranking | Domain name | No. of sequences |
|---|---|---|
| 1 | P-loops containing nucleoside triphosphate hydrolase | 699 |
| 2 | Protein kinase domain | 541 |
| 3 | Protein kinase-like domain | 440 |
| 4 | G protein-coupled receptors, rhodopsin-like, 7TM | 308 |
| 5 | Serine/threonine/dual-specificity protein kinase, catalytic domain | 293 |
| 6 | WD40/YVTN repeat-like-containing domain | 288 |
| 7 | Immunoglobulin-like fold | 264 |
| 8 | Major facilitator superfamily domain | 253 |
| 9 | EF-hand domain pair | 244 |
| 10 | RNA recognition motif domain | 217 |
| 11 | Zinc finger, RING/FYVE/PHD-type | 207 |
| 12 | Nucleotide-binding alpha-beta plait domain | 200 |
| 13 | WD40-repeat-containing domain | 180 |
| 14 | Pleckstrin homology-like domain | 179 |
| 15 | NAD(P)-binding domain | 175 |
| 16 | Ankyrin repeat-containing domain | 174 |
| 17 | Armadillo-like helical | 174 |
| 18 | Armadillo-type fold | 174 |
| 19 | Zincfinger,C2H2 | 174 |
| 20 | Serine-threonine/tyrosine-protein kinase catalytic domain | 171 |
| 21 | Alpha/beta-hydrolase fold | 166 |
| 22 | Homeodomain-like | 165 |
| 23 | Immunoglobulin-like domain | 159 |
| 24 | PDZ domain | 148 |
| 25 | Tetratricopeptide-like helical domain | 147 |
| 26 | Zinc finger, RING-type | 146 |
| 27 | Epidermal growth factor-like domain | 145 |
| 28 | Winged helix-turn-helix DNA-binding domain | 145 |
| 29 | Nematode cuticle collagen, N-terminal | 139 |
| 30 | Reverse transcriptase domain | 139 |
Fig. 4KEGG map for the mucin type O-glycan biosynthesis pathway.