| Literature DB >> 22023798 |
Xiaoyun Wang1, Wenjun Chen, Yan Huang, Jiufeng Sun, Jingtao Men, Hailiang Liu, Fang Luo, Lei Guo, Xiaoli Lv, Chuanhuan Deng, Chenhui Zhou, Yongxiu Fan, Xuerong Li, Lisi Huang, Yue Hu, Chi Liang, Xuchu Hu, Jin Xu, Xinbing Yu.
Abstract
BACKGROUND: Clonorchis sinensis is a carcinogenic human liver fluke that is widespread in Asian countries. Increasing infection rates of this neglected tropical disease are leading to negative economic and public health consequences in affected regions. Experimental and epidemiological studies have shown a strong association between the incidence of cholangiocarcinoma and the infection rate of C. sinensis. To aid research into this organism, we have sequenced its genome.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22023798 PMCID: PMC3333777 DOI: 10.1186/gb-2011-12-10-r107
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Summary of the C.sinensis genome assembly
| Total length (Mb) | Number | N50a (bp) | N90a (bp) | Longest (bp) | |
|---|---|---|---|---|---|
| Contigb | 515.56 | 60,796 | 14,708 | 4,079 | 137,874 |
| Scaffoldb | 516.46 | 31,822 | 30,195 | 7,299 | 238,094 |
| Super-scaffoldb | 516.47 | 26,446 | 42,632 | 8,441 | 400,764 |
aThe N50 and N90 sizes of contigs or scaffolds were calculated by ordering all sequences and then adding the lengths from the longest to the shortest until the summed length exceeded 50% and 90% of the total length of all sequences, respectively. bContigs and scaffolds were constructed by Celera, while contigs were continuous sequence fragment without gaps (Ns). Super-scaffolds were built with RNA-seq data by RNAPATH based on the scaffolds.
General pattern of protein-coding genes of C.sinensis with S. mansoni and S. japonicum
| Number of gene models | Average gene length (bp) | Average protein length (bp) | Average exon length (bp) | Average number of exons | Averge intron length (bp) | CDS proportion (%) | Intron proportion (%) | |
|---|---|---|---|---|---|---|---|---|
| 16,258 | 11,548 | 441 | 223 | 5.9 | 2,077 | 4.14 | 32.2 | |
| 12,657 | 9,999 | 392 | 222 | 5.3 | 2,059 | 3.70 | 28.00 | |
| 11,747 | 13,395 | 446 | 222 | 6 | 2,407 | 4.10 | 37.20 |
CDS, coding sequence.
Figure 1Functional categorization of genes and protein domains of . (a) Proportions of the 9,371 C. sinensis proteins in different Gene Ontology categories (biological process terms only). The classification was carried out by CateGOrizer [28] based on the second level of the Gene Ontology category biological process. (b) 8,371 domains were detected in C. sinensis, vertebrates (H. sapiens, G. gallus and D. rerio), D. melanogaster, C. elegans and Schistosoma (S. japonicum and S. mansoni). The major protein domains of C. sinensis are shared with other taxa and C. sinensis has the fewest unique domains.
Figure 2Maximum likelihood phylogenetic tree. The phylogenetic tree was constructed using concatenated amino acid sequences for 44 single-copy genes present in all nine genomes with maximum likelihood analysis. Numbers at the nodes indicate bootstrap values.
Figure 3Schematic diagram of the . All orthologues of fatty acid biosynthesis genes were BLAST to three fluke genomes. Only a fatty acid synthase (FASN) gene (tca:658978) could be significantly mapped (e-value < 1e-10), but two key domains of Ketoacyl-synt and Acyl_transf_1 were not observed in all of the three species. (a) C. sinensis (scaffolds: Csin_scf23908 and Csin_scf26026); (b)S. japonicum (scaffold: SJC_000021). (c) S. mansoni (scaffold: Smp_scaff000021). AAA, ATPase family associated with various cellular activities; Acyl_transf_1, acyl transferase domain; Ketoacyl-synt, beta-ketoacyl synthase, N-terminal domain and C-terminal domain; KR, KR domain; Thioesterase, thioesterase domain.