| Literature DB >> 23109823 |
Feni Iranawati1, Hyungtaek Jung1, Vincent Chand1, David A Hurwood1, Peter B Mather1.
Abstract
Siamese mud carp (Henichorynchus siamensis) is a freshwater teleost of high economic importance in the Mekong River Basin. However, genetic data relevant for delineating wild stocks for management purposes currently are limited for this species. Here, we used 454 pyrosequencing to generate a partial genome survey sequence (GSS) dataset to develop simple sequence repeat (SSR) markers from H. siamensis genomic DNA. Data generated included a total of 65,954 sequence reads with average length of 264 nucleotides, of which 2.79% contain SSR motifs. Based on GSS-BLASTx results, 10.5% of contigs and 8.1% singletons possessed significant similarity (E value < 10(-5)) with the majority matching well to reported fish sequences. KEGG analysis identified several metabolic pathways that provide insights into specific potential roles and functions of sequences involved in molecular processes in H. siamensis. Top protein domains detected included reverse transcriptase and the top putative functional transcript identified was an ORF2-encoded protein. One thousand eight hundred and thirty seven sequences containing SSR motifs were identified, of which 422 qualified for primer design and eight polymorphic loci have been tested with average observed and expected heterozygosity estimated at 0.75 and 0.83, respectively. Regardless of their relative levels of polymorphism and heterozygosity, microsatellite loci developed here are suitable for further population genetic studies in H. siamensis and may also be applicable to other related taxa.Entities:
Keywords: 454 pyrosequencing; Henichorynchus siamensis; SSR marker
Mesh:
Substances:
Year: 2012 PMID: 23109823 PMCID: PMC3472715 DOI: 10.3390/ijms130910807
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Summary of 454 pyrosequencing.
| Description | Dataset |
|---|---|
| Total number of bases (Mb) | 17.44 Mb |
| Average read length (nt) | 264 nt |
| Number of reads | |
| Total reads | 65,954 |
| Assembled | 5,297 |
| Singleton | 46,393 |
| Repeat | 280 |
| Number of contig | |
| Total contigs | 857 |
| Average contig read length (nt) | 352 nt |
| Largest contig (nt) | 2,373 nt |
| Number of large contigs > 500 nt | 143 |
Figure 1Summary of H. siamensis 454 pyrosequencing sequences.
Figure 2Top 25 hit species distribution based on BLASTx. E value cut-off is 10−5. Singleton (A) and contig (B). Bold text indicates teleosts.
Figure 3Gene Ontology (GO) terms for contig and singleton sequences in H. siamensis.
Summary of the top 20 domains combining contigs (Con) and singletons (Sing) in H. siamensis.
| IPR accession | Domain name | Domain description | Total of occurrence (Con/Sing) |
|---|---|---|---|
| IPR006130 | Asp/Orn_carbamoylTrfase | Aspartate/ornithine carbamoyltransferase | 1 (1/0) |
| IPR006132 | Asp/Orn_carbamoyltranf_P-bd | Aspartate/ornithine carbamoyltransferase carbamoyl-P binding, | 1 (1/0) |
| IPR002126 | Cadherin | Cadherin | 15 (0/15) |
| IPR005135 | Exo_endo_phos | Endonuclease/exonuclease/phosphatase | 1 (1/0) |
| IPR001845 | HTH_ArsR_DNA-bd_dom | HTH arsR-type DNA-binding domain | 1 (1/0) |
| IPR013098 | Ig_I-set | Immunoglobulin I-set | 14 (1/13) |
| IPR007110 | Ig_like | Immunoglobulin-like | 26 (0/26) |
| IPR013783 | Ig-like_fold | Immunoglobulin-like fold | 84 (2/82) |
| IPR013106 | Ig_V-set | Immunoglobulin V-set | 24 (0/24) |
| IPR001584 | Integrase_cat-core | Integrase, catalytic core | 23 (1/22) |
| IPR011009 | Kinase-like_dom | Protein kinase-like domain | 14 (0/14) |
| IPR000719 | Prot_kinase_cat_dom | Protein kinase, catalytic domain | 15 (0/15) |
| IPR012337 | RNaseH-like | Ribonuclease H-like | 29 (1/28) |
| IPR000477 | RVT | Reverse transcriptase | 37 (2/35) |
| IPR000276 | 7TM_GPCR_Rhodpsn | GPCR, rhodopsin-like, 7TM | 18 (0/18) |
| IPR002492 | Transposase_Tc1-like | Transposase, Tc1-like | 2 (2/0) |
| IPR002035 | VWF_A | Von Willebrand factor, type A | 2 (2/0) |
| IPR006612 | Znf_C2CH | Zinc finger, C2CH-type | 1 (1/0) |
| IPR013087 | Znf_C2H2/integrase_DNA-bd 5 | Zinc finger, C2H2-type/ integrase, DNA-binding | 28 (0/28) |
| IPR007087 | zf-C2H2 | Zinc finger, C2H2 | 52 (0/52) |
Frequency of genes identified in contigs (Con) and singletons (Sing) in H. siamensis.
| Candidate genes | Matched species | Length range (nt) | Total of occurrence (Con/Sing) | |
|---|---|---|---|---|
| Enzymatic poly | 3.12 × 10−91–4.80 × 10−13 | 269–1037 | 15 (3/12) | |
| Lrr and pyd domainscontaining protein 12 | 3.42 × 10−35–1.72 × 10−8 | 123–439 | 19 (0/19) | |
| Novel protein | 1.18 × 10−65–5.42 × 10−6 | 120–595 | 43 (7/36) | |
| Orf2-encoded protein | 4.63 × 10−97–3.33 × 10−8 | 143–1679 | 27 (7/20) | |
| Protein nlrc3-like | 7.76 × 10−71–1.32 × 10−9 | 220–528 | 13 (2/11) | |
| Retrotransposable element tf2 | 3.95 × 10−97–4.63 × 10−6 | 145–527 | 120 (2/118) | |
| Reverse transcriptase-like protein | 7.74 × 10−52–1.34 × 10−11 | 151–544 | 22 (4/18) | |
| Sjchgc01974 protein | 5.82 × 10−27–5.28 × 10−7 | 139–430 | 18 (0/18) | |
| Transposable element tc1 transposase 155 kda protein type 1-like | 6.41 × 10−23–1.80 × 10−8 | 128–233 | 5 (5/0) | |
| Transposase | 1.65 × 10−47–5.92 × 10−12 | 196–507 | 21 (4/17) |
Figure 4Distribution of simple sequence repeat (SSR) nucleotide classes among different nucleotide types in H. siamensis.
Details of 8 tetranucleotide SSR repeats designed for H. siamensis.
| Locus | Primer sequence | Repeat Motif | Pop | PIC | PHWE | Percent missing | |||
|---|---|---|---|---|---|---|---|---|---|
| HS2 | GTGGCGGAAATGGGCTTC | (ATCT)^14 | BB | 15 | 0.868 | 0.913 | 0.907 | 0.602 | 7% |
| CCTGAGGCATTTCATAAACTCCG | UB | 18 | 0.619 | 0.902 | 0.889 | 0.000 | 10% | ||
| HS4 | CTCATCACCCGCTGTGTTTC | (ATCT)^11 | BB | 35 | 0.775 | 0.962 | 0.961 | 0.006 | 0% |
| CACACACTGACAGGCAGAC | UB | 37 | 0.894 | 0.940 | 0.938 | 0.125 | 0% | ||
| HS5 | TGTCGTTCTCTGGCTGTCC | (ATCT)^13 | BB | 23 | 0.976 | 0.932 | 0.928 | 0.081 | 0% |
| CCCAGATACAGGAGTGGGATG | UB | 19 | 0.787 | 0.919 | 0.913 | 0.078 | 0% | ||
| HS12 | TTGCCTGGAGGACAAGACC | (ATCT)^9 | BB | 22 | 0.725 | 0.936 | 0.932 | 0.003 | 0% |
| TGCCACTGCACAGTAAACG | UB | 27 | 0.711 | 0.954 | 0.952 | 0.001 | 0% | ||
| HS14 | ACACGAGTGAGGAGTGCTG | (CTGT)^9 | BB | 14 | 0.806 | 0.846 | 0.832 | 0.608 | 12% |
| AGGCCACAAACTTCTGCTTG | UB | 15 | 0.810 | 0.822 | 0.812 | 0.550 | 10% | ||
| HS21 | CAACAAGCAGAGCGACAGG | (ACTC)^8 | BB | 7 | 0.730 | 0.705 | 0.657 | 0.981 | 0% |
| TGTTGATAACGCGCCACAG | UB | 11 | 0.596 | 0.750 | 0.722 | 0.569 | 0% | ||
| HS23 | TGAATGGAATGAGAGGTTCAGC | (GAGT)^8 | BB | 12 | 0.878 | 0.830 | 0.810 | 0.303 | 0% |
| TGCTGCTTGTGTGTTCAAAG | UB | 13 | 0.957 | 0.873 | 0.860 | 0.000 | 0% | ||
| HS24 | AACACCATACACCTGCACC | (AAAC)^8 | BB | 6 | 0.341 | 0.504 | 0.477 | 0.007 | 9% |
| ACTCCTGTGGTGGAAGAAAGG | UB | 5 | 0.467 | 0.528 | 0.483 | 0.000 | 0% |
Pop, population; BB, Battambang (41 samples); UB, Ubon Rathanchani (48 samples); Na, number of alleles; Ho, observed heterozygosity; He, expected heterozygosity; PIC, Polymorphism information content; PHWE significant at p < 0.003 after Bonferroni correction.