| Literature DB >> 20502665 |
Abdulaziz M Al-Swailem1, Maher M Shehata, Faisel M Abu-Duhier, Essam J Al-Yamani, Khalid A Al-Busadah, Mohammed S Al-Arawi, Ali Y Al-Khider, Abdullah N Al-Muhaimeed, Fahad H Al-Qahtani, Manee M Manee, Badr M Al-Shomrani, Saad M Al-Qhtani, Amer S Al-Harthi, Kadir C Akdemir, Mehmet S Inan, Hasan H Otu.
Abstract
Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20502665 PMCID: PMC2873428 DOI: 10.1371/journal.pone.0010720
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Analysis Workflow:
Outlay of analysis steps performed for Camel EST data. External programs used for analysis are shown where appropriate.
Summary of EST analysis results.
| Read Statistics | Sequence Statistics | ||
| Untrimmed # of reads | 70,272 | # of contigs/singletons | 8,319/15,283 |
| Average read length | 1,447±411 bp | average # of reads per contig | 5.2 |
| Average # of high quality bp/read | 614±283 | average contig length | 1,247 bp |
| # of reads after trimming | 58,842 | average singleton length | 696 bp |
| Average read length | 755±171 bp | average ORF length (contig) | 673 bp |
| Average # of high quality bp/read | 670±181 | average ORF length (singleton) | 390 bp |
| # of chimeric sequences | 1,241 | # of contigs with hit | 7,490 |
| # of reads after chimera analysis | 59,534 | # of singletons with hit | 11,480 |
| # of reads with repeat region | 18,340 (30.8%) | # of contigs with no hit | 829 |
| total # of bp masked due to repeats | ∼2.5×106 (5.5%) | # of singletons with no hit | 3,803 |
Figure 2Read Length and Base Call Quality Distribution:
Distribution of read length (a) and high quality base pairs per read (b) in trimmed and untrimmed EST data. Average ± standard deviation values for the measured parameters are overlaid on the graphs.
Figure 3Sample Cluster:
A sample instance of a cluster showing thirty high quality reads masked for repeats that are grouped and aligned to form a final consensus sequence yielding a contig. Individual reads are shown as blue bars and the consensus sequence is shown at top as a red bar. Labels to the left of bars show sequence IDs used for internal analysis purposes. Base pair scale is shown above the consensus sequence with 40 bp intervals rendering a consensus sequence slightly above 1,160 bp. Reads render overlaps of at least 98% identity, at least 40bp, and at most 20bp overlap distance of sequence end.
Figure 4Sequence Length and ORF Length Distribution:
Sequence length distribution for contigs and singletons (a), distribution of longest ORF lengths found in contigs and singletons (b), sequence length distribution for contigs and singletons with hits and no hits (c), and distribution of longest ORF lengths found in contigs and singletons with hits and no hits (d). Average ± standard dev. values of sequence and ORF lengths are overlaid on corresponding graphs. Sequence lengths up to 2,500 and ORF lengths up to 1,800 bp are shown for display purposes. 2.3% of contig and no singleton sequences have length longer than 2,500 bp (a), 2% of contig and no singleton sequences have an ORF longer than 1,800 bp (b), 1.8% of contigs with a hit and no other sequences in the remaining three groups have length longer than 2,500 bp (c), and 1.6% of contigs with a hit and no other sequences in the remaining three groups have an ORF longer than 1,800 bp (d).
BLAST results for contigs, singletons, and their combination shown separately for the nine species analyzed.
| Species | Contigs | Singletons | Combined | ||||||
| Sequences with a hit | Avg. ORF Length | Sequences with ORF>300 bp | Sequences with a hit | Avg. ORF Length | Sequences with ORF>300 bp | Sequences with a hit | Avg. ORF Length | Sequences with ORF>300 bp | |
|
| 7,045 (85%) | 740 | 5,938 (84%) | 10,247 (67%) | 458 | 7,047 (69%) | 17,292 (73%) | 573 | 12,985 (75%) |
|
| 6,323 (76%) | 783 | 5,597 (89%) | 8,373 (55%) | 467 | 6,064 (72%) | 14,696 (62%) | 603 | 11,661 (78%) |
|
| 6,032 (73%) | 803 | 5,455 (90%) | 7,824 (51%) | 479 | 5,873 (75%) | 13,856 (59%) | 620 | 11,328 (82%) |
|
| 6,440 (77%) | 775 | 5,656 (88%) | 8,681 (57%) | 469 | 6,240 (72%) | 15,121 (64%) | 599 | 11,896 (79%) |
|
| 5,737 (69%) | 811 | 5,195 (91%) | 7,590 (50%) | 487 | 5,699 (75%) | 13,327 (57%) | 626 | 10,894 (82%) |
|
| 5,965 (72%) | 802 | 5,367 (90%) | 7,778 (51%) | 469 | 5,707 (73%) | 13,743 (58%) | 614 | 11,074 (81%) |
|
| 6,462 (78%) | 772 | 5,640 (87%) | 8,586 (56%) | 481 | 6,337 (74%) | 15,048 (64%) | 606 | 11,977 (80%) |
|
| 6,516 (78%) | 766 | 5,645 (87%) | 8,530 (56%) | 472 | 6,181 (72%) | 15,046 (64%) | 599 | 11,826 (79%) |
|
| 1,831 (22%) | 846 | 1,684 (92%) | 2,289 (15%) | 515 | 1,764 (77%) | 4,120 (17%) | 662 | 3,448 (84%) |
Percentage of sequences that got a hit to the total number of sequences in each group (contig, singleton, or combined) is shown separately for each species. For the sequences that got a hit, average ORF length and the percentage of sequences with ORF >300 bp (to the total number of sequences that got a hit) is shown for each group and species.
Most frequently matched genes in human.
| Rank | GeneID | Gene_Symbol | Gene Name ( |
| 1 | 3507 | IGHM | immunoglobulin heavy constant mu |
| 2 | 3500 | IGHG1 | immunoglobulin heavy constant gamma 1 (G1m marker) |
| 3 | 28396 | IGHV4-31 | immunoglobulin heavy variable 4-31 |
| 4 | 3492 |
| immunoglobulin heavy locus |
| 5 | 3502 | IGHG3 | immunoglobulin heavy constant gamma 3 (G3m marker) |
| 6 | 3501 | IGHG2 | immunoglobulin heavy constant gamma 2 (G2m marker) |
| 7 | 3503 | IGHG4 | immunoglobulin heavy constant gamma 4 (G4m marker) |
| 8 | 100133739 | LOC100133739 | similar to hCG2038920 |
| 9 | 3493 | IGHA1 | immunoglobulin heavy constant alpha 1 |
| 10 | 652494 | LOC652494 | similar to Ig heavy chain V-III region VH26 precursor |
| 11 | 3495 | IGHD | immunoglobulin heavy constant delta |
| 12 | 100126583 | LOC100126583 | hypothetical LOC100126583 |
| 13 | 80314 | EPC1 | enhancer of polycomb homolog 1 (Drosophila) |
| 14 | 28412 | IGHV3-66 | immunoglobulin heavy variable 3-66 |
| 15 | 3105 | HLA-A | major histocompatibility complex, class I, A |
| 16 | 3107 | HLA-C | major histocompatibility complex, class I, C |
| 17 | 717 | C2 | complement component 2 |
| 18 | 28393 | IGHV4-55 | immunoglobulin heavy variable 4-55 |
| 19 | 28417 | IGHV3-60 | immunoglobulin heavy variable 3-60 |
| 20 | 28464 | IGHV1-58 | immunoglobulin heavy variable 1-58 |
| 21 | 28392 | IGHV4-59 | immunoglobulin heavy variable 4-59 |
| 22 | 28418 | IGHV3-57 | immunoglobulin heavy variable 3-57 |
| 23 | 28380 | IGHV7-56 | immunoglobulin heavy variable 7-56 |
| 24 | 28419 | IGHV3-54 | immunoglobulin heavy variable 3-54 |
| 25 | 643406 | LOC643406 | hypothetical protein LOC643406 |
| 26 | 28424 | IGHV3-48 | immunoglobulin heavy variable 3-48 |
| 27 | 4276 | MICA | MHC class I polypeptide-related sequence A |
| 28 | 8449 | DHX16 | DEAH (Asp-Glu-Ala-His) box polypeptide 16 |
| 29 | 440716 | LOC440716 | hypothetical LOC440716 |
| 30 | 203068 | TUBB | tubulin, beta |
| 31 | 10919 | EHMT2 | euchromatic histone-lysine N-methyltransferase 2 |
| 32 | 1192 | CLIC1 | chloride intracellular channel 1 |
| 33 | 3304 | HSPA1B | heat shock 70kDa protein 1B |
| 34 | 3303 | HSPA1A | heat shock 70kDa protein 1A |
| 35 | 7919 | BAT1 | HLA-B associated transcript 1 |
| 36 | 23 | ABCF1 | ATP-binding cassette, sub-family F (GCN20), member 1 |
| 37 | 3135 | HLA-G | major histocompatibility complex, class I, G |
| 38 | 10107 | TRIM10 | tripartite motif-containing 10 |
| 39 | 2794 | GNL1 | guanine nucleotide binding protein-like 1 |
| 40 | 7726 | TRIM26 | tripartite motif-containing 26 |
| 41 | 10255 | HCG9 | HLA complex group 9 |
| 42 | 80739 | C6orf25 | chromosome 6 open reading frame 25 |
| 43 | 259197 | NCR3 | natural cytotoxicity triggering receptor 3 |
| 44 | 7918 | BAT4 | HLA-B associated transcript 4 |
| 45 | 89870 | TRIM15 | tripartite motif-containing 15 |
| 46 | 80740 | LY6G6C | lymphocyte antigen 6 complex, locus G6C |
| 47 | 55937 | APOM | apolipoprotein M |
| 48 | 58530 | LY6G6D | lymphocyte antigen 6 complex, locus G6D |
| 49 | 1460 | CSNK2B | casein kinase 2, beta polypeptide |
| 50 | 7124 | TNF | tumor necrosis factor (TNF superfamily, member 2) |
Most frequently matched genes in mouse.
| Rank | GeneID | Gene_Symbol | Gene Name ( |
| 1 | 111507 | Igh | immunoglobulin heavy chain complex |
| 2 | 380794 | Ighg | Immunoglobulin heavy chain (gamma polypeptide) |
| 3 | 100043989 | LOC100043989 | V(H)76 segment leader peptide |
| 4 | 100047678 | LOC100047678 | similar to pORF2 |
| 5 | 380795 | AI324046 | expressed sequence AI324046 |
| 6 | 16019 | Igh-6 | immunoglobulin heavy chain 6 (heavy chain of IgM) |
| 7 | 195176 | Igh-VX24 | immunoglobulin heavy chain (X24 family) |
| 8 | 790956 | LOC790956 | 5.8S ribosomal RNA |
| 9 | 19791 | Rn18s | 18S RNA |
| 10 | 236598 | LOC236598 | 28S ribosomal RNA |
| 11 | 22138 | Ttn | Titin |
| 12 | 100045801 | LOC100045801 | similar to pORF2 |
| 13 | 320473 | Heatr5b | HEAT repeat containing 5B |
| 14 | 234358 | D10627 | cDNA sequence D10627 |
| 15 | 100042306 | 100042306 | predicted gene, 100042306 |
| 16 | 56314 | Zfp113 | zinc finger protein 113 |
| 17 | 233058 | Zfp420 | zinc finger protein 420 |
| 18 | 100041343 | 100041343 | predicted gene, 100041343 |
| 19 | 234542 | Rtbdn | Retbindin |
| 20 | 22678 | Zfp2 | zinc finger protein 2 |
| 21 | 16061 | Igh-VJ558 | immunoglobulin heavy chain (J558 family) |
| 22 | 16059 | Igh-V7183 | immunoglobulin heavy chain (V7183 family) |
| 23 | 100040270 | OTTMUSG00000013146 | predicted gene, OTTMUSG00000013146 |
| 24 | 76958 | 2210418O10Rik | RIKEN cDNA 2210418O10 gene |
| 25 | 100039123 | OTTMUSG00000016219 | predicted gene, OTTMUSG00000016219 |
| 26 | 12937 | Pcdha6 | protocadherin alpha 6 |
| 27 | 244556 | Zfp791 | zinc finger protein 791 |
| 28 | 170833 | Hook2 | hook homolog 2 (Drosophila) |
| 29 | 21672 | Prdx2 | peroxiredoxin 2 |
| 30 | 22704 | Zfp46 | zinc finger protein 46 |
| 31 | 16477 | Junb | Jun-B oncogene |
| 32 | 436049 | EG436049 | predicted gene, EG436049 |
| 33 | 246196 | Zfp277 | zinc finger protein 277 |
| 34 | 624855 | EG624855 | predicted gene, EG624855 |
| 35 | 19283 | Ptprz1 | protein tyrosine phosphatase, receptor type Z, polypeptide 1 |
| 36 | 68628 | Fbxw9 | F-box and WD-40 domain protein 9 |
| 37 | 17159 | Man2b1 | mannosidase 2, alpha B1 |
| 38 | 414077 | BC056474 | cDNA sequence BC056474 |
| 39 | 68544 | 2310036O22Rik | RIKEN cDNA 2310036O22 gene |
| 40 | 56495 | Asna1 | arsA (bacterial) arsenite transporter, ATP-binding, homolog 1 |
| 41 | 67836 | 1500041N16Rik | RIKEN cDNA 1500041N16 gene |
| 42 | 212989 | Best2 | bestrophin 2 |
| 43 | 212999 | Tnpo2 | transportin 2 (importin 3, karyopherin beta 2b) |
| 44 | 380800 | Ighvq52.3.8 | immunoglobulin heavy chain variable region Q52.3.8 |
| 45 | 69724 | Rnaseh2a | ribonuclease H2, large subunit |
| 46 | 330817 | Dhps | deoxyhypusine synthase |
| 47 | 22259 | Nr1h3 | nuclear receptor subfamily 1, group H, member 3 |
| 48 | 19732 | Rgl2 | ral guanine nucleotide dissociation stimulator-like 2 |
| 49 | 100039000 | 100039000 | predicted gene, 100039000 |
| 50 | 16971 | Lrp1 | low density lipoprotein receptor-related protein 1 |
Most frequently matched genes in rat.
| Rank | GeneID | Gene_Symbol | Gene Name ( |
| 1 | 299354 | Ighg | Immunoglobulin heavy chain (gamma polypeptide) |
| 2 | 367586 | IgG-2a | gamma-2a immunoglobulin heavy chain |
| 3 | 498354 | LOC498354 | hypothetical protein LOC498354 |
| 4 | 361915 | LOC361915 | hypothetical protein LOC361915 |
| 5 | 294421 | Serinc1 | serine incorporator 1 |
| 6 | 498550 | RGD1560705 | similar to LRRGT00152 |
| 7 | 362795 | LOC362795 | immunoglobulin G heavy chain |
| 8 | 25419 | Crp | C-reactive protein, pentraxin-related |
| 9 | 299352 | Igh-1a | immunoglobulin heavy chain 1a (serum IgG2a) |
| 10 | 501173 | LOC501173 | hypothetical protein LOC501173 |
| 11 | 309243 | Vps13a | vacuolar protein sorting 13A (yeast) |
| 12 | 317588 | LOC317588 | hypothetical protein LOC317588 |
| 13 | 361942 | LOC361942 | similar to ORF4 |
| 14 | 681893 | LOC681893 | similar to SET protein |
| 15 | 501553 | LOC501553 | hypothetical protein LOC501553 |
| 16 | 25116 | Hsd11b1 | hydroxysteroid 11-beta dehydrogenase 1 |
| 17 | 299357 | RGD1359202 | similar to immunoglobulin heavy chain 6 (Igh-6) |
| 18 | 366747 | LOC366747 | similar to Ig heavy chain V region MC101 precursor |
| 19 | 314509 | LOC314509 | similar to single chain Fv antibody fragment scFv 7–10A |
| 20 | 499136 | LOC499136 | LRRGT00021 |
| 21 | 299458 | LOC299458 | similar to Ig H-chain V-region precursor |
| 22 | 499120 | LOC499120 | hypothetical protein LOC499120 |
| 23 | 314487 | Igha_mapped | immunoglobulin heavy chain (alpha polypeptide) (mapped) |
| 24 | 24233 | C4a | complement component 4a |
| 25 | 24231 | C2 | complement component 2 |
| 26 | 361798 | Ehmt2 | euchromatic histone lysine N-methyltransferase 2 |
| 27 | 294257 | Cfb | complement factor B |
| 28 | 497897 | Zfp2 | zinc finger protein 2 |
| 29 | 406864 | Clic1 | chloride intracellular channel 1 |
| 30 | 294254 | Hspa1b | heat shock 70kD protein 1B (mapped) |
| 31 | 55939 | Apom | apolipoprotein M |
| 32 | 294260 | Skiv2l | superkiller viralicidic activity 2-like |
| 33 | 24472 | Hspa1a | heat shock 70kD protein 1A |
| 34 | 24591 | Neu1 | neuraminidase 1 |
| 35 | 25009 | Vars2 | valyl-tRNA synthetase 2 |
| 36 | 294255 | Slc44a4 | solute carrier family 44, member 4 |
| 37 | 406171 | G7e | G7e pseudogene |
| 38 | 309613 | Ng35 | Ng35 pseudogene |
| 39 | 309609 | Ly6g6f | lymphocyte antigen 6 complex, locus G6F |
| 40 | 406866 | Ly6g6e | lymphocyte antigen 6 complex, locus G6E |
| 41 | 361799 | Dom3z | DOM-3 homolog Z (C. elegans) |
| 42 | 361796 | Bat5 | HLA-B associated transcript 5 |
| 43 | 81650 | Csnk2b | casein kinase 2, beta subunit |
| 44 | 309611 | G7c | G7c protein |
| 45 | 361800 | Stk19 | serine/threonine kinase 19 |
| 46 | 294241 | Ly6g6c | lymphocyte antigen 6 complex, locus G6C |
| 47 | 415064 | Bat4 | Bat4 gene |
| 48 | 406170 | Ng23 | Ng23 protein |
| 49 | 415062 | Ly6g6d | lymphocyte antigen 6 complex, locus G6D |
| 50 | 94342 | Bat3 | HLA-B-associated transcript 3 |
Most frequently matched genes in bovine.
| Rank | GeneID | Gene_Symbol | Gene Name ( |
| 1 | 281850 | IGHG1 | immunoglobulin heavy constant gamma 1 |
| 2 | 281852 | IGHG3 | immunoglobulin heavy constant gamma 3 |
| 3 | 404060 | IGG1C | IgG1 heavy chain constant region |
| 4 | 790411 | LOC790411 | endonuclease reverse transcriptase |
| 5 | 503551 | BTIGGHB | C-H-gamma pseudogene, psi-gamma |
| 6 | 508062 | ZNF135 | zinc finger protein 135 |
| 7 | 522642 | LOC522642 | similar to Zinc finger protein 420 |
| 8 | 767896 | ZFP2 | zinc finger protein 2 homolog |
| 9 | 519934 | H2B | histone H2B-like |
| 10 | 504943 | RXRB | retinoid X receptor, beta |
| 11 | 282492 | BOLA-DNA | major histocompatibility complex, class II, DN alpha |
| 12 | 282497 | BOLA-DYA | major histocompatibility complex, class II, DY alpha |
| 13 | 515435 | COL11A2 | collagen, type XI, alpha 2 |
| 14 | 524959 | TAP1 | transporter 1, ATP-binding cassette, sub-family B (MDR/TAP) |
| 15 | 282013 | PSMB8 | proteasome (prosome, macropain) subunit, beta type, 8 |
| 16 | 512468 | GCLC | glutamate-cysteine ligase, catalytic subunit |
| 17 | 510593 | PSMB9 | proteasome (prosome, macropain) subunit, beta type, 9 |
| 18 | 505358 | BRD2 | bromodomain containing 2 |
| 19 | 532422 | HSD17B8 | hydroxysteroid (17-beta) dehydrogenase 8 |
| 20 | 282490 | BOLA-DMA | major histocompatibility complex, class II, DM alpha-chain, expressed |
| 21 | 282493 | BOLA-DOB | major histocompatibility complex, class II, DO beta |
| 22 | 282491 | BOLA-DMB | major histocompatibility complex, class II, DM beta-chain, expressed |
| 23 | 282498 | BOLA-DYB | major histocompatibility complex, class II, DY beta |
| 24 | 540716 | SLC39A7 | solute carrier family 39 (zinc transporter), member 7 |
| 25 | 618722 | LOC618722 | similar to MHC class II antigen |
| 26 | 618733 | TAP2 | transporter 2, ATP-binding cassette, sub-family B (MDR/TAP) |
| 27 | 614564 | ZNF79 | zinc finger protein 79 |
| 28 | 512364 | ZNF84 | zinc finger protein 84 |
| 29 | 100124518 | LOC100124518 | hypothetical protein LOC100124518 |
| 30 | 404057 | IGHM | immunoglobulin heavy constant mu |
| 31 | 514023 | ZNF180 | zinc finger protein 180 |
| 32 | 522837 | LOC522837 | hypothetical LOC522837 |
| 33 | 618141 | ZNF3 | zinc finger protein 3 |
| 34 | 524256 | ZNF300 | zinc finger protein 300 |
| 35 | 783710 | LOC783710 | similar to ENSANGP00000009498 |
| 36 | 515674 | ZNF184 | zinc finger protein 184 |
| 37 | 520008 | ZNF569 | zinc finger protein 569 |
| 38 | 513814 | ZNF16 | zinc finger protein 16 |
| 39 | 506448 | ZNF397 | zinc finger protein 397 |
| 40 | 511931 | LOC511931 | hypothetical LOC511931 |
| 41 | 539552 | ZNF167 | zinc finger protein 167 |
| 42 | 510417 | BOLA-NC1 | non-classical MHC class I antigen |
| 43 | 530050 | MYH11 | myosin, heavy chain 11, smooth muscle |
| 44 | 786931 | LOC786931 | similar to Zinc finger protein 585A |
| 45 | 518207 | ZNF345 | zinc finger protein 345 |
| 46 | 493779 | LOC493779 | 18S ribosomal RNA |
| 47 | 508355 | ITIH3 | inter-alpha (globulin) inhibitor H3 |
| 48 | 505478 |
| immunoglobulin light chain, lambda gene cluster |
| 49 | 539265 | ZNF502 | zinc finger protein 502 |
| 50 | 515712 | BOLA | MHC class I heavy chain |
Numbers of unique GeneIDs, GO Terms that are mapped by the ESTs with hits for the nine species analyzed.
| Species | Contigs | Singletons | Combined | |||||||||
| Gene IDs | GO Terms | Gene IDs with GO Term | Gene IDs Hit | GO Terms Mapped | Hit Gene IDs with GO | Gene IDs Hit | GO Terms Mapped | Hit Gene IDs with GO | Gene IDs Hit | GO Terms Mapped | Hit Gene IDs with GO | |
|
| 39,920 | 6,897 | 18,370 | 19,113 | 6,296 | 13,673 | 19,799 | 6,410 | 14,206 | 22,980 | 6,664 | 15,934 |
|
| 63,648 | 6,452 | 18,047 | 18,109 | 5,724 | 12,210 | 17,584 | 5,762 | 12,105 | 22,212 | 6,098 | 14,390 |
|
| 37,838 | 7,256 | 13,330 | 10,805 | 5,709 | 6,803 | 11,018 | 5,870 | 6,926 | 14,991 | 6,520 | 9,003 |
|
| 29,496 | 3,780 | 11,972 | 10,459 | 2,851 | 5,948 | 11,019 | 2,850 | 6,046 | 14,766 | 3,290 | 7,984 |
|
| 23,876 | N/A | N/A | 7,760 | N/A | N/A | 8,249 | N/A | N/A | 10,868 | N/A | N/A |
|
| 20,187 | N/A | N/A | 9,267 | N/A | N/A | 9,776 | N/A | N/A | 12,941 | N/A | N/A |
|
| 29,187 | N/A | N/A | 11,174 | N/A | N/A | 10,935 | N/A | N/A | 15,512 | N/A | N/A |
|
| 31,555 | N/A | N/A | 11,059 | N/A | N/A | 10,738 | N/A | N/A | 15,254 | N/A | N/A |
|
| 3,506 | N/A | N/A | 1,437 | N/A | N/A | 1,360 | N/A | N/A | 1,800 | N/A | N/A |
Numbers of GeneIDs, GO Terms, and GeneIDs that have a GO annotation are shown for the nine species analyzed, where applicable. For each camel sequence group (contig, singleton, and combination of the two), number of unique GeneIDs that are “hit” by BLAST analyses are shown. Where applicable, we also show number of GO terms mapped by the GeneIDs that got hit and number GeneIDs among this list that have a mapped GO term.
Figure 5GO Categories:
Top thirty Biological Process GO category terms found most abundant among the Homo sapiens genes similar to camel sequences.
Number of Camel sequences that showed high similarity (>96% identity over at least 30 aa) to known full length cDNA sequences.
| Organism | # of full length cDNA | matched (contig) | matched (singleton) | matched (combined) | N-terminus proximal |
|
| 9,188 | 902 | 744 | 1,646 | 42% |
|
| 5,341 | 461 | 423 | 884 | 49% |
|
| 23,120 | 767 | 776 | 1,543 | 38% |
|
| 28,133 | 1,047 | 1,042 | 2,089 | 34% |
Number of full length cDNA sequences in each organism is shown. Last column indicates the percent of matched contig sequences that extend to within 5 aa of the start codon of the matching full length cDNA.
Figure 6Shared Genes:
Comparison of genes found (and then matched in HomoloGene) in human, mouse, rat, and bovine. Regions not shown in the Venn diagram are genes shared by human and bovine only (403 genes) and mouse and rat only (536 genes).
Top 50 genes in camel that are shared by human, mouse, rat, and bovine.
| GeneID | Gene_Symbol | Gene Name |
| 75314 | HNRPDL | heterogeneous nuclear ribonucleoprotein D-like |
| 22410 | HNRNPD | heterogeneous nuclear ribonucleoprotein D (AU-rich element RNA binding protein 1) |
| 55558 | ANXA6 | annexin A6 |
| 1669 | ITIH3 | inter-alpha (globulin) inhibitor H3 |
| 1667 | ITIH1 | inter-alpha (globulin) inhibitor H1 |
| 68258 | MYH11 | myosin, heavy chain 11, smooth muscle |
| 23165 | HNRNPH2 | heterogeneous nuclear ribonucleoprotein H2 (H′) |
| 113602 | ZNF184 | zinc finger protein 184 |
| 68164 | ANXA4 | annexin A4 |
| 20626 | PTPRS | protein tyrosine phosphatase, receptor type, S |
| 74950 | HNRNPAB | heterogeneous nuclear ribonucleoprotein A/B |
| 21416 | KLHL2 | kelch-like 2, Mayven (Drosophila) |
| 20716 | USP4 | ubiquitin specific peptidase 4 (proto-oncogene) |
| 73874 | COL1A1 | collagen, type I, alpha 1 |
| 23741 | HMCN1 | hemicentin 1 |
| 55857 | ACTN4 | actinin, alpha 4 |
| 51861 | ZNF569 | zinc finger protein 569 |
| 79542 | KLHL3 | kelch-like 3 (Drosophila) |
| 22759 | ANXA11 | annexin A11 |
| 113709 | CYP2C19 | cytochrome P450, family 2, subfamily C, polypeptide 19 |
| 55941 | MYH10 | myosin, heavy chain 10, non-muscle |
| 13066 | USP32 | ubiquitin specific peptidase 32 |
| 65318 | ZNF629 | zinc finger protein 629 |
| 45 | C2 | complement component 2 |
| 55553 | ACTN1 | actinin, alpha 1 |
| 8306 | ZNF180 | zinc finger protein 180 |
| 862 | ACTN3 | actinin, alpha 3 |
| 22695 | ANGPTL2 | angiopoietin-like 2 |
| 20623 | PTPRF | protein tyrosine phosphatase, receptor type, F |
| 74294 | HSPA1B | heat shock 70kDa protein 1B |
| 36149 | ANXA7 | annexin A7 |
| 21268 | ZNF192 | zinc finger protein 192 |
| 74536 | PCBP2 | poly(rC) binding protein 2 |
| 48343 | PRKCE | protein kinase C, epsilon |
| 20482 | GRSF1 | G-rich RNA sequence binding factor 1 |
| 55437 | FGFR3 | fibroblast growth factor receptor 3 (achondroplasia, thanatophoric dwarfism) |
| 11012 | HNRNPH3 | heterogeneous nuclear ribonucleoprotein H3 (2H9) |
| 5169 | VIL1 | villin 1 |
| 1670 | ITIH4 | inter-alpha (globulin) inhibitor H4 (plasma Kallikrein-sensitive glycoprotein) |
| 68144 | UGT2B4 | UDP glucuronosyltransferase 2 family, polypeptide B4 |
| 75207 | CA13 | carbonic anhydrase XIII |
| 55433 | COL3A1 | collagen, type III, alpha 1 (Ehlers-Danlos syndrome type IV, autosomal dominant) |
| 21165 | NR1H3 | nuclear receptor subfamily 1, group H, member 3 |
| 47995 | AP1G1 | adaptor-related protein complex 1, gamma 1 subunit |
| 38170 | FBLN5 | fibulin 5 |
| 22878 | LPHN3 | latrophilin 3 |
| 20916 | CD151 | CD151 molecule (Raph blood group) |
| 56926 | PCBP4 | poly(rC) binding protein 4 |
| 1312 | BTN1A1 | butyrophilin, subfamily 1, member A1 |
| 14805 | KLHL18 | kelch-like 18 (Drosophila) |
Figure 7Gene Interaction Network:
Most significant network identified by IPA using 8,405 genes found in camel ESTs shared by human, mouse, rat, and bovine. Molecules involved in two most highly associated functions in the network (“hair and skin development and function” and “renal and urological system development” are shown in light green with related functional annotation.
Figure 8Canonical Pathway:
Most significantly associated pathway (NRF-2 mediated oxidative stress response pathway) by the data set of 8,405 genes found in camel ESTs shared by human, mouse, rat, and bovine using IPA. Molecules that exist in the data set are shown in red.