| Literature DB >> 25398096 |
Nicolas Tchitchek1, David Safronetz2, Angela L Rasmussen1, Craig Martens3, Kimmo Virtaneva3, Stephen F Porcella3, Heinz Feldmann2, Hideki Ebihara2, Michael G Katze4.
Abstract
BACKGROUND: The Syrian hamster (golden hamster, Mesocricetus auratus) is gaining importance as a new experimental animal model for multiple pathogens, including emerging zoonotic diseases such as Ebola. Nevertheless there are currently no publicly available transcriptome reference sequences or genome for this species.Entities:
Mesh:
Year: 2014 PMID: 25398096 PMCID: PMC4232415 DOI: 10.1371/journal.pone.0112617
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Histograms showing the length distribution of the reads and the length distribution of the singletons and contigs.
(A) The length distribution of the reads is shown in a gray histogram. Bins of the histogram have been set to 50 nucleotides. The lengths of the reads range from 40 to 631, with a median length of 387 and a mean length of 352. The reads represents a total of 426,683,712 nucleotides bases. (B) The length distribution of the 111,796 singletons is shown in a red histogram while the length distribution of the 62,482 contigs is shown in a blue histogram. Bins of the histograms have been set to 25 nucleotides. The lengths of the singleton sequences range from 50 to 614, with a median length of 187 and a mean length of 265. The lengths of the contig sequences range from 50 to 4,054, with a median length of 473 and a mean length of 487. Our Syrian hamster transcriptome represents a total of 60,117,204 nucleotides bases.
Transcriptome references and alignment statistics.
| Species name | # of genes | # of transcripts | # and % of alignments | # and % of mapped genes | # and % of mapped transcripts |
| Mouse (Mus musculus) | 38,293 | 92,484 | 41,651 (23.90%) | 9,562 (22.96%) | 11,648 (12.59%) |
| Rat (Rattus norvegicus) | 26,405 | 29,189 | 26,258 (15.07%) | 7,137 (27.18%) | 7,223 (24.75%) |
| Chinese Hamster Ovary cells (Cricetulus griseus) | NA | 121,636 | 7,845 (4.50%) | NA | 4,390 (3.61%) |
| Chimpanzee (Pan troglodytes) | 28,012 | 29,160 | 2,884 (1.65%) | 1,631 (56.55%) | 1,643 (5.63%) |
| Ferret (Mustela putorius furo) | 23,811 | 23,963 | 16,169 (9.28%) | 4,169 (25.78%) | 4,187 (17.47%) |
| Gorilla (Gorilla gorilla gorilla) | 29,216 | 35,727 | 8,319 (4.77%) | 2,733 (32.85%) | 2,735 (7.66%) |
| Guinea pig (Cavia porcellus) | 25,028 | 26,129 | 15,014 (8.61%) | 4,050 (26.97%) | 4,155 (15.90%) |
| Human (Homo sapiens) | 62,316 | 213,551 | 23,020 (13.21%) | 5,409 (23.50%) | 7,254 (3.40%) |
| Kangaroo rat (Dipodomys ordii) | 26,405 | 29,189 | 2,103 (1.21%) | 1,252 (59.53%) | 1,252 (4.29%) |
| Macaque (Macaca mulatta) | 30,246 | 44,725 | 13,792 (7.91%) | 3,804 (27.58%) | 4,163 (9.31%) |
| Orangutan (Pongo abelii) | 28,443 | 29,447 | 15,331 (8.80%) | 3,929 (25.63%) | 3,952 (13.42%) |
| Pig (Sus scrofa) | 25,322 | 30,586 | 10,910 (6.26%) | 2,978 (27.30%) | 3,051 (9.98%) |
| Pika (Ochotona princeps) | 23,028 | 23,028 | 1,575 (0.90%) | 989 (62.79%) | 989 (4.29%) |
| Rabbit (Oryctolagus cuniculus) | 23,394 | 28,188 | 4,344 (2.49%) | 1,946 (44.80%) | 2,007 (7.12%) |
| Shrew (Sorex araneus) | 19,134 | 19,139 | 1,330 (0.76%) | 759 (57.07%) | 759 (3.97%) |
| Squirrel (Ictidomys tridecemlineatus) | 22,398 | 23,572 | 7,730 (4.44%) | 2,723 (35.23%) | 2,733 (11.59%) |
| Tree Shrew (Tupaia belangeri) | 20,820 | 20,824 | 1,786 (1.02%) | 1,091 (61.09%) | 1,091 (5.24%) |
For each transcriptome reference used in this study, the name of the species, the number of genes available, and the number of transcripts available are indicated.
*The number of available transcripts indicated for the Chinese hamster ovary cells represents the number of available transcript fragments available and not the number of distinct transcripts. Moreover, for each transcriptome reference used in this study, the number of aligned contigs and singletons, the number of mapped transcripts and the number of mapped genes are indicated. The percentages of mapped transcripts and mapped genes relative to the total number of transcripts and genes available on the transcriptome references are provided. Moreover the percentage of alignments relative to the total number of contigs and singletons in our library (174,278) is also provided.
Figure 2Pie diagrams showing the alignment positions of the contigs and singletons on the mouse and rat transcript regions.
(A) Pie diagram showing the distribution of alignment positions of the 41,651 contigs and singletons on the mouse transcripts regions (5′ UTR, coding region, 3′ UTR, or inter-region). (B) Pie diagram showing the distribution of alignment positions of the 26,258 contigs and singletons on the rat transcripts regions. For each species and transcript region the number and percentage of aligned sequences are indicated.
List of the top 50 expressed genes in the library.
| Ensembl Gene ID | Associated Gene Name | Description | Count |
| ENSMUSG00000028647 | Mycbp | c-myc binding protein | 1120 |
| ENSMUSG00000020594 | Pum2 | pumilio 2 (Drosophila) | 1017 |
| ENSMUSG00000008575 | Nfib | nuclear factor I/B | 945 |
| ENSMUSG00000022010 | Tsc22d1 | TSC22 domain family, member 1 | 895 |
| ENSMUSG00000062078 | Qk | quaking | 861 |
| ENSMUSG00000078578 | Ube2d3 | ubiquitin-conjugating enzyme E2D 3 | 795 |
| ENSMUSG00000026621 | Mosc1 | MOCO sulphurase C-terminal domain containing 1 | 710 |
| ENSMUSG00000028161 | Ppp3ca | protein phosphatase 3, catalytic subunit, alpha isoform | 707 |
| ENSMUSG00000028790 | Khdrbs1 | KH domain containing, RNA binding, signal transduction associated 1 | 695 |
| ENSMUSG00000006740 | Kif5b | kinesin family member 5B | 684 |
| ENSMUSG00000031627 | Irf2 | interferon regulatory factor 2 | 682 |
| ENSMUSG00000036781 | Rps27l | ribosomal protein S27-like | 660 |
| ENSMUSG00000026655 | Fam107b | family with sequence similarity 107, member B | 658 |
| ENSMUSG00000006373 | Pgrmc1 | progesterone receptor membrane component 1 | 652 |
| ENSMUSG00000060961 | Slc4a4 | solute carrier family 4 (anion exchanger), member 4 | 641 |
| ENSMUSG00000024750 | Zfand5 | zinc finger, AN1-type domain 5 | 639 |
| ENSMUSG00000028788 | Ptp4a2 | protein tyrosine phosphatase 4a2 | 634 |
| ENSMUSG00000019943 | Atp2b1 | ATPase, Ca++ transporting, plasma membrane 1 | 605 |
| ENSMUSG00000097347 | AC121292.1 | 603 | |
| ENSMUSG00000004980 | Hnrnpa2b1 | heterogeneous nuclear ribonucleoprotein A2/B1 | 600 |
| ENSMUSG00000093904 | Tomm20 | translocase of outer mitochondrial membrane 20 homolog (yeast) | 593 |
| ENSMUSG00000068823 | Csde1 | cold shock domain containing E1, RNA binding | 586 |
| ENSMUSG00000020315 | Spnb2 | spectrin beta 2 | 579 |
| ENSMUSG00000068798 | Rap1a | RAS-related protein-1a | 579 |
| ENSMUSG00000020390 | Ube2b | ubiquitin-conjugating enzyme E2B | 570 |
| ENSMUSG00000026064 | Ptp4a1 | protein tyrosine phosphatase 4a1 | 570 |
| ENSMUSG00000020053 | Igf1 | insulin-like growth factor 1 | 569 |
| ENSMUSG00000027706 | Sec62 | SEC62 homolog (S. cerevisiae) | 553 |
| ENSMUSG00000064373 | Sepp1 | selenoprotein P, plasma, 1 | 549 |
| ENSMUSG00000014956 | Ppp1cb | protein phosphatase 1, catalytic subunit, beta isoform | 538 |
| ENSMUSG00000007850 | Hnrnph1 | heterogeneous nuclear ribonucleoprotein H1 | 536 |
| ENSMUSG00000031207 | Msn | moesin | 518 |
| ENSMUSG00000020152 | Actr2 | ARP2 actin-related protein 2 | 515 |
| ENSMUSG00000022261 | Sdc2 | syndecan 2 | 514 |
| ENSMUSG00000047187 | Rab2a | RAB2A, member RAS oncogene family | 512 |
| ENSMUSG00000004936 | Map2k1 | mitogen-activated protein kinase kinase 1 | 510 |
| ENSMUSG00000026576 | Atp1b1 | ATPase, Na+/K+ transporting, beta 1 polypeptide | 506 |
| ENSMUSG00000022234 | Cct5 | chaperonin containing Tcp1, subunit 5 (epsilon) | 504 |
| ENSMUSG00000001175 | Calm1 | calmodulin 1 | 502 |
| ENSMUSG00000069662 | Marcks | myristoylated alanine rich protein kinase C substrate | 490 |
| ENSMUSG00000017776 | Crk | v-crk sarcoma virus CT10 oncogene homolog (avian) | 484 |
| ENSMUSG00000038014 | Fam120a | family with sequence similarity 120A | 484 |
| ENSMUSG00000036478 | Btg1 | B cell translocation gene 1, anti-proliferative | 483 |
| ENSMUSG00000027177 | Hipk3 | homeodomain interacting protein kinase 3 | 478 |
| ENSMUSG00000043991 | Pura | purine rich element binding protein A | 474 |
| ENSMUSG00000022283 | Pabpc1 | poly(A) binding protein, cytoplasmic 1 | 471 |
| ENSMUSG00000031342 | Gpm6b | glycoprotein m6b | 471 |
| ENSMUSG00000050608 | Minos1 | mitochondrial inner membrane organizing system 1 | 471 |
| ENSMUSG00000018446 | C1qbp | complement component 1, q subcomponent binding protein | 469 |
| ENSMUSG00000026568 | Mpc2 | mitochondrial pyruvate carrier 2 | 461 |
For each of the top 50 expressed genes in the library, based on the mouse annotations, the Ensembl mouse gene identified, the associated gene name, description, and the number of count (number of time that the genes have been mapped by the reads) are indicated.
Functional enrichment of the mouse genes mapped by our transcriptome assembly.
| Rank | Biological Function [p-value range] | Canonical pathway (p-value) |
| 1 | Organismal Surviva [1.11E-03 – 4.03E-26] | Protein Ubiquitination Pathway (1.99E-18) |
| 2 | Nervous System Development and Function [1.29E-03 – 1.46E-19] | Molecular Mechanisms of Cancer (5.01E-14) |
| 3 | Organ Morpholog [1.32E-03 – 4.20E-19] | Integrin Signaling (3.16E-13) |
| 4 | Tissue Morphology [1.08E-03 – 1.07E-18] | EIF2 Signaling (3.98E-12) |
| 5 | Cardiovascular System Development and Function [1.05E-03 – 4.15E-17] | Epithelial Adherens Junction Signaling (2.51E-11) |
List of the top 5 biological functions and the top 5 canonical pathways found as statistically over-represented based on the list of 9,546 mouse genes mapped by our transcriptome assembly. The range of p-values is indicated for the biological functions and the p-value is indicated for each canonical pathways.
Figure 3Schematic representation of the top two over-represented canonical pathways in our transcriptome assembly.
(A) Representation of the “Protein Ubiquitination” canonical pathway. (B) Representation of the “Molecular Mechanisms of Cancer” canonical pathway. Both pathways have been generated based on mouse annotations. Transcripts involved in these pathways are indicated by different node shapes and associations are indicated by different edge shapes. Legends for the different nodes and edges are given in . For both pathways, transcripts present in our library are indicated in gray. Associated p-values showing the statistical over-representation significance of the canonical pathways are also indicated.
Figure 4Distogram showing the commonly mapped transcripts and phylogenetic tree showing the divergences amongst the different species.
(A) Distogram showing the number of transcripts commonly mapped by the Syrian hamster transcriptome between the different species used in this study. Each cell of the distogram represents the number of transcripts commonly mapped by two different species using a gradient color. (B) Phylogenetic tree showing the genomic divergence between a subset of the different species used in this study. Each leaf of the tree represents a different species and the distances of the edges are proportional to the genomic distances between the species. Genomic distances have been calculated based on the list of 611 Syrian hamster contigs and singletons that have been commonly aligned on the transcriptome references of the 13 species having the highest number of commonly aligned sequences.