| Literature DB >> 23543116 |
Shoji Tatsumoto1, Naoki Adati, Yasushi Tohtoki, Yoshiyuki Sakaki, Thorsten Boroviak, Sonoko Habu, Hideyuki Okano, Hiroshi Suemizu, Erika Sasaki, Masanobu Satake.
Abstract
The common marmoset is a new world monkey, which has become a valuable experimental animal for biomedical research. This study developed cDNA libraries for the common marmoset from five different tissues. A total of 290 426 high-quality EST sequences were obtained, where 251 587 sequences (86.5%) had homology (1E(-100)) with the Refseqs of six different primate species, including human and marmoset. In parallel, 270 673 sequences (93.2%) were aligned to the human genome. When 247 090 sequences were assembled into 17 232 contigs, most of the sequences (218 857 or 15 089 contigs) were located in exonic regions, indicating that these genes are expressed in human and marmoset. The other 5578 sequences (or 808 contigs) mapping to the human genome were not located in exonic regions, suggesting that they are not expressed in human. Furthermore, a different set of 118 potential coding sequences were not similar to any Refseqs in any species, and, thus, may represent unknown genes. The cDNA libraries developed in this study are available through RIKEN Bio Resource Center. A Web server for the marmoset cDNAs is available at http://marmoset.nig.ac.jp/index.html, where each marmoset EST sequence has been annotated by reference to the human genome. These new libraries will be a useful genetic resource to facilitate research in the common marmoset.Entities:
Keywords: cDNA; common marmoset; gene resource
Mesh:
Substances:
Year: 2013 PMID: 23543116 PMCID: PMC3686431 DOI: 10.1093/dnares/dst007
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Clustering of ESTs by CD-HIT and assembly of ESTs by CAP3
| Libraries | Number of ESTs | Number of clusters by CD-HIT | Number of contigs and singlets assembled by CAP3 |
|---|---|---|---|
| MES | 71 009 | 17 467 | 15 837 = 5519 (contig) + 10 318 (singlet) |
| MLI | 56 232 | 10 010 | 8831 = 3319 (contig) + 5512 (singlet) |
| MSC | 29 258 | 12 309 | 10 617 = 3764 (contig) + 6853 (singlet) |
| MSP | 61 831 | 16 600 | 14 268 = 5086 (contig) + 9182 (singlet) |
| MTE | 72 096 | 29 028 | 25 909 = 8044 (contig) + 17 865 (singlet) |
| All | 290 426 | 62 210 | 60 568 = 17 232 (contig) + 43 336 (singlet) |
Parameters used in CD-HIT and CAP3 programs were default.
Assignment of common marmoset ESTs to primates’ Refseq
| Species derivation of Refseq | Number of ESTs homologous to Refseq (number of homologous genes) |
|---|---|
| 239 920 (13 825) | |
| 233 913 (14 372) | |
| 231 084 (13 499) | |
| 231 354 (13 898) | |
| 229 151 (13 677) | |
| 228 749 (13 296) | |
| Six primates | 251 587 |
| Non-primates | 931 |
EST sequences of common marmoset (total 290 426) were referred to primates’ Refseq mRNA that are registered at NCBI. Homology was searched using BLASTn and judged significant at <1E−100.
Identity and coverage between homologous marmoset ESTs and primates’ Refseq
| Species derivation of Refseq | Identity for Refseq (for 9879 HomoloGenes) | Coverage for Refseq (for 9879 HomoloGenes) |
|---|---|---|
| 94.88% (94.54%) | 91.14% (93.91%) | |
| 94.86% (94.54%) | 88.84% (92.11%) | |
| 99.55% | 84.70% | |
| 94.77% | 87.36% | |
| 94.73% (94.44%) | 87.20% (89.66%) | |
| 94.72% | 87.80% |
Identity (%) represents a degree of identity between the aligned two sequences of high-scoring segment pairs, whereas coverage (%) represents a ratio of aligned sequence length over an entire length of EST. See the text as for the details how identity and coverage were calculated.
Mapping of ESTs on the human genome
| Libraries | Number of marmoset ESTs | Number of ESTs mapped on the human genome (raw data) | Number of ESTs mapped on the human genome (filtered) |
|---|---|---|---|
| MES | 71 009 | 70 375 | 66 894 (94.2%) |
| MLI | 56 232 | 55 602 | 52 405 (93.2%) |
| MSC | 29 258 | 28 931 | 27 253 (93.1%) |
| MSP | 61 831 | 61 300 | 58 170 (94.1%) |
| MTE | 72 096 | 71 641 | 65 951 (91.5%) |
| All | 290 426 | 287 849 (99.1%) | 270 673 (93.2%) |
Marmoset ESTs (290 426) were mapped on the human genome (hg19) by using a BLAT search program (−stepSize = 5, −minScore = 50, −minIdentity = 80, −repMatch = 2253). This initial mapping gave a number of 287 849 (99.1%) ESTs. These ESTs were then filtered, following a UCSC Genome Browser and using a pslCDnaFilter (−minId = 0.85, −minCover = 0.75, −globalNearBest = 0.0025, −minQSize = 20, −minNonRepSize = 16, −ignoreNs, −bestOveralp). Basis for adopting this filtering condition was as follows. A use of the condition such as (−minId = 0.95, −minCover = 0.25) selected only 159 309 ESTs (54.9%), indicating −minId = 0.95 to be extremely strict in the exactness. Therefore, we lowered the −minId to 0.85 (and −minCover = 0.25) and found the reasonably selected numbers of exact ESTs. Thus, under this −minId of 0.85, we then tried to improve the specificity by increasing −minCover to 0.75 (since 0.90 appeared too strict, 0.90 was not used). This −minCover number of 0.75 is roughly equal to the expected coverage of coding sequence [the average length of EST was 619 nt, and the average length of 5′UTR of human transcripts is 170 nt, therefore, an expected coverage between coding sequences and ESTs would be (619–170)/619 = 0.73]. Eventually, a condition of −minId = 0.85, −minCover = 0.75 filtered 270 673 ESTs (93.2%) as exact and specific.
Top five genes expressed abundantly in each cDNA library
| Libraries | Gene symbols | Descriptions | Number of ESTs |
|---|---|---|---|
| MES | Peptide YY | 123 | |
| MES | Lin-28 homologue A | 50 | |
| MES | Chromosome 6 open reading frame 221 | 46 | |
| MES | Nanog homeobox | 45 | |
| MES | Endogenous retrovirus group MER34, member 1 | 28 | |
| MLI | Albumin | 4492 | |
| MLI | Haptoglobin-related protein | 1537 | |
| MLI | Orosomucoid 2 | 1227 | |
| MLI | Orosomucoid 1 | 1221 | |
| MLI | Apolipoprotein A-II | 673 | |
| MSC | Synaptosomal-associated protein, 25 kDa | 109 | |
| MSC | Proteolipid protein 1 | 58 | |
| MSC | Calcitonin-related polypeptide alpha | 51 | |
| MSC | Stathmin-like 2 | 30 | |
| MSC | Thy-1 cell surface antigen | 28 | |
| MSP | Membrane-spanning 4-domains, subfamily A, member 1 | 62 | |
| MSP | Integrin, beta 2 | 29 | |
| MSP | Major histocompatibility complex, class II, DP alpha 1 | 26 | |
| MSP | C-type lectin domain family 4, member F | 20 | |
| MSP | CD53 molecule | 18 | |
| MTE | Protamine 1 | 1,198 | |
| MTE | Transition protein 1 | 161 | |
| MTE | High mobility group box 4 | 139 | |
| MTE | PHD finger protein 7 | 138 | |
| MTE | Dickkopf-like 1 | 107 |
Shown are the contigs that were detected only in one out of five cDNA libraries and possessed larger numbers of constituting ESTs.