Literature DB >> 19785770

Collection of Macaca fascicularis cDNAs derived from bone marrow, kidney, liver, pancreas, spleen, and thymus.

Naoki Osada¹, Makoto Hirata, Reiko Tanuma, Yutaka Suzuki, Sumio Sugano, Keiji Terao, Jun Kusuda, Yosuke Kameoka, Katsuyuki Hashimoto, Ichiro Takahashi.

Abstract

BACKGROUND: Consolidating transcriptome data of non-human primates is essential to annotate primate genome sequences, and will facilitate research using non-human primates in the genomic era. Macaca fascicularis is a macaque monkey that is commonly used for biomedical and ecological research.
FINDINGS: We constructed cDNA libraries of Macaca fascicularis, derived from tissues obtained from bone marrow, liver, pancreas, spleen, and thymus of a young male, and kidney of a young female. In total, 5'-end sequences of 56,856 clones were determined. Including the previously established cDNA libraries from brain and testis, we have isolated 112,587 cDNAs of Macaca fascicularis, which correspond to 56% of the curated human reference genes.
CONCLUSION: These sequences were deposited in the public sequence database as well as in-house macaque genome database http://genebank.nibio.go.jp/qfbase/. These data will become valuable resources for identifying functional parts of the genome of macaque monkeys in future studies.

Entities: Disease Gene Species

Year: 2009 PMID： 19785770 PMCID： PMC2762985 DOI： 10.1186/1756-0500-2-199

Source DB: PubMed Journal: BMC Res Notes ISSN： 1756-0500

Findings

Macaca fascicularis (cynomolgus, crab-eating, or long-tail macaque) is one of the most popular primate species used in biomedical research, and is closely related to Macaca mulatta (rhesus macaque). The draft sequence of the Macaca mulatta genome, which has an evolutionary important position, was published in 2007 [1]. Transcriptiome data broadens the application of genome sequences. Compared with several millions of human transcript sequences, macaque transcriptome data has only been analyzed in a limited numbers of studies [2-6]. A complete list of macaque genes will be beneficial for performing genetic studies using macaques in the future. We aim to elucidate all the macaque transcripts that correspond to human genes, which have been widely accepted as reference sequences, such as the RefSeq sequences [7]. We have published expressed sequence tag (EST) and full-length sequences, which were obtained from cDNA libraries of brain and testis of Macaca fascicularis, using a variety of research subjects [5,8-13]. Here, we present 5'-EST sequences from six other tissues of Macaca fascicularis. Bone marrow, liver, pancreas, spleen, and thymus from a 4-year-old male Malaysian Macaca fascicularis, and kidney from a 3-year-old female Philippine Macaca fascicularis were harvested. These animals are bred and reared in the Tsukuba Primate Research Center (TPRC), National Institute of Biomedical Innovation (Ibaraki, Japan). The tissues were harvested in the P2 facility in TPRC, in accordance with the guidelines of the Laboratory Biosafety Manual, World Health Organization. The libraries for kidney (QreA and QreB) and liver (QlvC) were constructed using the vector-capping method [14], and those for bone marrow (QbmA), pancreas (QpaA), spleen (QspA), and thymus (QthA) were constructed using the oligo-capping method [15]. The sequences of 5'-EST were determined by Sanger sequencing using an ABI 3730 sequencer, and all vector sequences were filtered out [5]. Nucleotide calls with a quality value (QV) of less than 15 were masked as ambiguous. After the masking, the sequences were trimmed, such that they did not contain more than four ambiguous nucleotides in a 10-bp width window, and sequences shorter than 100 bp after the trimming were filtered out. After the trimming, the average sequence length was 886.9 bp. In total, we obtained 56,856 EST sequences from the six tissues. The repeat sequences were masked by Repbase Update before the BLAST search [16]. The BLAST search (BLASTN) was performed with a cut-off value (E-value) of 1e-60 against human RefSeq data [7]. Since RefSeq sequences contain partially overlapped isoforms, we constructed non-redundant RefSeq sequences based on the Entrez Gene database [17]. Hereafter, we shall refer to the non-redundant RefSeq sequences as RefSeq genes. There were 23,236 RefSeq genes, including non-coding RNAs in the human genome at the time of investigation (Release 34) [7]. Out of the newly isolated 56,856 cDNA clones, 44,603 matched to 4940 human RefSeq genes. Of the 12,253 non-RefSeq clones, 40 consisted of repeat sequences, and the other 1631 did not show any homology to human transcript sequences in public databases using a lower cutoff value (1e-15). Meanwhile, 23,900 EST sequences were homologous to multiple RefSeq genes with the high cutoff value (1e-60). The average nucleotide sequence identity between the best BLAST hit pairs was 95.26%. The nucleotide sequence identity was slightly lower than that estimated using full-length cDNA sequences of high quality [5], and supposed to reflect some sequencing errors in the EST sequences. In some cases, the nucleotide sequence identity between the best and second best hit pairs were very close, which was probably due to gene duplications specific in the human lineage. The difference in nucleotide sequence identities between the best and second best BLAST hits were less than 0.5% in 8996 ESTs. In such cases, the best hit orthologs would not be regarded as unique orthologs of humans and macaques. In Figure 1, we classify the macaque ESTs according to the number of BLAST hits to RefSeq genes. The average nucleotide sequence identities were ordered by the rank of BLAST hits. For example, the nucleotide sequence identity in the second bin represents the identity between the second best hit pairs.

Figure 1

Number of BLAST hits (cutoff: 1e-60) against the human RefSeq genes. The grey bars represent the number of macaque ESTs matched to the human RefSeq genes. ESTs matched more than nine RefSeq genes were combined into a single bin. The red circles and lines represent the average nucleotide sequence identity between the macaque ESTs and RefSeq genes, ordered by the rank of BLAST hits. For example, the sequence identity in the second bin represents the sequence identity between the second best hits. In conjunction with the previously sequenced cDNA clones, we obtained 112,587 EST sequences corresponding to 8262 human RefSeq genes, which correspond to 36% of all human RefSeq genes. When we restricted the analysis of the human RefSeq genes in the manually curated status (Reviewed or Validated status) [7], 56% (6,177/11,080) of the human RefSeq genes were covered by the macaque transcriptome. As shown in Table 1, the number of RefSeq genes that were represented in the libraries was different in different tissues. In order to measure the unbiased transcript redundancy in each tissue, we estimated the redundancy of the human RefSeq homologs in 1000 macaque transcripts in each tissue. We randomized the transcript data and selected 1000 transcripts to enumerate the human RefSeq genes covered by the transcripts. The redundancy was given by the number of transcripts (1000) divided by the number of human RefSeq genes covered by the transcripts. This procedure was repeated 1000 times for each tissue, and the average redundancy was estimated. The results are shown in the last column of Table 1. Pancreas showed the highest redundancy; while brain and testis showed low redundancy, indicating that the gene expression complexity in brain and testis is higher than that in the other tissues, as suggested previously [18]. We also found that the kidney library (QreA) had very low redundancy. It was constructed using the vector-capping method, which does not amplify the template cDNA by PCR and may reduce the redundancy of the library [14]. In order to test the effectiveness of the cloning methods, we compared the redundancy of the transcript in our liver library constructed using the vector-capping method, and the previously reported liver library constructed using the oligo-capping method [6]. The redundancy in the vector-capped liver library was 3.21 (Table 1). In contrast, the redundancy in the oligo-capped liver library was 5.19 [6], which was significantly higher than that in the vector-capped library (P < 0.001, permutation test).

Table 1

Summary of Macaca fascicularis cDNA libraries

Tissue	Total clones	Covered RefSeq^d	non-RefSeq^e	Redundancy^f
Brain cortex^{a, c}	28679	4035	10259	2.32
Brain stem^{b, c}	5758	1591	2050	2.40
Cerebellum^c	11003	2340	4179	2.32
Testis^c	8551	1833	3300	2.36
Liver	9188	1360	3853	3.21
Kidney	9558	2495	2630	1.91
Bone marrow	9472	1366	1317	3.26
Spleen	9783	1556	1527	3.15
Thymus	9566	1295	1491	2.96
Pancreas	9289	534	1435	9.83
All	112587	8262	32269	2.14

aBrain cortex includes parietal lobe (Qnp), temporal lobe (Qtr), occipital lobe (Qor), and frontal lobe (Qfl).

bBrain stem includes medulla oblongata (Qmo) and the other part of brain stem (Qbs).

cThese sequences were determined by the previous studies [8-10,12].

dNumber of human RefSeq genes that have macaque homologs in each library.

eTheNumber of macaque cDNA clones that do not have human RefSeq homologs.

fEstimated from randomly chosen 1000 macaque transcripts, averaged over 1000 simulations.

Summary of Macaca fascicularis cDNA libraries aBrain cortex includes parietal lobe (Qnp), temporal lobe (Qtr), occipital lobe (Qor), and frontal lobe (Qfl). bBrain stem includes medulla oblongata (Qmo) and the other part of brain stem (Qbs). cThese sequences were determined by the previous studies [8-10,12]. dNumber of human RefSeq genes that have macaque homologs in each library. eTheNumber of macaque cDNA clones that do not have human RefSeq homologs. fEstimated from randomly chosen 1000 macaque transcripts, averaged over 1000 simulations. We have developed an in-house database for the genome data of Macaca fascicularis (QFbase: ) [5]. The Macaca fascicularis cDNA sequences described in this report were annotated and added to this database. They were also mapped on the rhesus macaque genome sequence using the BLAT program [19]. The results can be viewed in the Macaca fascicularis genome browser , which is implemented using GBrowse software [20]. The DDBJ/EMBL/Genbank accession numbers of these sequences are DC629777-DC639249 (bone marrow), DC639249-DC648806 (kidney), DC620589-DC629776 (liver), FS362802-FS372090 (pancreas), DC848487-DC858269 (spleen), and DK575154-DK584719 (thymus).

Availability and requirements

• Project name: Macaca fascicularis cDNA sequencing project • Project home page: • Operating system(s): Platform independent • Programming language: PERL • Other requirements: Generic web browser • License: GNU, GPL • Any restrictions to use by non-academics: none

Abbreviations

EST: expressed sequence tag; QV: quality value;

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

NO, KT, JK, YK, KH, and IT contributed to the design of the research. NO analyzed the data. NO and KH wrote the manuscript. MH performed the computational analysis. RT, YK, and IT were involved in the cDNA sequencing. YS and SS constructed the oligo-capped cDNA libraries. All authors read and approved the final manuscript.

20 in total

1. The generic genome browser: a building block for a model organism system database.

Authors: Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

2. Vector-capping: a simple method for preparing a high-quality full-length cDNA library.

Authors: Seishi Kato; Kuniyo Ohtoko; Hideki Ohtake; Tomoko Kimura
Journal: DNA Res Date: 2005-02-28 Impact factor: 4.458

3. Sequence complexity of nuclear RNAs in adult rat tissues.

Authors: D M Chikaraishi; S S Deeb; N Sueoka
Journal: Cell Date: 1978-01 Impact factor: 41.582

4. Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides.

Authors: K Maruyama; S Sugano
Journal: Gene Date: 1994-01-28 Impact factor: 3.688

5. Assignment of 118 novel cDNAs of cynomolgus monkey brain to human chromosomes.

Authors: N Osada; M Hida; J Kusuda; R Tanuma; K Iseki; M Hirata; Y Suto; M Hirai; K Terao; Y Suzuki; S Sugano; K Hashimoto; J Kususda
Journal: Gene Date: 2001-09-05 Impact factor: 3.688

6. Evolutionary and biomedical insights from the rhesus macaque genome.

Authors: Richard A Gibbs; Jeffrey Rogers; Michael G Katze; Roger Bumgarner; George M Weinstock; Elaine R Mardis; Karin A Remington; Robert L Strausberg; J Craig Venter; Richard K Wilson; Mark A Batzer; Carlos D Bustamante; Evan E Eichler; Matthew W Hahn; Ross C Hardison; Kateryna D Makova; Webb Miller; Aleksandar Milosavljevic; Robert E Palermo; Adam Siepel; James M Sikela; Tony Attaway; Stephanie Bell; Kelly E Bernard; Christian J Buhay; Mimi N Chandrabose; Marvin Dao; Clay Davis; Kimberly D Delehaunty; Yan Ding; Huyen H Dinh; Shannon Dugan-Rocha; Lucinda A Fulton; Ramatu Ayiesha Gabisi; Toni T Garner; Jennifer Godfrey; Alicia C Hawes; Judith Hernandez; Sandra Hines; Michael Holder; Jennifer Hume; Shalini N Jhangiani; Vandita Joshi; Ziad Mohid Khan; Ewen F Kirkness; Andrew Cree; R Gerald Fowler; Sandra Lee; Lora R Lewis; Zhangwan Li; Yih-Shin Liu; Stephanie M Moore; Donna Muzny; Lynne V Nazareth; Dinh Ngoc Ngo; Geoffrey O Okwuonu; Grace Pai; David Parker; Heidie A Paul; Cynthia Pfannkoch; Craig S Pohl; Yu-Hui Rogers; San Juana Ruiz; Aniko Sabo; Jireh Santibanez; Brian W Schneider; Scott M Smith; Erica Sodergren; Amanda F Svatek; Teresa R Utterback; Selina Vattathil; Wesley Warren; Courtney Sherell White; Asif T Chinwalla; Yucheng Feng; Aaron L Halpern; Ladeana W Hillier; Xiaoqiu Huang; Pat Minx; Joanne O Nelson; Kymberlie H Pepin; Xiang Qin; Granger G Sutton; Eli Venter; Brian P Walenz; John W Wallis; Kim C Worley; Shiaw-Pyng Yang; Steven M Jones; Marco A Marra; Mariano Rocchi; Jacqueline E Schein; Robert Baertsch; Laura Clarke; Miklós Csürös; Jarret Glasscock; R Alan Harris; Paul Havlak; Andrew R Jackson; Huaiyang Jiang; Yue Liu; David N Messina; Yufeng Shen; Henry Xing-Zhi Song; Todd Wylie; Lan Zhang; Ewan Birney; Kyudong Han; Miriam K Konkel; Jungnam Lee; Arian F A Smit; Brygg Ullmer; Hui Wang; Jinchuan Xing; Richard Burhans; Ze Cheng; John E Karro; Jian Ma; Brian Raney; Xinwei She; Michael J Cox; Jeffery P Demuth; Laura J Dumas; Sang-Gook Han; Janet Hopkins; Anis Karimpour-Fard; Young H Kim; Jonathan R Pollack; Tomas Vinar; Charles Addo-Quaye; Jeremiah Degenhardt; Alexandra Denby; Melissa J Hubisz; Amit Indap; Carolin Kosiol; Bruce T Lahn; Heather A Lawson; Alison Marklein; Rasmus Nielsen; Eric J Vallender; Andrew G Clark; Betsy Ferguson; Ryan D Hernandez; Kashif Hirani; Hildegard Kehrer-Sawatzki; Jessica Kolb; Shobha Patil; Ling-Ling Pu; Yanru Ren; David Glenn Smith; David A Wheeler; Ian Schenck; Edward V Ball; Rui Chen; David N Cooper; Belinda Giardine; Fan Hsu; W James Kent; Arthur Lesk; David L Nelson; William E O'brien; Kay Prüfer; Peter D Stenson; James C Wallace; Hui Ke; Xiao-Ming Liu; Peng Wang; Andy Peng Xiang; Fan Yang; Galt P Barber; David Haussler; Donna Karolchik; Andy D Kern; Robert M Kuhn; Kayla E Smith; Ann S Zwieg
Journal: Science Date: 2007-04-13 Impact factor: 47.728

7. Substitution rate and structural divergence of 5'UTR evolution: comparative analysis between human and cynomolgus monkey cDNAs.

Authors: Naoki Osada; Makoto Hirata; Reiko Tanuma; Jun Kusuda; Munetomo Hida; Yutaka Suzuki; Sumio Sugano; Takashi Gojobori; C-K James Shen; Chung-I Wu; Katsuyuki Hashimoto
Journal: Mol Biol Evol Date: 2005-06-08 Impact factor: 16.240

8. Entrez Gene: gene-centered information at NCBI.

Authors: Donna Maglott; Jim Ostell; Kim D Pruitt; Tatiana Tatusova
Journal: Nucleic Acids Res Date: 2006-12-05 Impact factor: 16.971

9. NCBI Reference Sequences: current status, policy and new initiatives.

Authors: Kim D Pruitt; Tatiana Tatusova; William Klimke; Donna R Maglott
Journal: Nucleic Acids Res Date: 2008-10-16 Impact factor: 16.971

10. Large-scale analysis of Macaca fascicularis transcripts and inference of genetic divergence between M. fascicularis and M. mulatta.

Authors: Naoki Osada; Katsuyuki Hashimoto; Yosuke Kameoka; Makoto Hirata; Reiko Tanuma; Yasuhiro Uno; Itsuro Inoue; Munetomo Hida; Yutaka Suzuki; Sumio Sugano; Keiji Terao; Jun Kusuda; Ichiro Takahashi
Journal: BMC Genomics Date: 2008-02-24 Impact factor: 3.969

9 in total

1. Characterization of MHC class I transcripts of a Malaysian cynomolgus macaque by high-throughput pyrosequencing and EST libraries.

Authors: Alice Aarnink; Pol-André Apoil; Ichiro Takahashi; Naoki Osada; Antoine Blancher
Journal: Immunogenetics Date: 2011-06-28 Impact factor: 2.846

2. Study of cynomolgus monkey (Macaca fascicularis) DRA polymorphism in four populations.

Authors: Alice Aarnink; Laurent Estrade; Pol-André Apoil; Yuki F Kita; Naruya Saitou; Takashi Shiina; Antoine Blancher
Journal: Immunogenetics Date: 2010-01-22 Impact factor: 2.846

3. The Rare Disease Bank of Japan: establishment, current status and future challenges.

Authors: Mayako Tada; Makoto Hirata; Mitsuho Sasaki; Ryuichi Sakate; Arihiro Kohara; Ichiro Takahashi; Yosuke Kameoka; Toru Masui; Akifumi Matsuyama
Journal: Hum Cell Date: 2018-04-02 Impact factor: 4.174

4. Development and characterization of cDNA resources for the common marmoset: one of the experimental primate models.

Authors: Shoji Tatsumoto; Naoki Adati; Yasushi Tohtoki; Yoshiyuki Sakaki; Thorsten Boroviak; Sonoko Habu; Hideyuki Okano; Hiroshi Suemizu; Erika Sasaki; Masanobu Satake
Journal: DNA Res Date: 2013-03-29 Impact factor: 4.458

5. Large-scale transcriptome sequencing and gene analyses in the crab-eating macaque (Macaca fascicularis) for biomedical research.

Authors: Jae-Won Huh; Young-Hyun Kim; Sang-Je Park; Dae-Soo Kim; Sang-Rae Lee; Kyoung-Min Kim; Kang-Jin Jeong; Ji-Su Kim; Bong-Seok Song; Bo-Woong Sim; Sun-Uk Kim; Sang-Hyun Kim; Kyu-Tae Chang
Journal: BMC Genomics Date: 2012-05-04 Impact factor: 3.969

6. Transcriptome reconstruction and annotation of cynomolgus and African green monkey.

Authors: Albert Lee; Hossein Khiabanian; Jeffrey Kugelman; Oliver Elliott; Elyse Nagle; Guo-Yun Yu; Travis Warren; Gustavo Palacios; Raul Rabadan
Journal: BMC Genomics Date: 2014-10-03 Impact factor: 3.969

Review 7. The Cynomolgus Macaque MHC Polymorphism in Experimental Medicine.

Authors: Takashi Shiina; Antoine Blancher
Journal: Cells Date: 2019-08-26 Impact factor: 6.600

8. Selection of new appropriate reference genes for RT-qPCR analysis via transcriptome sequencing of cynomolgus monkeys (Macaca fascicularis).

Authors: Sang-Je Park; Young-Hyun Kim; Jae-Won Huh; Sang-Rae Lee; Sang-Hyun Kim; Sun-Uk Kim; Ji-Su Kim; Kang-Jin Jeong; Kyoung-Min Kim; Heui-Soo Kim; Kyu-Tae Chang
Journal: PLoS One Date: 2013-04-15 Impact factor: 3.240

9. Improved Macaca fascicularis gene annotation reveals evolution of gene expression profiles in multiple tissues.

Authors: Tao Tan; Lin Xia; Kailing Tu; Jie Tang; Senlin Yin; Lunzhi Dai; Peng Lei; Biao Dong; Hongbo Hu; Yong Fan; Yang Yu; Dan Xie
Journal: BMC Genomics Date: 2018-11-01 Impact factor: 3.969

9 in total