| Literature DB >> 11806829 |
Naoki Osada1, Munetomo Hida, Jun Kusuda, Reiko Tanuma, Makoto Hirata, Momoki Hirai, Keiji Terao, Yutaka Suzuki, Sumio Sugano, Katsuyuki Hashimoto.
Abstract
BACKGROUND: The complete assignment of the protein-coding regions of the human genome is a major challenge for genome biology today. We have already isolated many hitherto unknown full-length cDNAs as orthologs of unidentified human genes from cDNA libraries of the cynomolgus monkey (Macaca fascicularis) brain (parietal lobe and cerebellum). In this study, we used cDNA libraries of three other parts of the brain (frontal lobe, temporal lobe and medulla oblongata) to isolate novel full-length cDNAs.Entities:
Mesh:
Substances:
Year: 2001 PMID: 11806829 PMCID: PMC150453 DOI: 10.1186/gb-2001-3-1-research0006
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Construction of hypothetical human cDNA. The regions in human genome corresponding to the cynomolgus monkey cDNA were digested and concatenated in silico as putative exons, and the hypothetical novel human gene was predicted.
Summary of the 29 new genes
| Monkey accession number | Cynomolgus monkey ( | Location* | RT-PCR† | Length (amino acids) in | Length (amino acids) in human§ | Number of exons¶ | Exon predicted# | Functional annotation (putative)** |
| AB055251 | QflA-10072 | 10p12.1 | + | 743 | 743 | 2 (1) | 1 (0) | Adenylate kinase |
| AB055256 | QflA-10350 | 9q34.3 | - | 308 | 276 | 5 (4) | 1 (1) | Unknown |
| AB055264 | QflA-10778 | 2q24.2 | - | 603 | 577 | 13 (13) | 6 (2) | Nuclear protein |
| AB056798 | QflA-11110 | 6q23.2 | + | 621 | 617 | 5 (5) | 3 (0) | Sugar transporter |
| AB055271 | QflA-11149 | 1p31.1 | - | 510 | 514 | 5 (4) | 2 (0) | Leucine-rich repeat protein |
| AB055273 | QflA-11186 | 10q26.2 | + | 624 | 625 | 8 (8) | 4 (1) | Unknown |
| AB055276 | QflA-11332 | 17q23.3 | + | 590 | 590 | 17 (14) | 10 (1) | Unknown |
| AB055278 | QflA-11381 | 11q22.3 | + | 197 | 197 | 1 (1) | 0 (1) | Unknown |
| AB055280 | QflA-11470 | 7q11.22 | - | 114 | 114 | 1 (1) | 0 (0) | Unknown |
| AB056389 | QflA-12365 | 10q26.3 | ++ | 368 | 369 | 12 (10) | 6 (2) | Serine/threonine protein kinase |
| AB055295 | QflA-12453 | 5q35.2 | - | 103 | 103 | 2 (1) | 0 (0) | Unknown |
| AB056800 | QflA-12512 | 15q15.2 | + | 653 | 653 | 20 (16) | 10 (2) | Glycoside hydrolase family 31 |
| AB056802 | QflA-12743 | 7q11.21 | ++ | 214 | 214 | 6 (4) | 2 (2) | Unknown |
| AB060227 | QflA-15038 | 9q34.13 | + | 327 | 327 | 2 (2) | 1 (1) | Unknown |
| AB056812 | QflA-15102 | 4p15.1 | + | 102 | 102 | 1 (1) | 0 (0) | Unknown |
| AB060245 | QflA-15249†† | 11p15.1 | + | 413 | 369 | 11 (9) | 8 (1) | Thyrosine phosphatase |
| AB056426 | QflA-15366 | 10q22.1 | + | 581 | 581 | 3 (2) | 2 (0) | Cysteine-rich protein with leucine-rich repeat |
| AB066513 | QmoA-10247 | 1q25.3 | + | 196 | 196 | 11 (6) | 1 (1) | G-protein signaling protein |
| AB063014 | QmoA-11221 | 6q14.1 | - | 430 | 430 | 7 (6) | 4 (2) | Nuclear protein |
| AB063019 | QmoA-11380 | Xq13.3 | + | 309 | 309 | 7 (1) | 0 (0) | Unknown |
| AB063029 | QmoA-11613 | 10p12.33 | ++ | 654 | 654 | 12 (11) | 5 (2) | Unknown |
| AB066529 | QmoA-11640 | 16p12.3 | - | 122 | 122 | 2 (1) | 0 (0) | Unknown |
| AB066540 | QmoA-12446 | 19q13.2 | - | 654 | 655 | 3 (3) | 0 (1) | Zinc-finger protein with KRAB domain |
| AB066542 | QmoA-12482 | 10p11.23 | + | 391 | 392 | 5 (2) | 1 (1) | Zinc-finger protein |
| AB063089 | QtrA-10552 | 11q22.3 | + | 664 | 664 | 16 (12) | 9 (2) | RNA-binding methyl transferase |
| AB060878 | QtrA-12612 | 9q32 | + | 238 | 238 | 5 (3) | 2 (1) | Nuclear protein |
| AB063095 | QtrA-13256 | 6q27 | + | 300 | 301 | 18 (12) | 9 (0) | Unknown |
| AB060262 | QtrA-14732 | 3p21.31 | ++ | 396 | 396 | 3 (3) | 3 (3) | Unknown |
| AB060922 | QtrA-14779†† | 1p31.1 | + | 432 | 536 | 12 (11) | 6 (1) | Adenylate kinase |
*Human chromosomal locationwas determined by computer homology search of the human genome working draft sequence at UCSC. †Results of RT-PCR analysis: +, single product; ++, multiple products; -, no product. QtrA-10552 and QtrA-13256 showed a distinct splicing pattern between human and Macaca. ‡Length of putative coding region deduced from cDNA sequence of Macaca. §Length of putative coding region deduced from human genome sequence corresponding to cDNA of Macaca. ¶Number of exons in the human hypothetical cDNA. The number of coding exons is in parentheses. #Number of exons correctly predicted by GenScan in the human genome. Partially predicted exons are in parentheses. **Function of putative protein was deduced by InterPro category search and BLAST homology search. ††ORF sequence was revised by sequencing the RT-PCR product.
Figure 2Expression of each gene by RT-PCR. The primer sets covering a putative protein-coding region were designed for amplification of transcripts from total brain RNA of human (H) and cynomolgus monkey (M). Of 29 primer pairs, 21 could amplify the transcript in both human and Macaca.
Figure 3Genomic structure of QflA-11332 and ab-initio-predicted exons. QflA-11332 is only 2.1 kb of cDNA but spans more than 600 kb in the human genome. Vertical lines indicate the exons of clones or exons predicted by each computer program, and are connected by a horizontal line (intron). The exons under open triangles represent the exons not predicted by either computer program. Other exons were correctly predicted but segmented into several genes by ab initio prediction.