| Literature DB >> 19880432 |
Ai Wakamatsu1, Kouichi Kimura, Jun-Ichi Yamamoto, Tetsuo Nishikawa, Nobuo Nomura, Sumio Sugano, Takao Isogai.
Abstract
We analyzed diversity of mRNA produced as a result of alternative splicing in order to evaluate gene function. First, we predicted the number of human genes transcribed into protein-coding mRNAs by using the sequence information of full-length cDNAs and 5'-ESTs and obtained 23 241 of such human genes. Next, using these genes, we analyzed the mRNA diversity and consequently sequenced and identified 11 769 human full-length cDNAs whose predicted open reading frames were different from other known full-length cDNAs. Especially, 30% of the cDNAs we identified contained variation in the transcription start site (TSS). Our analysis, which particularly focused on multiple variable first exons (FEVs) formed due to the alternative utilization of TSSs, led to the identification of 261 FEVs expressed in the tissue-specific manner. Quantification of the expression profiles of 13 genes by real-time PCR analysis further confirmed the tissue-specific expression of FEVs, e.g. OXR1 had specific TSS in brain and tumor tissues, and so on. Finally, based on the results of our mRNA diversity analysis, we have created the FLJ Human cDNA Database. From our result, it has been understood mechanisms that one gene produces suitable protein-coding transcripts responding to the situation and the environment.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19880432 PMCID: PMC2780955 DOI: 10.1093/dnares/dsp022
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1Clustering of human cDNA sequences. (A) Estimation of the number of human genes from full-length cDNAs and ESTs. Outline of our gene prediction method from the human full-length cDNAs and ESTs mapped to human genome is schematically shown. For each one of the predicted genes, classification reliability was evaluated manually. (B) Cover rate of FLJ EST sequences and (C) cover rate of FLJ full-length sequenced cDNAs. Results of reliability analysis according to the category based on the cover rates of 1.45 million of ESTs (B) and 55 000 full-length cDNAs (C).
Functional classification of the 11 769 full-length cDNAs based on the molecular function hierarchy of GO
| Functional categorization (GO: molecular function) | Number of matched cDNAs |
|---|---|
| Binding | |
| Nucleotide binding | 681 |
| Nucleic acid binding | 341 |
| Protein binding | 202 |
| Ion binding | 149 |
| Lipid binding | 28 |
| Tetrapyrrole binding | 27 |
| Neurotransmitter binding | 24 |
| Carbohydrate binding | 22 |
| Other bindings | 57 |
| Catalytic activity | |
| Hydrolase activity | 506 |
| Transferase activity | 479 |
| Oxidoreductase activity | 207 |
| Ligase activity | 85 |
| Lyase activity | 47 |
| Helicase activity | 38 |
| Isomerase activity | 26 |
| Other catalytic activities | 106 |
| Enzyme regulator activity | |
| GTPase regulator activity | 45 |
| Enzyme inhibitor activity | 44 |
| Other enzyme regulator activities | 21 |
| Motor activity | |
| Microtubule motor activity | 24 |
| Other motor activities | 20 |
| Signal transducer activity | |
| Receptor activity | 124 |
| Receptor binding | 25 |
| Other signal transducer activities | 40 |
| Structural molecule activity | |
| Structural constituent of ribosome | 25 |
| Other structural molecule activities | 56 |
| Transcription regulator activity | |
| Transcription factor activity | 138 |
| Other transcription regulator activities | 39 |
| Translation regulator activity | |
| Translation factor activity, nucleic acid binding | 25 |
| Transporter activity | |
| Ion transporter activity | 169 |
| Carrier activity | 90 |
| Channel or pore class transporter activity | 79 |
| ATPase activity, coupled to movement of substances | 39 |
| Other transporter activities | 131 |
| Others | 2 |
| Molecular function unknown | 45 |
If a protein was predicted to belong to two or more categories, all categories were included for counting.
Figure 2Classifications of the 11 769 full-length cDNAs based on splicing patterns. The 11 769 human full-length cDNAs were classified according to their TSS utilization. Type A: these cDNAs were derived from transcripts which were generated utilizing a TSS different than the previously analyzed TSS of the gene. Type A1: cDNAs contained a sequence variation known as FEV. Type A2: this class of cDNAs did not have the FEV feature. Type B: these cDNAs were derived from transcripts that were generated utilizing the same TSS as the previously analyzed TSS, but were found to be alternatively spliced. We could not classify 89 cDNAs because they coded for newly identified proteins.
Functional classification of two types of splicing patterns of 11 769 full-length cDNAs based on GO category analysis
| Functional categorization (GO: molecular function) | Number of matched cDNAs | ||
|---|---|---|---|
| Type A (%) | Type B (%) | Type A + B | |
| Binding | |||
| Lipid binding | 4 (14.3) | 24 (85.7) | 28 |
| Tetrapyrrole binding | 5 (18.5) | 22 (81.5) | 27 |
| Neurotransmitter binding | 12 (50.0)* | 12 (50.0) | 24 |
| Carbohydrate binding | 4 (18.2) | 18 (81.8) | 22 |
| Cofactor binding | 3 (16.7) | 15 (83.3) | 18 |
| Steroid binding | 1 (10.0) | 9 (90.0) | 10 |
| Catalytic activity | |||
| Helicase activity | 4 (10.5) | 34 (89.5) | 38 |
| Small protein activating enzyme activity | 2 (18.2) | 9 (81.8) | 11 |
| Cyclase activity | 6 (54.5)* | 5 (45.5) | 11 |
| Enzyme regulator activity | |||
| GTPase regulator activity | 31 (68.9)* | 14 (31.1) | 45 |
| Enzyme activator activity | 6 (50.0)* | 6 (50.0) | 12 |
| Structural molecule activity | |||
| Structural constituent of ribosome | 1 (4.0) | 24 (96.0) | 25 |
| Transporter activity | |||
| ATPase activity, coupled to movement of substances | 23 (59.0)* | 16 (41.0) | 39 |
| Electron transporter activity | 2 (13.3) | 13 (86.7) | 15 |
| Total | 1344 (32.0) | 2862 (68.0) | 4206 |
The ratio of Type A and Type B is 3:7 as shown by total. Total is all the results of classification in the category of molecular function. If a protein was predicted to belong to two or more categories, all categories were included for counting.
*Functional categories biased to Type A.
Expressions of a selected list of 261 FEV-containing cDNAs (155 genes)
| FLJ ID | Specific expression | Gene symbol | FLJ ID | Specific expression | Gene symbol | FLJ ID | Specific expression | Gene symbol | FLJ ID | Specific expression | Gene symbol |
|---|---|---|---|---|---|---|---|---|---|---|---|
| FLJ50079 | Brain | NRK | FLJ52319 | Trachea | GNE | FLJ55043 | FB, NT | PDZRN3 | FLJ57051 | Brain | Pld5 |
| FLJ50162 | Brain | LARGE1 | FLJ52354 | Brain, NT | CHRNB1_pre | FLJ55050 | Brain | EPS15 | FLJ57068 | FB | FGF13 |
| FLJ50199 | Brain | ARHGEF6 | FLJ52356 | Testis | ARMC4 | FLJ55194 | Brain | Unknown | FLJ57107 | Brain, NT | CHRNB1_pre |
| FLJ50365 | Trachea | CRISPLD1 | FLJ52358 | Testis | TP73 | FLJ55226 | FB | CHST10 | FLJ57108 | Brain | SNAP91 |
| FLJ50390 | Brain | GRIA1_pre | FLJ52367 | Testis | IQGAP2 | FLJ55256 | Synovial | TFEC | FLJ57207 | Im | Unknown |
| FLJ50398 | Testis | IQGAP2 | FLJ52368 | Testis, Trachea | ARMC4 | FLJ55265 | Im | Unknown | FLJ57232 | Testis | PRCP_pre |
| FLJ50459 | Brain | ETV1 | FLJ52384 | Im | PTPN3 | FLJ55281 | Heart, Fetal heart | SLC5A1 | FLJ57269 | Brain | BTBD10 |
| FLJ50460 | Brain | DLG4 | FLJ52407 | Testis | CRB1_pre | FLJ55284 | FB, NT | MAGI2 | FLJ57290 | Trachea | CRISPLD1 |
| FLJ50484 | Brain | SLC26A4 | FLJ52427 | Brain | AMPD3 | FLJ55338 | FB | CLASP1 | FLJ57298 | Brain | RAPGEF4 |
| FLJ50494 | Brain | ETV1 | FLJ52435 | Testis | MARCH7 | FLJ55344 | Brain | DYSF | FLJ57302 | Brain | RAPGEF4 |
| FLJ50523 | Brain | PEX5L | FLJ52438 | Brain | RIMS1 | FLJ55381 | FB | SLC44A5 | FLJ57330 | Brain | APBB1 |
| FLJ50526 | Brain | PEX5L | FLJ52453 | Testis | AMPD3 | FLJ55423 | Placenta | NRK | FLJ57521 | Tu | PPFIBP2 |
| FLJ50533 | Brain | SLC6A9 | FLJ52496 | Brain | TSPAN5 | FLJ55434 | Testis | POMGNT1 | FLJ57884 | FB | FGF13 |
| FLJ50539 | Brain, NT | DCAMKL1 | FLJ52520 | FB | EOMES | FLJ55460 | Brain | SEMA5B_pre | FLJ57888 | Brain | SGCB |
| FLJ50557 | Brain | MAP7 | FLJ52731 | Brain | SPRED2 | FLJ55461 | NT | KLHL13 | FLJ57953 | Brain | STAU |
| FLJ50577 | FB | DLG4 | FLJ52750 | Brain | ARHGEF7 | FLJ55481 | NT | RGMA_pre | FLJ58008 | Brain | PPP2R2B |
| FLJ50619 | NT | ELAVL4 | FLJ52810 | Testis | GABRB3_pre | FLJ55495 | Testis | PCYT2 | FLJ58099 | Brain | CLTCL1 |
| FLJ50623 | Brain, NT | DCAMKL1 | FLJ53109 | Testis | PPP2R5E | FLJ55504 | Testis | KLHL13 | FLJ58366 | Brain | RIMS1 |
| FLJ50641 | Brain | ETV1 | FLJ53114 | Testis | NCAM2_pre | FLJ55514 | Brain, Tu | EGFR_pre | FLJ58368 | Brain | RAPGEF4 |
| FLJ50646 | FB | DLG4 | FLJ53167 | NT | CUL4B | FLJ55516 | Tu | LIMS1 | FLJ58494 | Brain | Unknown |
| FLJ50725 | Testis | ATPAF1 | FLJ53184 | Brain | PPFIA2 | FLJ55607 | Brain, Trachea | HDAC9 | FLJ58753 | Brain | ARHGEF3 |
| FLJ50745 | Testis | CCNA1 | FLJ53222 | FB | MLLT3 | FLJ55622 | Testis | MMRN1_pre | FLJ58755 | Brain | CHN2 |
| FLJ50761 | Brain | LRIG1_pre | FLJ53242 | Testis | CLASP1 | FLJ55627 | Testis | MOV10L1 | FLJ58966 | Im | RAB37 |
| FLJ50773 | Brain | CALB1 | FLJ53247 | Testis | IDE | FLJ55628 | Testis | LOXHD1 | FLJ59303 | Brain | DOCK4 |
| FLJ50776 | Brain | ARHGEF6 | FLJ53252 | Testis | CDH2_pre | FLJ55641 | Brain, NT | JARID2 | FLJ59333 | Tu | RARG |
| FLJ50810 | FB, NT | MAGI2 | FLJ53320 | Brain | DLGAP1 | FLJ55662 | Im | FGR | FLJ59338 | Tu | RARG |
| FLJ50844 | Brain | WARS2_pre | FLJ53324 | Brain | TJP2 | FLJ55664 | Testis | NTRK3_pre | FLJ59345 | Brain | PPFIA2 |
| FLJ50917 | Testis | PCCB_pre | FLJ53330 | Brain, NT | EXOC4 | FLJ55778 | Brain | CLASP1 | FLJ59425 | Placenta | SH3KBP1 |
| FLJ50956 | Brain | RAPGEF4 | FLJ53518 | Testis | POMGNT1 | FLJ55834 | Brain, NT | FGF11 | FLJ59496 | Brain | CHN2 |
| FLJ50959 | Brain | RAPGEF4 | FLJ53578 | Brain | Rims1 | FLJ55856 | Testis | ARHGEF3 | FLJ59502 | Brain | PPFIA2 |
| FLJ50961 | Brain | TMEM16C | FLJ53606 | NT | AKT1 | FLJ55859 | Testis | ST7L | FLJ59511 | Brain | GRIA1_pre |
| FLJ50989 | FB | EOMES | FLJ53680 | Testis | KIF2C | FLJ55865 | Im | SLC43A2 | FLJ59545 | Brain | EML2 |
| FLJ51025 | Kidney | NOX4 | FLJ53829 | Brain | APBB1 | FLJ55903 | FB | GPR161 | FLJ59625 | Brain | ARHGEF7 |
| FLJ51027 | Kidney | NOX4 | FLJ53875 | Brain | APBB1 | FLJ55905 | Im | FGD4 | FLJ59641 | Testis | PPFIA2 |
| FLJ51073 | FB | EOMES | FLJ53929 | Im | PTPN4 | FLJ55906 | Testis | KIFC3 | FLJ59648 | Im | DYSF |
| FLJ51155 | Testis | Unknown | FLJ53980 | Brain | PPM1F | FLJ55918 | Brain | EML2 | FLJ59678 | Brain | PEX5L |
| FLJ51157 | Testis | HDAC4 | FLJ53990 | Brain | GABRB3_pre | FLJ55961 | Brain | GRM4_pre | FLJ59684 | Brain | PLEKHG5 |
| FLJ51174 | Im | HDAC4 | FLJ53997 | Brain | CTNNA2 | FLJ55997 | Brain | CPNE6 | FLJ59710 | Brain | MCF2 |
| FLJ51177 | Im | HDAC4 | FLJ53999 | Brain | GAB1 | FLJ56033 | Testis | Unknown | FLJ59717 | FB | TBR1 |
| FLJ51210 | Brain | KIFC3 | FLJ54008 | Brain | TPCN1 | FLJ56036 | Tu | KIFC3 | FLJ59769 | Im | PLEKHG5 |
| FLJ51383 | Testis | PPP2R5A | FLJ54011 | Brain | PPFIA2 | FLJ56037 | Testis, Prostate | CUL2 | FLJ59799 | Testis | CTNNA2 |
| FLJ51528 | Im | BTNL8_pre | FLJ54016 | Testis | DIP13B | FLJ56038 | Small intestine | Unknown | FLJ59802 | Testis | ADCY5 |
| FLJ51566 | Brain | PDK1 | FLJ54093 | Brain | GPHN | FLJ56044 | Brain | OXR1 | FLJ59806 | Im | HDAC4 |
| FLJ51606 | Trachea | HABP2_pre | FLJ54100 | Brain | CHN2 | FLJ56093 | Brain | PTPRR_pre | FLJ60503 | Brain | LARGE1 |
| FLJ51663 | Testis | CPS1_pre | FLJ54331 | Brain, Osteoclast | Unknown | FLJ56095 | Brain | KLHL13 | FLJ60665 | Tu | SLC44A5 |
| FLJ51675 | Brain | ETV1 | FLJ54394 | Testis | CRB1_pre | FLJ56110 | FB | GOLSYN | FLJ60667 | Tu | SLC44A5 |
| FLJ51685 | Testis | MCF2 | FLJ54513 | Testis | WDR59 | FLJ56116 | FB | APLP1 | FLJ60693 | FB | PHF21B |
| FLJ51695 | Im | TP74 | FLJ54541 | FB | EXOC4 | FLJ56136 | NT | SLC2A14 | FLJ60998 | Testis | INPP4B |
| FLJ51706 | Testis | RAPGEF4 | FLJ54577 | NT | HDAC9 | FLJ56137 | Im | Unknown | FLJ61124 | Brain | RAB37 |
| FLJ51734 | Uterus | TMEM16C | FLJ54580 | NT | HDAC9 | FLJ56142 | NT | AMOTL2 | FLJ61133 | FB | EXOC4 |
| FLJ51737 | Brain | ARHGEF6 | FLJ54612 | Brain | SH3KBP1 | FLJ56148 | Brain | PLEKHG5 | FLJ61370 | FB | SNCAIP |
| FLJ51769 | Testis | IQGAP2 | FLJ54642 | Brain | APBB1 | FLJ56167 | Testis | KLHL12 | FLJ61443 | Testis | LARGE1 |
| FLJ51805 | Brain | RIMS2 | FLJ54658 | Brain | LSAMP_pre | FLJ56226 | NT | SNCAIP | FLJ61560 | Trachea | TJP2 |
| FLJ51859 | Brain | APBB1 | FLJ54672 | Brain | DOCK4 | FLJ56370 | Testis, Prostate | FKBP8 | FLJ61674 | Brain | PEX5L |
| FLJ51873 | Brain, NT | AGPS_pre | FLJ54673 | Brain | Unknown | FLJ56376 | Brain | MTMR1 | FLJ61679 | Brain | APBB1 |
| FLJ51910 | FB | GTPBP3 | FLJ54674 | Brain | TPCN1 | FLJ56411 | Brain | GRIA2_pre | FLJ53199 | Brain ↓ | NEDD4L |
| FLJ51934 | Im | AOAH_pre | FLJ54690 | Brain | BACE1_pre | FLJ56420 | Testis | DNPEP | FLJ59993 | Brain ↓ | RIMS1 |
| FLJ51957 | NT | ELAVL4 | FLJ54693 | Brain | BACE1_pre | FLJ56452 | Brain | EML2 | FLJ55591 | Brain ↓ | ARHGEF3 |
| FLJ51977 | Brain | Unknown | FLJ54702 | Brain | DLGAP1 | FLJ56634 | Brain | GRM4_pre | FLJ56152 | Brain ↓ | ARHGEF7 |
| FLJ52027 | Testis | ATPAF1 | FLJ54724 | FB | DLG2 | FLJ56895 | Testis | EML2 | FLJ58411 | FB ↓ | CACNB3 |
| FLJ52034 | Im | Unknown | FLJ54738 | Brain | PDZRN3 | FLJ56912 | Uterus | FBLN2_pre | FLJ58949 | FB ↓ | CACNB3 |
| FLJ52037 | Im | GRAP2 | FLJ54742 | Testis | Slmap | FLJ56913 | Placenta, Uterus | FBLN2 | FLJ57810 | Tu ↓ | A2ML1 |
| FLJ52039 | Im | GRAP2 | FLJ54746 | NT | PDZRN3 | FLJ56957 | Brain | TMEM16C | FLJ53545 | Tu ↓ | RARG |
| FLJ52041 | Im | Unknown | FLJ54751 | NT | SUV420H1 | FLJ56961 | Brain | CLTCL1 | |||
| FLJ52042 | Im | GRAP2 | FLJ54906 | Trachea | TMC5 | FLJ56973 | Brain | TMEM16C | |||
| FLJ52288 | Testis | ARMC4 | FLJ54987 | FB | PHF21B | FLJ56979 | Brain | MYRIP |
We analyzed expression profiles of the first exons of ∼1.5 million 5'-ESTs constructed by the oligo-capping method. From this analysis, we selected 261 full-length cDNAs based on the expression levels of their FEVs in specific tissues. Expression levels of cDNAs indicated without any label and with a ‘↓’ label were high and low, respectively, in the respective tissues.
*NT: NT2 cell induced by retinoic acid; FB, fetal brain; Im, immune tissues; Tu, tumor tissues; pre, precursor; unknown, function unknown.
Figure 3Quantitative evaluation of selected genes by real-time PCR. Expression levels of the first exon regions of the selected genes were analyzed by real-time PCR. The data were normalized with respect to that of the human GAPDH as described in the Materials and methods section. The expression levels of genes were represented in log10 base. Expression levels of cDNAs labeled ‘$$’ represent the very low expression level or undetected. (A) FGF13, (B) OXR1, (C) C6orf142, (D) PLD5, (E) FGD4, (F) C6orf32. BW, brain, whole; BC, brain, cerebellum; BF, fetal brain; SP, spleen; BM, bone marrow; TH, thymus; OV, ovary; PR, prostate; UT, uterus; MT, mixture of tumor human tissues; MN, control, mixture of normal human tissues; KT, kidney tumor; LT, lung tumor.