| Literature DB >> 18487259 |
Mio Oshikawa1, Yoshiko Sugai, Ron Usami, Kuniyo Ohtoko, Shigeru Toyama, Seishi Kato.
Abstract
Recently, we have developed a vector-capping method for constructing a full-length cDNA library. In the present study, we performed in-depth analysis of the vector-capped cDNA library prepared from a single type of cell. As a result of single-pass sequencing analysis of 24,000 clones randomly isolated from the unamplified library, we identified 19,951 full-length cDNA clones whose intactness was confirmed by the presence of an additional G at their 5' end. The full-length cDNA content was >95%. Mapping these sequences to the human genome, we identified 4,513 transcriptional units that include 36 antisense transcripts against known genes. Comparison of the frequencies of abundant clones showed that the expression profiles of different libraries, including the distribution of transcriptional start sites (TSSs), were reproducible. The analysis of long-sized cDNAs showed that this library contained many cDNAs with a long-sized insert up to 11,199 bp of golgin B, including multiple slicing variants for filamin A and filamin B. These results suggest that the size-unbiased full-length cDNA library constructed using the vector-capping method will be an ideal resource for fine expression profiling of transcriptional variants with alternative TSSs and alternative splicing.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18487259 PMCID: PMC2650634 DOI: 10.1093/dnares/dsn010
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Summary of single-pass sequencing analysis of libraries
| Library Name | Lib-1 | Lib-2 | ||
|---|---|---|---|---|
| Clone Name | ARe/ARf | ARi | ARiS | |
| Vector | pKA1U5 | pGCAP10 | pGCAP10 | |
| No | Yes | Yes | ||
| Total | 10 176 | 6528 | 7296 | |
| Unreadable | 940 | 340 | 588 | |
| Readable | 9236 | 6188 | 6708 | |
| Insert-free vector | 305 | 244 | 221 | |
| dT tail | 177 | 10 | 9 | |
| Mitochondria | 86 | 65 | 69 | |
| rRNA | 3 | 2 | 4 | |
| cDNA insert | 8665 | 5867 | 6405 | |
| Full-length | 8275 | 5586 | 6090 | |
| Truncated | 310 | 243 | 271 | |
| Poly(A) | 80 | 38 | 44 | |
Figure 1Frequencies of 310 kinds of abundant genes with ≥0.05% content (10 clones) obtained from three libraries. The inlet shows the number of low-redundant genes with ≤0.05% content at each redundancy.
Figure 2Comparison of frequencies of abundant genes. (A) Between different pools from the same library, Lib-2. (B) Between different libraries, Lib-1 and Lib-2. The genes with ≥0.1% content were plotted. The top 10 genes are designated by gene symbols. The line represents the case without bias.
Figure 3Estimation of the total number of genes composed of each library. (A) The cumulative number of gene occurrences, D, was plotted for t, the accumulated number of sequenced clones. In this study, novel gene occurrence per 96 clones was counted. The best fitting curves were obtained using the hyperbolic equation described. The asymptotic value, S, represents the estimate of the total number of genes. (B) Abundance-based coverage estimator model ACE-1. The number of genes containing 10 or fewer clones was used to calculate the gene richness.
Figure 4Comparison among distributions of transcriptional start sites. Black bar, Lib-1; Gray bar, Lib-2; White bar, DBTSS. Position 1 is defined as a major TSS.
Long-sized full-length cDNA clones with >7 kb insert
| HP ID | Symbol | Name | RefSeq | mRNA (bp) | Protein (aa) | Clone ID | Ac. No. | cDNA (bp) | Protein (aa) | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | HP08164 | GOLGB1 | Golgi autoantigen, golgin subfamily b, macrogolgin (with transmembrane signal) 1 | NM_004487.3 | 11 185 | 3259 | ARiS161G17 | AB371588 | 11 198 | 3269 |
| 2 | HP07459 | N4BP2 | Nedd4 binding protein 2 | NM_018177.2 | 6616 | 1770 | ARiS023B20 | AB371584 | 9736 | 1690 |
| 3 | HP08032 | ACACA | Acetyl-Coenzyme A carboxylase alpha | NM_198836.1 | 9585 | 2346 | ARiS088K16 | AB371587 | 9534 | 2346 |
| 4 | HP04958 | FLNB | Filamin B, beta (actin binding protein 278) | NM_001457.2 | 9463 | 2602 | ARe23D04 | AB191258 | 9405 | 2591 |
| 5 | HP04958 | FLNB | ARe77D06 | AB371580 | 8059 | 2633 | ||||
| 6 | HP04958 | FLNB | ARe89D09 | AB371581 | 9366 | 2578 | ||||
| 7 | HP04958 | FLNB | ARi12F08 | AB371582 | 7973 | 2602 | ||||
| 8 | HP07616 | FLNC | Filamin C, gamma (actin binding protein 280) | NM_001458.3 | 9146 | 2725 | ARi57A02 | AB371585 | 9156 | 2725 |
| 9 | HP07744 | SPTBN1 | Spectrin, beta, non-erythrocytic 1 | NM_003128.2 | 10 238 | 2364 | ARiS088A21 | AB371586 | 8443 | 2364 |
| 10 | HP00079 | FLNA | Filamin A, alpha (actin binding protein 280) | NM_001456.2 | 8278 | 2639 | ARe06F05 | AB191259 | 8212 | 2612 |
| 11 | HP00079 | FLNA | ARe27E03 | AB191260 | 8241 | 2620 | ||||
| 12 | HP00079 | FLNA | ARi13C12 | AB371574 | 8242 | 2620 | ||||
| 13 | HP00079 | FLNA | ARi37B09 | AB371575 | 8212 | 2612 | ||||
| 14 | HP00079 | FLNA | ARi47G07 | AB371576 | 8243 | 2620 | ||||
| 15 | HP00079 | FLNA | ARi50A09 | AB371577 | 7321 | 2315 | ||||
| 16 | HP00079 | FLNA | ARi66B08 | AB371578 | 8212 | 2612 | ||||
| 17 | HP00079 | FLNA | ARiS088J13 | AB371579 | 8214 | 2612 | ||||
| 18 | HP06504 | COL5A1 | Collagen, type V, alpha 1 | NM_000093.3 | 8439 | 1838 | ARe79B07 | AB371586 | 8139 | 1838 |
| 19 | HP07532 | GCN1L1 | GCN1 general control of amino-acid synthesis 1-like 1 (yeast) | NM_006836.1 | 8699 | 2671 | ARi43H04 | — | 8.0 k | |
| 20 | HP06485 | PTPRF | Protein tyrosine phosphatase, receptor type, F | NM_002840.2 | 7718 | 1897 | ARe76H09 | — | 8.0 k | |
| 21 | HP05449 | SPTAN1 | Spectrin, alpha, non-erythrocytic 1 (alpha-fodrin) | NM_003127.1 | 7787 | 2472 | ARe45C02 | AB191262 | 7791 | 2452 |
| 22 | HP00124 | FN1 | Fibronectin 1 | NM_212476.1 | 8272 | 2296 | ARe05G09 | AB191261 | 7753 | 2265 |
| 23 | HP00124 | FN1 | ARi53A03 | — | 7.7 k | |||||
| 24 | HP00124 | FN1 | ARiS087I09 | — | 7.9 k | |||||
| 25 | HP06896 | PCM1 | Pericentriolar material 1 | NM_006197.3 | 8788 | 2024 | ARi07G10 | — | 7.5 k | |
| 26 | HP04727 | GLIS3 | GLIS family zinc finger 3 | NM_001042413.1 | 7656 | 930 | ARe06E05 | — | 7.5 k | |
| 27 | HP04890 | BAT2 | HLA-B associated transcript 2 | NM_080686.1 | 6750 | 2157 | ARe17F01 | — | 7.5 k | |
| 28 | HP04890 | BAT2 | ARiS122P21 | — | 7.0 k | |||||
| 29 | HP04715 | ABCA7 | ATP-binding cassette, sub-family A (ABC1), member 7 | NM_019112.2 | 6704 | 2146 | ARe05B12 | — | 7.5 k | |
| 30 | HP07625 | WWC2 | WW, C2 and coiled-coil domain containing 2 | NM_024949.4 | 6492 | 987 | ARi58B08 | — | 7.5 k | |
| 31 | HP04667 | MYH9 | myosin, heavy polypeptide 9, non-muscle | NM_002473.3 | 7274 | 1960 | ARe01B05 | AB191263 | 7436 | 1960 |
| 32 | HP04667 | MYH9 | ARe12G09 | AB191263 | 7450 | 1960 | ||||
| 33 | HP04667 | MYH9 | ARe58E01 | — | 7.5 k | |||||
| 34 | HP04667 | MYH9 | ARi26G12 | — | 7.5 k | |||||
| 35 | HP04667 | MYH9 | ARi63B09 | — | 7.5 k | |||||
| 36 | HP04667 | MYH9 | ARi63H03 | — | 7.0 k | |||||
| 37 | HP04667 | MYH9 | ARiS045L21 | — | 7.5 k | |||||
| 38 | HP04680 | AGRN | Agrin | NM_198576.2 | 7319 | 2045 | ARe02B04 | AB191264 | 7319 | 2045 |
| 39 | HP04680 | AGRN | ARiS085N10 | — | 7.4 k | |||||
| 40 | HP07482 | CDC42BPB | CDC42 binding protein kinase beta (DMPK-like) | NM_006035.2 | 6782 | 1711 | ARiS023M02 | — | 7.2 k | |
| 41 | HP07424 | MAP9 | microtubule-associated protein 9 | NM_001039580.1 | 7333 | 647 | ARiS109F14 | — | 7.1 k | |
| 42 | HP07506 | ROD1 | ROD1 regulator of differentiation 1 (S. pombe) | NM_005156.4 | 7230 | 552 | ARi40D10 | — | 7.0 k | |
| 43 | HP07574 | TRAM2 | Translocation associated membrane protein 2 | NM_012288.3 | 7065 | 370 | ARi50G08 | — | 7.0 k | |
| 44 | HP06927 | COL4A1 | Collagen, type IV, alpha 1 | NM_001845.4 | 6549 | 1669 | ARe94G08 | — | 7.0 k | |
| 45 | HP06927 | COL4A1 | ARi60G12 | — | 8.2 k | |||||
| 46 | HP07685 | ARHGAP23 | Rho GTPase activating protein 23 | XM_290799.7 | 6475 | 1684 | ARi70H06 | — | 7.0 k | |
| 47 | HP02917 | PLXNB2 | Plexin B2 | XM_371474.4 | 6434 | 1901 | ARe49A01 | — | 7.0 k | |
| 48 | HP06882 | GPAM | Glycerol-3-phosphate acyltransferase, mitochondrial | NM_020918.3 | 6386 | 828 | ARi06G08 | — | 7.0 k |
aThe frame shift owing to the insertion of one nucleotide A in the GOLGB1 clone was corrected.
bRefSeq terminates with a short A-stretch in a 3'-untranslated region.
cIn addition, 8 and 5 clones with >6 kb were obtained for COL4A1 and PLXNB2, respectively.
Figure 5The exon–intron structure of splicing variants for FLNA (A) and FLNB (B). Arrow heads represent exon inclusion or deletion.
Figure 6Correlation between the number of isolated full-length clones and the mRNA content. The size of mRNA was described in parentheses following each gene symbol.