| Literature DB >> 28972578 |
Nicolas Bertin1,2, Mickaël Mendez1,2, Akira Hasegawa1,2, Marina Lizio1,2, Imad Abugessaisa1,2, Jessica Severin1,2, Mizuho Sakai-Ohno1,2, Timo Lassmann1,2, Takeya Kasukawa1, Hideya Kawaji1,2,3, Yoshihide Hayashizaki2,3, Alistair R R Forrest1,2,3, Piero Carninci1,2, Charles Plessy1,2.
Abstract
The FANTOM5 expression atlas is a quantitative measurement of the activity of nearly 200,000 promoter regions across nearly 2,000 different human primary cells, tissue types and cell lines. Generation of this atlas was made possible by the use of CAGE, an experimental approach to localise transcription start sites at single-nucleotide resolution by sequencing the 5' ends of capped RNAs after their conversion to cDNAs. While 50% of CAGE-defined promoter regions could be confidently associated to adjacent transcriptional units, nearly 100,000 promoter regions remained gene-orphan. To address this, we used the CAGEscan method, in which random-primed 5'-cDNAs are paired-end sequenced. Pairs starting in the same region are assembled in transcript models called CAGEscan clusters. Here, we present the production and quality control of CAGEscan libraries from 56 FANTOM5 RNA sources, which enhances the FANTOM5 expression atlas by providing experimental evidence associating core promoter regions with their cognate transcripts.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28972578 PMCID: PMC5625555 DOI: 10.1038/sdata.2017.147
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Summary of the libraries prepared.
| The RNA identifier (Source.Name) can be searched in the FANTOM5 SSTAR database[ | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NCig10013 | 10002-101A5 | SABiosciences XpressRef Human Universal Total RNA, pool1 | 12,980,474 | 865,232 | 53,578 | 8,620,490 | 303,381 | 336,125 | 914,464 | 352,035 | 557,012 | 978,157 |
| NCig10014 | 10012-101C3 | brain, adult, pool1 | 10,041,908 | 789,460 | 56,053 | 3,579,617 | 156,712 | 499,657 | 1,967,826 | 479,841 | 1,007,678 | 1,505,064 |
| NCig10015 | 10016-101C7 | heart, adult, pool1 | 17,071,911 | 657,065 | 67,493 | 12,680,315 | 189,587 | 360,818 | 1,791,722 | 341,589 | 679,820 | 303,502 |
| NCig10016 | 10026-101D8 | testis, adult, pool1 | 12,778,881 | 735,402 | 56,663 | 8,828,467 | 229,250 | 357,108 | 761,921 | 321,059 | 575,393 | 913,618 |
| NCig10017 | 10030-101E3 | retina, adult, pool1 | 7,438,898 | 983,120 | 49,209 | 2,016,040 | 91,931 | 396,226 | 1,395,594 | 341,442 | 574,800 | 1,590,536 |
| NCig10018 | 11210-116A4 | Smooth Muscle Cells—Aortic, donor0 | 16,069,580 | 636,805 | 76,008 | 14,079,255 | 126,255 | 163,413 | 505,347 | 152,188 | 219,216 | 111,093 |
| NCig10019 | 12176-128I7 | Whole blood (ribopure), donor090325, donation1 | 11,721,271 | 745,633 | 59,630 | 5,626,776 | 168,819 | 557,490 | 1,749,239 | 502,772 | 701,914 | 1,608,998 |
| NCig10020 | 10019-101D1 | lung, adult, pool1 | 13,194,146 | 853,743 | 90,406 | 8,943,567 | 141,998 | 347,985 | 819,745 | 309,787 | 599,126 | 1,087,789 |
| NCig10021 | 10022-101D4 | prostate, adult, pool1 | 11,134,114 | 746,116 | 64,413 | 6,095,769 | 167,516 | 486,536 | 1,152,553 | 366,192 | 716,651 | 1,338,368 |
| NCig10022 | 10025-101D7 | spleen, adult, pool1 | 8,339,981 | 852,309 | 62,976 | 3,130,575 | 143,638 | 431,102 | 1,038,248 | 402,237 | 785,577 | 1,493,319 |
| NCig10023 | 10150-102I6 | medial frontal gyrus, adult, donor10252 | 4,512,027 | 960,285 | 25,800 | 467,916 | 81,388 | 389,764 | 534,385 | 224,651 | 506,845 | 1,320,993 |
| NCig10024 | 10151-102I7 | amygdala, adult, donor10252 | 6,314,079 | 858,652 | 33,503 | 1,112,179 | 112,044 | 450,492 | 801,134 | 321,988 | 702,445 | 1,921,642 |
| NCig10025 | 10153-102I9 | hippocampus, adult, donor10252 | 5,068,313 | 892,716 | 32,233 | 861,081 | 80,184 | 360,888 | 643,485 | 241,503 | 530,452 | 1,425,771 |
| NCig10026 | 10154-103A1 | thalamus, adult, donor10252 | 8,151,958 | 817,751 | 35,872 | 2,262,386 | 95,945 | 505,911 | 1,296,220 | 377,511 | 716,479 | 2,043,883 |
| NCig10027 | 10155-103A2 | medulla oblongata, adult, donor10252 | 9,999,787 | 1,397,247 | 78,367 | 2,366,286 | 132,873 | 610,522 | 1,541,331 | 446,644 | 906,737 | 2,519,780 |
| NCig10028 | 10157-103A4 | parietal lobe, adult, donor10252 | 9,456,143 | 936,084 | 71,291 | 1,471,269 | 153,886 | 694,447 | 1,387,954 | 463,355 | 1,096,767 | 3,181,090 |
| NCig10029 | 10158-103A5 | substantia nigra, adult, donor10252 | 7,656,663 | 1,078,146 | 59,251 | 2,360,698 | 85,562 | 425,712 | 1,188,939 | 344,322 | 602,764 | 1,511,269 |
| NCig10030 | 10159-103A6 | spinal cord, adult, donor10252 | 9,651,183 | 888,981 | 79,200 | 2,801,018 | 116,324 | 594,991 | 1,353,570 | 442,320 | 903,764 | 2,471,015 |
| NCig10031 | 10160-103A7 | pineal gland, adult, donor10252 | 7,577,434 | 1,011,960 | 65,055 | 1,343,792 | 103,948 | 545,004 | 944,389 | 348,323 | 744,959 | 2,470,004 |
| NCig10032 | 10161-103A8 | globus pallidus, adult, donor10252 | 11,489,499 | 821,077 | 80,387 | 4,015,936 | 130,251 | 673,588 | 1,632,249 | 469,685 | 916,587 | 2,749,739 |
| NCig10033 | 10162-103A9 | pituitary gland, adult, donor10252 | 8,630,256 | 970,964 | 49,755 | 1,932,932 | 124,341 | 563,707 | 1,591,606 | 455,105 | 847,858 | 2,093,988 |
| NCig10034 | 10163-103B1 | occipital cortex, adult, donor10252 | 9,407,509 | 905,254 | 44,694 | 1,193,623 | 136,650 | 708,731 | 1,223,230 | 432,869 | 1,030,595 | 3,731,863 |
| NCig10035 | 10164-103B2 | caudate nucleus, adult, donor10252 | 6,816,957 | 1,102,408 | 38,186 | 869,711 | 109,813 | 476,042 | 1,310,808 | 346,046 | 754,780 | 1,809,163 |
| NCig10036 | 10165-103B3 | locus coeruleus, adult, donor10252 | 6,753,026 | 1,045,453 | 49,711 | 1,251,961 | 97,962 | 454,490 | 1,173,365 | 330,395 | 729,525 | 1,620,164 |
| NCig10037 | 10166-103B4 | cerebellum, adult, donor10252 | 6,025,035 | 1,095,992 | 54,415 | 368,370 | 62,650 | 519,173 | 650,125 | 254,418 | 492,016 | 2,527,876 |
| NCig10038 | 11207-116A1 | Endothelial Cells—Aortic, donor0 | 15,261,564 | 718,897 | 98,322 | 12,128,937 | 142,908 | 291,064 | 820,844 | 298,247 | 455,971 | 306,374 |
| NCig10039 | 11222-116B7 | Fibroblast—Gingival, donor4 (GFH2) | 5,865,574 | 885,111 | 167,833 | 2,081,881 | 108,043 | 284,519 | 1,080,129 | 346,774 | 547,431 | 363,853 |
| NCig10040 | 11224-116B9 | CD14+ Monocytes, donor1 | 11,232,175 | 651,017 | 101,461 | 4,297,440 | 152,438 | 540,035 | 1,268,085 | 510,000 | 645,877 | 3,065,822 |
| NCig10041 | 11229-116C5 | CD14+ monocyte derived endothelial progenitor cells, donor1 | 10,775,321 | 1,032,309 | 242,145 | 3,950,479 | 190,791 | 539,087 | 1,666,613 | 561,795 | 835,546 | 1,756,556 |
| NCig10042 | 11245-116E3 | Fibroblast—Aortic Adventitial, donor1 | 9,543,436 | 735,498 | 828,670 | 3,376,827 | 198,604 | 517,858 | 2,135,913 | 655,705 | 710,317 | 384,044 |
| NCig10043 | 11246-116E4 | Intestinal epithelial cells (polarized), donor1 | 6,681,741 | 919,980 | 392,820 | 1,056,095 | 120,525 | 433,407 | 2,003,503 | 513,395 | 536,761 | 705,255 |
| NCig10044 | 11247-116E5 | Mesothelial Cells, donor1 | 7,150,721 | 870,202 | 443,481 | 1,547,197 | 127,726 | 418,103 | 2,291,855 | 516,801 | 516,012 | 419,344 |
| NCig10045 | 11248-116E6 | Anulus Pulposus Cell, donor1 | 9,329,478 | 673,123 | 467,283 | 4,733,836 | 191,255 | 418,230 | 1,674,689 | 474,236 | 571,373 | 125,453 |
| NCig10046 | 11249-116E7 | Pancreatic stromal cells, donor1 | 6,917,860 | 895,678 | 266,841 | 1,598,919 | 129,096 | 447,633 | 1,839,323 | 563,482 | 606,168 | 570,720 |
| NCig10047 | 11256-116F5 | Small Airway Epithelial Cells, donor1 | 8,934,394 | 762,215 | 197,286 | 3,113,793 | 175,723 | 506,591 | 2,330,196 | 629,638 | 801,987 | 416,965 |
| NCig10048 | 11273-116H4 | Mammary Epithelial Cell, donor1 | 10,019,381 | 890,533 | 198,834 | 4,561,811 | 208,742 | 497,865 | 2,025,351 | 497,139 | 721,321 | 417,785 |
| NCig10049 | 11278-116H9 | Placental Epithelial Cells, donor1 | 11,212,007 | 523,668 | 434,079 | 7,019,440 | 196,493 | 358,154 | 1,908,353 | 304,347 | 296,384 | 171,089 |
| NCig10050 | 11282-116I4 | Skeletal muscle cells differentiated into Myotubes—multinucleated, donor1 | 8,911,706 | 825,574 | 278,816 | 3,852,864 | 174,167 | 447,392 | 2,002,086 | 521,214 | 534,883 | 274,710 |
| NCig10051 | 11468-119C1 | Preadipocyte—omental, donor1 | 5,109,588 | 863,018 | 244,070 | 1,743,473 | 94,850 | 257,299 | 790,493 | 304,494 | 524,536 | 287,355 |
| NCig10052 | 11487-119E2 | Mast cell—stimulated, donor1 | 4,388,468 | 1,047,459 | 53,428 | 390,687 | 86,272 | 312,008 | 1,219,897 | 244,001 | 294,922 | 739,794 |
| NCig10053 | 10411-106B6 | renal cell carcinoma cell line:OS-RC-2 | 6,905,711 | 774,316 | 209,703 | 2,297,666 | 117,997 | 421,779 | 1,058,325 | 387,862 | 580,214 | 1,057,849 |
| NCig10054 | 10412-106B7 | malignant trichilemmal cyst cell line:DJM-1 | 9,858,285 | 728,347 | 139,130 | 3,031,554 | 164,712 | 630,434 | 2,045,672 | 648,552 | 895,267 | 1,574,617 |
| NCig10055 | 10414-106B9 | maxillary sinus tumor cell line:HSQ-89 | 9,125,063 | 857,517 | 135,611 | 2,250,778 | 123,989 | 579,817 | 1,951,209 | 498,578 | 697,019 | 2,030,545 |
| NCig10056 | 10431-106D8 | epidermoid carcinoma cell line:Ca Ski | 5,074,986 | 1,071,422 | 146,593 | 982,637 | 87,543 | 343,508 | 859,004 | 378,966 | 452,570 | 752,743 |
| NCig10057 | 10436-106E4 | signet ring carcinoma cell line:Kato III | 8,693,941 | 840,579 | 145,687 | 3,244,444 | 137,763 | 512,628 | 1,657,263 | 503,623 | 614,540 | 1,037,414 |
| NCig10058 | 10442-106F1 | schwannoma cell line:HS-PSS | 7,714,618 | 941,029 | 176,159 | 1,799,562 | 134,668 | 519,866 | 1,659,980 | 589,180 | 733,263 | 1,160,911 |
| NCig10059 | 10444-106F3 | glioblastoma cell line:A172 | 8,266,061 | 861,701 | 175,094 | 2,804,921 | 186,736 | 495,954 | 1,209,931 | 520,670 | 712,755 | 1,298,299 |
| NCig10060 | 10454-106G4 | chronic myelogenous leukemia cell line:K562 | 4,756,581 | 1,045,797 | 109,272 | 645,627 | 70,593 | 363,272 | 675,740 | 342,295 | 400,380 | 1,103,605 |
| NCig10061 | 10464-106H5 | acute lymphoblastic leukemia (T-ALL) cell line:Jurkat | 9,344,079 | 869,111 | 131,089 | 2,562,674 | 178,216 | 687,478 | 1,748,129 | 774,916 | 819,425 | 1,573,041 |
| NCig10062 | 10508-107D4 | neuroblastoma cell line:CHP-134, tech_rep1 | 4,622,691 | 962,974 | 148,947 | 278,618 | 57,421 | 405,741 | 662,738 | 258,098 | 391,938 | 1,456,216 |
| NCig10063 | 10552-107I3 | cervical cancer cell line:D98-AH2, tech_rep1 | 4,307,425 | 1,005,845 | 156,179 | 421,514 | 70,319 | 310,350 | 1,186,016 | 271,445 | 368,888 | 516,869 |
| NCig10064 | 10558-107I9 | osteosarcoma cell line:HS-Os-1, tech_rep1 | 4,374,077 | 983,856 | 182,894 | 548,737 | 80,879 | 357,116 | 711,651 | 286,493 | 395,130 | 827,321 |
| NCig10065 | 10410-106B5 | extraskeletal myxoid chondrosarcoma cell line:H-EMC-SS, tech_rep1 | 3,965,350 | 928,677 | 138,036 | 393,526 | 64,950 | 343,707 | 582,912 | 220,600 | 400,189 | 892,753 |
| NCig10066 | 10441-106E9 | synovial sarcoma cell line:HS-SY-II, tech_rep1 | 4,039,831 | 844,408 | 197,018 | 574,814 | 57,821 | 348,974 | 523,331 | 235,006 | 375,904 | 882,555 |
| NCig10067 | 10474-106I6 | myeloma cell line:PCM6, tech-rep1 | 4,582,185 | 810,459 | 186,856 | 755,594 | 71,301 | 358,371 | 807,453 | 278,280 | 416,507 | 897,364 |
| NCig10068 | 10424-106D1 | splenic lymphoma with villous lymphocytes cell line:SLVL | 4,458,999 | 852,283 | 163,002 | 455,663 | 80,461 | 376,376 | 969,585 | 280,831 | 377,707 | 903,091 |
| NCig10126 | 10508-107D4 | neuroblastoma cell line:CHP-134, tech_rep2 | 5,259,146 | 995,795 | 48,625 | 550,450 | 63,938 | 396,327 | 701,319 | 298,112 | 438,554 | 1,766,026 |
| NCig10127 | 10552-107I3 | cervical cancer cell line:D98-AH2, tech_rep2 | 4,097,389 | 1,015,542 | 60,740 | 646,950 | 64,609 | 304,041 | 930,823 | 243,837 | 338,193 | 492,654 |
| NCig10128 | 10558-107I9 | osteosarcoma cell line:HS-Os-1, tech_rep2 | 4,681,628 | 968,282 | 75,235 | 865,891 | 76,828 | 336,463 | 737,523 | 296,891 | 409,283 | 915,232 |
| NCig10129 | 10410-106B5 | extraskeletal myxoid chondrosarcoma cell line:H-EMC-SS, tech_rep2 | 3,118,570 | 822,752 | 40,614 | 436,600 | 112,425 | 377,956 | 276,992 | 152,854 | 276,872 | 621,505 |
| NCig10130 | 10441-106E9 | synovial sarcoma cell line:HS-SY-II, tech_rep2 | 3,232,761 | 726,773 | 64,905 | 633,633 | 81,424 | 473,478 | 240,861 | 160,678 | 250,046 | 600,963 |
| NCig10131 | 10474-106I6 | myeloma cell line:PCM6, tech-rep2 | 3,985,344 | 720,124 | 60,988 | 898,043 | 78,483 | 322,369 | 566,406 | 231,781 | 346,042 | 761,108 |
Figure 1ZENBU view of CAGEscan data.
CAGEscan clusters revealing new promoters for the SH3BGRL2 gene. Features on the plus and minus strand are displayed in green and purple respectively. Promoter regions of interest are highlighted with ellipses in track D. (a) Genomic coordinates. (b) FANTOM5 CAGE signal as a quantitative histogram. (c) CAGEscan CAGE signal. (d) CAGEscan meta-clusters, combining pairs for all libraries. The name of the seed CAGE peak is indicated on the left of each cluster. (e) NCBI Gene bodies. (f) GENCODE 19 annotations. (g) GenBank mRNA sequences. (h) EST sequences supporting the CAGEscan clusters.
Figure 2FANTOM5 CAGEscan processing workflow.
Processing pipeline. The diagram made of boxes connected by black arrows displays the MOIRAI workflow completed for one (NCig10013) of the 62 CAGEscan libraries. The coloured text and arrows overlayed on the diagram represents the points where the main alignment statistics are calculated to summarise the number of read pairs passing all the filters (CAGEscan pairs) or discarded at each step of the processing pipeline (Unextracted, rDNA, Artefacts, Non-aligned, Non-proper, Duplicates).
Figure 3Alignment and annotation statistics. Quality control statistics.
(a) Fraction of pairs passing all filters (CAGEscan pairs) or discarded at key steps of the processing pipeline (see Fig. 2). The central block of stack bars represents each library individually. The left block aggregates them by sequencing batch, named by the sequencing run identifier. The right block aggregates the libraries by sample type. Each sample type is represented by one colour, that is also used to colour the library identifiers and the sequence identifiers in the other blocks. Batches comprising multiple types are indicated by multiple colours. (b) Fraction of pairs starting in a Promoter, Exon, or Other (non-promoter, non-exon) region.
Figure 4Similarity between libraries.
Heatmap of the Jaccard similarity indexes computed between each pair of libraries. Sample type and batches are indicated by a colour code near library names, and pairs of replicates are indicated by an asterisk superimposed to the square displaying their similarity index.