Literature DB >> 22997578

A Prevalence of Imprinted Genes within the Total Transcriptomes of Human Tissues and Cells.

Sergey V Anisimov1.   

Abstract

Genomic imprinting is an epigenetic phenomenon that causes a differential expression of paternally and maternally inherited alleles of a subset of genes (the so-called imprinted genes). Imprinted genes are distributed throughout the genome and it is predicted that about 1% of the human genes may be imprinted. It is recognized that the allelic expression of imprinted genes varies between tissues and developmental stages. The current study represents the first attempt to estimate a prevalence of imprinted genes within the total human transcriptome. In silico analysis of the normalized expression profiles of a comprehensive panel of 173 established and candidate human imprinted genes was performed, in 492 publicly available SAGE libraries. The latter represent human cell and tissue samples in a variety of physiological and pathological conditions. Variations in the prevalence of imprinted genes within the total transcriptomes (ranging from 0.08% to 4.36%) and expression profiles of the individual imprinted genes are assessed. This paper thus provides a useful reference on the size of the imprinted transcriptome and expression of the individual imprinted genes.

Entities:  

Year:  2012        PMID: 22997578      PMCID: PMC3446743          DOI: 10.1155/2012/793506

Source DB:  PubMed          Journal:  Mol Biol Int        ISSN: 2090-2182


1. Introduction

Genomic imprinting is an epigenetic phenomenon that causes a differential expression of paternally and maternally inherited alleles of a minor subset of genes (the so-called imprinted genes). Genomic imprinting was first discovered in 1984 [1, 2], and in 1991 the first imprinted genes (IGF2, paternally expressed; IGF2R and H19, maternally expressed) were identified in the mouse [3-5]. Since then, the imprinting status was confirmed for numerous genes in Homo sapiens and Mus musculus genomes, less for Bos taurus, Rattus norvegicus, Sus scrofa, Canis lupus familiaris, and Ovis aries; many more genes are considered candidates [6]. Functional significance of the genomic imprinting is not yet fully understood [7-9], while alterations in the expression of imprinted genes are linked to certain pathologies, including Angelman syndrome, Prader-Willi syndrome, and particular cancer subtypes. Genomic imprinting varies between species and tissues. Furthermore, it is a dynamic process and may vary depending on the developmental stage [10]. The goal of the study was to estimate a prevalence of imprinted genes within the total human transcriptome, in cell and tissue samples in a variety of physiological and pathological conditions. Serial analysis of gene expression (SAGE) is a sequence-based technique to study mRNA transcripts quantitatively in cell populations [11]. Two major principles underline SAGE: first, short (10 bp) expressed sequenced tags (ESTs) are sufficient to identify individual gene products, and second, multiple tags can be concatenated and identified by sequence analysis. SAGE results are reported in either absolute or relative numbers of tags, which permits direct comparisons between tag catalogues and datasets [12-15]. Numerous technical adaptations assured a development of similar techniques [16], yet SAGE remains an important tool of modern molecular biology. It is widely used in a number of applications, of which a molecular dissection of cancer genome is the major [17]. In the current study, expression of established and candidate imprinted genes was evaluated in a wide array of cell and tissue samples using a comprehensive set of currently available SAGE data for Homo sapiens. Five hundred eighty-one SAGE catalogues based on the libraries generated with most commonly used NlaIII anchoring enzyme were screened using a conservative set of criteria, and in 492 of these (accounting for nearly 36 million SAGE tags) gene expression profiles of the imprinted genes were analyzed, using a proved algorithm [18]. It was therefore possible to estimate a prevalence of imprinted genes within the total human transcriptome.

2. Methods

2.1. Imprinted Gene Subsets

Established and candidate imprinted gene subset was assembled based on the Geneimprint resource (http://www.geneimprint.com/; credits to R.L. Jirtle) and Luedi et al. study [6]. Of the latter, high-confidence imprinted human gene candidates predicted to be imprinted by both the linear and RBF kernel classifiers learned by Equbits Foresight and by SMLR ([6], supplementary data) were utilized. Redundant entries have been excluded.

2.2. SAGE

SAGE technology is based on isolation of short tags form the appropriate position within the mRNA molecule, followed by the concatemerization of the tags, sequencing, tag extraction and gene annotation [11]. The complete set of publicly available SAGE libraries (GPL4 dataset, NlaIII anchoring enzyme) was downloaded from the Gene Expression Omnibus (GEO) database (National Center of Biotechnology Information (NCBI); http://www.ncbi.nlm.nih.gov/geo/). Following an exclusion of the duplicate entries, SAGE libraries were annotated and sorted based on the number of tags sequenced. Noninformative (A)10 sequences were extracted from SAGE libraries when detected, and tags per million (tpm) values were recalculated accordingly for all libraries as the transcript's raw tag count divided by the number of reliable tags in the library and multiplied by 1,000,000. SAGE libraries, constructed by Potapova et al. [19], were a subject to a “clean-up” procedure through which all clones containing ≤4 tags were excluded [20], with the remaining tags constituting the pool of “reliable tags.”

2.3. SAGE Tag Annotation

Established and candidate imprinted gene subset has matched CGAP (Cancer Genome Anatomy Project, NCI, NIH) SAGE Anatomic Viewer (SAV) applet [17]. For genes not matching SAV applet entries, and when unreliable/internal tags were suggested by SAV applet (viz., for TIGD1, HOXA3, NTRI genes, etc.), reliable 3′ end tags were extracted from full-length sequences available via GenBank (NCBI, NIH).

2.4. Expression Profiling

SAGE tags was matched the individual SAGE catalogues using MS Access software package Query function. Individual queries (both absolute tag abundance per library and normalized tag per million (tpm) values) were merged using MS Excel software. Calculations of maximal and average expression of transcripts matching established and candidate imprinted genes were performed using normalized tpm values. Particular values could be recalculated to the fraction of the total gene expression by dividing tpm value by 1,000,000.

2.5. Clustering Analysis

Clustering analysis was performed using EPCLUST Expression Profile data CLUSTering and analysis software (http://www.bioinf.ebc.ee/EP/EP/EPCLUST/). K-mean clustering analysis was performed after transposing the data matrix with initial clusters chosen by most distant (average) transcripts. For each dataset, the number of clusters was set to the lowest value yielding one cluster containing a solitary database entry. Hierarchical clustering was performed using correlation measure-based distance/average linkage (average distance) clustering method; hierarchical trees were built for individual datasets.

3. Results

Established and candidate human imprinted gene subset (203 entries total) was assembled based on the Geneimprint resource and Luedi et al. study data [6]. Of the candidate imprinted genes identified in the latter, high-confidence gene candidates (predicted via Equbits Foresight and SMLR means [6]) were selected. Following exclusion of the redundant entries, appropriate short (10 bp) SAGE tags matching NlaIII anchoring enzyme were annotated to gene targets using CGAP (Cancer Genome Anatomy Project, NCI, NIH) SAGE Anatomic Viewer (SAV) applet or manually, as described earlier [18]. For a number of the candidate imprinted genes, a complete sequence was unavailable via GenBank or alternative databases (e.g., GenBank ID: NM_016158, NM_024547, NM_181648, etc.), for that reason, a volume of the human imprinted gene subset subjected to tag annotation was reduced to 174 genes. Of these genes, candidate imprinted gene Q9NYI9 (PPARL; GenBank ID: AF242527) could not be annotated with SAGE tag, missing NlaIII anchoring enzyme recognition sites completely. Therefore, a total of 173 genes (including 53 established imprinted genes and 120 candidate imprinted genes) were annotated with the appropriate SAGE tags (Table 1) and subjected to further analysis.
Table 1

SAGE tag annotation for established and candidate imprinted gene subset.

N Gene symbolGene nameAliasesLocationa StatusExpressed allele NlaIII tagNotesGenBank accession number
1NDUFA4NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 4, 9 kDa1p13.3CandidatePaternalTTGGAGATCTBC105295
2GFI1Growth-factor-independent 1 transcription repressorZNF1631p22.1CandidatePaternalTGTACCATAGNM_001127215
3NM019610RNA-binding motif protein, X-linked-like 1 (RBMXL1), transcript variant 21p22.2CandidateMaternalGCAGATTTATNM_019610
4DIRAS3DIRAS family, GTP-binding RAS-like 3ARHI, NOEY2 1p31ImprintedPaternalCAGAAAAAAA* b BC005362
5BMP8BBone morphogenetic protein 8bOP2, BMP8, MGC131757 1p35–p32CandidatePaternalAGCAAAACTG*NM_001720
6FUCA1Fucosidase, alpha-L- 1, tissueFUCA1p36.11CandidatePaternalCTATTTAGTTNM_000147
7TP73TP73P73 1p36.3ImprintedMaternalTGGTACCGCCNM_001126240
8PRDM16PR domain containing 16MEL11p36.32CandidatePaternalAGATTGATATNM_022114
9PEX10Peroxisomal biogenesis factor 101p36.32CandidateMaternalGGAGGCGGCGNM_002617
10WDR8WD repeat domain 81p36.32CandidateMaternalTCGGTGCAGGNM_017818
11DVL1Dishevelled, dsh homolog 1 (Drosophila)DVL1p36.33CandidateMaternalGCCCGCAGGGNM_004421
12Q5EBL5Family with sequence similarity 132, member AFAM132A1p36.33CandidateMaternalGTTTCCAGGCNM_001014980
13TMEM52Transmembrane protein 521p36.33CandidatePaternalTTACACCGGCNM_178545
14HSPA6Heat shock 70 kDa protein 6 (HSP70B′)1q23.3CandidateMaternalTATGAATTTTNM_002155
15PTPN14Protein tyrosine phosphatase, nonreceptor type 14PEZ1q32.3CandidateMaternalACTTTTTCAA*BC017300
16HIST3H2BBHistone cluster 3, H2bb1q42.13CandidateMaternalAACTCCTTCG*#c NM_175055
17OBSCNObscurin, cytoskeletal calmodulin and titin-interacting RhoGEFKIAA1556, KIAA16391q42.13CandidatePaternalCTGAGCGCCG*NM_001098623
18Q8NGX0Olfactory receptor, family 11, subfamily L, member 1OR11L1 1q44CandidatePaternalAGAAGGAAAT*NM_001001959
19VAX2Ventral anterior homeobox 2DRES93 2p13.3CandidateMaternalGGCGATGGGGNM_012476
20OTX1Orthodenticle homeobox 12p15CandidateMaternalGCGGTTCCAGBC007621
21Q96PX6Coiled-coil domain containing 85ACCDC85A, KIAA19122p16.1CandidatePaternalGCAGATATTCRd NM_001080433
22ABCG8ATP-binding cassette, subfamily G (WHITE), member 82p21CandidateMaternalGGCTCCAAAANM_022437
23ZFP36L2Zinc finger protein 36, C3H type-like 2ERF2, TIS11D2p21CandidateMaternalTAGAAAGGCANM_006887
24CYP1B1Cytochrome P450, family 1, subfamily B, polypeptide 1P4501B12p22.2CandidatePaternalAATGCTTTTA*NM_000104
25RPL22Ribosomal protein L22EAP2q13CandidatePaternalGATGCTGCCA*CR456873
26TIGD1Tigger transposable element derived 1EEYORE2q37.1CandidatePaternalCGAAAAGCTTRBC063500
27MYEOV2Myeloma overexpressed 22q37.3CandidatePaternalCAGACTTTTT*AF487338
28FTHFD10-Formyltetrahydrofolate dehydrogenaseALDH1L1; DKFZp781N0997 3q21.3CandidateMaternalTCTGCATCTTBC027241
29ZIC1Zic family member 1 (odd-paired homolog, Drosophila)ZIC, ZNF2013q24CandidateMaternalATAATAGTGGNM_003412
30HES1Hairy and enhancer of split 1, (Drosophila)HHL, HRY, HES-1, bHLHb39, FLJ204083q29CandidatePaternalCACTATATTTNM_005524
31FGFRL1Fibroblast growth factor receptor-like 1FHFR, FGFR54p16.3CandidateMaternalAAAGTGCATCNM_001004358
32SPON2Spondin 2, extracellular matrix proteinDIL14p16.3CandidatePaternalTTATGGATCTNM_001128325
33Q9NYJ6Immunoglobulin superfamily, member 9IGSF9, 644ETD8, Dasm1, Kiaa1355-hp, NRT1, Ncaml, mKIAA13554q13.2CandidatePaternalTTACTGGCCCRBC030141
34NAP1L5Nucleosome assembly protein 1-like 5DRLM 4q22.1ImprintedPaternalTAGCTTTTAGNM_153757
35DUX2Double homeobox 24q35.2CandidatePaternalAAGGGGTGGANM_012147
36CDH18Cadherin 18, type 2CDH14, CDH24, CDH14L, EY-CADHERIN5p14.3CandidatePaternalATCGAAACTGNM_004934
37ADAMTS16ADAM metallopeptidase with thrombospondin type 1 motif, 16FLJ16731, ADAMTS16s5p15.32CandidateMaternalTACCCCTGAA*AK122980
38Q8TBP5Family with sequence similarity 174, member AFAM174A5q21.1CandidatePaternalACCCAGCGGG*NM_198507
39CSF2Colony-stimulating factor 2 (granulocyte-macrophage)GMCSF, MGC131935, MGC1388975q23.3CandidateMaternalGTGGGAGTGGBC108724
40BTNL2Butyrophilin-like 2 (MHC class II associated)SS2, BTLII, HSBLMHC1 6p21.32CandidateMaternalGAAGGAAAGANM_019602
41FAM50BFamily with sequence similarity 50, member BX5L, D6S2654E 6p25.2ImprintedPaternalCCTCAGTTTGBC001261
42C6orf117Chromosome 6 open-reading frame 117MRAP26q14.2CandidatePaternalGCAAGCTGTTNM_138409
43HYMAIHydatidiform mole associated and imprinted (nonprotein coding)NCRNA00020 6q24.2ImprintedPaternalTATATATTGABC059359
44PLAGL1Pleiomorphic adenoma gene-like 1ZAC, LOT1, ZAC1, MGC126275, MGC126276, DKFZp781P1017 6q24–q25ImprintedPaternalATCATAATGT*NM_001080951
45SLC22A2Solute carrier family 22 (organic cation transporter), member 2OCT2, MGC32628 6q26ImprintedMaternalAAAATTATAABC030978
46SLC22A3Solute carrier family 22 (extraneuronal monoamine transporter), member 3EMT, EMTH, OCT3 6q26–q27ImprintedMaternalTGCGCTAATCAF078749
47BRP44LBrain protein 44-likeCGI-129, dJ68L15.36q27CandidatePaternalCAGTGTATATBC000810
48DDCDopa decarboxylase (aromatic L-amino acid decarboxylase)AADC 7p12.2ImprintedIsoform DependentTGGCTAAATGNM_000790
49GRB10Growth factor receptor-bound protein 10RSS, IRBP, MEG1, GRB-IR, Grb-10, KIAA0207 7p12–p11.2ImprintedIsoform DependentTGCTTTGCTTNM_001001549
50GLI3GLI family zinc finger 3PHS, ACLS, GCPS, PAPA, PAPB, PAP-A, PAPA1, PPDIV7p14.1CandidateMaternalTAAATACATT*NM_000168
51EVX1Even-skipped homeobox 17p15.2CandidatePaternalACGCCCGTGGNM_001989
52HOXA5Homeobox A5HOX1C, HOX1.3, MGC9376 7p15.2CandidateMaternalAGCCTGTTTABC013682
53HOXA2Homeobox A2HOX1K7p15.2CandidateMaternalCATATTTTTT*NM_006735
54HOXA3Homeobox A3HOX1E, MGC101557p15.2CandidateMaternalCTCTTCCTCGRBC015180
55HOXA11Homeobox A11HOX1I7p15.2CandidateMaternalGAGATAGCCCBC040948
56HOXA4Homeobox A4HOX1D7p15.2CandidateMaternalTGCTAAGAATNM_002141
57TMEM60Transmembrane protein 60DC32, MGC74482, C7orf357q11.23CandidatePaternalAATCTATCCTNM_032936
58PEG10Paternally expressed 10Edr, HB-1, Mar2, MEF3L, Mart2, RGAG3, KIAA1051 7q21ImprintedPaternalGAAGTTATAANM_001040152
59MAGI2Membrane-associated guanylate kinase, WW and PDZ domain containing 2AIP1, SSCAM, KIAA07057q21.11CandidateMaternalTATTAATAGTBC150277
60PPP1R9AProtein phosphatase 1, regulatory (inhibitor) subunit 9ANRB1, NRBI, FLJ20068, KIAA1222, neurabin-I 7q21.3ImprintedMaternalGAAGAGACAANM_017650
61SGCESarcoglycan, epsilonESG, DYT11 7q21–q22ImprintedPaternalTTGGCAGTAT*NM_001099400
62TFPI2Tissue factor pathway inhibitor 2PP5, REF1, TFPI-2, FLJ21164 7q22ImprintedMaternalTGCTTTTAACNM_006528
63MESTMesoderm-specific transcript homolog (mouse)PEG1, MGC8703, MGC111102, DKFZp686L18234 7q32ImprintedPaternalCTGAATGTACNM_002402
64COPG2IT1COPG2 imprinted transcript 1 (nonprotein coding)CIT1, COPG2AS, FLJ41646, NCRNA00170, DKFZP761N09121 7q32ImprintedPaternalGAGGGATGGC*AF038190
65CPA4Carboxypeptidase A4CPA3 7q32ImprintedMaternalTCTGTAAATC*BC052289
66MESTIT1MEST intronic transcript 1 (nonprotein coding)PEG1-AS, NCRNA00040 7q32ImprintedPaternalTGTAGTGGTGNR_004382
67KLF14Kruppel-like factor 14BTEB5 7q32.3ImprintedMaternalTGGACTCTGGNM_138693
68SLC4A2Solute carrier family 4, anion exchanger, member 2 (erythrocyte membrane protein band 3-like 1)AE2, HKB3, BND3L, NBND3, EPB3L1 7q36.1CandidateMaternalCCCCTCCCTC*NM_003040
69FASTKFas-activated serine/threonine kinaseFAST7q36.1CandidateMaternalGGGGGTGGATNM_006712
70PURGPurine-rich element binding protein GPURG-A, PURG-B, MGC1192748p12CandidatePaternalCTGAACAAAGNM_001015508
71DLGAP2Discs, large (Drosophila) homolog-associated protein 2DAP2, SAPAP2 8p23ImprintedPaternalCCCCAGCCCC*NM_004745
72Q8N9I4FLJ37098 fis, clone BRACE20190048p23.1CandidatePaternalCTAAGCGCAGAK094417
73FAM77DFamily with sequence similarity 77, member DNKAIN3, FLJ396308q12.3CandidatePaternalGTGCCCTACCNM_173688
74GPTGlutamic-pyruvate transaminase (alanine aminotransferase)GPT1, AAT1, ALT18q24.3CandidateMaternalCCAAGTTCACNM_005309
75KCNK9Potassium channel, subfamily K, member 9KT3.2, TASK3, K2p9.1, TASK-3, MGC138268, MGC138270 8q24.3ImprintedMaternalCCAGGCACTC*AK090707
76LY6DLymphocyte antigen 6 complex, locus DE488q24.3CandidatePaternalGAGATAAATGBC031330
77APBA1Amyloid beta (A4) precursor protein-binding, family A, member 1X11, D9S411E, MINT1, LIN109q21.11CandidatePaternalTGTCTCCTTCNM_001163
78NM182505Chromosome 9 open-reading frame 85C9orf85, MGC61599, RP11-346E17.29q21.12CandidatePaternalTAAAAATAAANM_182505
79FAM75D1Family with sequence similarity 75, member D1FLJ463219q21.32CandidateMaternalCCCCACAGGANM_001001670
80ABCA1ATP-binding cassette, subfamily A (ABC1), member 1TGD, ABC1, CERP, ABC-1, HDLDT1, FLJ14958, MGC164864, MGC165011 9q31.1ImprintedUnknownATGGGGAGAG*AK024328
81LMX1BLIM homeobox transcription factor 1, betaNPS1, LMX1.2, MGC138325, MGC1420519q33.3CandidateMaternalGGAGCCCAGC*NM_002316
82EGFL7EGF-like-domain, multiple 7ZNEU1, MGC111117, VE-STATIN, RP11-251M1.29q34.3CandidatePaternalGCACAGGCCANM_016215
83PHPT1Phosphohistidine phosphatase 1PHP14, CGI-202, HSPC141, bA216L13.10, DKFZp564M173, RP11-216L13.10 9q34.3CandidateMaternalGCCTATGGTCNM_014172
84NM144654Chromosome 9 open-reading frame 116, transcript variant 2C9orf116, FLJ13945, MGC29761, RP11-426A6.49q34.3CandidatePaternalGGAAAGATGCNM_144654
85GATA3GATA binding protein 3HDR10p14CandidatePaternalAAGGATGCCA*BC003070
86Q9H6Z8FLJ21625 fis, clone COL0801510q23.31CandidatePaternalGCAGCAGCCTAK025278
87LDB1LIM domain binding 1CLIM2, NLI10q24.32CandidateMaternalTCCTGACCACNM_001113407
88INPP5F V2Inositol polyphosphate-5-phosphatase FSAC2, hSAC2, MSTP007, MSTPO47, FLJ13081, KIAA0966, MGC59773, MGC131851 10q26.11ImprintedPaternalAGATTGAGGCNR_003252
89C10orf93Chromosome 10 open-reading frame 93bB137A17.3, RP13-137A17.3 10q26.3CandidateMaternalAACAAAATTABC044661
90NKX6-2NK6 homeobox 2NK, NKX6B10q26.3CandidateMaternalACCGAGAGCC*NM_177400
91PAOXPolyamine oxidase (exo-N4-amino)PAO, DKFZp434J24510q26.3CandidateMaternalGAGACTCTGTNM_152911
92C10orf91Chromosome 10 open-reading frame 91bA432J24.4, RP11-432J24.410q26.3CandidateMaternalGGTTCTCAGCBC030794
93VENTX2VENT-like homeobox-2NA88A, HPX42B, VENTX210q26.3CandidateMaternalTGCTTTTAAAAF068006
94WT1-Alt transWilms tumor 1WT1, GUD, WAGR, WT33, WIT-2 11p13ImprintedPaternalCTGGTATATGBC032861
95KCNQ1OT1KCNQ1 overlapping transcript 1 (nonprotein coding)LIT1, KvDMR1, KCNQ10T1, KvLQT1-AS, long QT intronic transcript 1 11p15ImprintedPaternalAAATATTTACAF086011
96KCNQ1DNKCNQ1 downstream neighborBWRT, HSA404617 11p15.4ImprintedMaternalGGACCCCAAAAB039920
97OSBPL5Oxysterol binding protein-like 5ORP5, OBPH1, FLJ42929 11p15.4ImprintedMaternalGGGGATGGATNM_001144063
98PKP3Plakophilin 311p15.5CandidateMaternalAACAGTCAAANM_007183
99Q8N9U2FLJ36520 fis, clone TRACH200210011p15.5CandidateMaternalACAAGTATTCAK093839
100IFITM1Interferon-induced transmembrane protein 1 (9–27)IFI17, LEU13, CD22511p15.5CandidateMaternalACCATTGGATNM_003641
101PHLDA2Pleckstrin homology-like domain, family A, member 2IPL, BRW1C, BWR1C, HLDA2, TSSC3 11p15.5ImprintedMaternalAGCCCGCCGCNM_003311
102CDKN1CCyclin-dependent kinase inhibitor 1C (p57, Kip2)BWS, WBS, p57, BWCR, KIP2 11p15.5ImprintedMaternalCCCATCTAGCNM_000076
103SLC22A18Solute carrier family 22, member 18HET, ITM, BWR1A, IMPT1, TSSC5, ORCTL2, BWSCR1A, SLC22A1L, p45-BWR1A, DKFZp667A184 11p15.5ImprintedMaternalCTGGGCCTCT*NM_002555
104IGF2/INSInsulin/insulin-like growth factor 2 (somatomedin A)INSIGF, pp9974, C11orf43, FLJ22066, FLJ44734/ILPR, IRDN 11p15.5ImprintedPaternalCTTGGGTTTTBC011786
105IGF2ASInsulin-like growth factor 2 antisensePEG8, MGC168198 11p15.5ImprintedPaternalGAGGGCCGTTAB030733
106H19H19, imprinted maternally expressed transcript (nonprotein coding)ASM, BWS, ASM1, MGC4485, PRO2605, D11S813E 11p15.5ImprintedMaternalGCCACCCCCT*BC007513
107KCNQ1Potassium voltage-gated channel, KQT-like subfamily, member 1LQT, RWS, WRS, LQT1, SQT2, ATFB1, ATFB3, JLNS1, KCNA8, KCNA9, Kv1.9, Kv7.1, KVLQT1, FLJ26167 11p15.5ImprintedMaternalGGCAGGAGACBC017074
108B4GALNT4Beta-1,4-N-acetyl-galactosaminyl transferase 4FLJ25045 11p15.5CandidateMaternalTGGAGCGTCCNM_178537
109RAB1BRAB1B, member RAS oncogene family11q13.2CandidateMaternalTCAGGCATTTBC071169
110KBTBD3Kelch repeat and BTB (POZ) domain containing 3BKLHD3, FLJ3068511q22.3CandidatePaternalAAACTACAAAAK092993
111NTRINeurotrimin NTM, HNT, IGLON2, MGC6032911q25CandidatePaternalTCCCTCTTCARNM_016522
112ABCC9ATP-binding cassette, subfamily C (CFTR/MRP), member 9SUR2, ABC37, CMD1O, FLJ3685212p12.1CandidateMaternalTGTCTTTAAA*BX537513
113RBP5Retinol binding protein 5, cellularCRBP3, CRBPIII, CRBP-III 12p13.31ImprintedMaternalCTTCCTGTTA*AK096947
114HOXC4Homeobox C4HOX3E, CP1912q13.13CandidateMaternalGTACCTGCTGNM_153633
115HOXC9Homeobox C9HOX3B12q13.13CandidateMaternalTACGGCTCGCBC032769
116SLC26A10Solute carrier family 26, member 1012q13.3CandidateMaternalACCCTTGAACNM_133489
117CDK4Cyclin-dependent kinase 4PSK-J3, CMM312q14.1CandidateMaternalGAAGGAAGAA*BC005864
118Q96AV8E2F transcription factor 7E2F7, FLJ1298112q21.2CandidateMaternalTAAACTGATTBC016658
119Q9HCM7Fibrosin-1-like proteinFBRSL1, AUTS2L, KIAA1545, XTP912q24.33CandidateMaternalTCAATCAGTGNM_001142641
120Q8N7V5Proline-rich 20APRR20A, FLJ4029613q21.1CandidateMaternalACTCACTGGA*NM_198441
121FAM70BFamily with sequence similarity 70, member B13q34CandidateMaternalGTGCCTCTGTNM_182614
122FOXG1CForkhead box G1HFK314q12CandidatePaternalGAACTATATGBC050072
123PLEKHC1Fermitin family (Drosophila) homolog 2FERMT2, MIG2, UNC112, KIND214q22.1CandidatePaternalGTTCAAAGACNM_001134999
124DLK1Delta-like 1 homolog (Drosophila)DLK, FA1, ZOG, pG2, PREF1, Pref-1 14q32ImprintedPaternalATACAGAATA*BC013197
125MEG3Maternally expressed 3 (nonprotein coding)GTL2, FP504, prebp1, PRO0518, PRO2160, FLJ31163, FLJ42589 14q32ImprintedMaternalTGGGAAGTGGAB032607
126RTL1Retrotransposon-like 1PEG1114q32.31CandidateMaternalACGGCCTGCANM_001134888
127ATP10AATPase, class V, type 10AATPVA, ATPVC, ATP10C, KIAA0566 15q11.2ImprintedMaternalGCCCCCAGAGBC052251
128PWCR1Prader-Willi syndrome chromosome region 1PET1, noncoding RNA in the Prader-Willi critical region 15q11.2ImprintedPaternalTTGGTGAGGGAF241255
129NDNNecdin homolog (mouse)HsT16328 15q11.2–q12ImprintedPaternalACCTTGCTGGBC008750
130SNURF/ SNRPNSNRPN upstream reading frame/small nuclear ribonucleoprotein polypeptide NSMN, PWCR, SM-D, RT-LI, HCERN3, SNRNP-N, FLJ33569, FLJ36996, FLJ39265, MGC29886, SNURF-SNRPN, DKFZp762N022, DKFZp686C0927, DKFZp761I1912, DKFZp686M12165 15q11.2–q12ImprintedPaternalCCGCCTCCGGBC000611
131MAGEL2MAGE-like 2nM15, NDNL1 15q11–q12ImprintedPaternalTAGCATTGTABC035839
132MKRN3Makorin ring finger protein 3D15S9, RNF63, ZFP127, ZNF127, MGC88288 15q11–q13ImprintedPaternalAAATAATTTANM_005664
133UBE3AUbiquitin protein ligase E3AAS, ANCR, E6-AP, HPVE6A, EPVE6AP, FLJ26981 15q11–q13ImprintedMaternalCTGTAAAACABC002582
134Q9P168PRO236915q13.1CandidatePaternalAGAACTCCACAF119879
135SOX8SRY (sex-determining region Y)-box 816p13.3CandidatePaternalCAGCGTCTCCBC031797
136SALL1Sal-like 1 (Drosophila)HSAL116q12.1CandidateMaternalACATTTCTAGRBC113881
137C16orf57Chromosome 16 open-reading frame 5716q13CandidateMaternalGGATTTTAATBC004415
138ACDAdrenocortical dysplasia homolog (mouse)PTOP, PIP1, TINT1, TPP116q22.1CandidateMaternalCGGCAAAAAABC016904
139FOXF1Forkhead box F1FKHL5, FREAC1, ACDMPV16q24.1CandidateMaternalTTCCTCCTCT*BC089442
140ANKRD11Ankyrin repeat domain 11T13, LZ16, ANCO-1 16q24.3ImprintedMaternalAAAGCTGACABC058001
141Q8N206FLJ36443 fis, clone THYMU2012891FLJ36443 fis16q24.3CandidateMaternalACATTCAGAAAK093762
142TMEM88Transmembrane protein 88FLJ2002517p13.1CandidateMaternalCTGGGCTTCGNM_203411
143PYY2Peptide YY, 2 (seminal plasmin)17q11.2CandidatePaternalTTCACTCCCGAF222904
144HOXB3Homeobox B3HOX2G17q21.32CandidateMaternalAACTCAGCTCNM_002146
145HOXB2Homeobox B2HOX2H17q21.32CandidateMaternalAAGCACAAGCNM_002145
146Q8N8L1FLJ39287 fis, clone OCBBF2011897LOC10013117017q25.3CandidatePaternalGGGTCTGAGGAK096606
147FAM59AFamily with sequence similarity 59, member AGAREM, Gm944, C18orf11 18q12.1CandidatePaternalTGCAGAGAAANM_022751
148BRUNOL4Bruno-like 4CELF418q12.2CandidateMaternalGCTGTTCTTGNM_001025087
149TCEB3CTranscription elongation factor B polypeptide 3C (elongin A3)HsT829, TCEB3L2, elongin A3 18q21.1ImprintedMaternalACCTCCCAGG*NM_145653
150Q8NE65Zinc finger protein 738ZNF73819p13.11CandidatePaternalTTGGTCAGGCRBC034499
151Q8NB05FLJ34424 fis, clone HHDPC200827919p13.2CandidatePaternalTGCTCGGGAAAK091743
152PPAP2CPhosphatidic acid phosphatase type 2CPAP2C, LPP219p13.3CandidateMaternalGTGTTCTTGGNM_003712
153TSH3Teashirt zinc finger homeobox 3TSHZ3, ZNF537, FLJ54422, KIAA147419q12CandidatePaternalTTCTTATTTT*AK291466
154CHST8Carbohydrate (N-acetylgalactosamine 4-0) sulfotransferase 8GalNAc4ST1, GalNAc4ST19q13.11CandidateMaternalGTTTCCAGAG*NM_001127895
155ZNF225Zinc finger protein 225MGC119735 19q13.31CandidatePaternalTGGTATGTATNM_013362
156ZNF229Zinc finger protein 229FLJ3422219q13.31CandidateMaternalTTGTAACCTCNM_014518
157ZNF264Zinc finger protein 264ZFP264 19q13.4ImprintedMaternalGCTTCAGTGGNM_003417
158ZIM2/PEG3ZIM2 zinc finger, imprinted 2/Paternally expressed 3ZNF656/PW1, ZSCAN24, KIAA0287, DKFZp781A09519q13.4ImprintedPaternalTTTTCACCATBC037330
159LILRB4Leukocyte immunoglobulin-like receptor, subfamily B (with TM and ITIM domains), member 4LIR5, ILT3, HM18, CD85K19q13.42CandidateMaternalGGAAAATGGG*NM_001081438
160ZNF550Zinc finger protein 55019q13.43CandidateMaternalAGAAATGTAC*AK122867
161CHMP2AChromatin-modifying protein 2AVPS2A, VPS2, BC219q13.43CandidateMaternalGGTGATGAGG*NM_014453
162ZNF42Zinc finger protein 42MZF1, MZF1B, ZFP98, ZSCAN619q13.43CandidateMaternalGTCAGAACAC*NM_003422
163ISM1Isthmin 1 homolog (zebrafish)C20orf8220p12.1CandidatePaternalAATATTATCANM_080826
164NNATNeuronatinPEG5, MGC1439 20q11.2–q12ImprintedPaternalCAGTTGTGGTNM_005386
165BLCAPBladder cancer-associated proteinBC10 20q11.2–q12ImprintedIsoform DependentCCTGTCCTTTNM_006698
166L3MBTLL(3)mbt-like (Drosophila)L3MBTL1, FLJ41181, KIAA0681, H-L(3)MBT, dJ138B7.3, DKFZp586P1522 20q13.12ImprintedPaternalTGTGTATGTG*AB014581
167GNASGNAS complex locusAHO, GSA, GSP, POH, GPSA, NESP, GNAS1, PHP1A, PHP1B, C20orf45, MGC33735, dJ309F20.1.1, dJ806M20.3.3 20q13.3ImprintedIsoform DependentATTAACAAAGNM_000516
168GNASASGNAS antisense RNA 1 (nonprotein coding)SANG, NESPAS, GNAS1AS, NCRNA00075 20q13.32ImprintedPaternalTCCATTAGAAAJ251759
169COL9A3Collagen, type IX, alpha 3IDD, MED, EDM3, FLJ90759, DJ885L7.4.120q13.33CandidateMaternalAAGGAGCGGG*BC011705
170C20orf20Chromosome 20 open-reading frame 20Eaf7, MRGBP, URCC4, MRG15BP, FLJ1091420q13.33CandidateMaternalACCTCACTCTBC009889
171SIM2Single-minded homolog 2 (Drosophila)SIM, bHLHe15, MGC119447 21q22.13CandidatePaternalAAGGAAGATT*NM_005069
172DGCR6DiGeorge syndrome critical region gene 622q11.21CandidatePaternalCAGAAGAGGC*NM_005675
173FLJ20464Hypothetical protein FLJ2046422q12.2CandidatePaternalCGTGAAATTCCR456348

SAGE tags annotated for NlaIII anchoring enzyme.

aEntries are sorted according to the established gene location.

b∗: tag maps to other gene(s) according to CGAP (Cancer Genome Anatomy Project, NCI, NIH) SAGE Anatomic Viewer.

c#: highly repetitive tag according to CGAP SAGE Anatomic Viewer.

dR: unreliable/internal tag suggested by CGAP SAGE Anatomic Viewer is replaced with reliable 3′ end tag.

The complete set of publicly available human SAGE catalogues was downloaded from the Gene Expression Omnibus (GEO, NCBI) database. Acquired SAGE catalogues represent 581 SAGE libraries generated from a wide spectrum of cell and tissue samples in a variety of physiological and pathological conditions. Following an exclusion of the numerous duplicate GEO database entries (e.g., GSM785 = GSM383907; GSM1515 = GSM383958; GSM85612 = GSM125353, etc.), the criteria listed below were applied when selecting libraries for the analysis of gene expression. SAGE libraries were selected only if they have represented (i) genetically unaltered/unmodified samples, (ii) SAGE catalogues with a total number of tags ≥20,000, and (iii) a complete dataset available. For example, samples GSM383929 and GSM180669 were excluded since these did not satisfy criteria (i), representing ovary surface epithelium immortalized with SV40 and lymphocytes from Down syndrome children, respectively; samples GSM384024 (white blood cells, CD45+, isolated from a mammary gland carcinoma; 18,741 tags) and GSM1128 (breast cancer cell line; tags detected once are not available) were excluded as not satisfying criteria(ii) and (iii), respectively (Supplementary Table 1). Due to the conservative nature of the criteria listed above, a total number of SAGE catalogues satisfying these and thus selected for further analysis (i.e., to the extraction of tags matching imprinted genes) was reduced to 492. Together, these 492 SAGE catalogues representing human samples account for 35.97 million SAGE tags constructed using NlaIII-anchoring enzyme. The catalogues were assigned into one of the following Clusters: C (cancer tissue; 185 SAGE catalogues), N (normal tissue and cells; 166 SAGE catalogues), IV (cells cultured in vitro; 112 SAGE catalogues), or D (nontumorous disease tissue and cells; 29 SAGE catalogues) (Table 2, and Supplementary Table 1).
Table 2

Summary of SAGE catalogs analyzed.

ClustersNumber of SAGE catalogsa Number of SAGE tagsb
C (cancer tissue)18513,165,432
N (normal tissue and cells)16612,953,131
IV (cells cultured in vitro)1128,009,673
D (nontumorous disease tissue and cells)291,840,291

Total49235,968,527

All SAGE catalogs screened belong to GPL4 Gene Expression Omnibus database (GEO, NCBI) platform (Homo sapiens; NlaIII-anchoring enzyme).

aSAGE catalogs selected for analysis (see Supplementary Table 1 available online at doi:10.1155/2012/793506).

bNumber of tags subjected for analysis (with (A)10 tags excluded).

Figure 1 shows a distribution of the analyzed established and candidate imprinted genes through the human genome. Primary analysis of the normalized expression profiles of the imprinted genes demonstrated a great variability in the cumulative gene expression for 173 genes (Figure 2, Table 3, and Supplementary Table 2). Average cumulative gene expression of those genes in human tissues and cells was 0.90% of the total gene expression: specifically, 0.95% for both cancer and normal tissue and cells (clusters C and N, resp.), 0.77% for cells cultured in vitro (cluster IV), and 0.83% for nontumorous disease tissue and cells (cluster D). In the pool of the assessed SAGE catalogues, it ranged from 0.08% (total blood, GSM389907 [21]) to 4.36% of the total gene expression (bronchial epithelium, GSM125353 [22]). Of 492 human SAGE catalogues tested, the cumulative expression of the imprinted genes constituted >2% of the total gene expression in 21 and <0.2% in 7 catalogues. The SAGE libraries with 10% most and 10% least cumulative and average expression of established and candidate imprinted gene subsets are listed in Table 3.
Figure 1

A schematic representation of the analyzed established (53, filled arrowheads) and candidate (120, empty arrowheads) imprinted genes distribution through the human genome. Chromosome layout is via NCBI (Build 37.2). Numbers next to some of the arrowheads indicate the number of entries per locus.

Figure 2

Histogram of average ((a), (b), and (c)) and maximum ((d), (e), and (f)) tag per million (tpm) values of the pool of imprinted genes and gene candidates for the normalized SAGE catalogues: cancer tissue ((a), (d); 185 catalogues); normal tissues and cells ((b), (e); 166 catalogues); cells cultured in vitro ((c), (f); 112 catalogues). Corresponding histogram pairs are built following a sorting by the maximum value in the pool.

Table 3

The SAGE libraries with 10% most and 10% least cumulative and average expression of established and candidate imprinted genes subsets.

IDa Primary IDb SAGE librarySampleClusterc Sumd Averagee Maxf
Top 10% libraries

76301GSM125353Bronchial brushings, former smokerN43,563.92251.8140,054.67
46GSM574Central retina (macula)N33,159.24191.6726,692.01
142427GSM383793Mammary gland, ductal carcinoma in situ C29,781.50172.1525,275.20
29104GSM1730Breast, ductal carcinoma in situ C29,575.98170.9625,125.63
75300GSM125352Bronchial brushings, former smokerN29,184.64168.7024,955.09
145430GSM383797Mammary gland, ductal carcinomaC27,222.30157.3522,422.27
99346GSM194651Oral biopsyN27,066.16156.4521,312.29
90273GSM112808Neuroblastoma, primary tumor, stage 4SC25,944.14149.9717,365.83
55155GSM14753Breast carcinoma metastasis to lungC23,941.22138.3920,247.08
143428GSM383794Mammary gland, ductal carcinoma in situ C23,570.83136.2518,977.90
30105GSM1731Breast, ductal carcinoma in situ C23,346.84134.9518,746.14
167507GSM383893Gallbladder tubular adenocarcinomaC23,262.07134.4620,154.48
27561GSM384016Vascular endothelium, hemangioma, benign hyperplasiaD22,688.19131.1512,514.88
91274GSM112809Neuroblastoma, primary tumor, stage 4SC22,633.99130.8311,595.94
28101GSM1516Hemangioma tumorC22,622.41130.7712,500.82
100347GSM194652Oral biopsyN21,279.52123.0011,932.33
146440GSM383807Mammary gland, ductal carcinoma in situ C20,570.15118.9016,071.37
62433GSM383800Breast carcinoma cell lineIV20,467.40118.3112,710.80
140425GSM383790Mammary gland, ductal carcinomaC20,417.00118.0215,536.62
144537GSM383946Whole body, fetalN20,274.33117.199,698.99
149443GSM383812Mammary gland, ductal carcinomaC20,036.49115.8213,377.93
521GSM688Breast, ductal carcinoma in situ C19,968.16115.4215,538.48
118416GSM383775Cortex, pooled sampleN19,926.52115.1813,701.49
54263GSM85616Bronchial epitheliumN19,665.60113.6715,918.74
53262GSM85611Bronchial epitheliumN19,653.98113.6115,655.95
17340GSM194386Metaplastic bronchial epitheliumD19,376.31112.0014,577.13
81306GSM125358Bronchial brushings, never smokerN19,369.05111.9614,172.30
137509GSM383895GallbladderN19,320.00111.6815,346.14
66437GSM383804Breast carcinoma cell lineIV19,247.28111.269,389.55
31106GSM1733Mammary gland, ductal invasive in situ carcinomaC19,126.03110.5614,594.68
92276GSM112812Neuroblastoma, primary tumor, stage 4C18,914.78109.3312,349.04
35GSM573Peripheral retinaN18,817.72108.7712,293.99
24GSM572Peripheral retinaN18,781.47108.568,727.76
96331GSM194377Nonsmall cell lung cancer: squamous cell carcinoma in situ C18,122.08104.7514,601.68
71181GSM14781Brain desmoplastic medulloblastomaC18,106.48104.6611,244.70
68439GSM383806Breast carcinoma cell lineIV17,991.89104.008,886.81
66291GSM125343Bronchial brushings, former smokerN17,662.13102.0912,756.35
60285GSM125337Bronchial brushings, current smokerN17,598.56101.7312,699.28
121387GSM383710EpendymomaC17,524.27101.3010,222.49
51260GSM82458Hippocampus N17,289.5299.948,268.90
72297GSM125349Bronchial brushings, former smokerN17,225.1699.5713,193.26
109361GSM296391Lung biopsyN17,223.6799.5613,839.55
62287GSM125339Bronchial brushings, current smokerN16,871.1397.5210,737.09
98333GSM194379Nonsmall cell lung cancer: squamous cell carcinoma in situ C16,572.8195.8012,713.33
74299GSM125351Bronchial brushings, former smokerN16,456.5795.1211,137.34
70295GSM125347Bronchial brushings, former smokerN16,306.8594.2611,835.62
7216GSM37212Adrenal cortex affected by primary pigmented nodular adrenocortical diseaseD16,221.4693.774,205.56
59284GSM125336Bronchial brushings, current smokerN16,090.3393.0111,242.21
94329GSM194375Nonsmall cell lung cancer: squamous cell carcinoma in situ C15,715.0490.8411,679.51

Bottom 10% libraries

83488GSM383868Colon carcinoma, cell lineIV4,179.3124.16422.91
38180GSM14780Gastric epithelial tissue from the antrumN4,169.9524.101,615.39
84204GSM14807Lung, poorly differentiated adenocarcinoma with lymphoplasmacytic infiltrationC4,146.1823.97460.69
181553GSM383998Gastroesophageal junction adenocarcinomaC4,141.0823.941,351.60
17GSM668Kidney, embryonic cell line 293, uninduced cellsIV4,119.0123.81920.45
4120GSM3244AIDS-KS lesionD4,107.8523.741,245.47
25522GSM383914Lung, tumor associated (focal fibrosis and chronic inflammation)D4,103.6223.72606.45
30121GSM3245CD4+ T cellsN4,088.7423.63978.17
44137GSM14734Medulloblastoma, cerebellumC4,081.9823.60804.90
56162GSM14760Stomach, poorly differentiated carcinomaC4,081.1623.591,340.95
54153GSM14751Skin, melanomaC4,072.2223.542,497.12
80485GSM383865Colon carcinoma, cell lineIV4,048.0223.40404.80
130404GSM383753MedulloblastomaC3,987.1623.051,069.73
82487GSM383867Colon carcinoma, cell lineIV3,961.3222.90506.24
1271GSM747Colon, cancer cell lineIV3,945.4222.81673.61
166506GSM383892Gallbladder adenocarcinomaC3,909.9522.60651.66
115372GSM311354CD15+ myeloid progenitor cellsN3,859.2722.311,108.11
52261GSM82459SpermatozoaN3,814.9022.051,288.15
50259GSM82243Spermatozoa, pooled sampleN3,811.1622.031,429.18
94328GSM180670Lymphocytes from children 1–4 years old (pooled samples)N3,794.2721.93843.17
81486GSM383866Colon carcinoma, cell lineIV3,746.1221.65378.40
39183GSM14784Bone marrowN3,740.5821.62962.65
88318GSM136195Cord blood-derived activated Th1 cellsN3,668.8121.21777.29
20423GSM383787Breast stroma, ductal carcinoma in situ associatedD3,663.7821.18334.67
38234GSM66698HL-60 cellsIV3,661.8121.17920.25
168508GSM383894Gallbladder tubular adenocarcinomaC3,643.4021.06624.22
28217GSM37337Primary bronchial epithelial cellsIV3,604.3020.83804.84
177546GSM383970RetinoblastomaC3,561.8220.59384.45
71475GSM383852Cartilage chondrosarcoma cell lineIV3,502.9420.25515.14
38127GSM7800Primary gastric cancer, poorly differentiated (scirrhous type)C3,413.2719.73636.37
130466GSM383840Mammary myoepithelium, CD10+ cellsN3,364.8519.45291.33
39235GSM66712HL-60 cells exposed to 2.45 GHz radiofrequency for 2 hIV3,293.7719.04982.36
97344GSM194649Oral brushingN3,290.7219.02996.85
173523GSM383915Lymph node, B-cell lymphomaC3,166.9718.31555.61
140514GSM383902LeukocytesN2,991.6917.29439.34
65436GSM383803Breast carcinoma cell lineIV2,988.1217.27597.62
63434GSM383801Breast carcinoma cell lineIV2,977.7117.21297.77
2184GSM784Gastric epithelial tissuesN2,970.0617.17673.21
151554GSM384002StomachN2,813.8316.26654.38
170515GSM383903Liver cholangiocarcinoma metastasisC2,736.3215.821,575.46
29578GSM389908Total blood after EPO treatment, pooled sampleD2,684.9015.52747.69
68175GSM14775Skin, primary malignant melanomaC2,612.9915.10653.25
40236GSM66714HL-60 cells exposed to 2.45 GHz radiofrequency for 6 hIV1,879.8210.87639.52
143533GSM383937PancreasN1,813.1110.48278.94
83308GSM135389Skeletal muscle, 5 days training young menN1,673.619.67435.14
86311GSM135392Skeletal muscle, detraining young menN1,670.439.66510.41
165576GSM389906Total blood, pooled sampleN1,511.128.73436.55
82307GSM135388Skeletal muscle, pretraining young menN1,091.726.31327.52
28577GSM389907Total blood during EPO treatment, pooled sampleD811.754.69162.35

Indexes (GSM numbers) represent GEO database accession numbers for SAGE libraries (one accession number selected for redundant entries).

aID: listing within each cluster (see Supplementary Table 2).

bPrimary ID: listing within a full dataset (see Supplementary Table 1).

cClusters: C: cancer tissue; N: normal tissue and cells; IV: cells cultured in vitro; D: nontumorous disease tissue and cells.

dSum: cumulative (total) tag per million (tpm) value for SAGE tags matching established and candidate imprinted genes within the SAGE library.

eAverage: tpm value for SAGE tags matching established and candidate imprinted genes within the SAGE library.

fMax: maximum tpm value for SAGE tags matching established and candidate imprinted genes within the SAGE library.

Particular sum: average and maximum values could be recalculated to the fraction of the total gene expression by dividing tpm value to 1,000,000.

Entries are sorted according to the cumulative (total) tpm value.

In some samples, a major fraction of the cumulative expression of the imprinted genes was established by only a few highly abundant transcripts. For example, in the GSM125353 SAGE catalogue already mentioned above, 91.9% of the cumulative (total) gene expression of the assayed imprinted genes is represented by the single gene, namely, PTPN14 (ACTTTTTCAA tag). Similarly, in GSM383893 SAGE catalogue (gallbladder tubular adenocarcinoma [17, 23]), the same gene constitutes 86.6% of the cumulative (total) gene expression of the assayed imprinted genes. In many other SAGE catalogues, expression profile of the assayed imprinted genes was rather more balanced. For example, in the GSM383840 SAGE catalogue (mammary myoepithelium, CD10+ cells [24]), PTPN14 constitutes just 8.7% of the cumulative (total) gene expression of the assayed imprinted genes, equal to GNAS gene (ATTAACAAAG tag). Some imprinted genes were expressed almost ubiquitously through the samples: for example, genes NDUFA4, RPL22, Q8NE65, PTPN14, GNAS, and RAB1B (Supplementary Table 3). Notably, in other cases, expression of the particular imprinted genes either was not detected at all in all 492 SAGE catalogues screened (EVX1, ACGCCCGTGG tag), or was detected only occasionally (Supplementary Table 3). For example, gene DUX2 (AAGGGGTGGA tag) expression was detected only 3 times (on a minimum level) in 492 SAGE catalogues representing cell and tissue samples in a variety of physiological and pathological conditions: namely, in GSM383692 SAGE catalogue (astrocytoma grade II [25]), GSM383867 SAGE catalogue (colon carcinoma cell line [17, 23]), and GSM383928 SAGE catalogue (ovary preneoplasia cell line [26]). Similarly rare was the expression of FAM75D1 (detected only 3 times altogether), FAM77D, ISM1, FLJ20464, and Q8NB05 (detected only 5 times, in all cases on a minimum level). To assess variation in the expression of individual imprinted genes in the samples, the clustering analysis of the normalized expression profiles was performed using EPCLUST (Expression Profile data CLUSTering and analysis) software. For each dataset, the number of clusters was set to the lowest value yielding one cluster containing a solitary database entry; 5 for cancer tissue, 6 for normal tissues and cells, 5 for cells cultured in vitro, and 2 for nontumorous disease tissue and cells (Figures 3–7). Notable diversity was observed in the transcription profiles represented by the individual clusters, with relatively high expression levels characteristic for just 1-2 or a higher number of the individual imprinted genes (Figures 4(a), 4(b)–7(a), and 7(b)). Expectedly, in a few cases samples generated from the same tissues/cell types did fell into the same compact cluster of the distinct pattern (Figure 3, Figures 4(a), 4(b)–7(a), and 7(b)). However, in many other cases imprinted gene expression profiles of the same/similar tissue or cell types fall into different clusters. Similarly, though in many cases imprinted gene expression profiles of the same/similar tissue or cell types fell into the closely matching area of the hierarchical tree built for the individual datasets (clusters C, N, IV, and D) (Figures 4(c), 5(c), 5(d), 6(c), and 7(c)), in other cases notable variability was observed in the distribution of imprinted gene expression profiles of the same/similar tissue or cell types. For example, at K-mean clustering, small-size cluster 3 in cancer tissue dataset (3 entries) is composed entirely of neuroblastoma samples (Figures 4(a) and 4(b)); however, other entries representing tumors of the same histological properties [27] fell into cluster 1 (composed of 141 entries in total). Cluster 4 in the same dataset (12 entries) is composed entirely of carcinoma samples, while cluster 2 (28 entries) is composed of carcinoma samples predominantly (19 entries), with other samples representing astrocytoma (3 entries), glioblastoma multiforme (2 entries), cystadenoma (1 entry), rhabdosarcoma (1 entry), and unclassified breast cancer (2 entries). Similarly, only one cluster in the normal tissue and cell dataset has a homogenous composition (cluster 5, 2 entries), matching both available SAGE libraries constructed from placenta (GSM14849, also designated GSM383945; GSM14750, also designated GSM383947 [17]) (Figure 3), with all other clusters composed of the samples of diverse origins. Illustratively, this particular cluster brakes down (i.e., cluster content get redistributed to the clusters of the smaller size) only if the number of K-mean clusters for the dataset is increased from the set value of 6 to 26, while some other clusters break down more readily. In the hierarchical trees, most densely packed areas (representing most similar transcription profiles) are generally composed of the samples of the same/similar tissue or cell types. For example, one of the densest areas in four hierarchical trees built is composed of 19 samples matching bronchial brushings (Figures 5(c) and 5(d)) [22], with all 5 other samples of the same origin falling into the nearest vicinity within the hierarchical tree (Figure 5(c)). At the same time, some SAGE libraries representing the samples of the identical origin fell into the separate K-mean clusters and into well-separated areas of the hierarchical tree. This was observed, for example, for 3 available peripheral retina samples, from which GSM572 and GSM573 [28] fell into cluster 3, and GSM383968 [29] fells into cluster 1 (Figure 4).
Figure 3

An example of gene expression pattern recognized by K-mean clustering analysis (normal tissue and cells, cluster 5). Graph line (a) and cluster contents (b). Vertical bars denote individual genes. Exponential shades of grey code (5 colors) are based on the normalized tpm values. GSM14749: first trimester placenta; GSM14750: full-term placenta. Imprinted genes with peak expression values in the cluster are indicated. PTPN14: protein tyrosine phosphatase, nonreceptor type 14; TFPI2: tissue factor pathway inhibitor 2; DLK1: delta-like 1 homolog (Drosophila).

Figure 7

Gene expression patterns recognized by K-mean and hierarchical clustering analysis (nontumorous disease tissue and cells, 29 SAGE catalogues). (a) K-mean clustering analysis, graph lines and (b) cluster contents; vertical bars denote individual genes. (c) Hierarchical cluster tree. Exponential shades of red code (15 colors) are based on the normalized tpm values.

Figure 4

Gene expression patterns recognized by K-mean and hierarchical clustering analysis (cancer tissue, 185 SAGE catalogues). K-mean clustering analysis, (a) graph lines and (b) cluster contents; vertical bars denote individual genes. (c) Hierarchical cluster tree. Exponential shades of red code (15 colors) are based on the normalized tpm values.

Figure 5

Gene expression patterns recognized by K-mean and hierarchical clustering analysis (normal tissue and cells, 166 SAGE catalogues). K-mean clustering analysis, (a) graph lines and (b) cluster contents; vertical bars denote individual genes. Arrowheads point out 3 SAGE libraries generated from peripheral retinal samples. (c) Hierarchical cluster tree, fragment (d) enlarged is highlighted. Exponential shades of red code (15 colors) are based on the normalized tpm values.

Figure 6

Gene expression patterns recognized by K-mean and hierarchical clustering analysis (cells cultured in vitro, 112 SAGE catalogues). (a) K-mean clustering analysis, graph-lines and (b) cluster contents; vertical bars denote individual genes. (c) Hierarchical cluster tree. Arrowheads point out 13 SAGE libraries generated from undifferentiated embryonic stem cells (ESC). Exponential shades of red code (15 colors) are based on the normalized tpm values.

4. Discussion

Mechanism of genomic imprinting plays important, yet not fully understood role in many physiological processes: in particular, in the control of growth and development. Since the identification of the first imprinted genes (IGF2, IGF2R, and H19) in mouse in 1991, a large volume of information has been accumulated on the identity and biological function of imprinted genes both for Homo sapiens and animal species (Mus musculus in particular). Over the course of the decade, we witness an expansion of the list of the established imprinted genes [6, 30]. It is most probable that novel candidate imprinted genes will be identified in the future, and features of the imprinted genes will be confirmed for some candidates. In the current study, a comprehensive list of the human imprinted genes and high-confidence gene candidates (203 entries total) became a subject for a large-scale in silico gene expression profiling. Available nucleotide sequences (174 genes and gene candidates) have been utilized for the extraction of the appropriate short SAGE tags matching NlaIII anchoring enzyme, most common in generating SAGE libraries. Notably, candidate imprinted gene Q9NYI9 (PPARL) did not bear NlaIII recognition sites. This limitation of the conventional SAGE protocol can generally be overcome by using an alternative anchoring enzyme [16]. However, gene Q9NYI9 does not bear recognition sites for anchoring enzymes Sau3AI and RsaI (second and third most common in generating SAGE libraries) as well, though it bears one for MmeI utilized in LongSAGE protocol. Taken together, not 174 but 173 genes (missing Q9NYI9 (PPARL))—including 53 established imprinted genes and 120 candidate imprinted genes—were annotated with the appropriate SAGE tags. The latter was matched the pool of 492 normalized SAGE catalogues representing libraries derived from human samples, constructed using NlaIII anchoring enzyme and together accounting for 35.97 million SAGE tags. Collectively, these catalogues represent a comprehensive assay of tissues and cell types in physiological and a variety of pathological conditions. Gene expression of imprinted genes was assessed in the normalized SAGE catalogues representing the transcriptomes of these samples, according to the straightforward algorithm of in silico analysis. As with nearly any other gene, expression of imprinted genes is not a constant, but rather a dynamic function of cell type and state. In the current study, a great variability was observed in both cumulative/total expression of the studied imprinted genes and that of the individual genes. The cumulative expression of 173 studied imprinted genes ranges from 0.08% (total blood) to 4.36% (bronchial epithelium) of the total gene expression (Table 3). In some samples (Table 3 and Supplementary Table 2), imprinted genes-associated proportion of the transcriptome is obviously above what is to be expected from such a limited group of genes, clearly reflecting the importance of the biological roles played by the latter. At the same time, overall expression of the imprinted genes was equal in the clusters of cancer tissues and normal tissue and cells (clusters C and N, 0.95% for both clusters) and lower for the cells cultured in vitro (cluster IV, 0.77%). The current study apparently represents the first attempt to estimate an impact of imprinted genes on the total volume of the transcriptome. Obvious biases affect an accuracy of the algorithm applied, suggesting both underestimation (probable existence of yet unidentified imprinted genes, unavailable information on gene structure for some imprinted genes, absence of anchoring enzyme recognition sites for at least one gene) and overestimation (unconfirmed imprinting status of some of the candidate imprinted genes, SAGE tags matching more than one gene; see Table 1) of the relative size of the imprinted transcriptome. Despite this, provided data on the estimated cumulative/total expression of the known imprinted genes (their number well corresponding to the predicted number of imprinted genes in human genome [31, 32]) in a variety of tissues and cells is most interesting. Until now, little information was available on the overall expression of imprinted genes in the cells of different types. It is generally believed that many imprinted genes are highly expressed in the developing and adult brain tissue [33], placenta [34], and undifferentiated stem cells [35]. Discrete studies identify certain highly expressed imprinted genes as the potential biomarkers of cancer subtypes [36, 37]. In contrast, imprinted genes are known to be expressed on relatively low level in adult blood cells [38]. This information is supported by the observed values of the cumulative expression of the imprinted genes through the screened samples (Table 3 and Supplementary Table 2): cumulative expression of the imprinted genes is generally high in many assessed brain-derived samples and low in blood samples. It was also observed earlier that major upregulation of gene expression of the numerous imprinted genes is associated with early differentiation and development, rather than with undifferentiated status of stem cells [39, 40]. Concordantly, in the current study, all of the 13 SAGE libraries generated from undifferentiated embryonic stem cells (ESCs)—namely, lines HES3, HES4 [17, 23, 41], BG01, H1, H7, H9, H13, H14, HSF6 [17, 23]—uniformly demonstrate intermediate cumulative expression of the imprinted genes (Supplementary Table 2) and fit closely in the hierarchical tree built for the corresponding cluster (cluster IV; Figure 5(c)). However, many samples with high cumulative expression of the imprinted genes do not fit into any of the groups listed above. Important role of genomic imprinting in particular normal cell and cancer subtypes, suggested by high expression of these genes, thus should be a subject of the follow-up studies. Expression of individual imprinted genes varies to even further extent in the samples screened. Expression of the candidate imprinted gene even-skipped homeobox 1 (EVX1) was not detected in any sample submitted to the analysis, while the expression of many more (DUX2, FAM75D1, Q8NB05, FLJ20464, ISM1, FAM77D, and others) was detected only in a few samples, always on a minimal level. In contrast, further imprinted genes (NDUFA4, RPL22, Q8NE65, GNAS, PTPN14, RAB1B, and others) were expressed in the majority of the samples screened, often on high level (Supplementary Table 3). Illustratively, a notable variation in the cumulative expression of the imprinted genes and in the expression of individual imprinted genes is observed in the cells cultured in vitro, including cells of the same type (e.g., numerous medulloblastoma, glioblastoma multiforme, and breast carcinoma cell lines) (Supplementary Table 2 and Figure 6). This observation further supports earlier suggestion that cell culture conditions contribute to the maintenance or alteration of the imprinted gene expression [42, 43]. Taken together, a screening of the normalized expression profiles of a comprehensive panel of the established and candidate imprinted genes within the publicly available human SAGE datasets was performed in the current study: the first to estimate a prevalence of imprinted genes within the total human transcriptome in a large scale. This paper thus provides a useful reference on the relative size of the imprinted transcriptome and on the expression of the individual imprinted genes. Supplementary Material provides key properties of established and candidate imprinted gene subset within the SAGE datasets. Click here for additional data file.
  42 in total

1.  SAGEmap: a public gene expression resource.

Authors:  A E Lash; C M Tolstoshev; L Wagner; G D Schuler; R L Strausberg; G J Riggins; S F Altschul
Journal:  Genome Res       Date:  2000-07       Impact factor: 9.043

2.  An anatomy of normal and malignant gene expression.

Authors:  Kathy Boon; Elisson C Osorio; Susan F Greenhut; Carl F Schaefer; Jennifer Shoemaker; Kornelia Polyak; Patrice J Morin; Kenneth H Buetow; Robert L Strausberg; Sandro J De Souza; Gregory J Riggins
Journal:  Proc Natl Acad Sci U S A       Date:  2002-07-15       Impact factor: 11.205

3.  Large-scale serial analysis of gene expression reveals genes differentially expressed in ovarian cancer.

Authors:  C D Hough; C A Sherman-Baust; E S Pizer; F J Montz; D D Im; N B Rosenshein; K R Cho; G J Riggins; P J Morin
Journal:  Cancer Res       Date:  2000-11-15       Impact factor: 12.701

4.  Targets of c-Jun NH(2)-terminal kinase 2-mediated tumor growth regulation revealed by serial analysis of gene expression.

Authors:  Olga Potapova; Sergey V Anisimov; Myriam Gorospe; Ryan H Dougherty; William A Gaarde; Kenneth R Boheler; Nikki J Holbrook
Journal:  Cancer Res       Date:  2002-06-01       Impact factor: 12.701

5.  Serial analysis of gene expression.

Authors:  V E Velculescu; L Zhang; B Vogelstein; K W Kinzler
Journal:  Science       Date:  1995-10-20       Impact factor: 47.728

Review 6.  Genome and genetic resources from the Cancer Genome Anatomy Project.

Authors:  G J Riggins; R L Strausberg
Journal:  Hum Mol Genet       Date:  2001-04       Impact factor: 6.150

7.  Parental imprinting of the mouse H19 gene.

Authors:  M S Bartolomei; S Zemel; S M Tilghman
Journal:  Nature       Date:  1991-05-09       Impact factor: 49.962

8.  Development of reconstituted mouse eggs suggests imprinting of the genome during gametogenesis.

Authors:  M A Surani; S C Barton; M L Norris
Journal:  Nature       Date:  1984 Apr 5-11       Impact factor: 49.962

9.  Evaluation of allelic expression of imprinted genes in adult human blood.

Authors:  Jennifer M Frost; Dave Monk; Taita Stojilkovic-Mikic; Kathryn Woodfine; Lyn S Chitty; Adele Murrell; Philip Stanier; Gudrun E Moore
Journal:  PLoS One       Date:  2010-10-21       Impact factor: 3.240

10.  Parental imprinting of the mouse insulin-like growth factor II gene.

Authors:  T M DeChiara; E J Robertson; A Efstratiadis
Journal:  Cell       Date:  1991-02-22       Impact factor: 41.582

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.