| Literature DB >> 19360135 |
Elias Daura-Oller1, Maria Cabré, Miguel A Montero, José L Paternáin, Antoni Romeu.
Abstract
In the present study, a positive training set of 30 known human imprinted gene coding regions are compared with a set of 72 randomly sampled human nonimprinted gene coding regions (negative training set) to identify genomic features common to human imprinted genes. The most important feature of the present work is its ability to use multivariate analysis to look at variation, at coding region DNA level, among imprinted and non-imprinted genes. There is a force affecting genomic parameters that appears through the use of the appropriate multivariate methods (principle components analysis (PCA) and quadratic discriminant analysis (QDA)) to analyse quantitative genomic data. We show that variables, such as CG content, [bp]% CpG islands, [bp]% Large Tandem Repeats, and [bp]% Simple Repeats, are able to distinguish coding regions of human imprinted genes.Entities:
Year: 2009 PMID: 19360135 PMCID: PMC2666875 DOI: 10.1155/2009/549387
Source DB: PubMed Journal: Comp Funct Genomics ISSN: 1531-6912
List of imprinted genes classified by expression.
| Name | Band | Expression |
|---|---|---|
| TP73 | 1p36 | M |
| LRRTM1 | 2p12 | P |
| NAP1L5 | 4q22 | P |
| PRIM2 | 6p12 | M |
| PLAGL1 | 6q24 | P |
| HYMAI | 6q24 | P |
| PEG10 | 7q21 | P |
| PON1 | 7q21 | P |
| CALCR | 7q21 | M |
| PPP1R9A | 7q21 | M |
| MEST | 7q32 | P |
| COPG2 | 7q32 | P |
| CPA4 | 7q32 | M |
| KLF14 | 7q32 | M |
| KCNK9 | 8q24 | M |
| INPP5F_V2 | 10q26 | P |
| KCNQ1 | 11p15 | M |
| IGF2AS | 11p15 | P |
| SMPD1 | 11p15 | M |
| IGF2 | 11p15 | P |
| ZNF215 | 11p15 | M |
| H19 | 11p15 | M |
| SLC22A18 | 11p15 | M |
| PHLDA2 | 11p15 | M |
| NDN | 15q11 | P |
| MKRN3 | 15q11 | P |
| MAGEL2 | 15q11 | P |
| UBE3A | 15q12 | M |
| TCEB3C | 18q21 | M |
| NNAT | 20q11 | P |
Figure 1The separation of the training set into four groups: I1, I2, NO_I1 and NO_I2. Notice that both PCs are responsible for the separation.
Figure 2Plot of the loading values of the selected variables used in the training set.
The number of large tandem repeats (LTR), CpG islands, and GC content in coding sequences of imprinted genes.
| I1 group | Lenght | CG content | Number CpG islands | Number LTR | Size count | Consensus |
|---|---|---|---|---|---|---|
| TP73 | 2234 | 64.6 | 3 | 0 | — | — |
| LRRTM1 | 2217 | 58.4 | 2 | 1 | 24_7 | ctgccgaaccacaccttccaggac |
| KLF14 | 1383 | 66.8 | 2 | 1 | 18_9 | cggcgcgcccgccgcctc |
| KCNK9 | 1303 | 60.1 | 2 | 0 | — | — |
| KCNQ1 | 3262 | 63.4 | 1 | 1 | 30_4 | cgcggccgccgccccgggccccgcgccccc |
| IGF2AS | 2056 | 64 | 1 | 0 | — | — |
| SMPD1 | 2473 | 59.8 | 1 | 1 | 6_9 | cgctgg |
| IGF2 | 1356 | 63.7 | 3 | 1 | 14_18 | tccccccctctctc |
| SLC22A18 | 1549 | 65 | 1 | 0 | — | — |
| PHLDA2 | 937 | 61.7 | 1 | 1 | 9_14 | ccgcgccct |
| NDN | 1897 | 52.3 | 2 | 1 | 57_4 | cccaggcccacaacgccccgggcgccccgaaggcggttccgccggccgcggccccgg |
| TCEB3C | 1877 | 64.7 | 2 | 0 | — | — |
|
| ||||||
| I2 group | Lenght | CG content | Number CpG islands | Number LTR | Size count | Consensus |
|
| ||||||
| NAP1L5 | 1912 | 42.9 | 0 | 1 | 12_7 | ggaggaggagga |
| PRIM2 | 2353 | 40.7 | 0 | 0 | — | — |
| PLAGL1 | 4354 | 46.9 | 1 | 1 | 25_3 | atcttacaaaaaaaaaaaaaaaaaa |
| HYMAI | 5005 | 42.1 | 1 | 1 | 13_7 | tatatatatataa |
| PEG10 | 6628 | 44.7 | 2 | 2 | 42_3 12_4 | agaagctctcagaggagaacaacaaccttcgagagcaggtgg/ccgccgcctcca |
| PON1 | 2395 | 41.3 | 0 | 0 | — | — |
| CALCR | 3470 | 40.4 | 0 | 0 | — | — |
| PPP1R9A | 9705 | 39.9 | 0 | 1 | 5_8 | ttttc |
| MEST | 2507 | 45.1 | 1 | 2 | 42_4 23_3 | ggcggctgcggctgccgcgcccggtgctgcccagcgctgcgg/caaaaaaaaaaaaaaaaaaaaaa |
| COPG2 | 3365 | 43.1 | 0 | 0 | — | — |
| CPA4 | 2807 | 48.9 | 1 | 0 | — | — |
| INPP5F_V2 | 4955 | 43.5 | 1 | 0 | — | — |
| ZNF215 | 3658 | 40.4 | 1 | 2 | 84_3 84_3 | tattcgacatcaaaaaattcatactgaagcgaaggcctataaatgcaataaatgtgggaaagccttcagccgaagtgcagacct/aaaactgcatactggagataagtcctgaaaatgtaaaaaatgtaggaaaaccttcaaccggagttcagaacttatttaacatca |
| H19 | 2615 | 55.9 | 0 | 2 | 8_10 20_4 | ggggggga/ctttttcttcttcctccttt |
| MKRN3 | 3107 | 48 | 0 | 1 | 29_5 | ttaaaaattatatatataagaatataaaa |
| MAGEL2 | 2294 | 53.7 | 0 | 2 | 36_7 21_3 | cgggccctgagtgtctgggagggcccaagcacctcc/ggcctcctcaaaagagcgcag |
| UBE3A | 4491 | 36.7 | 0 | 1 | 10_7 | aaaacaaaaa |
| NNAT | 1338 | 56.5 | 0 | 0 | — | — |
Classification obtained with the QDA analysis.
| Group | I2 | I1 | NO_I1 | NO_I2 |
| count | 18 | 12 | 21 | 51 |
|
| ||||
| Summary of classification | ||||
|
| ||||
| True group | ||||
|
| ||||
| Put into group | I2 | I1 | NO_I1 | NO_I2 |
| I2 | 18 | 0 | 2 | 3 |
| I1 | 0 | 11 | 0 | 1 |
| NO_I1 | 0 | 0 | 19 | 0 |
| NO_I2 | 0 | 1 | 0 | 47 |
| Total | 18 | 12 | 21 | 51 |
|
| 18 | 11 | 19 | 47 |
| Proportion | 1,00 | 0,92 | 0,90 | 0,92 |
N = 102; N correct = 95; proportion correct = 0,93; proportion correct with cross-validation = 0.833.
List of 31 genes from the test group.
| Gene | Expression | Lenght | Chromosome |
|---|---|---|---|
| GFI1 | P | 2784 | 1 |
| EFNA4 | M | 1276 | 1 |
| HSPA6 | M | 2664 | 1 |
| SHC1 | M | 1752 | 1 |
| CYP1B1 | P | 5128 | 2 |
| SIX3 | P | 1926 | 2 |
| OTX1 | M | 2176 | 2 |
|
| |||
| BCL2L11 | P | 3422 | 2 |
| HOXD9 | M | 2089 | 2 |
| PER2 | M | 6219 | 2 |
| PPARG | P | 1883 | 3 |
|
| |||
| POLR2H | M | 821 | 3 |
| PITX2 | P | 2122 | 4 |
| TLL1 | P | 6654 | 4 |
|
| |||
| NDUFS4 | P | 668 | 5 |
| ITGB8 | M | 8787 | 7 |
| CDK6 | M | 11611 | 7 |
| PTPRN2 | M | 4767 | 7 |
|
| |||
| GADD45G | P | 1078 | 9 |
| AKR1C2 | P | 1663 | 10 |
| GATA3 | P | 3070 | 10 |
| NRGN | P | 1295 | 11 |
| KLRF1 | P | 1242 | 12 |
| KLRC3 | P | 1042 | 12 |
| POU4F1 | M | 5015 | 13 |
| F10 | M | 1560 | 13 |
| JAG2 | M | 5077 | 14 |
| SFRS2 | M | 2923 | 17 |
| GATA6 | M | 3494 | 18 |
| ELA2 | M | 938 | 19 |
| ZNF42 | M | 2620 | 19 |
Figure 3Scores for the predicted imprinted genes.