| Literature DB >> 26314578 |
Khadija El Amrani1, Harald Stachelscheid2,3, Fritz Lekschas4, Andreas Kurtz5,6, Miguel A Andrade-Navarro7,8.
Abstract
BACKGROUND: Identification of marker genes associated with a specific tissue/cell type is a fundamental challenge in genetic and cell research. Marker genes are of great importance for determining cell identity, and for understanding tissue specific gene function and the molecular mechanisms underlying complex diseases.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26314578 PMCID: PMC4552366 DOI: 10.1186/s12864-015-1785-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1An example showing how marker genes are identified by our method. The expression values correspond to the probe set "202357_s_a t ", which represents the gene CFB (complement factor B)
The corresponding samples to the tissues in the 3 data sets
| Tissue | Samples | |
|---|---|---|
| Data | midbrain | GSM80699, GSM80700, GSM80701 |
| set | heart atrium | GSM80654, GSM80655, GSM80656 |
| #1 | kidney cortex | GSM80686, GSM80687, GSM80688 |
| liver | GSM80728, GSM80729, GSM80730 | |
| lung | GSM80707, GSM80710, GSM80712 | |
| Data | liver | GSM44702, GSM18953, GSM18954 |
| set | lung | GSM44704, GSM18949, GSM18950 |
| #2 | brain | GSM44690, GSM18921, GSM18922 |
| kidney | GSM44675, GSM18955, GSM18956 | |
| heart | GSM44671, GSM18951, GSM18952 | |
| Data | liver | GSM18953, GSM18954 |
| set | lung | GSM18949, GSM18950 |
| #3 | heart | GSM18951, GSM18952 |
| kidney | GSM18955, GSM18956 |
Fig. 2Number of marker probe sets found for each cutoff for data sets #1 and #2
Fig. 3Precision/Recall curves for the complete set of genes selected by MGFM as potential markers for the examined tissues using a data set #1 and b data set #2. The gray curves show precision/recall for random selection
The percentage of probes on the microarray predicted as marker probe sets and the percentage of correctly identified marker genes using different score cutoffs for data set #1
| Score cutoff | 1 | 0.9 | 0.8 | 0.7 | 0.6 | 0.5 | 0.4 | 0.3 |
|---|---|---|---|---|---|---|---|---|
| Selected marker probe sets (in %) | 22.8 | 16 | 7.3 | 3.5 | 1.7 | 0.7 | 0.3 | 0.04 |
| Identified marker genes (in %) | 72.2 | 68.4 | 51.9 | 42.2 | 30.5 | 22.5 | 11.8 | 2.1 |
The percentage of probes on the microarray predicted as marker probe sets and the percentage of correctly identified marker genes using different score cutoffs for data set #2
| Score cutoff | 1 | 0.9 | 0.8 | 0.7 | 0.6 | 0.5 | 0.4 |
|---|---|---|---|---|---|---|---|
| Selected marker probe sets (in %) | 17.2 | 11.9 | 4.9 | 1.8 | 0.7 | 0.2 | 0.02 |
| Identified marker genes (in %) | 51.7 | 46 | 29.3 | 20.7 | 13.8 | 5.2 | 0 |
Fig. 4Precision/Recall curves for genes selected by MGFM as potential markers for each of the examined tissues using a data set #1 and b data set #2
The number of correctly identified and known marker genes on the microarrays of data sets #1 and #2 for each of the examined tissues
| Tissue | Correctly identified/known | |
|---|---|---|
| marker genes on the microarray | ||
| Data | midbrain | 16/33 |
| set | heart atrium | 24/32 |
| #1 | kidney cortex | 25/33 |
| liver | 39/65 | |
| lung | 9/24 | |
| Data | liver | 26/60 |
| set | lung | 7/24 |
| #2 | brain | 9/28 |
| kidney | 17/31 | |
| heart | 19/31 |
Fig. 5Venn diagrams showing comparisons of the predicted marker gene lists for the examined tissues. Labels in the Venn diagrams indicate tissue and data set (1 or 2, within brackets)
Gene Ontology enrichment (Molecular Function) of predicted marker genes for the examined tissues
| GO ID | GO |
| Expected count | Gene count | Size |
|---|---|---|---|---|---|
| Midbrain | |||||
| GO:0008017 | microtubule binding | 1.02×10−11 | 14.02 | 43 | 148 |
| GO:0030695 | GTPase regulator activity | 2.51×10−07 | 39.97 | 73 | 422 |
| GO:0005525 | GTP binding | 4.18×10−06 | 31.73 | 58 | 335 |
| GO:0030276 | clathrin binding | 4.27×10−06 | 1.89 | 10 | 20 |
| GO:0017075 | syntaxin-1 binding | 5.28×10−06 | 1.23 | 8 | 13 |
| Heart atrium | |||||
| GO:0008307 | structural constituent of muscle | 3.12×10−24 | 1.73 | 25 | 44 |
| GO:0003779 | actin binding | 2.81×10−21 | 13.63 | 58 | 346 |
| GO:0005523 | tropomyosin binding | 1.34×10−08 | 0.55 | 8 | 14 |
| GO:0051371 | muscle alpha-actinin binding | 2.46×10−08 | 0.28 | 6 | 7 |
| GO:0031432 | titin binding | 4.07×10−08 | 0.43 | 7 | 11 |
| Kidney cortex | |||||
| GO:0008509 | anion transmembrane transporter activity | 4.9×10−16 | 8.77 | 40 | 226 |
| GO:0015294 | solute:cation symporter activity | 8.98×10−11 | 3.03 | 19 | 78 |
| GO:0015081 | sodium ion transmembrane transporter activity | 4.5×10−09 | 4.58 | 21 | 118 |
| GO:0015297 | antiporter activity | 1.92×10−08 | 2.17 | 14 | 56 |
| GO:0019534 | toxin transporter activity | 1.39×10−04 | 0.31 | 4 | 8 |
| Liver | |||||
| GO:0004497 | monooxygenase activity | 2.23×10−20 | 4.95 | 34 | 87 |
| GO:0009055 | electron carrier activity | 4×10−20 | 8.20 | 43 | 144 |
| GO:0048037 | cofactor binding | 1.17×10−16 | 14.11 | 52 | 248 |
| GO:0020037 | heme binding | 2.06×10−15 | 6.83 | 34 | 120 |
| GO:0005506 | iron ion binding | 1.53×10−14 | 8.54 | 37 | 150 |
| Lung | |||||
| GO:0005102 | receptor binding | 3.30×10−10 | 71.09 | 124 | 1129 |
| GO:0004896 | cytokine receptor activity | 5.12×10−09 | 5.23 | 22 | 83 |
| GO:0003823 | antigen binding | 1.78×10−08 | 3.02 | 16 | 48 |
| GO:0019899 | enzyme binding | 6.89×10−07 | 69.58 | 110 | 1105 |
| GO:0032395 | MHC class II receptor activity | 1.54×10−06 | 0.5 | 6 | 8 |
Column labels are as follows: GO ID is the GO identifier; GO is the description of the GO term; p-value is the hypergeometric p-value for over-representation of each GO term; Expected/Gene Count are the expected and actual gene counts; and Size is the number of genes within each GO term
Gene Ontology enrichment (Biological Process) of predicted marker genes for the examined tissues
| GO ID | GO |
| Expected count | Gene count | Size |
|---|---|---|---|---|---|
| Midbrain | |||||
| GO:0007409 | axonogenesis | 4.49×10−22 | 46.71 | 118 | 489 |
| GO:0010975 | regulation of neuron projection development | 5.58×10−20 | 21.4 | 70 | 224 |
| GO:0006836 | neurotransmitter transport | 3.93×10−18 | 12.42 | 49 | 130 |
| GO:0051969 | regulation of transmission of nerve impulse | 9.65×10−16 | 19.11 | 59 | 200 |
| GO:0016358 | dendrite development | 6.24×10−13 | 12.51 | 42 | 131 |
| Heart atrium | |||||
| GO:0006941 | striated muscle contraction | 3.12×10−27 | 3.82 | 37 | 97 |
| GO:0060047 | heart contraction | 3.34×10−27 | 5.76 | 44 | 146 |
| GO:0048738 | cardiac muscle tissue development | 9.63×10−25 | 5.56 | 41 | 141 |
| GO:0090257 | regulation of muscle system process | 4.29×10−24 | 5.76 | 41 | 146 |
| GO:0030239 | myofibril assembly | 1.6×10−26 | 1.66 | 26 | 42 |
| Kidney cortex | |||||
| GO:0055085 | transmembrane transport | 2.03×10−18 | 29.19 | 83 | 757 |
| GO:0007588 | excretion | 1.83×10−10 | 2.47 | 17 | 64 |
| GO:0072006 | nephron development | 6.35×10−09 | 3.43 | 18 | 89 |
| GO:0006814 | sodium ion transport | 8.87×10−08 | 4.47 | 19 | 116 |
| GO:0072348 | sulfur compound transport | 1.74×10−05 | 0.89 | 7 | 23 |
| Liver | |||||
| GO:0008202 | steroid metabolic process | 8.68×10−46 | 15.33 | 90 | 267 |
| GO:0032787 | monocarboxylic acid metabolic process | 3.34×10−35 | 24.57 | 100 | 428 |
| GO:0006805 | xenobiotic metabolic process | 4.1×10−30 | 8.04 | 53 | 140 |
| GO:0044282 | small molecule catabolic process | 2×10−28 | 14.7 | 69 | 256 |
| GO:1901605 | alpha-amino acid metabolic process | 7.38×10−25 | 11.54 | 57 | 201 |
| Lung | |||||
| GO:0002684 | positive regulation of immune system process | 2.53×10−37 | 40.21 | 134 | 606 |
| GO:0006954 | inflammatory response | 2.23×10−25 | 32.64 | 101 | 492 |
| GO:0001816 | cytokine production | 4.95×10−25 | 31.32 | 98 | 472 |
| GO:0046649 | lymphocyte activation | 1.2×10−24 | 31.12 | 97 | 469 |
| GO:0009607 | response to biotic stimulus | 2.18×10−24 | 39.28 | 111 | 592 |
Column labels are as follows: GO ID is the GO identifier; GO is the description of the GO term; p-value is the hypergeometric p-value for over-representation of each GO term; Expected/Gene Count are the expected and actual gene counts; and Size is the number of genes within each GO term
PCR results
| Predicted marker genes for liver | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Gene | Liver | Lung | Heart | Brain | Kidney | Gene | Liver | Lung | Heart | Brain | Kidney |
|
| + | + | - | - | - |
| + | - | - | - | - |
|
| + | - | + | - | - |
| + | - | - | - | - |
|
| + | + | - | - | - |
| + | - | - | - | - |
|
| + | - | - | - | - |
| + | - | - | - | - |
|
| + | - | - | - | - |
| + | - | - | - | - |
|
| + | - | - | - | - | ||||||
| Predicted marker genes for lung | |||||||||||
| Gene | Liver | Lung | Heart | Brain | Kidney | Gene | Liver | Lung | Heart | Brain | Kidney |
|
| - | + | - | - | - |
| - | + | + | - | - |
|
| - | + | + | - | - |
| - | + | - | - | - |
|
| - | + | + | - | - |
| + | + | + | - | - |
|
| - | + | - | - | - |
| - | + | - | - | - |
|
| - | + | + | - | - |
| - | + | - | - | - |
|
| - | + | + | - | - |
| - | + | - | - | + |
| Predicted marker genes for heart | |||||||||||
| Gene | Liver | Lung | Heart | Brain | Kidney | Gene | Liver | Lung | Heart | Brain | Kidney |
|
| - | - | + | - | - |
| - | + | + | - | + |
|
| - | + | + | - | - |
| - | - | + | - | - |
|
| - | + | + | - | - |
| - | + | + | - | + |
|
| - | - | + | - | - |
| - | - | + | - | - |
|
| - | - | + | - | - |
| - | - | + | - | - |
|
| - | - | + | - | - |
| + | + | + | - | + |
| Predicted marker genes for brain | |||||||||||
| Gene | Liver | Lung | Heart | Brain | Kidney | Gene | Liver | Lung | Heart | Brain | Kidney |
|
| - | - | - | - | - |
| - | + | + | + | + |
|
| - | - | - | + | - |
| - | - | - | + | - |
|
| - | - | - | + | - |
| - | - | - | + | - |
|
| - | - | - | + | + |
| - | - | - | + | - |
|
| - | - | - | + | - |
| - | - | - | + | - |
|
| - | + | + | + | - |
| - | - | - | + | - |
| Predicted marker genes for kidney | |||||||||||
| Gene | Liver | Lung | Heart | Brain | Kidney | Gene | Liver | Lung | Heart | Brain | Kidney |
|
| - | - | - | - | - |
| - | - | - | - | - |
|
| - | - | - | - | - |
| - | - | - | - | + |
|
| - | - | - | - | + |
| - | - | - | - | - |
|
| - | - | - | - | + |
| - | - | - | - | + |
|
| - | + | - | - | + |
| - | - | - | - | + |
|
| - | + | - | - | + |
| - | - | - | - | + |