| Literature DB >> 26819094 |
Nishanth Ulhas Nair1, Laura Hunter2, Mingfu Shao3, Paulina Grnarova4, Yu Lin5, Philipp Bucher6,7, Bernard M E Moret8,9.
Abstract
BACKGROUND: In cell differentiation, a less specialized cell differentiates into a more specialized one, even though all cells in one organism have (almost) the same genome. Epigenetic factors such as histone modifications are known to play a significant role in cell differentiation. We previously introduce cell-type trees to represent the differentiation of cells into more specialized types, a representation that partakes of both ontogeny and phylogeny.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26819094 PMCID: PMC4895258 DOI: 10.1186/s12864-015-2297-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Flowchart for building cell-type trees using a maximum-likelihood framework. Data preprocessing: the mapped reads in the ChIP-Seq data are used to find peaks in the data. Data representation: peak data is converted to a binary matrix using windowing/overlap representation. Phylogenetic analysis: using distance-based, parsimony-based or maximum-likelihood-based phylogenetic approach. Inferring ancestral or internal nodes: we establish a parent-child relationship between the cell types (leaf data) using a process called lifting
Fig. 2Example of lifting leaf node A (L in the algorithm) in tree R. Tree R is divided into two smaller trees and as described in the algorithm
Cell types, short description, and general group for H3K4me3 data. For details see the ENCODE website [23]
| Cell name | Short description | Group |
|---|---|---|
| AG04449 | fetal buttock/thigh fibroblast | Fibroblast |
| AG04450 | fetal lung fibroblast | Fibroblast |
| AG09319 | gum tissue fibroblasts | Fibroblast |
| AoAF | aortic adventitial fibroblast cells | Fibroblast |
| BJ | skin fibroblast | Fibroblast |
| CD14 | Monocytes-CD14+ from human leukapheresis production | Blood |
| CD20 | B cells replicate | Blood |
| hESC | undifferentiated embryonic stem cells | hESC |
| HAc | astrocytes-cerebellar | Astrocytes |
| HAsp | astrocytes spinal cord | Astrocytes |
| HBMEC | brain microvascular endothelial cells | Endothelial |
| HCFaa | cardiac fibroblasts- adult atrial | Fibroblast |
| HCF | cardiac fibroblasts | Fibroblast |
| HCM | cardiac myocytes | Myocytes |
| HCPEpiC | choroid plexus epithelial cells | Epithelial |
| HEEpiC | esophageal epithelial cells | Epithelial |
| HFF | foreskin fibroblast | Fibroblast |
| HFF MyC | foreskin fibroblast cells expressing canine cMyc | Fibroblast |
| HMEC | mammary epithelial cells | Epithelial |
| HPAF | pulmonary artery fibroblasts | Fibroblast |
| HPF | pulmonary fibroblasts isolated from lung tissue | Fibroblast |
| HRE | renal epithelial cells | Epithelial |
| HRPEpiC | retinal pigment epithelial cells | Epithelial |
| HUVEC | umbilical vein endothelial cells | Endothelial |
| HVMF | villous mesenchymal fibroblast cells | Fibroblast |
| NHDF Neo | neonatal dermal fibroblasts | Fibroblast |
| NHEK | epidermal keratinocytes | Epithelial |
| NHLF | lung fibroblasts | Fibroblast |
| RPTEC | renal proximal tubule epithelial cells | Epithelial |
| SAEC | small airway epithelial cells | Epithelial |
| SKMC | skeletal muscle cells | Skeletal muscle |
| WI 38 | embryonic lung fibroblast cells | Fibroblast |
Fig. 3Compares cell-type trees obtained on H3K4me3 data (only one replicate) using two different methods on the overlap representation: (a) using a maximum-likelihood based approach, (b) using a parsimony-based approach, (c) using a distance-based approach
Groupings for cell-type trees on H3K4me3 data
| hESC | Epithelial | Fibroblast | Blood | Astrocytes | Myocytes | Endothelial | Skeletal Muscle | |
|---|---|---|---|---|---|---|---|---|
| (5) | (8) | (16) | (2) | (2) | (1) | (2) | (1) | |
| D | 5,0 | 4,1 | 6,3 | 2,0 | 2,0 | 1,0 | 1,1 | 1,0 |
| P | 5,0 | 4,2 | 6,4 | 2,0 | 1,1 | 1,0 | 1,1 | 1,0 |
| ML | 5,0 | 6,1 | 15,1 | 2,0 | 2,0 | 1,0 | 1,1 | 1,0 |
Second to ninth columns show the number of cell types (of the same group) belonging to the largest and second-largest clades; the total number of cell types of that group is in the top row. Rows correspond to various methods. Overlap representation is used. ML — maximum-likelihood-based approach, P — parsimony-based approach, D — distance-based approach
Fig. 4Cell-type trees obtained on H3K4me3 data on a set of 19 cell types: (a) before lifting, (b) after lifting
Statistics for trees with fixed number of leaf nodes
| 12-leaf | 50-leaf | 100-leaf | |
|---|---|---|---|
|
| 0.750 | 0.736 | 0.789 |
|
| 0.070 | 0.064 | 0.036 |
|
| 0.677 | 0.629 | 0.748 |
|
| 0.906 | 0.917 | 0.946 |
|
| 1.3 | 5.900 | 12.20 |
We simulate 10 random trees (data representation length is 1000) for each of kind of tree (12, 50 or 100 leaf tree) and ran the lifting algorithm (for α=0.1). We then calculated the following statistics shown in the table: True positive rate (TPR), False positive rate (FPR), F-score or F 1-score (F), Accuracy (ACC), RF distance (RF)
Statistics for trees with different length of data representations
|
|
|
| |
|---|---|---|---|
|
| 0.783 | 0.750 | 0.880 |
|
| 0.088 | 0.070 | 0.067 |
|
| 0.621 | 0.677 | 0.733 |
|
| 0.899 | 0.906 | 0.927 |
|
| 0.700 | 1.300 | 0.400 |
We simulate 10 random 12-leaf trees for varying number of data representation lengths (500, 1000 and 5000) and ran the lifting algorithm (for α=0.1). We then calculated the following statistics shown in the table: True positive rate (TPR), False positive rate (FPR), F-score or F 1-score (F), Accuracy (ACC), RF distance (RF)
Statistics for trees with different values of α
|
|
|
| |
|---|---|---|---|
|
| 0.750 | 0.750 | 0.871 |
|
| 0.070 | 0.063 | 0.141 |
|
| 0.677 | 0.667 | 0.643 |
|
| 0.906 | 0.916 | 0.861 |
We simulate 10 random 12-leaf trees for data representation length of size 1000 and ran the lifting algorithm for varying values of α. We then calculated the following statistics shown in the table: True positive rate (TPR), False positive rate (FPR), F-score or F 1-score (F), Accuracy (ACC)