Nonsyndromic cleft lip and palate (NSCL/P) is a complex disease resulting from failure of fusion of facial primordia, a complex developmental process that includes the epithelial-mesenchymal transition (EMT). Detection of differential gene transcription between NSCL/P patients and control individuals offers an interesting alternative for investigating pathways involved in disease manifestation. Here we compared the transcriptome of 6 dental pulp stem cell (DPSC) cultures from NSCL/P patients and 6 controls. Eighty-seven differentially expressed genes (DEGs) were identified. The most significant putative gene network comprised 13 out of 87 DEGs of which 8 encode extracellular proteins: ACAN, COL4A1, COL4A2, GDF15, IGF2, MMP1, MMP3 and PDGFa. Through clustering analyses we also observed that MMP3, ACAN, COL4A1 and COL4A2 exhibit co-regulated expression. Interestingly, it is known that MMP3 cleavages a wide range of extracellular proteins, including the collagens IV, V, IX, X, proteoglycans, fibronectin and laminin. It is also capable of activating other MMPs. Moreover, MMP3 had previously been associated with NSCL/P. The same general pattern was observed in a further sample, confirming involvement of synchronized gene expression patterns which differed between NSCL/P patients and controls. These results show the robustness of our methodology for the detection of differentially expressed genes using the RankProd method. In conclusion, DPSCs from NSCL/P patients exhibit gene expression signatures involving genes associated with mechanisms of extracellular matrix modeling and palate EMT processes which differ from those observed in controls. This comparative approach should lead to a more rapid identification of gene networks predisposing to this complex malformation syndrome than conventional gene mapping technologies.
Nonsyndromic cleft lip and palate (NSCL/P) is a complex disease resulting from failure of fusion of facial primordia, a complex developmental process that includes the epithelial-mesenchymal transition (EMT). Detection of differential gene transcription between NSCL/Ppatients and control individuals offers an interesting alternative for investigating pathways involved in disease manifestation. Here we compared the transcriptome of 6 dental pulp stem cell (DPSC) cultures from NSCL/Ppatients and 6 controls. Eighty-seven differentially expressed genes (DEGs) were identified. The most significant putative gene network comprised 13 out of 87 DEGs of which 8 encode extracellular proteins: ACAN, COL4A1, COL4A2, GDF15, IGF2, MMP1, MMP3 and PDGFa. Through clustering analyses we also observed that MMP3, ACAN, COL4A1 and COL4A2 exhibit co-regulated expression. Interestingly, it is known that MMP3 cleavages a wide range of extracellular proteins, including the collagens IV, V, IX, X, proteoglycans, fibronectin and laminin. It is also capable of activating other MMPs. Moreover, MMP3 had previously been associated with NSCL/P. The same general pattern was observed in a further sample, confirming involvement of synchronized gene expression patterns which differed between NSCL/Ppatients and controls. These results show the robustness of our methodology for the detection of differentially expressed genes using the RankProd method. In conclusion, DPSCs from NSCL/Ppatients exhibit gene expression signatures involving genes associated with mechanisms of extracellular matrix modeling and palate EMT processes which differ from those observed in controls. This comparative approach should lead to a more rapid identification of gene networks predisposing to this complex malformation syndrome than conventional gene mapping technologies.
Nonsyndromic cleft lip and palate (NSCL/P [MIM 119530]), a complex multifactorial disorder, is one of the most common congenital malformations, with a prevalence of 0.69 to 2.35 per 1,000 births in the Caucasian population [1]. Taking account of the complexities of this orofacial malformation and the long rehabilitation period following surgery, cleft lip and palate is considered to be a major psychosocial and economic burden for families and society. Gaining insight into the genetic causes of NSCL/P should lead to future improvement of genetic counseling, preventive and curative measures.The development of the human face begins with migration of neural crest cells that combine with the core mesoderm and pharyngeal ectoderm, establishing the facial primordia, which in turn give rise to structures associated with upper lip and palate formation [2, 3]. The growth, differentiation and fusion of these structures are genetically determined, and it is likely that disturbances in genetic pathways orchestrating these processes result in facial abnormalities, such as cleft lip and palate [2-5]. In this context, the epithelial-mesenchymal transition (EMT) plays a central role in generating the cranial neural crest cells as well as ensuring palate and lip fusion [6-8].The large number of genes known or suspected to be involved in clefting probably reflects the diversity of embryological events that contribute to the formation of these facial structures [4, 9–16]. Although gene mapping approaches, such as Genome-Wide Association Studies (GWAS), appeared to offer an option to identify at-risk alleles associated with NSCL/P, with better reproducibility among different studies [10, 11, 17, 18] than candidate genes, lack of progress over the last decade suggests that GWAS are still unlikely to provide sufficient information on the genetic etiology underlying the disease. However, genome-wide expression analyses based on differential gene expression associated with NSCL/P, as proposed here; present a viable and challenging alternative, since patterns of co-expression can be used to identify biological pathways or gene networks associated with disease predisposition. The current data supporting this supposition can be summarized as follows: transcriptome analysis in tissues of cleft palate (CP) patients showed a distinct gene expression signature when compared to CL/P [19]. It has been reported that a few genes coding for extracellular matrix proteins, such as TGFB3 and MMP3, are differentially expressed in fibroblasts of NSCL/Ppatients as compared to controls [20, 21], suggesting that the transcriptome of NSCL/P cells might exhibit a specific expression signature irrespective of origin of the cells concerned.The use of mesenchymal stem cells (MSCs) or induced pluripotent stem cells (iPSs) has been shown to be a promising new approach to study gene function and signaling pathways in genetic disorders [22-24]. MSCs constitute a long-lived population of cells possessing self-renewal and differentiation properties [25-29]. Accordingly, these cells are a good model to study the in vitro characteristics of NSCL/P, since in addition to gene expression, they can be tested for proliferation, migration and differentiation properties, including EMT, functions that are presumed to be altered in cells of NSCL/Ppatients during embryonic development. MSCs were originally isolated from bone marrow, and subsequently, similar cell populations were isolated from other tissues, such as adipose tissue [27], dental pulp [28, 30], orbicularis oris muscle [26], umbilical cord blood, and umbilical cord tissue [31]. Moreover, MSCs can be easily obtained from non-invasive sources, such as exfoliated teeth, both from NSCL/Ppatients and control individuals [28].The main aim of this study was to verify if there are consistent gene expression profile differences between mesenchymal stem cells from NSCL/Ppatients and controls. We chose to study stem cells from dental pulp as they can be obtained relatively easy from both NSCL/Ppatients and controls. In addition, these stem cells, as for any other cells obtained from craniofacial tissues, are derived from cranial neural crest cells which play an important role in craniofacial development, including lip and palate [32, 33]. Our results will provide a base line for further investigation and insights into genetic pathway irregularities associated with craniofacial clefts.
Materials and Methods
Sample: NSCL/P Patients and Controls
Ethical approval to obtain stem cells from dental pulp of deciduous teeth was obtained from the Biosciences Institute Research Ethics Committee (Protocol 037/2005). Samples were included only after signed informed consent by the parents or legal guardians.Deciduous teeth from controls were obtained from odontopediatric clinics in Sao Paulo, while teeth from NSCL/Ppatients were excised during the exfoliation period by Dr. Bueno D.F. at Sobrapar, Campinas, Sao Paulo. An individual was classified as NSCL/P if no other malformations than clefting of both lip and palate were present.We analyzed mRNA of dental pulp stem cell cultures (DPSC) obtained from 6 controls and 6 NSCL/Ppatients (Table 1, supplementary material) for microarray expression analysis and for validation of 4 genes by quantitative Real-Time PCR (qRT-PCR). Validation of the microarray expression analysis by qRT-PCR for a larger number of genes was also done in mRNA obtained from DPSC cultures of 16 additional controls and 13 NSCL/Ppatients.
Stem Cell Culture
Stem cell cultures obtained from DPSC of deciduous teeth were established according to previously published protocols [28]. Cells were cultured at 37°C with 5% CO2 in DMEM-F12 (Invitrogen, UK) supplemented with 15% Fetal Bovine Serum (FBS, HyClone, USA) and frozen in 40% FBS for storage. Frozen cells were thawed and grown until 80% confluency in a 75 cm2 flask and submitted to serum starvation (12 h) prior to RNA extraction. All experiments were conducted with cells between the 4th and 8th subculture.
Characterization of Mesenchymal Stem Cell Populations
Cell populations used in the microarray assays were analyzed in a flow cytometer for specific cell surface markers to evaluate homogeneity. Cells were harvested with TrypLe (Invitrogen, UK), washed with PBS, and incubated at 4°C for 30 min with the following antibodies: CD29-PE CY5, CD90 (Thy-1), CD45-FITC, CD31-PE (Becton Dickinson, USA), SH2, SH3, and SH4 (Case Western Reserve University, USA). After a second wash, samples incubated with unconjugated primary antibodies were then incubated with anti-mouse-PE secondary antibody (Guava Technologies, Hayward, CA) for an additional 15 min at 4°C. Finally, the cell suspension was washed with PBS, and signals from 105 cells were acquired with an EasyCyte Flow cytometer (Guava Technologies). Control samples for determining background noise were incubated with PBS instead of primary antibody followed by incubation with anti-mouse-PE secondary antibody. All plots generated were analyzed with Guava ExpressPlus software (Guava Technologies).The in vitro differentiation into bone, cartilage, muscle and fat had previously been demonstrated in 2 of the NSCL/Ppatients (F4280, F4285) and 3 of the control (F3363, F4271, F4272) samples included in this study [28, 29]. Because of the high reproducibility of our protocols, stem cell cultures are currently characterized only with respect to their immunophenotype.
RNA Processing
Total RNA was isolated using TRIzol (Invitrogen, UK) according to the manufacturer’s protocol and purified with RNeasy Mini-kit (QIAGEN). RNA quality and concentration were assessed using Nanodrop 1000 and gel electrophoresis. Only samples with a preserved rRNA ratio (28S/18S) and no evidence of RNA degradation were used in the microarray hybridization and qRT-PCR.
Microarray Processing
Expression measurements were performed using the Affymetrix Human Gene 1.0 ST array, which interrogates 28,869 genes, following RNA labeling and hybridization protocols as recommended by the manufacturer. After array scanning, quality control was performed with GCOS software (Affymetrix, USA) according to the manufacturer’s recommendations.
Transcriptome Analysis
Gene expression values were obtained using the three-step Robust Multi-array Average (RMA) pre-processing method, implemented in the Affy package from R/Bioconductor [34].We employed the RankProd method for the selection of differentially expressed genes (DEGs), considering a p-value cut-off of 0.05 adjusted for FDR (False Discovery Rate) [35]. RankProd is a rank-based nonparametric procedure [36]. The method has the advantage of being able to deal with few samples for identifying biologically relevant expression changes [37]. Genes selected by RankProd do not necessarily need to present homogenous expression levels across all the samples of the test and control groups. Accordingly, RankProd seems to be a good choice for identifying differential gene expression in complex diseases, in which altered expression of a given candidate gene is expected in just a subgroup of patients due to both multi-locus genetic heterogeneity and the stochastic nature of gene expression in complex systems [38].
Functional Annotation
We performed functional annotation analysis of the differentially expressed genes with the DAVID (Database for Annotation, Visualization and Integrated Discovery, http://david.abcc.ncifcrf.gov) and IPA (Ingenuity Pathway Analysis, http://www.ingenuity.com) tools. In IPA, we considered the default parameter Molecules per Network = 35, Networks per Analysis = 25, only direct relationships between genes and the “Ingenuity Expert Information” Data Source, including the new option of “Ingenuity ExpertAssist Findings”, in which the information has been manually reviewed and curated from full-text scientific publications.
Validation of Microarray Expression Using Quantitative Real-Time PCR (qRT-PCR)
Initially we performed qRT-PCR for 4 genes (COL15A1, ERAP2, PPT2 and EGFL8) on the same samples used in the microarray assay. These genes were randomly selected, but they were amongst those with the highest fold change. PPT2 and EGFL8 are represented by a common probe set on the Affymetrix Human Gene 1.0 ST array and therefore both genes were tested for qRT-PCR with gene-specific primers. Next, we performed qRT-PCR for further 12 genes (ACAN, CDH6, CLDN1, CLDN11, COL4A1, COL4A2, ENPP2, HAS2, ITGA8, JAG1, MCAM and MMP3) plus the genes COL15A1, ERAP2 and PPT2 in 16 controls and 13 NSCL/Ppatients. Only 4 of the control and 4 of the NSCL/Ppatients samples were the same as those used in the microarray assay due to unavailability of RNA from the remaining 4 individuals (Table 1 in supplementary material).One microgram of total RNA from each cell culture was reverse transcribed with Superscript II (Invitrogen, UK), according to manufacturer’s recommendations. Quantitative Real-Time PCR reactions were performed in duplicates with a final volume of 20 μl, using 20 ng of cDNA, 1X SYBR Green PCR Master Mix (Applied Biosystems) and 100–400 nM of each primer. We used ABI Prism 7500 Sequence Detection System (Applied Biosystems) with standard temperature protocol. Primers were designed with Primer Express software V.2.0 (Applied Biosystems; primers sequence in supplementary Table 2) and the amplification efficiency (E) of each primer was calculated according to the equation E = 10 (−1/slope). The expression data of the target transcripts were determined by relative quantification in comparison to a pool of RNAs (4 controls and 4 patients). GeNorm v3.4 was used to determine the most stable endogenous controls (SDHA, HPRT1 and GAPDH) and to calculate the normalization factors for each sample [39]. Expression values were calculated according to reference [40].To compare the expression of COL15A1, ERAP2, PPT2 and EGFL8, obtained by qRT-PCR and microarray assay in the same samples, we used an unpaired t-test with Welch’s correction.To compare the results obtained by microarray with those obtained by qRT-PCR in the novel samples, for which we do not have microarray data, we performed the following strategy: for each of the 15 genes, we calculated the average (avg) expression for controls and for NSCL/Ppatients obtained by both methods (Table 3 in supplementary material). The correlation between the ratios “avg_patients/avg_controls” from microarray and qRT-PCR assays was calculated using Spearman’s correlation test.
Clustering Analysis
The cluster analysis of the differentially expressed genes was performed using GEDI (Gene Expression Dynamics Inspector) software. This tool creates a 2 dimensional gene expression image for each sample in which each gene retains exactly the same position in the image of each sample and in which the gene positions are computed to give the most parsimonious gene arrangement for depicting expression level differences between the patient and control groups for all differentially expressed genes [41]. In the analysis, a 10 × 11 grid configuration of SOM (Self-Organizing Map) was used. Inspection of GEDI images allows a straightforward classification of the samples into subgroups without the aid of a clustering algorithm, but simply based on the visual differences in the patterns [42].We also used two other conventional clustering methods: K-means and Hierarchical, both available in the MeV (MultiExperiment Viewer) software [43], with Spearman’s correlation as the distance metric. The clustering analysis of qRT-PCR data followed two criteria: a) only DEGs from network 1 (Fig. 4) and b) only DEGs that showed the same tendency of expression in the qRT-PCR and microarray assays (Fig. 2 in supplementary material).
Fig. 4
The most significant network built by IPA tool with the highest number of differentially expressed genes. The upregulated genes in NSCL/P patients are indicated in red and the downregulated genes in green. The blank symbols pertain to genes that were either not present in our array or not differentially expressed. Solid lines indicate a direct linkage among two genes. Lines with arrows indicate that one gene acts on the other, and lines without arrows indicate that the corresponding proteins interact with each other. The 6 genes circled in orange were used in clustering analysis of qRT-PCR and microarray
Results
The cell cultures presented positive labeling for cell adhesion (CD29, CD90) and mesenchymal stem cell markers (SH2, SH3, SH4) in most of the cells (>90%) and were negative for endothelial and hematopoietic cell markers (Fig. 2 in supplementary material). Moreover, 2 NSCL/Ppatients and 3 control cell cultures used in the microarray analyses had been previously shown to differentiate into bone, muscle, cartilage and fat upon in vitro induction [28, 29]. Therefore the cell populations used in this study had the main properties of stem cells.
Controls versus NSCL/P Patients
We identified a total of 87 differentially expressed genes (DEGs; 58 upregulated and 29 downregulated) in the comparison between controls and NSCL/Ppatients (adjusted p ≤ 0.05; Table 1).
Table 1
List of 87 differentially expressed genes sorted out by comparing controls and NSCL/P patients with the RankProd analysis
AFFY ID
Gene symbol
FCa
Pfpb
Cytoband
8176375
RPS4Y1
5.1099
0.0007
Yp11.3
8176624
DDX3Y
4.9358
0.0000
Yq11
8041206
LBH
4.6707
0.0000
2p23.1
8118509
PPT2 c
4.2212
0.0008
6p21.3
7972750
COL4A1 d c
3.6284
0.0006
13q34
7932254
ITGA8 d c
3.2852
0.0037
10p13
8125537
HLA-DMA c
3.2541
0.0005
6p21.3
7998927
–
3.0562
0.0053
–
8108370
EGR1
3.0266
0.0124
5q31.1
8176719
EIF1AY
2.9797
0.0055
Yq11.222
7952205
MCAM d c
2.9129
0.0123
11q23.3
7953200
CCND2
2.7894
0.0089
12p13
8056491
SCN9A c
2.7724
0.0223
2q24
7964388
NDUFA4L2
2.7670
0.0112
12q13.3
8113504
C5orf13
2.7473
0.0077
5q22.1
8176578
USP9Y
2.7360
0.0219
Yq11.2
8104663
CDH6 d c
2.7285
0.0263
5p15.1-p14
7985786
ACAN d c
2.6889
0.0042
15q26.1
8121838
TPD52L1
2.6688
0.0106
6q22-q23
8177137
UTY
2.6455
0.0280
Yq11
7970033
COL4A2 d c
2.6123
0.0210
13q34
7965573
NTN4 c
2.5853
0.0129
12q22-q23
8003298
SLC7A5 c
2.5767
0.0174
16q24.3
8156783
COL15A1 c
2.5452
0.0250
9q21-q22
8090565
SNORA7B
2.5214
0.0225
3q21.3
7954293
PDE3A
2.5063
0.0121
12p12
8137670
PDGFA d c
2.4802
0.0179
7p22
8049888
ATG4B
2.4474
0.0115
2q37.3
7974895
FLJ43390
2.4260
0.0427
14q23.2
7958262
TCP11L2
2.4172
0.0253
12q23.3
7989985
ITGA11 c
2.4038
0.0447
15q23
8104035
SORBS2
2.3912
0.0291
4q35.1
8102800
SLC7A11 c
2.3697
0.0390
4q28-q32
7912157
ERRFI1
2.3596
0.0308
1p36
7981728
–
2.3590
0.0407
–
7985493
TM6SF1
2.3535
0.0442
15q24-q26
7981962
SNORD116-5
2.3518
0.0419
15q11.2
7931977
ITIH5
2.3354
0.0486
10p14
7928308
DDIT4
2.3277
0.0467
10pter-q26.12
8034940
NOTCH3 c
2.3234
0.0396
19p13.2-p13.1
7966089
CMKLR1 c
2.2946
0.0455
12q24.1
8068024
JAM2 d c
2.2538
0.0397
21q21.2
8023152
TCEB3CL
2.2139
0.0407
18q21.1
8109159
LOC728264
2.2134
0.0459
5q33.1
8117018
JARID2
2.2051
0.0408
6p24-p23
8086752
RNU13P3
2.2046
0.0452
3p21.31
7982868
CHAC1
2.2041
0.0273
15q15.1
8139207
INHBA c
2.1839
0.0405
7p15-p13
8027352
–
2.1608
0.0399
–
8027002
GDF15 c
2.1594
0.0285
19p13.11
8048749
KCNE4 c
2.1436
0.0184
2q36.3
8007850
LRRC37A
2.0206
0.0244
17q21.31
8160138
NFIB
1.9790
0.0182
9p24.1
8131867
–
1.8907
0.0477
–
8064978
JAG1 d c
1.8529
0.0178
20p12.1-p11.23
8042788
ACTG2
1.7905
0.0414
2p13.1
8045804
–
1.7730
0.0492
–
8092726
CLDN1 d
1.4872
0.0445
3q28-q29
7937772
IGF2
−1.8634
0.0165
11p15.5
7915592
–
−1.8745
0.0195
–
8124391
HIST1H2AB
−2.0165
0.0426
6p21.3
8163202
SVEP1
−2.0970
0.0214
9q32
8037240
PSG1 c
−2.1467
0.0273
19q13.2
8044605
LOC654433
−2.2491
0.0468
–
8117594
HIST1H2BM
−2.2734
0.0128
6p22-p21.3
7951271
MMP1 d c
−2.3465
0.0026
11q22.3
8138888
PDE1C
−2.4348
0.0263
7p15.1-p14.3
8140668
SEMA3A c
−2.5040
0.0042
7p12.1
7951284
MMP3 d c
−2.5261
0.0433
11q22.3
8107044
ERAP2
−2.5407
0.0291
5q15
8003667
SERPINF1 c
−2.5688
0.0109
17p13.1
8110916
LOC442132
−2.5879
0.0258
5p15.31
7904293
PTGFRN c
−2.6619
0.0164
1p13.1
8113800
FBN2 c
−2.7034
0.0000
5q23-q31
8116418
GFPT2
−2.8180
0.0212
5q34-q35
8152617
HAS2 d
−2.8501
0.0038
8q24.12
8129573
MOXD1 c
−2.8809
0.0181
6q23.1-q23.3
7925929
AKR1C3
−3.1055
0.0165
10p15-p14
7917850
ARHGAP29
−3.1793
0.0031
1p22.1
8037251
PSG7 c
−3.2618
0.0006
19q13.2
8180291
–
−3.2801
0.0006
8165808
XG c
−3.6591
0.0065
Xp22.33
8037272
PSG5 // PSG5 c
−3.7775
0.0000
19q13.2 // 19q13.2
8083887
CLDN11 d
−4.2221
0.0000
3q26.2-q26.3
7909730
KCNK2 c
−4.7183
0.0000
1q41
8037283
PSG4 c
−4.9049
0.0000
19q13.2
8152522
ENPP2 d c
−5.3472
0.0000
8q24.1
aFC = Fold change
bPfp = estimated percentage of false positive predictions.
(c) Genes that were functionally categorized as glycoproteins by DAVID (p = 8,0E-6).
(d) Genes involved in EMT.
List of 87 differentially expressed genes sorted out by comparing controls and NSCL/Ppatients with the RankProd analysisaFC = Fold changebPfp = estimated percentage of false positive predictions.(c) Genes that were functionally categorized as glycoproteins by DAVID (p = 8,0E-6).(d) Genes involved in EMT.Next, in order to visualize the expression behavior of these DEGs, we performed clustering analysis with the GEDI software. An image was created reflecting the 87 DEGs’ transcriptional behavior for each individual and where the gene position was fixed to give the most parsimonious arrangement to show differential gene expression between controls and NSCL/Ppatients. Upon visual inspection of the GEDI images, we observed that 4 of the NSCL/Ppatients showed a similar expression pattern (F4280, F4281, F4282, F4283; Fig. 1).
Fig. 1
Clustering of 87 DEGs resulted from the comparison between 6 controls and 6 NSCL/P patients. Each GEDI map (or mosaic) represents a gene expression profile of a single individual. The blue color represents the lowest expression level and red color represents the highest expression level on a scale of −4.70 to 7.98, respectively. The black frame highlights four patients with similar gene expression profile
Clustering of 87 DEGs resulted from the comparison between 6 controls and 6 NSCL/Ppatients. Each GEDI map (or mosaic) represents a gene expression profile of a single individual. The blue color represents the lowest expression level and red color represents the highest expression level on a scale of −4.70 to 7.98, respectively. The black frame highlights four patients with similar gene expression profileThe clustering analysis with the k-means method resulted in 9 gene clusters. Four of them exhibited a similar expression profile among NSCL/Ppatients, most particularly in the afore-mentioned group (F4280, F4281, F4282, F4283; Fig. 2). The similarities and dissimilarities in expression levels observed between NSCL/Ppatients were similar for both clustering methods. The expression pattern found in 4 out of 6 NSCL/Ppatients illustrates the characteristic of RankProd of being capable of selecting genes with differential expression in just a subgroup of samples.
Fig. 2
Gene clusters 1, 4, 6 and 9 resulted from k-means method (k = 9). In both clusters it is possible to observe a similar gene expression profile among 4 out of 6 patients (F4280, F4281, F4282 and F4283), indicating that many of the 87 selected DEGs are co-regulated in these 4 NSCL/P patients
Gene clusters 1, 4, 6 and 9 resulted from k-means method (k = 9). In both clusters it is possible to observe a similar gene expression profile among 4 out of 6 patients (F4280, F4281, F4282 and F4283), indicating that many of the 87 selected DEGs are co-regulated in these 4 NSCL/PpatientsDifferentially expressed genes were functionally annotated and analyzed with two different tools. First, the DAVID tool led to identification of three main canonical pathways from the KEGG Database: Focal adhesion, Cell adhesion molecules and ECM-receptor interaction (Fig. 3, 4 and 5 in supplementary material). Moreover, the most relevant functional category identified through DAVID was that of Glycoproteins (p = 8.0E-6), which included 36 of the 87 DEGs (Table 1). Subsequently, the IPA tool was used to characterize the 87 DEGs regarding possible biological functions. We observed 3 relevant functions enriched with a significant number of our genes: Cellular movement (20 genes, p = 6.4E-06–2.76E-02), Cellular growth and proliferation (27 genes, p = 3.11E-05–2.68E-02) and Cellular development (27 genes, p = 3.3E-05–2.47E-02) (Table 4 in supplementary material). A putative network with the largest number of DEGs built by IPA (13 DEGs; Fig. 4) suggests functional relationship among several extracellular proteins: ACAN, COL4A1, COL4A2, GDF15, IGF2, MMP1, MMP3 and PDGFa. All of these 8 genes are DEGs.Quantitative Real-Time PCR (qRT-PCR) initial analysis of NSCL/Ppatients and control samples for ERAP2, PPT2, COL15A1. E = primer amplification efficiency; CT = cycle threshold; delta_CT = sample’s average_CT normalized by pool’s average_CT; NF = normalization factorThe most significant network built by IPA tool with the highest number of differentially expressed genes. The upregulated genes in NSCL/Ppatients are indicated in red and the downregulated genes in green. The blank symbols pertain to genes that were either not present in our array or not differentially expressed. Solid lines indicate a direct linkage among two genes. Lines with arrows indicate that one gene acts on the other, and lines without arrows indicate that the corresponding proteins interact with each other. The 6 genes circled in orange were used in clustering analysis of qRT-PCR and microarray
Validation of the Microarray Analysis
The reproducibility of gene expression assayed by Affymetrix microarrays is high [44] and 4 genes (COL15A1, ERAP2, PPT2 and EGFL8) were initially selected for validation through qRT-PCR. Except for ERAP2 (p = 0.0397), that showed lower expression levels, the other genes showed higher transcript levels in NSCL/Ppatients than in controls: PPT2 (p = 0.0087) and COL15A1 (p = 0.0355) (Fig. 3), which confirms the expression patterns observed in the microarray assays. EGFL8 is represented by the same probe set of PPT2. Considering that we did not find significant differential expression levels through qRT-PCR with specific primers for EGFL8 (p = 0.1781), we kept only PPT2 among our candidate genes.
Fig. 3
Quantitative Real-Time PCR (qRT-PCR) initial analysis of NSCL/P patients and control samples for ERAP2, PPT2, COL15A1. E = primer amplification efficiency; CT = cycle threshold; delta_CT = sample’s average_CT normalized by pool’s average_CT; NF = normalization factor
For further validation of our results, in addition to COL15A1, PPT2 and ERAP2, we analyzed a further 12 other genes known to be involved with matrix remodeling in a novel sample of individuals (16 controls and 13 NSCL/Ppatients). By comparing the ratios of the average expression from patients/controls, we observed that ENPP2 exhibited the highest discordance between the microarray and qRT-PCR ratios (0.645 and 4.324, respectively), therefore we considered this gene expression as not validated. However, we observed a significant positive correlation between microarray and qRT-PCR expression data (p = 0.0114, r = 0.653) for the 14 genes chosen for validation (Table 3 in supplementary material). Accordingly, the analyses thus show that the data obtained from microarrays and qRT-PCR are consistent with each other, even in an enlarged novel sample, attesting to the reliability of the microarray analysis.Using NSCL/Ppatients expression data from qRT-PCR (13 NSCL/Ppatients) and microarray assays (6 NSCL/Ppatients), we also performed a clustering analysis for the following DEGs: ACAN, COL4A1, COL4A2, ERAP2, HAS2, and MMP3. These genes belong to network 1 (Fig. 4) and exhibited the same expression tendency in both experiments (Fig. 2 in supplementary material). The hierarchical clustering performed with qRT-PCR data revealed a homogeneous cluster with 7 out of 13 NSCL/Ppatients (F4312, F4311, F4281, F4243, F4388, F5398 and F4280), in which MMP3 is downregulated and ACAN, COL4A1 and COL4A2 upregulated. ERAP2 and HAS2 did not exhibit a consistent expression pattern in these patients (Fig. 5a). Interestingly, the hierarchical clustering using microarray data showed the same transcriptional behavior among these genes in 4 out of 6 NSCL/Ppatients (F4280, F4281, F4282 and F4283), although in this case, ERAP2 and HAS2 seem to be co-regulated with MMP3 and collagens (Fig. 5b). Spearman’s correlation test (Fig. 5c) calculated for each relationship among the genes from Fig. 5a confirmed that MMP3 and ACAN are negatively correlated (r = −0.889; p < 0.05) while ACAN exhibited positive correlation with COL4A1 and COL4A2 (r = 0.921 and r = 0.872, respectively; p < 0.05), corroborating our findings thus far. On the other hand, we did not achieve significance for ERAP2 and HAS2, confirming the lack of a well-defined expression pattern for these genes, as observed in Fig. 5a.
Fig. 5
Hierarchical clustering of all the patients analyzed by qRT-PCR and microarray, considering only the genes from IPA network 1 (Fig. 4). a Clustering of expression values obtained by qRT-PCR. b Clustering of expression values obtained by microarray. c Correlations (Spearman’s correlation test, r and p-values) between each co-regulated gene from qRT-PCR clustering. It is possible to observe that the gene MMP3 is significantly and inversely correlated to ACAN, COL4A1, COL4A2 genes in a subgroup of patients from qRT-PCR. This same pattern of co-regulation is also present in another group of patients analyzed by microarray, which includes two individuals (F4280 and F4281) from the mentioned qRT-PCR subgroup. In controls expression data of both assays, MMP3 is upregulated and ACAN, COL4A1 and COL4A2 downregulated
Hierarchical clustering of all the patients analyzed by qRT-PCR and microarray, considering only the genes from IPA network 1 (Fig. 4). a Clustering of expression values obtained by qRT-PCR. b Clustering of expression values obtained by microarray. c Correlations (Spearman’s correlation test, r and p-values) between each co-regulated gene from qRT-PCR clustering. It is possible to observe that the gene MMP3 is significantly and inversely correlated to ACAN, COL4A1, COL4A2 genes in a subgroup of patients from qRT-PCR. This same pattern of co-regulation is also present in another group of patients analyzed by microarray, which includes two individuals (F4280 and F4281) from the mentioned qRT-PCR subgroup. In controls expression data of both assays, MMP3 is upregulated and ACAN, COL4A1 and COL4A2 downregulated
Discussion
In this study, we conducted the first comparative transcriptome analysis between dental pulp stem cells from NSCL/Ppatients and controls to obtain more information on the genetic etiology of this malformation and explore new possibilities to identify pathways associated with disease pathology.Genome expression microarray analysis is a powerful tool for assessing changes in the transcription patterns of related genes and identification of signaling pathways associated with specific cell types, culture conditions or disease states [45, 46]. Considering that the cell populations from NSCL/Ppatients and controls were established and maintained under similar identical conditions, it is very likely that a large proportion of the DEGs identified is related to the genetic constitutional differences between cells from NSCL/Ppatients and controls. The observations that NSCL/P disease-specific expression profiles have also been previously reported in tissue biopsies and fibroblast cultures [19, 21] suggests that such profiles may be ubiquitous; in this respect our findings suggest that the disease-specific transcription profile is also present in mesenchymal stem cells. Accordingly, transcriptome analysis of stem cells represents an additional option to the study of the transcriptome and genetics of NSCL/P.Of the 87 DEGs obtained in our microarray analysis, we noted that MMP3 had previously been proposed as a candidate gene for NSCL/P [47]. A further 2 genes, PDGFa and ERRFI1, had previously been indirectly associated with NSCL/P, since their receptors, respectively PDGFR and EGRF, were identified as predisposing loci for this form of clefting [48-50]. These observations provide further confirmation that the transcriptome of DPSCs from NSCL/Ppatients can also be used to identify genetic variations associated with the disease.The functional annotation through DAVID showed that nearly half of the 87 DEGs correspond to glycoprotein molecules, including collagens, metalloproteinases, integrins, and adhesion proteins, which are important to orchestrate the signaling between the extracellular and intracellular compartment. Indeed, the three canonical pathways we identified through DAVID are mainly related to extracellular matrix components and signaling molecules located on the cell membrane (Fig. 3, 4 and 5 in supplementary material). Functional relationships among several extracellular proteins that are deregulated in our studies are also suggested by the putative network built by IPA (Fig. 4). These analyses suggest that a large proportion of the DEGs in the DPSC from NSCL/Ppatients are extracellular matrix components (ECM) involved in several cellular processes in facial development, such as extracellular remodeling during the epithelial-mesenchymal transition (EMT).EMT is a fundamental mechanism behind palatal fusion. This process occurs through a regulated sequence of events determined both by the extracellular environment and the gene expression program of the cell, leading to loss of cell-cell adhesion, breakdown of basal laminae, and increased cell invasion and mobility [51, 52].Fifteen genes enrolled in EMT are within the 87 DEGs (Table 1). Further, we observed the enrichment of biological functions involving cell proliferation, movement and invasion (Table 4 in supplementary material), all of which are expected phenotypic outcomes for genes involved with EMT.In the most relevant IPA network we observed that 8 out of 13 DEGs (ACAN, COL4A1, COL4A2, GDF15, IGF2, MMP1, MMP3 and PDGFa, Fig. 4) are extracellular matrix components, suggesting that an abnormal expression behavior of these genes may affect EMT during palate development.Clustering analyses showed that MMP3, ACAN, COL4A1 and COL4A2 transcripts are co-regulated in 4 out of 6 NSCL/Ppatients analyzed by microarray as well as in 7 out of 13 NSCL/Ppatients from qRT-PCR analysis. Therefore, it seems that in NSCL/P mesenchymal cells, the down-regulation of MMP3 is associated with up-regulation of ACAN, COL4A1 and COL4A2. Such a deregulated pathway probably has serious consequences in embryonic development, since it is known that the MMP3 protein cleaves a wide range of ECM proteins, including the collagens IV, V, IX, X, proteoglycans, fibronectin and laminin. It is also capable of activating other MMPs, as shown in network 1 (Fig. 4), and playing a key role in ECM degradation and remodeling [47]. In this context it has already been experimentally shown that lower levels of MMPs can block palatal fusion [7]. Therefore, it is possible that failure of lip and palate fusion in these groups of patients is at least partly associated with down-regulation of MMPs and up-regulation of collagens. It will be important in the near feature to identify the leading causative mechanism of altered expression of MMPs in these individuals. These results also show the robustness of RankProd to detect DEGs on a limited and heterogeneous group of samples, in contradistinction to a popular method SAM [53] which appears to require larger sample sizes. Moreover, RankProd is able to identify expression patterns in just a subgroup of affected samples, which is the ideal, considering that this is the expected to occur in a complex disease such as NSCL/P. Notwithstanding our favorable impression of RankProd, we are acutely aware of the small sample sizes employed in detecting DEGs in the initial microarray assay. Future studies must be directed towards confirming and/or expanding the pattern of DEGs using novel and much larger sample sizes.In summary our results suggest that NSCL/Ppatients exhibit deregulation of genes participating in either EMT or embryonic processes that depend on extracellular modeling. Abnormal expression of some genes encoding for extracellular matrix proteins in NSCL/P fibroblasts has also been reported by others, reinforcing the concept that disease expression signatures for NSCL/P are present in various tissues and not necessarily confined to a specific cell type. Moreover, the penetrance of the phenotype can depend on exposure to environmental factors and it is of interest that a recent report claimed a positive association between nicotine exposure and the CL/P phenotype in conjunction with deregulation of gene expression involved in ECM synthesis and degradation [21].The observation that the DEGs in NSCL/Ppatients are associated with ECM component signaling suggests that further analysis of these cells presents unique opportunities to study the complexity of molecular pathways and alleles involved in NSCL/P etiology. However, arguably the major advantage of DPSCs above other cell types, such as fibroblasts, will be their potential to study in vitro differentiation into muscle, bone and cartilage that are affected tissues in NSCL/P. In this context, it will be possible to integrate genomic and transcriptome analysis under different experimentally induced environmental insults on identical cell cultures. Our results open new avenues for the study of novel candidate genes for NSCL/P, since most of the 87 DEGs have not been previously associated with NSCL/P. In particular, the potential offered by our approach can be best visualized in the gene network 1 (Fig. 4), in which several of our DEGs are ECM components, suggesting that these genes might be enrolled in EMT during the development of NSCL/P phenotype.If wide-spread differences in co-regulated gene transcription, as observed in our experiments, are indeed a primary cause of NSCL/P, then the research emphasis must move away from naively cataloging which genes are being differentially expressed to defining the central transcription factors and regulatory elements that are driving the disease specific transcription patterns. In this respect, identifying the putative gene networks involved, as in our observations, will be an initial crucial step towards defining which regulatory elements are the most important.Below is the link to the electronic supplementary material.Flow cytometry analysis of dental pulp stem cell (DPSC) cultures. The graphs display the immunophenotype of adherent DPSC isolated from NSCL/Ppatients. The control assay (not labeled) is represented by a black line and the experimental population (labeled with specific antibodies) is in gray. Values represent the mean percentage of all assessed cells positively stained for the indicated antigens: SH2 (97.43%), SH3 (97.02%), SH4 (93.92%), CD29 (99.9%), CD31 (0.08%), CD45 (3.84%) and CD90 (97.96%). Abbreviations: CD, Cluster of Differentiation. (JPEG 551 kb)Comparison between expression ratios (log10 avg_patients/avg_controls) obtained by qRT-PCR (black line) and microarray (grey line) of the 15 genes chosen for validation. (JPEG 223 kb)Focal Adhesion pathway from the KEGG Database sorted out by DAVID tool. The red stars correspond to genes deregulated in the comparison between NSCL/Ppatients and controls (ECM, ITGA and CycD). (JPEG 782 kb)Cell Adhesion Molecules pathway from the KEGG Database sorted out by DAVID tool. The red stars correspond to genes deregulated in the comparison between NSCL/Ppatients and controls (MHC-II, CLDN, JAM2, ITGA8 and CLDN11). (JPEG 1387 kb)ECM-receptor interaction pathway from the KEGG Database sorted out by DAVID tool. The red stars correspond to genes deregulated in the comparison between NSCL/Ppatients and controls (Collagen, alpha8 and alpha12). (JPEG 783 kb)Age, gender, and clinical status of the samples analyzed by qRT-PCR and/or microarray. (DOC 64 kb)Primer sequences used for qRT-PCR. (DOC 41 kb)Ratios of the average expression from patients/controls for 15 candidate genes chosen for validation. (DOC 55 kb)Top 3 (out of 70) functional categories identified by Ingenuity Pathway Analysis as being significantly overrepresented by differentially expressed genes. (DOC 50 kb)
Authors: A I Saeed; V Sharov; J White; J Li; W Liang; N Bhagabati; J Braisted; M Klapa; T Currier; M Thiagarajan; A Sturn; M Snuffin; A Rezantsev; D Popov; A Ryltsov; E Kostukovich; I Borisovsky; Z Liu; A Vinsavich; V Trush; J Quackenbush Journal: Biotechniques Date: 2003-02 Impact factor: 1.993
Authors: Terri H Beaty; Jeffrey C Murray; Mary L Marazita; Ronald G Munger; Ingo Ruczinski; Jacqueline B Hetmanski; Kung Yee Liang; Tao Wu; Tanda Murray; M Daniele Fallin; Richard A Redett; Gerald Raymond; Holger Schwender; Sheng-Chih Jin; Margaret E Cooper; Martine Dunnwald; Maria A Mansilla; Elizabeth Leslie; Stephen Bullard; Andrew C Lidral; Lina M Moreno; Renato Menezes; Alexandre R Vieira; Aline Petrin; Allen J Wilcox; Rolv T Lie; Ethylin W Jabs; Yah Huei Wu-Chou; Philip K Chen; Hong Wang; Xiaoqian Ye; Shangzhi Huang; Vincent Yeow; Samuel S Chong; Sun Ha Jee; Bing Shi; Kaare Christensen; Mads Melbye; Kimberly F Doheny; Elizabeth W Pugh; Hua Ling; Eduardo E Castilla; Andrew E Czeizel; Lian Ma; L Leigh Field; Lawrence Brody; Faith Pangilinan; James L Mills; Anne M Molloy; Peadar N Kirke; John M Scott; James M Scott; Mauricio Arcos-Burgos; Alan F Scott Journal: Nat Genet Date: 2010-05-02 Impact factor: 38.330
Authors: Ingvar Ferby; Markus Reschke; Oliver Kudlacek; Pjotr Knyazev; Guido Pantè; Kerstin Amann; Wolfgang Sommergruber; Norbert Kraut; Axel Ullrich; Reinhard Fässler; Rüdiger Klein Journal: Nat Med Date: 2006-04-30 Impact factor: 53.440
Authors: Eder Zucconi; Natassia M Vieira; Daniela F Bueno; Mariane Secco; Tatiana Jazedje; Carlos E Ambrosio; Maria Rita Passos-Bueno; Maria Angelica Miglino; Mayana Zatz Journal: Stem Cells Dev Date: 2010-03 Impact factor: 3.272
Authors: Mary L Marazita; Jeffrey C Murray; Andrew C Lidral; Mauricio Arcos-Burgos; Margaret E Cooper; Toby Goldstein; Brion S Maher; Sandra Daack-Hirsch; Rebecca Schultz; M Adela Mansilla; L Leigh Field; You-e Liu; Natalie Prescott; Sue Malcolm; Robin Winter; Ajit Ray; Lina Moreno; Consuelo Valencia; Katherine Neiswanger; Diego F Wyszynski; Joan E Bailey-Wilson; Hasan Albacha-Hejazi; Terri H Beaty; Iain McIntosh; Jacqueline B Hetmanski; Gökhan Tunçbilek; Matthew Edwards; Louise Harkin; Rodney Scott; Laurence G Roddick Journal: Am J Hum Genet Date: 2004-06-04 Impact factor: 11.025
Authors: Ariadne Letra; Rodrigo A Silva; Renato Menezes; Claudia M Astolfi; André Shinohara; Ana Paula de Souza; José Mauro Granjeiro Journal: Arch Oral Biol Date: 2007-05-29 Impact factor: 2.633
Authors: Ariadne Letra; Renato M Silva; Luise G Motta; Susan H Blanton; Jacqueline T Hecht; Jose M Granjeirol; Alexandre R Vieira Journal: Birth Defects Res A Clin Mol Teratol Date: 2012-06-22
Authors: Waheed Awotoye; Peter A Mossey; Jacqueline B Hetmanski; Lord J J Gowans; Mekonen A Eshete; Wasiu L Adeyemo; Azeez Alade; Erliang Zeng; Olawale Adamson; Thirona Naicker; Deepti Anand; Chinyere Adeleke; Tamara Busch; Mary Li; Aline Petrin; Babatunde S Aregbesola; Ramat O Braimah; Fadekemi O Oginni; Ayodeji O Oladele; Abimbola Oladayo; Sami Kayali; Joy Olotu; Mohaned Hassan; John Pape; Peter Donkor; Fareed K N Arthur; Solomon Obiri-Yeboah; Daniel K Sabbah; Pius Agbenorku; Gyikua Plange-Rhule; Alexander Acheampong Oti; Rose A Gogal; Terri H Beaty; Margaret Taub; Mary L Marazita; Michael J Schnieders; Salil A Lachke; Adebowale A Adeyemo; Jeffrey C Murray; Azeez Butali Journal: Sci Rep Date: 2022-07-11 Impact factor: 4.996
Authors: Elizabeth J Leslie; M Adela Mansilla; Leah C Biggs; Kristi Schuette; Steve Bullard; Margaret Cooper; Martine Dunnwald; Andrew C Lidral; Mary L Marazita; Terri H Beaty; Jeffrey C Murray Journal: Birth Defects Res A Clin Mol Teratol Date: 2012-09-24
Authors: Giuliana Giannuzzi; Priscillia Siswara; Maika Malig; Tomas Marques-Bonet; James C Mullikin; Mario Ventura; Evan E Eichler Journal: Genome Res Date: 2012-10-11 Impact factor: 9.043
Authors: Priscila Sala; Giliane Belarmino; Raquel S Torrinhas; Natasha M Machado; Danielle C Fonseca; Graziela R Ravacci; Robson K Ishida; Ismael F M S Guarda; Eduardo G de Moura; Paulo Sakai; Marco A Santo; Ismael D C G da Silva; Claudia C A Pereira; Angela F Logullo; Steven Heymsfield; Daniel Giannella-Neto; Dan L Waitzberg Journal: Clin Transl Gastroenterol Date: 2017-01-05 Impact factor: 4.488