Literature DB >> 26567182

Functional Conservation of the Glide/Gcm Regulatory Network Controlling Glia, Hemocyte, and Tendon Cell Differentiation in Drosophila.

Pierre B Cattenoz1, Anna Popkova1, Tony D Southall2, Giuseppe Aiello1, Andrea H Brand2, Angela Giangrande3.   

Abstract

High-throughput screens allow us to understand how transcription factors trigger developmental processes, including cell specification. A major challenge is identification of their binding sites because feedback loops and homeostatic interactions may mask the direct impact of those factors in transcriptome analyses. Moreover, this approach dissects the downstream signaling cascades and facilitates identification of conserved transcriptional programs. Here we show the results and the validation of a DNA adenine methyltransferase identification (DamID) genome-wide screen that identifies the direct targets of Glide/Gcm, a potent transcription factor that controls glia, hemocyte, and tendon cell differentiation in Drosophila. The screen identifies many genes that had not been previously associated with Glide/Gcm and highlights three major signaling pathways interacting with Glide/Gcm: Notch, Hedgehog, and JAK/STAT, which all involve feedback loops. Furthermore, the screen identifies effector molecules that are necessary for cell-cell interactions during late developmental processes and/or in ontogeny. Typically, immunoglobulin (Ig) domain-containing proteins control cell adhesion and axonal navigation. This shows that early and transiently expressed fate determinants not only control other transcription factors that, in turn, implement a specific developmental program but also directly affect late developmental events and cell function. Finally, while the mammalian genome contains two orthologous Gcm genes, their function has been demonstrated in vertebrate-specific tissues, placenta, and parathyroid glands, begging questions on the evolutionary conservation of the Gcm cascade in higher organisms. Here we provide the first evidence for the conservation of Gcm direct targets in humans. In sum, this work uncovers novel aspects of cell specification and sets the basis for further understanding of the role of conserved Gcm gene regulatory cascades.
Copyright © 2016 Cattenoz et al.

Entities:  

Keywords:  DamID; Drosophila; glide/gcm; mGcm; screen

Mesh:

Substances:

Year:  2015        PMID: 26567182      PMCID: PMC4701085          DOI: 10.1534/genetics.115.182154

Source DB:  PubMed          Journal:  Genetics        ISSN: 0016-6731            Impact factor:   4.562


UNDERSTANDING the molecular signature of a developmental pathway is a major challenge in modern biology. Transcription factors specify cell fates by inducing the expression of specific genes. For instance, the zinc finger transcription factor glial cell deficient/glial cell missing (Glide/Gcm, or Gcm for the sake of simplicity) is expressed transiently at early stages (Bernardoni ; Laneve ; Flici ) and controls Drosophila glial and blood development (Hosoya ; Jones ; Vincent ; Bernardoni ; Egger ; Freeman ; Soustelle ; Altenhein ). Gcm is also expressed in tendon and peritracheal cells (Soustelle ; Laneve ), showing that fate determinants have a much broader role than expected and likely trigger the expression of target genes depending on the transcriptional and epigenetic environment of the different cell types. Expression profiling data and computational predictions were used previously to gain a better understanding of the Gcm regulatory network (Egger ; Freeman ; Altenhein ), but these approaches did not allow genome-wide identification of the direct targets. Genes directly targeted by transcription factors are commonly identified by chromatin immunoprecipitation (CHiP) using specific antibodies targeting the transcription factors. Because no efficient antibody is available for Gcm (Popkova ; Laneve ), we decided to use DNA adenine methyltransferase identification (DamID) to identify the Gcm direct targets in Drosophila. The DamID chromatin profiling is a methylation-based tagging method used to identify the direct genomic loci bound by transcription factors (van Steensel and Henikoff 2000; van Steensel ). The approach is based on the fusion of a bacterial Dam methylase to a protein of interest to mark the factor’s genomic binding sites by adenine methylation. The DamID screen allowed us to identify 1031 targets, only some of which have already been associated with a Gcm-dependent cascade. Several targets belong to the Notch (N), JAK/STAT, and Hedgehog (Hh) pathways and suggest the presence of feedback loops. Because these pathways were previously shown to affect the cell populations depending on Gcm, the DamID data provide a molecular frame to clarify the observed mutant phenotypes (Hosoya ; Jones ; Bernardoni ). The DamID screen also brought to light two key features of the Gcm pathway. First, we address the late role of fate determinants beyond their ability to trigger novel transcriptional programs that are subsequently maintained by other factors [reviewed in Cattenoz and Giangrande (2015)]. The transiently expressed Gcm transcription factor is known to induce the expression of Reverse polarity (Repo), Tramtrack (Ttk), and Pointed (Pnt) transcription factors that will ensure and maintain the glial-specific differentiation program (Flici ) [reviewed in Cattenoz and Giangrande (2015)], and many Gcm targets identified by the DamID screen code for transcription factors. In addition, however, we found a significantly high number of effector genes, including numerous members of the Ig domain–containing protein family. These are molecules that affect cell function or late developmental events, including cell migration, a key feature of glia and hemocytes (Schmucker ; Watson ; ) [reviewed in Schwabe ]. This suggests that early genes such as may have a much broader impact than expected in cell specification/physiology. Second, the Gcm pathway is conserved in evolution. The Gcm protein is structurally conserved, as are most key developmental factors present in the fly genome. Like the fly ortholog, murine mGcm1 (mGcm1) and mGcm2 are important transcription factors because their deletion is lethal (Anson-Cartwright ; Gunther ). However, the main role of the mammalian genes, including the human genes, is, respectively, in the placenta and the parathyroid glands, two tissues that do not exist in invertebrates (Kim ; Basyuk , 2009; Gordon ; Correa ; Chen ; Mannstadt , 2011; Doyle ; Yi ; Park ; Mitsui ). The DamID data allow us to identify direct targets that are common in flies and vertebrates. To the best of our knowledge, this is the first evidence of functional conservation and sets the basis to further understand the Gcm network in mammals.

Materials and Methods

DamID technique

The pUASTattB-NDam construct was made by cloning the Dam-Myc cassette from pNDam-Myc (van Steensel and Henikoff 2000; van Steensel ), using EcoRI and BglII, into pUASTattB. To produce the Dam-Gcm fusion construct, the full-length coding sequence was cloned into pUASTattB-NDam (Choksi ) using KpnI and NotI sites. The two constructs were used to produce UAS Dam and UAS Dam-Gcm flies, respectively, employing the docking site attP-22A (Bischof ). Stage 10–11 embryos [4–7 hr after egg laying (AEL)] were collected from the two strains. DNA isolation, processing, and amplification were performed as described previously (Choksi ). The Dam-only and Dam-Gcm samples were labeled and hybridized together on a whole-genome 2.1 million–feature tiling array with 50- to 75-mer oligonucleotides spaced at ∼55-bp intervals (Nimblegen Systems). Arrays were scanned and intensities extracted (Nimblegen Systems). Three biological replicates (with one dye swap) were performed. Log2 ratios of each spot were median normalized.

DamID analysis

A peak-finding algorithm with false-discovery-rate (FDR) analysis was developed to identify significant binding sites (PERL script available on request). All peaks spanning eight or more consecutive probes (∼900 bp) over a twofold ratio change were assigned a FDR value. To assign a FDR value, the frequency of a range of small peak heights (0.1–1.25 log2 increase) were calculated within a randomized data set (for each chromosome arm) using 20 iterations for each peak size. This was repeated for a range of peak widths (6–15 consecutive probes). All these data were used to model the exponential decay of the FDR with respect to increasing peak height and peak width, therefore enabling extrapolation of FDR values for higher and broader peaks. This analysis was performed independently for each replicate data set. Each peak was assigned the highest FDR value from the three replicates. Genes were defined as targets where a binding event (with FDR < 0.1%) occurred within 5 kb of the transcriptional unit (depending on the proximity of adjacent genes).

Conservation of the Gcm binding sites located in DamID peaks

The Drosophila genome (version BDGP R5/Dm3) was scanned for the canonical Gcm binding sites (GBSs) listed in Figure 1B. For each GBS, the conservation score, which was calculated from 12 Drosophila species, mosquito, honeybee, and red flour beetle (Blanchette ; Siepel ), was taken from the Conservation track (multiz15way) on the University of California Santa Cruz (UCSC) Genome Browser. The GBSs located within 1 kb of DamID peaks were compared with the whole population of GBSs (Figure 1E). An F-test was used to compare the variance of the two populations, and a t-test for unequal sample variance was used to calculate the P-value.
Figure 1

The DamID peaks are enriched for GBSs. (A) Schematic of the MICRA algorithm used to identify enriched motifs in the DamID peaks. For each peak, 1000 nt of sequence was extracted and filtered for conserved sequence, and then the frequency of every 6–10 mers was compared to the background frequency in nonexonic DNA and ranked accordingly [for details, see Southall and Brand (2009)]. The most highly represented motif corresponds to the canonical GBS. (B) Canonical GBS reported with the strength of Gcm binding and references. (C) Distance between the DamID peaks and the closest canonical GBS. (D) Distribution of the number of canonical GBSs per kilobase in the whole genome and under the DamID peaks. The box delimits the second and third quartiles; the thick black bar indicates the median for the two populations; the P-values are indicated as follows: ns = nonsignificant (P > 0.05); *P = 0.05–0.01; **P = 0.01–0.001; ***P < 0.001. (E) Distribution of conservation scores of canonical GBSs in the whole genome and under the DamID peaks. Box, thick black bar; asterisk, as in D. (F) Coding status of the genomic region covered by the DamID.

The DamID peaks are enriched for GBSs. (A) Schematic of the MICRA algorithm used to identify enriched motifs in the DamID peaks. For each peak, 1000 nt of sequence was extracted and filtered for conserved sequence, and then the frequency of every 6–10 mers was compared to the background frequency in nonexonic DNA and ranked accordingly [for details, see Southall and Brand (2009)]. The most highly represented motif corresponds to the canonical GBS. (B) Canonical GBS reported with the strength of Gcm binding and references. (C) Distance between the DamID peaks and the closest canonical GBS. (D) Distribution of the number of canonical GBSs per kilobase in the whole genome and under the DamID peaks. The box delimits the second and third quartiles; the thick black bar indicates the median for the two populations; the P-values are indicated as follows: ns = nonsignificant (P > 0.05); *P = 0.05–0.01; **P = 0.01–0.001; ***P < 0.001. (E) Distribution of conservation scores of canonical GBSs in the whole genome and under the DamID peaks. Box, thick black bar; asterisk, as in D. (F) Coding status of the genomic region covered by the DamID.

Comparison with expression profiling data

The data set from Freeman was retrieved directly from the publication. For Egger’s data set, the raw data were retrieved and analyzed as described in the paper (Egger ) (intensities >50 and fold change >1.5) with a more restrictive P-value (<0.001). The data set from Altenhein comprising the filtered and tested genes for the gain of function (GOF) and loss of function (LOF) was retrieved, and all genes giving nonspecific or negative in situ hybridizations (ISHs) were removed to make the Venn diagrams in Figure 2. The R package VennDiagram was used to draw the diagrams in Figure 2, B and C (Chen and Boutros 2011). The gene names for all data sets then were converted to FlyBase gene numbers (Fbgn) for comparison with the DamID genes using the FlyBase conversion tool (dos Santos ).
Figure 2

Known targets of Gcm are found in the DamID screen. (A) Schematic representation of the gcm locus. The gene is indicated by the blue rectangles, thin ones indicating the untranslated regions (UTR) and thick ones indicating the coding exons (CDSs); pale blue arrowheads indicate the direction of transcription. In this and the following figures, GBSs are indicated in red, and the histograms above the locus show a region of 1 kb on each side of a DamID peak scoring a FDR < 0.001. The histograms in gray indicate the nonsignificant DamID peaks with a FDR > 0.001. The genomic coordinates of the loci (genome version BDGP R5/Dm3) are indicated above the histograms. (B) Euler diagram representing the overlap between the downstream targets of Gcm identified by Altenhein et al. (2002), Freeman , and Egger . The size of each area is proportional to the number of genes included in the category. (C) Subset of genes identified in the three transcriptome assays mentioned in B that are also identified as direct targets by the DamID screen. The size of each area is proportional to the number of genes included in the category. (D) Names of the genes in common between the screens of Altenhein et al. (2002), Freeman , and Egger et al. (2003) and the DamID. (E) Distribution of the genes whose expression decreases in gcm LOF according to the earliest developmental stages at which they were identified [data set Altenhein et al. (2002)]. (F) Same distribution as in E but for genes present in both the DamID screen and the Altenhein et al. (2002) LOF data set.

Known targets of Gcm are found in the DamID screen. (A) Schematic representation of the gcm locus. The gene is indicated by the blue rectangles, thin ones indicating the untranslated regions (UTR) and thick ones indicating the coding exons (CDSs); pale blue arrowheads indicate the direction of transcription. In this and the following figures, GBSs are indicated in red, and the histograms above the locus show a region of 1 kb on each side of a DamID peak scoring a FDR < 0.001. The histograms in gray indicate the nonsignificant DamID peaks with a FDR > 0.001. The genomic coordinates of the loci (genome version BDGP R5/Dm3) are indicated above the histograms. (B) Euler diagram representing the overlap between the downstream targets of Gcm identified by Altenhein et al. (2002), Freeman , and Egger . The size of each area is proportional to the number of genes included in the category. (C) Subset of genes identified in the three transcriptome assays mentioned in B that are also identified as direct targets by the DamID screen. The size of each area is proportional to the number of genes included in the category. (D) Names of the genes in common between the screens of Altenhein et al. (2002), Freeman , and Egger et al. (2003) and the DamID. (E) Distribution of the genes whose expression decreases in gcm LOF according to the earliest developmental stages at which they were identified [data set Altenhein et al. (2002)]. (F) Same distribution as in E but for genes present in both the DamID screen and the Altenhein et al. (2002) LOF data set. The expression profiles of the DamID genes were compared to the expression profile of in embryos using the in situ data produced by the Berkeley Drosophila Genome Project on embryos (Tomancak , 2007; Hammonds ) (Table 1, Table 2, and Supporting Information, Table S1, column O).
Table 1

Targets of Gcm involved in nervous system development

Gene symbolAnnotationReferences
beat-Ia,a beat-Ib,a beat-IIIb,b beat-IIIc,b beat-VI, beat-VII, btl, Dscam,b Dscam3,a Dscam4, fas, Lar, Ptp99A, robo, robo2, robo3,b Trim9,b tutlbAxon guidanceSeeger et al. (1993); Desai et al. (1996); Krueger et al. (1996); Lekven et al. (1998); Rajagopalan et al. (2000); Pipes et al. (2001); Sun et al. (2001); Jhaveri et al. (2004); Hiramoto and Hiromi (2006); Al-Anzi and Wyman (2009); Hofmeyer and Treisman (2009); Prakash et al. (2009); Zeev-Ben-Mordehai et al. (2009); Song et al. (2011); Armitage et al. (2012); Kim et al. (2013)
btl, fra,b sty, ths, unc-5Axon ensheatment and glial cell migrationKlambt et al. (1992); Shishido et al. (1997); Franzdottir et al. (2009); von Hilchen et al. (2010); Mukherjee et al. (2012)
Atp-α, bib,b CG13248,b CG13384, CG30344,b Fas2, Fas3,b Lac,b Sln, VGAT, wunaBlood-brain barrier (amino acid transport, septate junction)Karlstrom et al. (1993); Strigini et al. (2006); DeSalvo et al. (2014); Limmer et al. (2014); Deligiannaki et al. (2015)
HtlbBlood-brain barrier, axon ensheatment, and glial cell migrationShishido et al. (1993); Shishido et al. (1997); DeSalvo et al. (2014)
BabosbDendritic plasticityYuan et al. (2011)
JbugbEpileptic seizureGoldstein and Gunawardena (2000); Song and Tanouye (2006)
CG13506,a ImpL2,b InRInsulin regulationGarbe et al. (1993); Claeys et al. (2002); Song et al. (2003); Yuan et al. (2011); Sarraf-Zadeh et al. (2013)
gcm,b gcm2,b hkb,b loco,b pnt,b retnbGlial cell developmentGranderath et al. (1999); Kammerer and Giangrande (2001); Shandala et al. (2003); Yuasa et al. (2003); De Iaco et al. (2006); Popkova et al. (2012)
Dr,b vndbLate embryonic brain developmentSprecher and Hirth (2006)
CycE,b Dl,b drk,b Egfr,b N,a Ras85DaLongitudinal glia precursor divisionHidalgo et al. (2001); Griffiths and Hidalgo (2004); Berger et al. (2005a,b); Thomas and van Meyel (2007)
brat,b E(spl)m5,b E(spl)m7,b E(spl)m8,b E(spl)mβ,b E(spl)mδ,b E(spl)mɣ,a lola,b mira,b prosbNeural stem cell regulationNeumuller et al. (2011); Carney et al. (2012); Zacharioudaki et al. (2012)
ase,b ato,b ci,b d,b dally, ds, ft,b hbs, hh, l(1)sc,b Pka-C1,a ptc, rdx,b rho,b Ser, th,b tllbOptic lobe developmentGonzalez et al. (1989); Huang and Kunes (1996); Rudolph et al. (1997); Huang and Kunes (1998); Daniel et al. (1999); Chotard et al. (2005); Sugie et al. (2010); Yasugi et al. (2010); Kawamori et al. (2011); Perez-Gomez et al. (2013); Onel et al. (2014)
DIP-β,b DIP-θ,α dpr11,α dpr15,α dpr2, dpr3, dpr4, dpr5,b dpr8Enriched in gliaDeSalvo et al. (2014)
CG34371, CG42313, CG42389, E(spl)m4, E(spl)m6, kek6, wakebExpressed in the CNSBailey and Posakony (1995); Graveley et al. (2011)

Genes not coexpressed in embryos according to the Berkeley Drosophila Genome Project in situ database. No mark for genes that were not assayed.

Genes coexpressed in embryos with gcm. No mark for genes that were not assayed.

Table 2

Targets of Gcm involved in immune system development

Gene symbolAnnotationReferences
att-ORFA, CanA1,a coro,b Crag,a dnr1,b dos,a E2f1,b jumu,b loco,b lola,b mtd,b nub,b os, par-1,a Pli, scny,a shg,b Stat92E,b tefu, wun,a zfh1bAntimicrobial humoral response (response to fungi, response to Gram-negative bacteria)Kleino et al. (2005); Dijkers and O’Farrell (2007); Kim et al. (2007); Jin et al. (2008); Avadhanula et al. (2009); Cronin et al. (2009); Guntermann et al. (2009); Petersen et al. (2012); Wang et al. (2012); Dantoft et al. (2013); Engel et al. (2014); Ji et al. (2014)
Atg18,b Atg5,a Atg8b,a Atg9,a hid,b Rab7aAutophagyGorski et al. (2003); Hou et al. (2008); Yano et al. (2008); Ren et al. (2009)
CG11313aCoagulationKarlsson et al. (2004)
if, snaHemocyte migrationZanet et al. (2)009; Siekhaus et al. (2010)
alpha-Man-IIb, Gl,a Rac2,b RhoGEF3,b ZiraMelanotic encapsulation of foreign targetsHowell et al. (2012); Mortimer et al. (2012)
crq,b dally, DMAP1,a Dscam,b pnt,b Traf4bPhagocytosisFranc et al. (1999); Watson et al. (2005); Stroschein-Stevenson et al. (2006); Avet-Rochex et al. (2007); Zhu and Zhang (2013)
fz, fz2,b wg,b wntDbWnt mediated inflammatory cascadeZhang and Carthew (1998); Gordon et al. (2005); Sen and Ghosht (2008)
Alk, aop,a Egfr,b N,a Ras85DaHemocytes proliferationLebestky et al. (2003); Zettervall et al. (2004)
Antp,b Gale,b lolal,b mRpL53, pnr,b Ser, tupbLymph gland developmentLebestky et al. (2003); Mandal et al. (2004); Han and Olson (2005); Mandal et al. (2007); Tao et al. (2007); Mondal et al. (2014)
dpp, gcm,b gcm2,b htl,b l(3)mbn,a Pvf3,b ushbPlasmatocytes differentiationKonrad et al. (1994); Bernardoni et al. (1997); Lebestky et al. (2000); Fossett et al. (2001); Kammerer and Giangrande (2001); Alfonso and Jones (2002); Cho et al. (2002); Muratoglu et al. (2006); Frandsen et al. (2008); Parsons and Foley (2013)
apt,b Pen, Ptp61F,b Socs36E,b Socs44AaRepression of lamellocyte differentiationKussel and Frasch (1995); Callus and Mathey-Prevot (2002); Rawlings et al. (2004); Baeg et al. (2005); Muller et al. (2005); Sorrentino et al. (2007); Starz-Gaiano et al. (2008); Stec et al. (2013)

Genes not coexpressed in embryos according to the Berkeley Drosophila Genome Project in situ database. No mark for genes that were not assayed.

Genes coexpressed in embryos with gcm. No mark for genes that were not assayed.

Genes not coexpressed in embryos according to the Berkeley Drosophila Genome Project in situ database. No mark for genes that were not assayed. Genes coexpressed in embryos with gcm. No mark for genes that were not assayed. Genes not coexpressed in embryos according to the Berkeley Drosophila Genome Project in situ database. No mark for genes that were not assayed. Genes coexpressed in embryos with gcm. No mark for genes that were not assayed.

Fly strains and immunolabeling

Flies were raised on standard medium at 25°. The following strains were used: gcmGal4, UASmCD8GFP/CyO, Tb1 (gcmGal4, UASGFP in the text) (Soustelle and Giangrande 2007), y1v1; P(TRiP.JF01075)attP2 (UASgcmRNAi in the text) (Bloomington B#31519), repoGal80 (gift of B. Altenhein), and enGal4 (Bloomington B#30564) weres crossed with Oregon R flies to generate enGal4/+ or with UASgcm F18A flies (Bernardoni ) to generate enGal4/+; UASgcm/+ flies (Figure 4), and gcmGal4, UASmCD8GFP was recombined with repoGal80 to produce the line gcmGal4, UASmCD8GFP, repoGal80/CyO. Overnight lays of Drosophila embryos were used for Figure 4. In Figure 5, Drosophila central nervous systems (CNSs) were dissected and labeled as described previously (Ceron ). The primary antibodies used were rat anti-Ci [1:100; supernatant from the Developmental Studies Hybridoma Bank (DSHB)], mouse anti-Ptc (1:100; supernatant from the DSHB), mouse anti-Smo (1:100; supernatant from the DSHB), rat anti-Elav (1:200; supernatant from the DSHB), chicken anti-GFP (1:1000; Abcam #13970), and rabbit anti-Dh31 [1:500; kindly provided by J. Veenstra (Veenstra ; Veenstra 2009)]. Secondary antibodies conjugated with FITC, Cy3, or Cy5 (Jackson) were used at 1:500. DAPI was used at 100 ng/ml for nuclear counterstaining. Embryos and brains were mounted in VECTASHIELD (Vector) mounting medium and analyzed by confocal microscopy (Leica SP5) using identical settings between controls and mutants ( GOF and hypomorph).
Figure 4

Gcm regulates the Hedgehog signaling pathway in the embryonic epidermis. (A–H) Immunolabeling of Ci (in gray), Ptc (in red), and Smo (in green) in stage 15 embryos of the following genotypes: enGal4/+ (control) and enGal4/+; UASgcm/+ (gcm GOF). The areas delimited in white in A, C, E, and G are magnified in B–B″, D–D″, F and F′, and H and H′, respectively. Full projections of the embryos are shown in A, C, E, and G, and projections of four optical sections taken at 2-µm interval are shown in the magnified regions of B, D, F, and H. DAPI-labeled nuclei are in blue. Note that the three proteins involved in the Hedgehog signaling pathway are expressed at higher levels in the gcm GOF than in the control embryo. Bar, 50 µm. (I and J) Histograms showing the expression of gcm (I), ci, Pka-C1, and rdx (J) in enGal4/+ (control, red) and enGal4/+; UASgcm/+ (blue) embryos at stage 13–14. The y-axis represents the relative expression levels compared to that observed in the control embryos (red columns). Error bars and the P-values are calculated as in Figure 3D.

Figure 5

Gcm overlaps with and regulates Dh31 expression in the larval CNS. (A) Histogram representing the endogenous expression levels of Dh31 in S2 cells with and without transfected Gcm. Error bars and P-values are calculated as in Figure 3, A–D. (B–F″) Immunolabeling of Dh31 and GFP in larval CNS (Dh31 in red; GFP in green). White arrowheads indicate cells coexpressing Dh31 and Gcm (GFP+ cells); asterisks indicate cells expressing only Dh31. The gcm-expressing cells correspond to the dorsolateral neuronal cluster. Areas delimited in white in B and D are magnified in C–C″ and E–E″, respectively. (B) Full projection of a larval CNS of a heterozygous gcmGal4, UASmCD8GFP at the third instar. (C–C″) Projection of three optical sections taken at 2-µm interval: (C) overlay of Dh31 and GFP, (C′) Dh31 alone, and (C″) GFP alone. (D–E″) Same as B–C″ in a gcmGal4, UASmCD8GFP homozygous larva. (F–F″) Same as C–C″ in a gcm knockdown (KD) larva of the following genotype: gcmGal4, repoGal80, UASmCD8GFP/+; UASgcmRNAi/+. In all genotypes, the three sections contain the whole dorsolateral cluster (white oval). Note that in the gcmGal4 homozygous and in the gcm KD (arrowheads) larvae, the intensity of Dh31 labeling is reduced in the GFP+ cell compared to that observed in the control gcmGal4 heterozygous animals. Also take for comparison the surrounding Dh31+ cells (asterisk). (B and D) DAPI staining shows the nuclear labeling is in blue. Bar, 50 µm.

Gene Ontology (GO) term and protein domain enrichment analysis

The GO term and protein domain enrichment analyses were performed using the DAVID functional classification tool (Huang ,b).

Luciferase assay in S2 cells

For , , , and (Figure 3, A–D), sense and antisense oligonucleotides covering the GBSs in each gene were synthesized with flanking restriction sites for KpnI at the 5′ extremity and for NheI at the 3′ extremity. Each pair of oligonucleotides was designed with the wild-type (WT) GBS and a mutated GBS that is not bound by Gcm (mutated for nucleotides 2, 3, 6, and/or 7. In Table 3, the GBS and restriction sites are indicated by capital letters. For each GBS, the WT and mutant double-stranded probes were prepared as follows: 1 nmol of forward probe and 1 nmol of reverse probe were combined in 10 mM Tris, pH 7.5, 1 mM EDTA, and 50 mM NaCl in 100 μl of total solution. The mix was incubated for 1 min at 95° in a heating block, and then the heating block was turned off and allowed to cool to 25°. Then 2 μg of annealed oligonucleotides was digested with 20 units of KpnI [New England Biolabs (NEB) #R3142S] and 20 units of NheI (NEB #R3131S) in CutSmart buffer (NEB #B7204S) for 90 min at 37°. The digested double-stranded probes then were cleaned using a PCR Clean-Up Kit [Macherey-Nagel (MN) #740609] according to the manufacturer’s instructions.
Figure 3

Validation of new Gcm targets identified by the DamID screen. (A–D) The left panels represent the loci containing the DamID peaks and the GBSs for CG30002 (A), CycA (B), E(spl)m8 (C), and ptc (D) (as described in Figure 2A), and the histograms on the right (in red) represent the results of the luciferase assays carried out on each GBS. The red bars indicate enrichment of luciferase activity in the presence of Gcm compared to no transfected Gcm (“no Gcm”); bars indicate SEM; and n represents the number of independent transfection assays. p-values are as indicated as Figure 1D. (E) Histogram representing the endogenous levels of expression of CycA, E(spl)m8, CG30002, ptc, and repo in S2 cells with and without transfected Gcm. The y-axis is in log10 scale; error bars and P-values are calculated as indicated in Figure 3, A–D. (F) Same as E after FACS sorting of the Gcm+ cells. (G–I) FACS analyses of S2 cells (G), of S2 cells transfected with the Gal4 expression vector ppacGal4 and the GFP reporter UAS-GFP (H), and of S2 cells transfected with ppacGcm and repoGFP (I). The dotplots show the forward scatter area (FSC-A) on the y-axis and the GFP intensity on the x-axis. The area in blue indicates the GFP+ cells that were sorted for further analysis, and the number under the area indicates the percentage of cells that are GFP+. (J) Diagram representing the distribution of the DamID targets according to their levels of expression in stage 10–11 embryo (4- to 8-hr embryo in modENCODE development RNA-seq).

Table 3

Oligonucleotides used to generate the pGL4.23 vectors used in S2 cells

Probe nameProbe sequence
CG30002_GBS1_mutFgagaGGTACCtttgccgaaaaatgttcgggtAGTCGTTGcatacaatatccgctgaaacgGCTAGCgaga
CG30002_GBS1_mutRtctcGCTAGCcgtttcagcggatattgtatgCAACGACTacccgaacatttttcggcaaaGGTACCtctc
CG30002_GBS1_wtFgagaGGTACCtttgccgaaaaatgttcgggtATGCGGGCcatacaatatccgctgaaacgGCTAGCgaga
CG30002_GBS1_wtRtctcGCTAGCcgtttcagcggatattgtatgGCCCGCATacccgaacatttttcggcaaaGGTACCtctc
CG30002_GBS2_mutFgagaGGTACCtgtttgtgggcttgttctgaaaAGTCGTTGcagtgggttatgagaacaaaGCTAGCgaga
CG30002_GBS2_mutRtctcGCTAGCtttgttctcataacccactgCAACGACTtttcagaacaagcccacaaacaGGTACCtctc
CG30002_GBS2_wtFgagaGGTACCtgtttgtgggcttgttctgaaaATGCGGGAcagtgggttatgagaacaaaGCTAGCgaga
CG30002_GBS2_wtRtctcGCTAGCtttgttctcataacccactgTCCCGCATtttcagaacaagcccacaaacaGGTACCtctc
cycA_GBS_mutFgagaGGTACCctccacggccaacttggaattcCAACGACTcagctcatcgaattccgcctGCTAGCgaga
cycA_GBS_mutRtctcGCTAGCaggcggaattcgatgagctgAGTCGTTGgaattccaagttggccgtggagGGTACCtctc
cycA_GBS_wtFgagaGGTACCctccacggccaacttggaattcGCCCGCATcagctcatcgaattccgcctGCTAGCgaga
cycA_GBS_wtRtctcGCTAGCaggcggaattcgatgagctgATGCGGGCgaattccaagttggccgtggagGGTACCtctc
E(spl)m8_GBS_mutFgagaGGTACCatgctggagcgccagcgacgtCAACGACTgaacaagtgcctggacaacctGCTAGCgaga
E(spl)m8_GBS_mutRtctcGCTAGCaggttgtccaggcacttgttcAGTCGTTGacgtcgctggcgctccagcatGGTACCtctc
E(spl)m8_GBS_wtFgagaGGTACCatgctggagcgccagcgacgtGCCCGCATgaacaagtgcctggacaacctGCTAGCgaga
E(spl)m8_GBS_wtRtctcGCTAGCaggttgtccaggcacttgttcATGCGGGCacgtcgctggcgctccagcatGGTACCtctc
ptc_GBS1_mutFgagaGGTACCcatacacacacacacacacacCAACGACTcaacacacacacacacacgaaGCTAGCgaga
ptc_GBS1_mutRtctcGCTAGCttcgtgtgtgtgtgtgtgttgAGTCGTTGgtgtgtgtgtgtgtgtgtatgGGTACCtctc
ptc_GBS1_wtFgagaGGTACCcatacacacacacacacacacACACGCATcaacacacacacacacacgaaGCTAGCgaga
ptc_GBS1_wtRtctcGCTAGCttcgtgtgtgtgtgtgtgttgATGCGTGTgtgtgtgtgtgtgtgtgtatgGGTACCtctc
ptc_GBS2_mutFgagaGGTACCagcaggaagtgcaggatgctaCAACGACTaagtatgagtatcttccccatGCTAGCgaga
ptc_GBS2_mutRtctcGCTAGCatggggaagatactcatacttAGTCGTTGtagcatcctgcacttcctgctGGTACCtctc
ptc_GBS2_wtFgagaGGTACCagcaggaagtgcaggatgctaTCCCGCATaagtatgagtatcttccccatGCTAGCgaga
ptc_GBS2_wtRtctcGCTAGCatggggaagatactcatacttATGCGGGAtagcatcctgcacttcctgctGGTACCtctc
Validation of new Gcm targets identified by the DamID screen. (A–D) The left panels represent the loci containing the DamID peaks and the GBSs for CG30002 (A), CycA (B), E(spl)m8 (C), and ptc (D) (as described in Figure 2A), and the histograms on the right (in red) represent the results of the luciferase assays carried out on each GBS. The red bars indicate enrichment of luciferase activity in the presence of Gcm compared to no transfected Gcm (“no Gcm”); bars indicate SEM; and n represents the number of independent transfection assays. p-values are as indicated as Figure 1D. (E) Histogram representing the endogenous levels of expression of CycA, E(spl)m8, CG30002, ptc, and repo in S2 cells with and without transfected Gcm. The y-axis is in log10 scale; error bars and P-values are calculated as indicated in Figure 3, A–D. (F) Same as E after FACS sorting of the Gcm+ cells. (G–I) FACS analyses of S2 cells (G), of S2 cells transfected with the Gal4 expression vector ppacGal4 and the GFP reporter UAS-GFP (H), and of S2 cells transfected with ppacGcm and repoGFP (I). The dotplots show the forward scatter area (FSC-A) on the y-axis and the GFP intensity on the x-axis. The area in blue indicates the GFP+ cells that were sorted for further analysis, and the number under the area indicates the percentage of cells that are GFP+. (J) Diagram representing the distribution of the DamID targets according to their levels of expression in stage 10–11 embryo (4- to 8-hr embryo in modENCODE development RNA-seq). Then 1 μg of luciferase reporter plasmid pGL4.23[luc2/minP] (pGL4.23) (Promega #E841A) was digested with KpnI and NheI as described previously; after 90 min at 37°, 20 units of alkaline phosphatase, calf intestinal (CIP; Promega #M0290S) was added to the plasmid and incubated for 1 hr at 37°. The plasmid then was cleaned using a PCR Clean-Up Kit (MN #740609). Then 50 ng of digested luciferase plasmid was combined with the digested annealed probes (ratio plasmid/probe = 1:6), 400 units of ligase (NEB #M0202S), and ligation buffer (NEB #B0202S) and incubated overnight at 18°. The ligated plasmids then were dialyzed for 30 min on membrane filters (Millipore #VSWP02500) and amplified using the Plasmid DNA Purification Kit (MN #740410) according to the manufacturer’s instructions. Transfections of Drosophila S2 cells were carried out in 12-well plates using Effectene transfection reagent (Qiagen #301427) according to the manufacturer’s instructions. Cells were transfected with 0.5 μg pPac-lacZ, 0.5 μg pGL4.23 carrying the indicated GBS, 0.5 μg pPac-gcm (Miller ) or 0.5 μg pPac (Krasnow ). Then, 48 hr after transfection, cells were collected, washed once in cold PBS, and resuspended in 100 μl of lysis buffer (25 mM Tris-phosphate, pH 7.8, 2 mM EDTA, 1 mM DTT, 10% glycerol, and 1% Triton X-100). The suspensions were frozen and thawed four times in liquid nitrogen and centrifuged for 30 min at 4° at 13,000 × g. The luciferase and LacZ activities were measured in triplicate for each sample. For LacZ measurements, 20 μl of lysate was mixed with 50 μl of β-galactosidase assay buffer (60 mM Na2PO4, 40 mM NaH2PO4, 10 mM KCl, 1 mM MgCl2, and 50 mM β-mercaptoethanol) and 20 μl of ortho-nitrophenyl-β-galactoside (ONPG, 4 mg/ml) and incubated at 37° for 20 min. The reaction was stopped by adding 50 μl of 1 M Na2CO3, and the optical density at 415 nm was measured. For luciferase activity, 10 μl of protein lysate was analyzed on an opaque 96-well plate (Packard Instruments #6005290) with a Berthold Microluminat LB96P Luminometer by injecting 50 μl of luciferase buffer (20 mM Tris-phosphate, pH 7.8, 1 mM MgCl2, 2.5 mM MgSO4, 0.1 mM EDTA, 0.5 mM ATP, 0.5 mM luciferine, 0.3 mM coenzyme A, and 30 mM DTT). For both LacZ and luciferase assays, background levels were estimated using lysate from nontransfected S2 cells. The relative luciferase levels were calculated as follows: first, the background was subtracted from each value, and then the average values of the technical triplicate were calculated. From there, the luciferase activity of each sample was normalized to the LacZ activity (luciferase activity/LacZ activity) to correct for transfection efficiency variability, and the ratio luciferase with Gcm/luciferase without Gcm was calculated. For each WT and mutant GBS, biological triplicates were carried out.

S2 cell FACS and quantitative PCR (qPCR)

S2 cells were plated in six-well plates, 6 million cells per well, in 1.5 ml of Schneider medium complemented with 10% fetal calf serum (FCS) and 0.5% penicillin and 0.5% streptomycin (PS). Cells were transfected 12 hr after plating using Effectene Transfection Reagent (Qiagen). Briefly, 2 µg of pPac-gal4 vector and 1 µg of pUAS-GFP for the negative control and 2 µg of pPac-gcm (Miller ) and 1 µg of 4.3kb repo-GFP (repoGFP) (Laneve ) for the GOF were mixed with 90 µl of EC buffer and 24 µl of enhancer and incubated for 5 min at room temperature, and then 25 µl of Effectene was added, and the mix was incubated at room temperature for 20 min. Then 600 µl of Schneider medium + 10% FCS + 0.5% PS was added to the mix before spreading it on the cells. At 48 hr after transfection, cells were sorted on a BD FACSAria according to GFP expression to obtain more than 80% of transfected cells in the sample. The RNA then was extracted using TRI Reagent (Sigma), and 1 µg of RNA per sample was DNAse treated with RNAse-free DNAse 1 (Thermo Fisher) and reverse transcribed with Superscript II (Invitrogen). qPCR was performed on a LightCycler 480 (Roche) with SYBR master (Roche) on the equivalent of 5 ng of reverse-transcribed RNA with the primer pairs listed in Table S3. Each PCR was carried out in triplicate on at least three biological replicates. The quantity of each transcript was normalized to the quantity of the housekeeping genes () and Actin 5c (Act5c). The P-values were measured comparing the control with the transfected cells using Student’s t-test (bars = SEM).

qPCR on gcm-overexpressing embryos

RNA extraction was carried out on 50 to 100 enGal4/+ or enGal4/+; UASgcm/+ stage 13–14 embryos (at 25°, 2 hr of egg laying, 9 hr and 20 min of incubation before collection) using TRI Reagent (Sigma). RNA extraction and qPCR were performed as described for the S2 cells in triplicate.

Conversion to human orthologs

The Drosophila RNAi Screening Center (DRSC) integrative ortholog prediction tool (Hu ) was used to retrieve the human orthologs of all Gcm targets identified in Drosophila by the DamID screen. All human genes with a weighted score > 1 were selected.

qPCR in HeLa cells

HeLa cells were plated in six-well plates, 400,000 cells per well, in 1.6 ml of Dulbecco’s Modification of Eagle’s Medium (DMEM) complemented with 5% FCS and gentamycin. Cells were transfected 12 hr after plating using Effectene Transfection Reagent (Qiagen). Briefly, 1 µg of pCIG vector, 1 µg of pCIG vector expressing mouse Gcm1 (pCIG-mGcm1) (Soustelle ), or 1 µg of pCIG vector expressing mouse Gcm2 (pCIG-mGcm2) were mixed with 100 µl of EC buffer and 8 µl of enhancer and incubated for 5 min at room temperature, and then 10 µl of Effectene was added, and the mix was incubated at room temperature for 20 min. Then 200 µl of DMEM + 5% FCS + gentamycin was added to the mix before spreading it on the cells, and 48 hr after transfection, the RNA was extracted using TRI Reagent (Sigma). Then 1 µg of RNA per sample was DNAse treated with RNAse-free DNAse 1 (Thermo Fisher) and reverse transcribed with Superscript II (Invitrogen). qPCR was performed on a LightCycler 480 (Roche) with SYBR master (Roche) on the equivalent of 5 ng of reverse-transcribed RNA with the primer pairs listed in Table S3. Each PCR was carried out in triplicate on at least three biological replicates. The quantity of each transcript was normalized to the quantity of the housekeeping genes Glyceraldehyde 3 phosphate dehydrogenase (GAPDH) and Actin Beta (ACTB). The P-values were measured comparing the control with the transfected cells using Student’s t-test (bars = SEM).

Luciferase assay in HeLa cells

For GCM1 and GCM2 (Figure 9), oligonucleotides surrounding the GBSs were designed with flanking restriction sites for KpnI at the 5′ extremity and for NheI at the 3′ extremity (in Table 4, the GBS and restriction sites are indicated by lowercase letters). Each pair of oligonucleotides was used to amplify the genomic region encompassing the GBSs on HeLa genomic DNA using Expand High Fidelity System DNA polymerase (Roche). The amplicons were digested with 20 units of KpnI (NEB #R3142S) and 20 units of NheI (NEB #R3131S) in CutSmart buffer (NEB #B7204S) for 90 min at 37°. The digested amplicons then were cleaned using a PCR Clean-Up Kit (MN #740609) according to the manufacturer’s instructions. Then 1 μg of luciferase reporter plasmid pGL4.23[luc2/minP] (pGL4.23) (Promega #E841A) was digested with KpnI and NheI as described previously. After 90 min at 37°, 20 units of CIP (Promega #M0290S) was added to the plasmid and incubated for 1 hr at 37°. The plasmid then was cleaned using a PCR Clean-Up Kit (MN #740609) according to the manufacturer’s instructions, and 50 ng of digested luciferase plasmid was combined with the digested amplicons (ratio of plasmid/probe = 1:6), 400 units of ligase (NEB #M0202S), and ligation buffer (NEB #B0202S) and incubated overnight at 18°. The ligated plasmids then were dialyzed for 30 min on membrane filters (Millipore #VSWP02500) and amplified using the Plasmid DNA Purification Kit (MN #740410) according to the manufacturer’s instructions. These plasmids were used for the luciferase assay (Figure 9) and as templates for mutagenesis. To mutagenize the reporters, primers overlapping the GBSs were designed with mutations for nucleotides at position 2, 3, 6, and/or 7 in the GBSs (in Table 4, the GBS and restriction sites are indicated by lowercase letters).
Figure 9

Conservation of Gcm targets in mammals. (A and B) Schematic representation of GCM1 (A) and GCM2 (B) loci in humans. The genes are represented as in Figure 2A. GBSs are indicated as bars. The color coding refers to their similarity with the canonical GBSs used in mammals in Yu from 62.5% (blue) to 100% (red). The genomic coordinates of the loci (genome version GRCh37/hg19) are indicated above the GBSs. Red rectangles indicate the regions used for the luciferase reporter assays, and the mutated sites are indicated by red asterisks. (C and D) Characterization of mGcm1 and mGcm2 effects on GCM1 (C) and GCM2 expression (D). Histograms represent the endogenous expression levels of transcripts of each gene in HeLa cells on transfection with mGcm1 or mGcm2; values are relative to those of housekeeping genes GAPDH and ACTB. (E) Histogram representing the luciferase activity in HeLa cells transfected with mGcm1 or mGcm2 and luciferase reporters containing the region of GCM1 with WT or mutant GBSs. Levels of luciferase activity are relative to those observed in the absence of mGcm transfection. (F) Histogram representing the luciferase activity in HeLa cells transfected with mGcm2 and luciferase reporters containing the region of GCM2 with WT or mutant GBSs as in E. (G–M) Characterization of mGcm1 and mGcm2 effects on TBX1 (G), GATA3 (H), GATA4 (I), GATA6 (J), FGFR1 (K), FGFR2 (L), and DLL1 expression (M), as in C and D. (N) Histogram representing the S2 cell endogenous levels of Doc1, pnr, and btl on transfection with a Gcm expression vector, as indicated in Figure 3D.

Table 4

Oligonucleotides used to generate the pGL4.23 vectors used in HeLa cells

Probe nameProbe sequence
GCM1 FGAGAggtaccCAGAGCCTGCTGGGACTTGA
GCM1 RTCTCgctagcTAGCTGGGATTACAGGCACG
GCM1 GBS1mut forwardGAGAggtaccCAGAGCCTGCTGGGACTTGAaaaagacgTAAGATTTTCACGACACAGTGCTGT
GCM1 GBS2mut reverseGAACAGTAACGATATTGTCTcgtattttGCAAATTTTGTTATAACTAATTGGA
GCM1 GBS2mut forwardTTAGTTATAACAAAATTTGCaaaatacgAGACAATATCGTTACTGTTCAGGGT
GCM1 GBS3mut reverseTCTCgctagcTAGCTGGGATTACAGGCACGccacaaaaCACCCAGCTAATTTTTGTATTTTCA
GCM2 FGAGAggtaccCAGGTAAGTGAACCGGGTGT
GCM2 RTCTCgctagcGGTAGAGACGGGGTTTCTCC
GCM2 GBS1mut forwardGAGAggtaccCAGGTAAGTGAACCGGGTGTggtcgttcACGCGGGGCGCTGTCCATCCGAAGG
GCM2 GBS2mut reverseGCCTCAGAAACCCAGAAATTtgtcgtttATGTGTGTGTGTGTGTGTGTGTGTG
GCM2 GBS2mut forwardACACACACACACACACACATaaacgacaAATTTCTGGGTTTCTGAGGCCCTCT
GCM2 GBS3mut reverseTCTCgctagcGGTAGAGACGGGGTTTCTCCagttttttCAGGCTGGTCTTGAACTCCCGACCT
For each gene, PCR was performed using 5 ng of pGL4.23 containing the WT locus with Expand High Fidelity System DNA polymerase (Roche). A first round of PCRs was carried out to generate the amplicon containing the first and second mutated GBSs and the amplicon containing the second and third mutated GBSs with the following primer pairs: GBS1mut forward/GBS2mut reverse and GBS2mut forward/GBS3mut reverse. The two amplicons then were combined using the primers GBS1mut forward and GBS3mut reverse and inserted into pGL4.23. HeLa cells were plated in 24-well plates, 60,000 cells per well, in 350 µl of DMEM complemented with 5% FCS and gentamycin. Cells were transfected 12 hr after plating using Effectene Transfection Reagent (Qiagen). Briefly, 2.5 ng of pGL4.75 vector, 250 ng of pCIG vector expressing either mouse Gcm1 (pCIG-mGcm1) (Soustelle ) or mouse Gcm2 (pCIG-mGcm2) or empty, and 250 ng of pGL4.23 vector containing the GBS WT or mutant were mixed with 60 µl of EC buffer and 4 µl of enhancer and incubated 5 min at room temperature; then 5 µl of Effectene was added, and the mix was incubated at room temperature for 20 min. Then 100 µl of DMEM + 5% FCS + gentamycin was added to the mix before spreading it on the cells, and 48 hr after transfection, the luciferase assay was performed using the Dual-Luciferase Reporter Assay System (Promega) according to the manufacturer’s instructions with a Berthold Microluminat LB96P Luminometer.

Data availability

All data are joined with the publication in Supporting Information.

Results

The DamID screen identifies loci containing GBSs

To identify the genes directly regulated by Gcm, we mapped its binding sites using a genome-wide DamID screen (van Steensel and Henikoff 2000; van Steensel ). Briefly, the Escherichia coli DNA adenine methyltransferase (Dam) was fused N-terminal to the full-length Gcm coding sequences. Thus, wherever Gcm binds, the Dam methylates the surrounding DNA. The methylated DNA then can be identified by microarray. In our case, the Dam-Gcm screen was performed on Drosophila embryos at stage 11, when Gcm expression peaks. Because Gcm is expressed in several cell types: glia, hemocytes, and tendon cells (Soustelle ), as well as neuronal (Chotard ; Soustelle and Giangrande 2007) and peritracheal cell subsets (Laneve ), we decided to search for all its direct targets and did not restrict expression of the Dam-Gcm fusion to a specific cell type. Overall, 4863 DamID peaks were identified. Motif enrichment analysis using the MICRA tool (Southall and Brand 2009) revealed enrichment for the motif ATGCGGG at the loci bound by the Dam-Gcm fusion (Figure 1A). This motif is closely related to most of the GBSs previously described and validated functionally (Figure 1B) (Akiyama ; Schreiber ; Miller ; Ragone ). Up to 83% of the loci identified in the screen contain canonical a GBS(s) within 1 kb of the peak (Figure 1C), and the average density of the GBS(s) present at the DamID peaks (0.693 GBS/kb) is significantly higher than the average GBS density over the whole genome (0.138 GBS/kb) (P = 1.496 × 10−148; Wilcoxon test = 0) (Figure 1D). Finally, because numerous GBSs are present throughout the genome but may not all be relevant to the Gcm cascade, we asked whether those that are under a DamID peak are more likely to be directly associated with Gcm. Indeed, the GBSs present under DamID peaks are significantly more conserved than the GBSs in the whole genome (12 Drosophila species, mosquito, honeybee, and red flour beetle were used for the comparative analysis), thus adding strength to the DamID data (P = 0.00273) (Figure 1E).

The DamID screen identifies genes previously characterized as Gcm interactors

The 4863 DamID peaks are located in the vicinity of (<5 kb) or within 1031 genes (Figure 1F shows the overall peak locations). To assess the specificity of the DamID screen, known targets of Gcm were examined more closely. For instance, itself contains several GBSs upstream of its transcription start site (TSS) and is known to autoregulate, and the strongest GBS was determined previously to be GBS 3/C, which is located 3 kb upstream of the TSS (Miller ; Ragone ) (Figure 2A). A DamID peak was detected on top of this GBS (Figure 2A). Other examples include , , , and , which were all identified as interactors in a genetic screen (Popkova ) (Figure S1, A–D). All of them contain at least one significant DamID peak in their promoter regions. Moreover, the gene involved in late glial cell differentiation is controlled directly by Gcm (Granderath ) and has five canonical GBSs, three of which are located within a DamID peak (Figure S1E). Importantly, the three GBSs located under the peak are critical for the expression pattern of in glial cell (Granderath ). The gene , involved in glial development (Chen ; Klambt 1993) and the immune response (Zettervall ), was described as downstream of (Giesen ) and contains one canonical GBS within one DamID peak (Figure S1F). Two other genes were extensively described as targets of Gcm during glial cell development: and . Ttk is a transcriptional repressor inhibiting the neuronal fate in neural stem cells (Giesen ). While containing two canonical GBSs within a DamID peak, was not identified as a direct target of Gcm by our screen because the peak is located far (9.2 kb) from the TSS (Figure S1G). The Repo homeodomain transcription factor is required for the late differentiation of lateral glial cells (Campbell ; Xiong ; Halter ) and is directly activated by Gcm through the 11 GBSs present in the 4.3-kb region upstream of the TSS (Lee and Jones 2005). However, was not selected in our screen because the observed DamID peak did not pass the enrichment threshold to be considered significant by the algorithm (Figure S1H). This indicates that the criteria for the identification of the Gcm direct targets are extremely stringent. Three teams had previously performed genome-wide screens for the Gcm downstream targets (Egger ; Freeman ; Altenhein ). Egger compared WT stage 11 embryos to those expressing Gcm ectopically in the neuroectoderm (GOF embryos) and identified 356 genes significantly enriched in GOF compared to WT animals (P < 0.001). Freeman combined computational prediction, expression profiling analyses in WT and GOF, and IHS in WT, GOF, and LOF animals to identify and validate 48 genes as downstream targets of Gcm. And finally, Altenhein tracked Gcm downstream targets in stage 9–16 embryos in WT, LOF, and GOF animals and validated 119 genes by IHS. Together these studies identified 471 downstream targets of Gcm, but the overlap between the three data sets is quite weak, with only 42 genes identified by two of the studies and 5 genes identified by all three studies (Figure 2B). Cross-referencing the three data sets with the DamID peaks allowed us to considerably restrict the number of targets identified by the expression profiling analyses and revealed that 47 genes identified as downstream targets of Gcm in at least one of these studies are direct targets of Gcm according to DamID (Figure 2, C and D). Of note, in the first part of their study, Freeman developed an algorithm to predict 384 direct targets of Gcm based on the presence of a cluster of eight GBSs in the surrounding regions. Among the predicted targets, only 8.3% (17 of 204 tested) were confirmed in Gcm GOF or Gcm LOF embryos (Freeman ). Cross-referencing the DamID data set with these bioinformatics data returned 47 genes of 384 (12%) predicted to be direct targets by the Freeman algorithm and confirmed as direct targets by our screen. Among these 47 genes, only 7 were previously validated in vivo (Table S1). Together with the observed evolutionary conservation of the GBSs under DamID peaks, this underlines the importance of scoring for occupied binding sites. Finally, expression profiling and DamID analyses provide complementary information because the first approach tells the direct and indirect transcriptional consequences of a mutation, whereas the second tells where the transcription factor binds in the genome. We thus verified that the direct targets identified in our screen are differentially represented in the expression profiling data. Because expression of the direct targets is induced just after Gcm starts being expressed (stage 10), we expect them to be enriched in the expression profiling data relative to the early stages. We thus analyzed the data of Altenhein and found that for most of the downregulated genes identified in the LOF expression profiling (119 in these data sets), expression starts being affected over a large time window, between stages 11 and 13 (Figure 2E). In contrast, when we performed the same analysis on the gene subset that also was detected in the DamID screen (13 genes), we found that most of these targets start to be downregulated at earlier stages (Figure 2F). Together these findings indicate that the DamID screen is an efficient method to identify the direct targets of Gcm.

The DamID screen identifies new direct targets of Gcm

Among the 1031 genes identified by the DamID screen, more than 900 are new. The interaction between Gcm and four new target genes was validated by luciferase assays in Drosophila S2 cells. These genes were selected to be representative of the different locations of the DamID peaks. They include genes showing a DamID peak in the promoter and carrying canonical GBS-like ; genes for which the closest GBSs are near the DamID peak but do not overlap with it, such as ; and genes for which the DamID peak and the GBSs are located within the transcribed region of the gene, such as , Enhancer of split m8 (), and . For each gene, the regions containing the GBS under the DamID peak or closest GBS to the DamID peak were cloned in a luciferase reporter plasmid. For DamID peaks covering two GBSs ( and ), one reporter was built per GBS. The constructs then were transfected in S2 cells with or without the Gcm expression vector ppacGcm. In parallel, the same regions were mutated for their GBSs and analyzed similarly (Figure 3, A–D). The gene contains a significant DamID peak in its promoter region and two GBSs at the position of the peak (Figure 3A). The luciferase assays indicate that both GBSs induce transcription of the reporter on cotransfection with Gcm, and no induction is observed when the GBSs are mutated. Similar observations were made on , , and , even though the DamID peaks are located within the coding sequences of these genes (Figure 3, B—D). To confirm the data obtained with the reporter plasmids, we analyzed the effects of Gcm on the endogenous genes and measured the levels of their transcripts in S2 cells by qPCR. S2 cells were transfected with ppacGcm because Gcm is expressed at extremely low levels in those cells (Cherbas ). We were able to show significant induction of and expression, but no induction was observed for or (Figure 3E). Such a negative result might be due to the facts that S2 cells do not contain the appropriate cofactors, the genes are in a repressed chromatin state, and/or S2 cells have low transfection efficiency. Indeed, FACS analysis of S2 cells transfected with the Gal4 expression vector ppacGal4 and UAS-GFP vectors revealed that only 4.92% of the cells express GFP. This means that only a minority of the cell population contains the two plasmids (Figure 3, G and H). To improve the readout of the assay, S2 cells were transfected with ppacGcm and the reporter plasmid repoGFP, which was used previously to trace Gcm activity (Lee and Jones 2005; Laneve ; Flici ). This allowed us to sort the cells that express the GFP produced under the promoter. Based on the FACS analysis, our transfection protocol allows the detection of GFP in 2.11% of S2 cells (Figure 3I). These GFP+ cells were enriched to reach at least 80% purity, and the preceding target genes were analyzed by qPCR. First, we monitored the levels of endogenous transcripts to assess the efficiency of our protocol. Without FACS sorting, no change in levels was observable on Gcm expression, whereas we could see a 30-fold increase on adding the FACS sorting step (Figure 3F). This step also allowed us to detect the induction of and expression (Figure 3F) and greatly improved the detection of and transcript induction. Finally, we tested DamID target genes in vivo. Because is strongly required in the epidermis at the level of muscle attachment sites, where Gcm is also expressed and required, we drove epithelial Gcm expression using the engrailedGal4 (enGal4) driver (Figure 4I), which induces expression of tendon cell markers (Soustelle ). Because several other members of the Hh pathway also were identified in the DamID screen, including , , , and , we analyzed these genes as well. First, we performed qPCR assays and found increased expression for some of them ( and ) (Figure 4J). To complement this approach, we performed qualitative analyses by immunolabeling on Gcm-overexpressing embryos and found significantly increased expression for Ptc, Ci, and Smo (Figure 4, A–H′). In agreement with previously obtained data (Soustelle ), the expression of Repo did not increase, likely owing to the lack of cell-specific factors, which are known to affect Gcm activity strongly (De Iaco ). Conversely, the expression of Repo increases on overexpression of Gcm in the neurogenic region, whereas that of Ci, Ptc, and Smo does not (data not shown). In sum, the in vivo findings validate those in S2 cell transfection assays, which thus provide a simple and sensitive approach. Gcm regulates the Hedgehog signaling pathway in the embryonic epidermis. (A–H) Immunolabeling of Ci (in gray), Ptc (in red), and Smo (in green) in stage 15 embryos of the following genotypes: enGal4/+ (control) and enGal4/+; UASgcm/+ (gcm GOF). The areas delimited in white in A, C, E, and G are magnified in B–B″, D–D″, F and F′, and H and H′, respectively. Full projections of the embryos are shown in A, C, E, and G, and projections of four optical sections taken at 2-µm interval are shown in the magnified regions of B, D, F, and H. DAPI-labeled nuclei are in blue. Note that the three proteins involved in the Hedgehog signaling pathway are expressed at higher levels in the gcm GOF than in the control embryo. Bar, 50 µm. (I and J) Histograms showing the expression of gcm (I), ci, Pka-C1, and rdx (J) in enGal4/+ (control, red) and enGal4/+; UASgcm/+ (blue) embryos at stage 13–14. The y-axis represents the relative expression levels compared to that observed in the control embryos (red columns). Error bars and the P-values are calculated as in Figure 3D. Moreover, the targets identified by the DamID screen are not necessarily expressed in embryos at stage 11, the stage at which the screen was performed. Indeed, comparison of DamID and modENCODE transcriptome data on stage 10–11 embryos reveals that 8.7% of the genes identified in our screen are not expressed at this stage (Figure 3J). For example, the () gene is not detected in embryos on ISH (Tomancak , 2007; Hammonds ) and starts to be expressed at stage 17 according to modENCODE data (Graveley ). Nevertheless, the locus contains two DamID peaks, and Gcm induces Dh31 expression in S2 cells (Figure 5A). We therefore analyzed later developmental stages and found colocalization of Dh31 and Gcm in a single cell of the larval brain hemisphere (Figure 5, B–C″). At that stage, Gcm is expressed in the de novo–produced glial cells of the optic lobe, in the lamina, and in medulla neurons (Chotard ), as well as in two groups of neurons of the central brain, the so-called dorsolateral and medial clusters (Soustelle ). The double Gcm/Dh31+ cell belongs to the dorsolateral cluster, and colocalization is affected in mutant animals. The gcmGal4, UASGFP line allowed us to trace expression in WT and hypomorphic conditions obtained by using gcmGal4 homozygous or heterozygous animals carrying a UASgcmRNAi construct (Figure 5, D–F). These data strongly suggest that the Dam construct is present and can bind sites in cells in which Gcm and/or its targets are not yet expressed. In sum, we have shown that S2 cells can be used to validate direct targets of Gcm on cell sorting, that the identified targets are actually induced by Gcm in vivo, and that the Dam-Gcm fusion likely identifies most Gcm direct targets in the fly genome. Gcm overlaps with and regulates Dh31 expression in the larval CNS. (A) Histogram representing the endogenous expression levels of Dh31 in S2 cells with and without transfected Gcm. Error bars and P-values are calculated as in Figure 3, A–D. (B–F″) Immunolabeling of Dh31 and GFP in larval CNS (Dh31 in red; GFP in green). White arrowheads indicate cells coexpressing Dh31 and Gcm (GFP+ cells); asterisks indicate cells expressing only Dh31. The gcm-expressing cells correspond to the dorsolateral neuronal cluster. Areas delimited in white in B and D are magnified in C–C″ and E–E″, respectively. (B) Full projection of a larval CNS of a heterozygous gcmGal4, UASmCD8GFP at the third instar. (C–C″) Projection of three optical sections taken at 2-µm interval: (C) overlay of Dh31 and GFP, (C′) Dh31 alone, and (C″) GFP alone. (D–E″) Same as B–C″ in a gcmGal4, UASmCD8GFP homozygous larva. (F–F″) Same as C–C″ in a gcm knockdown (KD) larva of the following genotype: gcmGal4, repoGal80, UASmCD8GFP/+; UASgcmRNAi/+. In all genotypes, the three sections contain the whole dorsolateral cluster (white oval). Note that in the gcmGal4 homozygous and in the gcm KD (arrowheads) larvae, the intensity of Dh31 labeling is reduced in the GFP+ cell compared to that observed in the control gcmGal4 heterozygous animals. Also take for comparison the surrounding Dh31+ cells (asterisk). (B and D) DAPI staining shows the nuclear labeling is in blue. Bar, 50 µm.

Protein domain enrichment analysis

To annotate the genes identified in the DamID screen, we first performed a protein domain enrichment analysis using DAVID bioinformatics resources (Huang ,b). The analysis showed enrichment for genes coding for proteins containing basic Helix-Loop-Helix (bHLH) domains (17 genes) and Homeobox domains (26 genes), which are characteristic of transcription factors, but the most enriched family is the Ig domain–containing protein (DAVID protein domain enrichment: 7.79-fold; P = 1.3 × 10−14; FDR = 2.0 × 10−11) (Figure 6A). Most of the Ig genes are involved in or at least expressed during nervous system development (Figure 6B) and code for guidance molecules [reviewed in Patel and Van Vactor (2002)]. This includes two fibroblast growth factor receptors (Htl and Btl), two netrin receptors (Unc-5 and Fra), six members of the Beaten path family (Beat), three members of the Down syndrome cell adhesion molecule family (Dscam), two fasciclins (Fas), three roundabout proteins (Robo), and several others, including seven members of the defective proboscis extension response family (Dpr) and two Dpr interactors (Ozkan ) (Figure 6B and Table 1). The chemoreceptor family of genes dpr has been poorly characterized so far, but these genes are also expressed in glial cells (DeSalvo ), suggesting that Gcm may regulate these genes during gliogenesis.
Figure 6

Gcm targets encode Ig domain proteins. (A) Histogram summarizing the protein domain enrichment analysis. The x-axis indicates the enrichment score; the grade of gray is representative of the P-value: lightest gray, P = 10−5; black, P = 10−14. The y-axis indicates the name of the protein domain; n indicates the number of genes in the DamID screen containing that domain. (B) List of genes from the DamID screen containing immunoglobulin (Ig) domains. The genes indicated in green are known to be involved in nervous system development; the genes in blue are expressed in the nervous system. (C) Histogram showing the endogenous expression of Ig domain–containing genes on S2 cell transfection with a Gcm expression vector and FACS sorting. The y-axis represents the relative expression levels in cells transfected with Gcm compared to cells without Gcm. The y-axis is in log10 scale; error bars and p-values are calculated as in Figure 3D.

Gcm targets encode Ig domain proteins. (A) Histogram summarizing the protein domain enrichment analysis. The x-axis indicates the enrichment score; the grade of gray is representative of the P-value: lightest gray, P = 10−5; black, P = 10−14. The y-axis indicates the name of the protein domain; n indicates the number of genes in the DamID screen containing that domain. (B) List of genes from the DamID screen containing immunoglobulin (Ig) domains. The genes indicated in green are known to be involved in nervous system development; the genes in blue are expressed in the nervous system. (C) Histogram showing the endogenous expression of Ig domain–containing genes on S2 cell transfection with a Gcm expression vector and FACS sorting. The y-axis represents the relative expression levels in cells transfected with Gcm compared to cells without Gcm. The y-axis is in log10 scale; error bars and p-values are calculated as in Figure 3D. To confirm induction of this class of genes by Gcm, the levels of expression of 12 of them were assayed in S2 cells in basal conditions or on transfection with a Gcm expression vector (Figure 6C). The expression of 8 genes of 12 is significantly induced by Gcm in S2 cells, the strongest induction being observed for (>100-fold increase) (Figure 6C).

GO term enrichment analysis

Gcm direct targets are involved in nervous system development:

Following protein domain enrichment analysis, we carried out a GO term enrichment analysis for biological function using DAVID. The analysis retrieved 230 genes involved in nervous system development (8.8-fold enrichment; P = 2.4 ×10−12; FDR = 3.9 × 10−9). This subset of genes was then further analyzed using an enrichment analysis for molecular function. As expected from a cell fate determinant, the major class of genes regulated by Gcm and involved in nervous system development is transcription factors (67 genes; 6.7-fold enrichment; P = 1.0 × 10−32; FDR = 1.3 × 10−29) (list in Table S1, column J). More specifically, we found genes involved in (1) neural stem cell (also called neuroblast) regulation, (2) embryonic glial cell development, and (3) larval optic lobe development (Figure 7A and Table 1).
Figure 7

Gcm targets involved in nervous system development and function. (A) List of DamID genes involved in neural development and function. The genes in red were previously characterized as downstream targets of Gcm, and those underlined were confirmed by qPCR in FACS-sorted S2 cells in this study. (B) Schematic representation of E(spl)-C. Top panel represents the DamID peak histogram on the 0.8 Mb around E(spl)-C. Note that the peaks are all localized within E(spl)-C. Bottom panel shows the 50-kb window of E(spl)-C. (C) Histogram represents the S2 cell endogenous levels of E(spl)-C transcripts on transfection of a Gcm expression vector, as indicated in Figure 3, A–D.

Gcm targets involved in nervous system development and function. (A) List of DamID genes involved in neural development and function. The genes in red were previously characterized as downstream targets of Gcm, and those underlined were confirmed by qPCR in FACS-sorted S2 cells in this study. (B) Schematic representation of E(spl)-C. Top panel represents the DamID peak histogram on the 0.8 Mb around E(spl)-C. Note that the peaks are all localized within E(spl)-C. Bottom panel shows the 50-kb window of E(spl)-C. (C) Histogram represents the S2 cell endogenous levels of E(spl)-C transcripts on transfection of a Gcm expression vector, as indicated in Figure 3, A–D. Up to 34 genes regulate neural stem cells, including genes that likely allow the transition from stem cell to glial identity. Interestingly, Pros also was identified as a positive regulator of Gcm (Freeman and Doe 2001; Ragone ; Choksi ), and both Brat and Lola interact with Gcm genetically (Popkova ). These data suggest the presence of feedback loops in the establishment of glial fate. Numerous targets are directly linked to glial cell development, as expected given the gliogenic role of Gcm. For example, Hkb controls glial subtype specification by reinforcing Gcm autoregulation in a specific glial lineage and interacts with Gcm genetically as well as biochemically (De Iaco ) (Popkova ). In addition, several genes are required in longitudinal glia precursor division (Figure 7A and Table 1). As to genes involved in optic lobe development, this is in line with the finding that Gcm is necessary for both neuronal and glial cell development within the larval optic lobe (Chotard ; Yoshida ; Soustelle ). Of note, one of the targets, Tll, has been shown recently to be necessary for specification of lamina neuronal precursors, showing a mutant phenotype similar to that induced by the lack of Gcm (Guillermin ). Among the targets, we identified several members of signaling pathways that control neural development at different steps: Hh, Egfr/Ras, and Fat/Hippo pathways (, , , , , , , , and ) and, finally, the N pathway (, , , , , and eight genes of the E(spl) complex, or E(spl)-C). Because of their peculiar organization, we further validated the genes of E(spl)-C, which spans over 50 kb and is located on the right arm of chromosome 3 (Figure 7, B and C). Its members are all induced in FACS-sorted S2 cells transfected with Gcm, whereas the gene directly adjacent to the complex, , is devoid of a DamID peak and is not induced, indicating that Gcm activity is specific to the complex (Figure 7C). and also were validated (Figure 8B).
Figure 8

Gcm targets involved in immune system development and function. (A) List of the DamID genes. Green area covers those involved in immune system function; red area covers those involved in immune system development. The genes in red were previously characterized as downstream targets of Gcm, and those underlined were confirmed by qPCR in FACS-sorted S2 cells in this study. (B) Histogram representing the S2 cell endogenous levels of genes involved in immune system development on transfection with a Gcm expression vector, as indicated in Figure 3, A–D.

Gcm targets involved in immune system development and function. (A) List of the DamID genes. Green area covers those involved in immune system function; red area covers those involved in immune system development. The genes in red were previously characterized as downstream targets of Gcm, and those underlined were confirmed by qPCR in FACS-sorted S2 cells in this study. (B) Histogram representing the S2 cell endogenous levels of genes involved in immune system development on transfection with a Gcm expression vector, as indicated in Figure 3, A–D. Interestingly, several genes targeted by Gcm are involved in the function of fully differentiated glia. Our screen identified several genes acting in the blood-brain barrier (BBB) (Figure 7A and Table 1). Overlap between our screen and transcriptome data of the BBB reveals that Gcm targets 61 genes enriched in the BBB compared to all glial cells (Table S1, column L) (DeSalvo ; Limmer ). Gcm also targets several genes controlling axon ensheathment and glial cell migration (Figure 7A and Table 1). In sum, the screen reveals the molecular role and mode of action of Gcm in neural development and function.

Gcm direct targets are involved in immune system development:

In addition to its role in the nervous system, Gcm is also required for differentiation and proliferation of embryonic plasmatocytes. Embryos mutant for and its paralog present a decreased number of plasmatocytes, and the plasmatocytes do not complete the differentiation process (Bernardoni ; Alfonso and Jones 2002) [reviewed in Kammerer and Giangrande (2001), Evans and Banerjee (2003), and Waltzer ]. Plasmatocytes are macrophages that can differentiate into another type of hemocyte called a lamellocyte on immune challenge (Rizki 1957; Stofanko ), a process that Gcm helps to repress (Jacques ). The DamID screen identified 68 genes known to regulate the immune system (Figure 8A, Table 2, and Table S1, column M), and similar to the nervous system, transcription factors are quite prominent (14 transcription factors) (Table S1, column J). In addition to itself and (Bernardoni ; Kammerer and Giangrande 2001; Alfonso and Jones 2002), Gcm targets genes that promote the differentiation of prohemocytes into plasmatocytes (Figure 8A and Table 2). Several targets inhibit the JAK/STAT pathway, whose activation leads to lamellocyte differentiation (Figure 8A and Table 2). and inhibit the formation of lamellocytes and so-called melanotic tumors, which are masses of aggregated hemocytes enriched in lamellocytes (Kussel and Frasch 1995; Sorrentino ). In addition, Gcm targets genes characterized by their role in hemocyte proliferation or migration (Figure 8A and Table 2). Similar to the nervous system, Gcm also targets genes that are involved in function of the mature immune system. These include genes involved in (1) coagulation, (2) phagocytosis, (3) autophagy, (4) the inflammatory cascade mediated by Wnt, and (5) melanotic encapsulation of foreign targets (Figure 8A and Table 2). Finally, DamID targets also include genes that tune the immune response based on the nature of the pathogen. Thus, based on the screen, Gcm induces the expression of genes involved in the response to fungi, in the response to gram-negative bacteria, and more broadly, in the antimicrobial humoral response (Figure 8A and Table 2). In addition, we assayed 11 genes involved in hemocyte biology using S2 cells transfected by a Gcm expression vector and confirmed the induction of 10 of them (Figure 8B).

Gcm direct targets are involved in tendon cell and peritracheal cell development:

Tendons cells link muscles to the backbone of the organism, and Gcm expression in these cells is required for proper muscle attachment (Soustelle ). Several Gcm targets are involved in the maturation of tendon cells, such as Hh signaling pathway proteins Wg and Egfr. The Hh signaling pathway and Wg are necessary in the early development of tendon cells (Hatini and DiNardo 2001), whereas the Egfr pathway promotes terminal differentiation after the junction between the migrating muscle and the tendon has been established (Yarnitzky ). Other targets expressed in tendon cells are directly involved in muscle migration toward the tendon. Sli and Sdc are guidance cues that attract the muscle (Kramer ; Steigemann ). Leucine-rich tendon-specific protein (Lrt) interacts with Robo expressed in the muscle to arrest muscle migratory behavior (Wayburn and Volk 2009). Dnt and Hbs control muscle attachment site selection (Dworak ; Lahaye ). A third class of targets controls the later step of building the junction between muscles and tendons. Mew, If, Dys, and Wech are core components of the junction (Prokop ; Loer ) [reviewed in Charvet ]; Short stop (Shot) is an actin-tubulin cross-linker involved in junction stabilization (Bottenberg ), and Sema-5C is a transmembrane protein so far poorly characterized in tendon cells (Bahri ). Finally, Gcm is expressed in peritracheal cells (Laneve ), endocrine cells located along the trachea that secrete ecdysis-triggering hormone (O’Brien and Taghert 1998). Only two markers of peritracheal cells have been characterized so far: the bHLH transcription factor Dimm (Hewes ) and the neuropeptide biosynthetic enzyme peptidylglycine-α-hydroxylating monooxygenase (Phm) (O’Brien and Taghert 1998). Gcm acts upstream of (Laneve ), and is known to control the expression of (Park ), but neither nor is present in the DamID screen, suggesting that Gcm does not directly induce these genes in peritracheal cells. While little is known about peritracheal cells, and were further characterized in neuroendocrine cells: 212 downstream targets of Dimm were recently identified by combining CHiP and transcriptome analyses (Hadzic ). Comparison of this data set with the DamID screen revealed that 27 targets are common to Dimm and Gcm (Table S1), suggesting a potential contribution of Gcm to the Dimm regulatory pathway. In addition, based on our screen, Gcm targets a gene involved in the endocrine function of Dimm+ cells: syt-β is controlled by Dimm and is involved in calcium-dependent exocytosis (Adolfsen ; Park ). To conclude, Gcm is necessary for the development of peritracheal cells expressing Dimm, and our screen identified genes involved in the Dimm pathway. This sets the stage for future analysis of these genes in peritracheal cells. Overall, our screen reveals the full extent of Gcm function in the diverse cell types in which this transcription factor is expressed.

Conservation of the Gcm molecular cascade in mammals

The mammalian genome contains two orthologs of Drosophila gcm genes, which are named in humans GCM1 and GCM2. GCM1 is required for differentiation of trophoblasts in the developing placenta (Basyuk , 2009); GCM2 is expressed and required mainly in the parathyroid glands (Kim ; Gordon ). Because few targets have been identified for GCM1 and GCM2, we sought for a conserved Gcm regulatory cascade on retrieving the mammalian orthologs of the Drosophila targets identified in the DamID screen (Table S1). To start a comparative analysis, we chose genes that have a functional significance in mammals. A GO term enrichment analysis identified orthologs that are associated with parathyroid gland or placenta development, which allowed us to restrict the list to 29 genes potentially targeted by GCM genes in mammals (Table S2). We further analyzed the impact of murine Gcm genes on GCM1 and GCM2; T-box transcription factor (TBX1); GATA3, GATA4, and GATA6; FGFR1 and FGFR2; and Delta-like 1 (DLL1) expression.

GCM genes regulate their own expression

GCM1 and GCM2 contain multiple GBSs in their promoters (Figure 9, A and B). This suggests the existence of a positive-feedback loop that has not been documented in mammals but that is very well characterized for Drosophila gcm genes (Miller ; Ragone ). To validate the autoregulation of GCM1 and GCM2 in mammals, we monitored their levels of expression in HeLa cells transfected with expression vectors for mGcm1 and mGcm2. The use of mouse orthologs allowed us to design qPCR primers specific for the human transcripts and to specifically quantify their levels of expression. This set of experiments shows that in HeLa cells, GCM1 expression is induced by both mGcm1 and mGcm2 (Figure 9C), and GCM2 is induced only by mGcm2 (Figure 9D). Conservation of Gcm targets in mammals. (A and B) Schematic representation of GCM1 (A) and GCM2 (B) loci in humans. The genes are represented as in Figure 2A. GBSs are indicated as bars. The color coding refers to their similarity with the canonical GBSs used in mammals in Yu from 62.5% (blue) to 100% (red). The genomic coordinates of the loci (genome version GRCh37/hg19) are indicated above the GBSs. Red rectangles indicate the regions used for the luciferase reporter assays, and the mutated sites are indicated by red asterisks. (C and D) Characterization of mGcm1 and mGcm2 effects on GCM1 (C) and GCM2 expression (D). Histograms represent the endogenous expression levels of transcripts of each gene in HeLa cells on transfection with mGcm1 or mGcm2; values are relative to those of housekeeping genes GAPDH and ACTB. (E) Histogram representing the luciferase activity in HeLa cells transfected with mGcm1 or mGcm2 and luciferase reporters containing the region of GCM1 with WT or mutant GBSs. Levels of luciferase activity are relative to those observed in the absence of mGcm transfection. (F) Histogram representing the luciferase activity in HeLa cells transfected with mGcm2 and luciferase reporters containing the region of GCM2 with WT or mutant GBSs as in E. (G–M) Characterization of mGcm1 and mGcm2 effects on TBX1 (G), GATA3 (H), GATA4 (I), GATA6 (J), FGFR1 (K), FGFR2 (L), and DLL1 expression (M), as in C and D. (N) Histogram representing the S2 cell endogenous levels of Doc1, pnr, and btl on transfection with a Gcm expression vector, as indicated in Figure 3D. To demonstrate that the induction of GCM1 and GCM2 expression was carried out via the GBSs, we arbitrarily selected promoter fragments containing three GBSs with at least 75% similarity with the canonical GBS ATG(A/C)GGG(T/C) (Yu ). These regions were cloned in luciferase reporters (Figure 9, A and B, red). For each region, we also built the reporter with the mutated GBSs. The reporters were then transfected in HeLa cells with or without mGcm1 or mGcm2. For GCM1, we could observe an induction by mGcm1 and not by mGcm2, suggesting that the region inserted in the reporter allows for Gcm1-mediated induction. However, the three GBSs are not sufficient for induction mediated by mGcm1 because their mutagenesis leaves it unaffected (Figure 9E). Further scrutiny of the region revealed the presence of two other GBSs with <75% similarity with the canonical GBS but still containing nucleotides 2, 3, 6, and 7, which were determined to be indispensable for Gcm binding (Schreiber ). For GCM2, we showed that this gene is able to induce expression of the luciferase reporter carrying the GCM2 promoter and that this promoter is inactive when the canonical GBSs are mutated (Figure 9F). This demonstrates that GCM2 is able to regulate its own expression via the GBSs inserted in the luciferase reporter. To conclude, mGcm1 and mGcm2 induce GCM1 expression via a region that has to be defined, and mGcm2 activates GCM2 transcription via a region covering the first exon-intron junction. These experiments demonstrate that positive autoregulation is conserved between Drosophila and mammalian genes.

Other DamID targets are conserved in mammals

TBX1 was shown previously to be coexpressed with GCM2 during formation of the parathyroid glands (Manley ; Reeh ) and contains 46 GBSs in its promoter. To assess whether the GCM transcription factors are able to induce TBX1 expression in mammals, we analyzed the levels of TBX1 transcripts in HeLa cells transfected with expression vectors coding for mouse orthologs mGcm1 or mGcm2. This assay indicated that expression of TBX1 is specifically induced by mGcm2 (Figure 9G). The three GATA transcription factors GATA-3, GATA-4, and GATA-6 control numerous developmental processes in mammals [reviewed in Molkentin (2000), Cantor and Orkin (2005), Zaytouni , and Chlon and Crispino (2012)]. The three genes contain several GBSs in their promoters, and expression of both GATA3 and GATA6 is induced by mGcm2 in HeLa cells (Figure 9, H–J). FGFR1 and FGFR2 are the mammalian orthologs of the DamID targets and . FGFRs are widely described for their roles in angiogenesis, cancer development, and organogenesis [reviewed in Bates (2011), Kelleher , and Katoh and Nakagama (2014)]. They are also required for building the placental vascular system and are expressed in trophoblasts (Anteby ; Pfarrer ). Both genes contain GBSs in their promoters. In HeLa cells, mGcm1 induces the expression of FGFR1, and mGcm2 induces the expression of both FGFR1 and FGFR2 (Figure 9, K and L). Finally, DLL1 is one of the ligands of the N receptor (Shimizu ). The N signaling pathway plays a critical role in cell fate determination [reviewed in Artavanis-Tsakonas and Schwanbeck ], including trophoblast development (Zhao and Lin 2012; Rayon ; Massimiani ), and DLL1 is expressed in trophoblasts (Herr ; Gasperowicz ). In humans, the DLL1 promoter contains GBSs, and DLL1 expression is induced specifically by mGcm1 in HeLa cells (Figure 9M).

The Drosophila orthologs of mammalian target genes work in pathways known to depend on Gcm

The preceding data indicate that TBX1, GATA factors, the FGFRs, and the N ligand DLL1 are regulated by Gcm in mammals. As mentioned earlier, no tissue has been found so far for which Gcm function is required in both mammals and Drosophila. This suggests that instead of a conservation of Gcm in similar tissues, we should look for conserved Gcm cascades. To further test this hypothesis, we validated the impact of Drosophila Gcm transcription factor on the Drosophila orthologs of the four families of genes identified earlier. The Drosophila orthologs of TBX1 are the T-box transcriptions factors Bi, H15 (Porsch ), and Doc1, which are required for ganglion mother cell (GMC) differentiation during development of the embryonic nervous system (Choksi ; Leal ). We validated the role of Gcm on in S2 cells (Figure 9N). The Drosophila ortholog of the GATA factor Pnr regulates postembryonic tendon cell differentiation (Ghazi ). There is a significant induction of expression by Gcm in S2 cells (Figure 9N), but further experiments are required to demonstrate the impact of Pnr on embryonic tendon cells or the impact of Gcm on postembryonic tendon cells. The ortholog of the FGFRs, , is involved in ensheathing glia morphogenesis (Stork ), and Gcm is required for differentiation of these cells (Awasaki ). Gcm strongly induces expression of in S2 cells (Figure 6C). The second ortholog, , presents the same expression pattern as in the developing embryonic CNS at stage 11 (Lin ), indicating that Gcm may regulate expression in this tissue. Finally, the DLL1 Drosophila ortholog Dl is required as part of the N pathway for the development of embryonic glia (Udolph ; Van de Bor and Giangrande 2001; Umesono ; Edenfeld ), the larval optic lobe (Egger ; Yasugi ; Wang ), and tendon cells (Nabel-Rosen ; Ghazi ). expression is induced by Gcm in S2 cells (Figure 8B).

Discussion

The DamID approach allowed us to identify the direct targets of Gcm in Drosophila in a genome-wide fashion and to extend these findings to mammals. It also allowed us to recognize key molecular pathways and developmental processes that depend on Gcm. The improvement of the transfection assays on FACS sorting provides a rapid and sensitive tool to characterize molecular pathways and genetic interactions. This versatile approach is particularly useful to study genes that are expressed at weak levels, in very few cells, or for which the target tissues are still unknown.

Feedback loops between Gcm and signaling pathways

Gcm is widely described for its role in nervous system development and hemocyte differentiation, and indeed, many of the genes identified in the screen are involved in these two processes (Table 1 and Table 2). The screen allowed us to establish a direct link between Gcm and three major signaling pathways, the vast majority of whose members are targeted by Gcm. The Hh pathway and Gcm are necessary for tendon cell differentiation as well as for lamina neuron proliferation and differentiation, but the relation between Gcm and the Hh pathway remained vague (Chotard ; Umetsu ). Validation of the DamID screen in cells and in vivo reveals that Gcm can control five members of the Hh pathway (basically only Cos2 was not identified in the screen) (Figure 10A) at tendon cells. This includes Hh itself; its receptor, Ptc; the proteins transducing the signal Smo, , and Ci; and inhibitor of the signaling pathway Rdx. Interestingly, Chotard proposed that Gcm interacts with the Hh pathway. Together with the finding that the Hh pathway is also required upstream of Gcm in tendon cells for definition of their precursor (Soustelle ), these data suggest the presence of a feedback loop.
Figure 10

Molecular pathways affected by Gcm. (A) Hedgehog signaling pathway. (B) JAK/STAT signaling pathway. (C) Notch signaling pathway. The proteins and genes in black and red are targeted by Gcm, according to the DamID screen. The genes in red were previously characterized as downstream targets of Gcm, and underlined genes were validated by qPCR in FACS-sorted S2 cells in this study. The proteins in gray are part of the pathway but are not targeted by Gcm.

Molecular pathways affected by Gcm. (A) Hedgehog signaling pathway. (B) JAK/STAT signaling pathway. (C) Notch signaling pathway. The proteins and genes in black and red are targeted by Gcm, according to the DamID screen. The genes in red were previously characterized as downstream targets of Gcm, and underlined genes were validated by qPCR in FACS-sorted S2 cells in this study. The proteins in gray are part of the pathway but are not targeted by Gcm. The JAK/STAT signaling pathway triggers lamellocyte differentiation [reviewed in Agaisse and Perrimon (2004) and Myllymaki and Ramet (2014)] and was shown previously to be repressed by Gcm (Jacques ). The DamID screen shows the direct interaction between Gcm and two major inhibitors of the JAK/STAT pathway, Ptp61F and Socs36E, providing a molecular explanation for the observed genetic interactions. Interestingly, Gcm also targets Stat92E and Os (basically all the major players of the JAK/STAT pathway were identified in the screen) (Figure 10B), and the JAK/STAT pathway was shown to induce expression in the optic lobe (Wang ). This suggests the existence of a feedback loop in this cascade as well. Finally, the N pathway was described as an activator or inhibitor of Gcm in glial cell differentiation depending on the context (Udolph ; Van de Bor and Giangrande 2001; Umesono ). The DamID screen shows that Gcm interacts with 30 genes of the N pathway, including seven inhibitors and seven activators of that pathway (Table S1); all the genes in the N pathway were found the in screen (Figure 10C). Thus, the DamID screen helps to clarify the impact of Gcm on regulatory pathways at the molecular level and paves the way for future studies assessing the biological relevance of these interactions in vivo. These pathways had not been characterized previously as regulated by Gcm, nor had they been identified by the three transcriptome assays (Egger ; Freeman ; Altenhein ). This is most likely due to the fact that these pathways involve feedback loops and/or are required in most, if not all, tissues during embryogenesis. Future studies will assess whether Gcm targets different members of a signaling pathway in specific tissues.

Gcm regulates genes organized in cluster

Analysis of the DamID screen reveals the regulation of genes organized in a cluster, as illustrated by E(spl)-C. This observation is in line with the hypothesis that chromatin conformation plays an important role in transcription factor–mediated induction of gene expression. Over a region of 0.8 Mb, the DamID peaks are exclusively concentrated in a region of 50 kb that contains the 12 members of E(spl)-C (Figure 7B), whose expression is upregulated by Gcm. In contrast, the unrelated gene is located at the boundary of the complex, is not controlled by Gcm, and does not contain a DamID peak (Figure 7C). The precise targeting of Gcm to genes within the folded region of E(spl)-C correlates well with chromosome conformation capture (3C) analyses, which revealed long-range interactions within the 50-kb complex but very few interactions with regions surrounding it (Schaaf ). Further development of 3C technology will allow full comprehension of the impact of the chromatin three-dimensional structure on gene expression.

Gcm regulates genes required for the final function of differentiating tissue

Gcm is generally described as a cell fate determinant. Accordingly, many Gcm direct targets are involved in the early differentiation step, e.g., implementation of glial fate at the expense of neuronal fate in the nervous system (Figure 7A and Figure 8A, pink-shaded area). Surprisingly, a number of genes identified by the screen are necessary at late developmental stages or for function of fully differentiated cells (Figure 7A and Figure 8A, green-shaded area). Typical examples are provided by the genes expressed specifically in the BBB involved in amino acid, sugar, and water exchange and by those coding for septate junction proteins, which are necessary for the filtering function of the BBB (Deligiannaki ) [reviewed in Hindle and Bainton (2014)]. Similarly, a number of genes acting in the immune system are involved in antigen-specific immune response and the encapsulation of foreign targets. Thus, an early and transiently expressed gene directly targets genes involved in physiologic responses. The fact is that early genes play a broader role that initially was thought not to be so uncommon because this was also observed for the transcription factor Pros in the nervous system. Pros promotes the switch from self-renewal to differentiation. DamID and transcriptome analyses of the downstream targets of Pros revealed that in addition to promoting genes involved in repression of stem cell self-renewal, it also promotes expression of genes involved later on in terminally differentiated neurons (Choksi ).

Gcm downstream targets are conserved in mammals

The use of simple organisms such as Drosophila to understand the GCM regulatory network in vertebrates has been limited to few studies (Iwasaki ; Soustelle and Giangrande 2007). This was mostly due to the known requirement of mammalian Gcm genes in placenta or parathyroid glands, two tissues that have no fly equivalent. Despite the disparity of tissues in which Gcm genes are expressed, we were able to find conservation in the target genes. Because in humans GCM2 downregulation and mutations are associated with parathyroid adenomas and hypoparathyroidism, respectively (Mannstadt ; Doyle ; Yi ; Park ; Mitsui ), while GCM1 downregulation is associated with preeclampsia (Chen ), the downstream genes identified in this work, including TBX1, GATA factors, and FGFR, represent interesting candidates to dissect the molecular mechanisms underlying these pathologies. Typically, TBX1 mutations in humans result in DiGeorge syndrome, which includes parathyroid aplasia (Jerome and Papaioannou 2001; Merscher ). Also, GATA3 regulates GCM2 in the parathyroid gland (Grigorieva ), and GATA3, GATA4, and GATA6 are required during trophoblast development [reviewed by Bai ]. Our data suggest a feedback loop between GCM2 and GATA3 in the parathyroid and point to a hitherto unknown role of GATA6 in this tissue, in line with recent immunolabeling data (Uhlen ). Finally, FGFR1 is misregulated in hyperparathyroidism (Komaba ). Finally, the impact of Gcm on the N pathway seems to be conserved to the largest extent. Of the 30 genes identified by the screen, we confirmed 9 by qPCR, including N, E(spl)-C, and . In humans, we show that DLL1 is regulated by GCM1, and in the mouse, the E(spl)-C ortholog Hes5 was reported to be regulated by mGcm1 and mGcm2 (Hitoshi ). This gives strong support to the hypothesis that Gcm regulates the N pathway in both Drosophila and mammals. However, the effect of Gcm most likely depends on the tissue. Indeed, opposite outcomes have already been observed for the effect of Notch on Gcm in Drosophila (Udolph ; Van de Bor and Giangrande 2001; Umesono ), and both activators and inhibitors of the N pathway were identified in the DamID screen (Table S1). This means that the interaction between Gcm and the N pathway will need to be studied case by case. Overall, our data indicate that even though the main sites of expression may be different in mammals and Drosophila, the Gcm cascade is at least partially conserved. In this study, we discovered 980 new potential direct targets of Gcm, demonstrated the direct interaction for 36 of them, and the use of Drosophila allowed us to discover eight new targets of the GCMs in humans, which include the characterization of a feedback loop for GCMs on themselves.
  253 in total

1.  Fascin is required for blood cell migration during Drosophila embryogenesis.

Authors:  Jennifer Zanet; Brian Stramer; Thomas Millard; Paul Martin; François Payre; Serge Plaza
Journal:  Development       Date:  2009-08       Impact factor: 6.868

2.  Fibroblast growth factor-10 and fibroblast growth factor receptors 1-4: expression and peptide localization in human decidua and placenta.

Authors:  Eyal Y Anteby; Shira Natanson-Yaron; Yaron Hamani; Yael Sciaki; Debra Goldman-Wohl; Caryn Greenfield; Ilana Ariel; Simcha Yagel
Journal:  Eur J Obstet Gynecol Reprod Biol       Date:  2005-03-01       Impact factor: 2.435

3.  Developmental control of blood cell migration by the Drosophila VEGF pathway.

Authors:  Nam K Cho; Linda Keyes; Eric Johnson; Jonathan Heller; Lisa Ryner; Felix Karim; Mark A Krasnow
Journal:  Cell       Date:  2002-03-22       Impact factor: 41.582

4.  A novel role of the glial fate determinant glial cells missing in hematopoiesis.

Authors:  Cécile Jacques; Laurent Soustelle; István Nagy; Céline Diebold; Angela Giangrande
Journal:  Int J Dev Biol       Date:  2009       Impact factor: 2.203

5.  Lineage tracing of lamellocytes demonstrates Drosophila macrophage plasticity.

Authors:  Martin Stofanko; So Yeon Kwon; Paul Badenhorst
Journal:  PLoS One       Date:  2010-11-19       Impact factor: 3.240

6.  Neural stem cell transcriptional networks highlight genes essential for nervous system development.

Authors:  Tony D Southall; Andrea H Brand
Journal:  EMBO J       Date:  2009-12-16       Impact factor: 11.598

7.  Glide directs glial fate commitment and cell fate switch between neurones and glia.

Authors:  S Vincent; J L Vonesch; A Giangrande
Journal:  Development       Date:  1996-01       Impact factor: 6.868

8.  Identification and functional analysis of antifungal immune response genes in Drosophila.

Authors:  Li Hua Jin; Jaewon Shim; Joon Sun Yoon; Byungil Kim; Jihyun Kim; Jeongsil Kim-Ha; Young-Joon Kim
Journal:  PLoS Pathog       Date:  2008-10-03       Impact factor: 6.823

9.  Two Drosophila suppressors of cytokine signaling (SOCS) differentially regulate JAK and EGFR pathway activities.

Authors:  Jason S Rawlings; Gabriela Rennebeck; Susan M W Harrison; Rongwen Xi; Douglas A Harrison
Journal:  BMC Cell Biol       Date:  2004-10-15       Impact factor: 4.241

10.  Global analysis of patterns of gene expression during Drosophila embryogenesis.

Authors:  Pavel Tomancak; Benjamin P Berman; Amy Beaton; Richard Weiszmann; Elaine Kwan; Volker Hartenstein; Susan E Celniker; Gerald M Rubin
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

View more
  9 in total

Review 1.  Drosophila as a Genetic Model for Hematopoiesis.

Authors:  Utpal Banerjee; Juliet R Girard; Lauren M Goins; Carrie M Spratford
Journal:  Genetics       Date:  2019-02       Impact factor: 4.562

2.  Revisiting the role of the Gcm transcription factor, from master regulator to Swiss army knife.

Authors:  Pierre B Cattenoz; Angela Giangrande
Journal:  Fly (Austin)       Date:  2016-07-19       Impact factor: 2.160

Review 3.  Origins of glial cell populations in the insect nervous system.

Authors:  Jaison J Omoto; Jennifer K Lovick; Volker Hartenstein
Journal:  Curr Opin Insect Sci       Date:  2016-09-28       Impact factor: 5.186

4.  The Glide/Gcm fate determinant controls initiation of collective cell migration by regulating Frazzled.

Authors:  Tripti Gupta; Arun Kumar; Pierre B Cattenoz; K VijayRaghavan; Angela Giangrande
Journal:  Elife       Date:  2016-10-14       Impact factor: 8.140

5.  An evolutionary conserved interaction between the Gcm transcription factor and the SF1 nuclear receptor in the female reproductive system.

Authors:  Pierre B Cattenoz; Claude Delaporte; Wael Bazzi; Angela Giangrande
Journal:  Sci Rep       Date:  2016-11-25       Impact factor: 4.379

6.  Patient-associated mutations in Drosophila Alk perturb neuronal differentiation and promote survival.

Authors:  Kathrin Pfeifer; Georg Wolfstetter; Vimala Anthonydhason; Tafheem Masudi; Badrul Arefin; Mats Bemark; Patricia Mendoza-Garcia; Ruth H Palmer
Journal:  Dis Model Mech       Date:  2022-08-16       Impact factor: 5.732

7.  The Repo Homeodomain Transcription Factor Suppresses Hematopoiesis in Drosophila and Preserves the Glial Fate.

Authors:  Guillaume Trébuchet; Pierre B Cattenoz; János Zsámboki; David Mazaud; Daria E Siekhaus; Manolis Fanto; Angela Giangrande
Journal:  J Neurosci       Date:  2018-11-30       Impact factor: 6.167

8.  Multifunctional glial support by Semper cells in the Drosophila retina.

Authors:  Mark A Charlton-Perkins; Edward D Sendler; Elke K Buschbeck; Tiffany A Cook
Journal:  PLoS Genet       Date:  2017-05-31       Impact factor: 5.917

9.  Embryonic hematopoiesis modulates the inflammatory response and larval hematopoiesis in Drosophila.

Authors:  Wael Bazzi; Pierre B Cattenoz; Claude Delaporte; Vasanthi Dasari; Rosy Sakr; Yoshihiro Yuasa; Angela Giangrande
Journal:  Elife       Date:  2018-07-11       Impact factor: 8.140

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.