Literature DB >> 18851760

New insights into SRY regulation through identification of 5' conserved sequences.

Diana G F Ross1, Josephine Bowles, Peter Koopman, Sigrid Lehnert.   

Abstract

BACKGROUND: SRY is the pivotal gene initiating male sex determination in most mammals, but how its expression is regulated is still not understood. In this study we derived novel SRY 5' flanking genomic sequence data from bovine and caprine genomic BAC clones.
RESULTS: We identified four intervals of high homology upstream of SRY by comparison of human, bovine, pig, goat and mouse genomic sequences. These conserved regions contain putative binding sites for a large number of known transcription factor families, including several that have been implicated previously in sex determination and early gonadal development.
CONCLUSION: Our results reveal potentially important SRY regulatory elements, mutations in which might underlie cases of idiopathic human XY sex reversal.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18851760      PMCID: PMC2572636          DOI: 10.1186/1471-2199-9-85

Source DB:  PubMed          Journal:  BMC Mol Biol        ISSN: 1471-2199            Impact factor:   2.946


Background

Sex in mammals normally correlates with the presence or absence of the Y chromosome. Male sex determination in almost all mammals is directly caused by the correct expression and function of a single Y-linked gene, SRY[1-4]. SRY activity in males causes the bipotential gonad, the genital ridge, to set off on the path to becoming a testis. If the fetal genital ridge does not express SRY, ovary development is initiated instead. A majority of gonadal dysgenesis cases cannot be attributed to mutations within or immediately 5' of SRY, or to any other gene known to have a role in sex determination. We hypothesise that this is because SRY's regulatory regions are uncharted, therefore providing no means to check specific areas for mutation. SRY carries out a similar function in all mammals in which it is present, but displays a high degree of variability between species. This situation is thought to result from the location of SRY on the Y chromosome, exposing it to a higher rate of mutation compared to autosomal genes, thereby leading to DNA degradation and even loss [5]. The region of SRY best conserved between species is the high mobility group (HMG) box, which confers the encoded protein its transcription factor role by allowing it to bind and bend DNA [6,7]. Outside the HMG box, SRY is very poorly conserved between species. This lack of conservation has made it difficult to define functional motifs required for the role of SRY protein in directing male sex determination. The regulation of SRY is under tight control to ensure its expression at the right time, place and level necessary to initiate male sex determination. In mice, delayed onset of Sry expression, or reduced levels of Sry expression, is known to cause full or partial XY sex reversal [8-10]. Therefore, an understanding of how SRY expression is regulated is an important part of the overall picture of its functions in male sex determination and of how disturbances in function can lead to disorders of sex development. As with the SRY coding region, sequences beyond the transcription unit of SRY are very poorly conserved between species, a situation that has contributed to an almost total lack of understanding of how the expression of this gene is regulated. Comparative genomics is normally a powerful tool for identifying biologically important gene regulatory regions, based on the conservation of functional regulatory modules being under selective pressure during evolution [11-13], but this method has shown only limited success in studies of SRY to date. Although mice are most useful for a range of developmental and functional genetic studies, their utility in comparative genomics is limited by their unusually high rate of sequence drift, thought to be linked to their short generation time [14]. Progress in identifying potential gene regulatory motifs through comparative genomics relies on the availability of genome sequences from a range of non-murine mammals. A study analysing non-coding sequences in 39 bovine, human and mouse gene orthologues revealed 73 putative regulatory intervals conserved between bovine and human genes, only 13 of which were also conserved in mice [15]. Further comparative genomic analysis of these regions showed that the homology to human is highest in bovine, and weakest in the mouse. Other studies also point to an excellent conservation of bovine and human sequences in the promoter region of genes such as Oct4, but relatively poor conservation of the corresponding mouse sequences [16]. In the present study we generated novel bovine and caprine SRY 5' sequence data in order to conduct comparative genomic analysis of 5' sequences from human, bull, pig, goat and mouse Sry. In this way we identified four novel sequence intervals that may be important for the correct regulation of SRY expression and therefore for correct function of SRY in mammalian sex determination. The identification of these candidate regulatory regions provides a focus for efforts to discover new mutations associated with human idiopathic XY sex reversal.

Results

Generation of novel Sry genomic flanking sequence from bovine and caprine BACs

In order to provide new tools for comparative genomic analysis of potential SRY 5' regulatory sequences, we first generated novel flanking sequence from the bovine and caprine SRY genes. The BAC clone RP42-95D10 containing bovine SRY [17] was found by Southern blotting and polymerase chain reaction (PCR) to contain a 15 kb EcoR1 fragment harbouring SRY (data not shown). This fragment was subcloned, and sequenced to five times coverage [GenBank EU581861]. Alignment of the bovine sequence with published human [EMBL: NT_011896.9 nucleotides 5177–21272] and mouse [EMBL: NT_078925.6 nucleotides 1917040–1934040] SRY 5' sequence allowed the preliminary identification of several potentially conserved sequence blocks. We generated corresponding fragments of the goat SRY 5' region by PCR using as template a goat BAC clone containing SRY and known to cause female to male sex reversal in mice [18]. These fragments were sequenced, aligned, and appended to existing goat SRY sequence where possible, and used for further analysis [Genbank EU581862, EU581863, and EU581864].

Comparative genomic sequence analysis

We next used the novel 15 kb of bovine SRY 5' sequence as a reference point for comparative genomic studies. VISTA alignment of the bovine sequence with human, porcine (4.6 kb) [19], caprine (individual regions described above), and mouse (17 kb), revealed four sequence blocks of significant homology (Figure 1). These blocks (A, B, C and D) from human, caprine and porcine SRY displayed at least 50% nucleotide identity to bovine sequence by VISTA analysis using 100 bp windows. The four conserved blocks were separated by non-conserved sequence, the length of which varied between species (Figure 2). In the goat no intervening sequences were detected between region C and D. The main features of each conserved block are as follows:
Figure 1

Homology of human, caprine, porcine and mouse . Pink shading indicates 70% or higher homologies calculated over 100 bp. Peaks of homology are labelled Region A to D above the graph. Repetitive elements (LINEs and SINEs) are indicated in green, and the SRY coding region in blue. Grey line below each graph shows the extent of sequence used.

Figure 2

Spacing of and co-ordinates of conserved . Sequence information for porcine region A was not available for this study. Numbering represents number of nucleotides 5' to the transcription start site in each species, ? denotes unknown positions.

Homology of human, caprine, porcine and mouse . Pink shading indicates 70% or higher homologies calculated over 100 bp. Peaks of homology are labelled Region A to D above the graph. Repetitive elements (LINEs and SINEs) are indicated in green, and the SRY coding region in blue. Grey line below each graph shows the extent of sequence used. Spacing of and co-ordinates of conserved . Sequence information for porcine region A was not available for this study. Numbering represents number of nucleotides 5' to the transcription start site in each species, ? denotes unknown positions. Region A (480 bp) lies about 8.3 kb upstream of the start of transcription in bovine SRY (5.6 kb in human; Figure 2). It showed more than 70% conservation in 100 bp windows between bovine, human and caprine sequence over a large proportion of its length using VISTA (Figure 1, pink shading). ClustalW showed overall homology between the three species as 63 – 87% (Figure 3).
Figure 3

DNA sequence homologies calculated across the whole of regions A, B, C and D. Species are human (hum), bovine (bov), porcine (por), caprine (cap), and murine (mur). na, porcine sequence not available.

DNA sequence homologies calculated across the whole of regions A, B, C and D. Species are human (hum), bovine (bov), porcine (por), caprine (cap), and murine (mur). na, porcine sequence not available. Region B (1.5 kb) begins 6.7 kb 5' of the bovine SRY start of transcription (5 kb in human; Figure 2). Bovine/human homology, in 100 bp windows of this region, was above 70%, limited to two short sequence intervals (Figure 1). This high homology between bovine and caprine, and moderate homology between bovine and human sequences, was reflected in overall ClustalW homology analysis of these regions (Figure 3). As in region A, homology of mouse sequence was minimal in this region. The available 4.6 kb of porcine genomic sequence stopped partway through this region, but aligned well with bovine sequence (Figure 1). Region C (1 kb) was found 3.9 kb upstream on the bovine sequence (3.6 kb in human; Figure 2). This was the least conserved area between bovine and human, not reaching 70% in any 100 bp window using the VISTA browser (Figure 1), and only 19% overall by ClustalW (Figure 3). Caprine sequence showed high homology to bovine in this region, porcine intermediate, and mouse negligible (Figures 1, 3). Region D was found immediately upstream of bovine, human and caprine SRY, and so represents the proximal promoter region in these species (1.9, 1.5 and 1.9 kb respectively). This region showed strong to moderate conservation across all species except mouse (Figure 1, 3). Conservation between bovine and human sequences was stronger in this region than other regions (Figure 3). No additional regions of homology were detected distal to region A within the 15 kb of bovine sequence used as anchor, when compared with 17 kb of human and 16 kb of mouse sequence.

Conserved transcription factor binding sites

We next searched for potential transcription factor binding sites in conserved regions A-D in order to evaluate the possible significance of these regions for SRY regulation. In silico DiAlignTF analysis revealed 210 conserved, canonical transcription factor binding sites across the four regions, representing 38 transcription factor families (Table 1 and 2, Figure 4 and additional file 1). None of the transcription factor binding sites were shown as conserved in the mouse using DiAlignTF, although some nucleotide conservation was detectable when viewed by eye (Additional file 1). To allow us to add levels of significance to the putative sites they were grouped according to their occurrence patterns in the sequences (Table 1): most frequent (total number of times represented in the four regions), most common (number of regions containing each type of site) and level of conservation (number of species containing the site) among the four species examined other than mice. In addition, the matrix similarity score for each site (that is, the similarity of each putative site to the canonical binding site for the relevant transcription factor) is shown in Table 3, as further indication of the likely relevance of each putative binding site.
Table 1

Transcription factor binding sites found in SRY 5' regions

FamilyReg. AReg. BReg. CReg. DTOTAL
BRNFb/g/p (×2)b/g/h (×2)b/g/p (×2)18

OCT1b/g/p (×2)b/g/p (×2)b/g/h (×2)18

HOXFb/g/hb/g/hb/g/p/hb/g/h13

PARFb/g/pb/g/p (×3)12

FKHDb/g/pb/g/pb/g/h/p10

GATAb/g/h (×2)b/g/p9

CREBb/g/pb/g/p (×2)9

CDXFb/g/h/p (×2)8

SRFFb/g/hb/g/h/p7

MEF2b/g/hg/h/p6

ETSFb/g/h, b/g/p6

HNF1b/g/pb/g/h6

SORYb/g/pg/h/p6

NKXHb/g/hb/g/p6

LHXFb/g/p/h4

MYT1b/g/h/p4

PLZFb/g/h/p4

NFKBb/g/h/p4

EVI1b/g/h3

TBPFb/g/h3

HOXCb/g/h3

GFI1b/g/h3

PITIb/g/h3

OCTPb/g/h3

RORAb/g/h3

HAMLb/g/h3

RBPFb/g/h3

IRFFb/g/p3

PAX6b/g/p3

MZF1b/g/p3

GZF1b/g/h3

ZFHXb/g/p3

PLAGb/g/p3

MOKFb/g/h3

HOMFb/g/h3

RBITb/g/h3

SATBb/g/p3

CLOXb/g/h3

TOTAL271862103210

Only sites conserved between 3 or more species are shown. Sites conserved between 4 species are marked in bold. Numbers in parentheses indicate the number of times each binding site was found in the same region. Data are sorted in order of most common to least common transcription factor binding sites. Bovine (b), goat (g), human (h), pig (p), mouse (m).

Table 2

Transcription factor family members

FamilyTranscription factors
BRNFBrn POU domain factors
 BRN2/3/4/5

OCT1Octamer binding protein
 OCT1/2/3

HOXFFactors with moderate activity to homeodomain consensus sequence
 Barx2, CRX, GSC, Gsh-1/2, HOX1, HOXA9, HOXB9, HOXC13, NANOG, OTX2, PCE1, PHOX2a/2b, PTX1 pituitary homeobox.

PARFPAR/bZIP family
 DBP Albumin D-box binding protein, HLF hepatic leukemia factor, TEF Thyrotrophic embryonic factor, VBP PAR-type chicken vitellogenin promoter binding protein.

FKHDFork head domain factors
 FHXA/B, FKHRL1 (FOXO), FREAC2/3/4/7 fork head related activators (FOXF2, FOXC1, FOXD1, FOXL1), HFH1/2/3/8 (FOXQ1, FOXD3, FOXI1, Freac-6. FXF1), HNF3B (FOXA2), IlF1 (FOXK2), XFD1/2/3.

GATAGATA binding factors
 GATA, GATA1/2/3.

CREBCamp-responsive element binding proteins
 ATF, ATF2/6 activation transcription factors, c-Jun/ATF2 heterodimers, CREB, CREB1/2, CREB2/cJun, E4BP4, TAX/CREB complex, XBP1 X-box-binding protein.

CDXFVertebrate caudal related homeodomain Protein
 CDX1/2 Intestine specific homeodomain factor and mammalian caudal related intestinal TF.

SRFFSerum response element binding factor
 SRF.01/02/03

MEF2Myocyte-specific enhancer binding Factor
 MEF2, RSRFC4 related to serum response factor, SL1 member of RSRF

ETSFHuman and murine ETS1 factors
 c-Ets-1/2(p54), ELF-2(NERF1a), ELK1, FLI, GABP GA binding protein, GABPB1 GA repeat binding protein beta 1, NRF2 nuclear respiratory factor 2, PDEF Prostate-derived Ets factor, PEA3 polyomavirus enhancer A binding protein 3, ETV4, PU1, SPI1, SpiB.

HNF1Hepatic nuclear factor 1
 HNF1

SORYSOX/SRY-sex/testis determining and related HMG box factors
 HBP1, HMGA1/2, HMGIY, SOX5/9, SRY.

NKXHNKX homeodomain factors
 Hmx2/Nkx5-2 homeodomain transcription factor, NKX31 prostate-specific homeodomain protein, TTF1 thyroid transcription factor

LHXFLim homeodomain factors
 LHX3 and LMXB1

MYT1MYT1 C2HC zinc finger protein
 MyT1 myelin transcription factor, and MyT1L.

PLZFC2H2 zinc finger protein
 PLZF promyelocytic leukemia zinc finger (TF with 9 Kruppel-like zinc fingers)

NFKBNuclear factor kappa B/c-rel
 NF-kappaB (p50 and p65), HIVEP1; ZAS Domain TF human immunodeficiency virus type 1 enhancer-binding protein-1 (HIVEP1), major histocompatibility complex-binding protein-1 (MBP-1), positive regulatory domain II-binding factor (PRDII-BF1)

EVI1Myleoid transforming protein
 EVI1 ecotropic viral integration site 1 encoded factor, amino-terminal zinc finger domain. MEL1 (MDS1/EVI1-like gene 1) DNA-binding domain 1.

TBPFTATA-binding protein factors
 ATATA avian C-type LTR TAT box, LTATA Lentivirus LTR TAT box, MTATA muscle TATA box, TATA cellular and viral TATA box elements, and Mammalian C-type LTR TATA box.

HOXCHOX – PBX complexes
 HOX/PBX binding sites, PBX1, PBX-HOXA9 binding site.

GFI1Growth factor independence transcriptional Repressor
 GFI1.01/02, GFI1B.01.

PITIGHF-1 pituitary specific pou domain TF
 Pit1, GHF1.

OCTPOCT1 binding factor (POU-specific domain)
 OCT1P Octamer-binding factor 1, POU-specific domain)

RORAv-ERB and RAR-related orphan receptor alpha
 REV-ERBA orphan nuclear receptor rev-erb alpha (NR1D1), RORA/RORA1/2 RAR-related orphan receptor alpha/1/2, RORGAMMA RAR-related orphan receptor gamma, VERBA viral homolog of thyroid hormone receptor alpha1

HAMLHuman acute myelogenous leukemia factors
 AML1/CBFA2 Runt domain binding site, AML3 runt-related transcription factor 2/CBFA1

RBPFRBPJ kappa
 Mammalian transcriptional repressor RBP-Jkappa/CBF1

IRFFInterferon regulatory factors
 IRF1/2/3/4(NF-EM5, PIP, LSIRF, ICSAT)/7, ISRE interferone stimulated response element.

PAX6PAX-4/PAX-6 paired domain binding sites
 PAX4 and PAX6 paired domain binding site

MZF1Myeloid zinc finger 1 factors
 MZF1

GZF1GDNF-inducible zinc finger gene 1
 GZF1 (ZNF336)

ZFHXTwo-handed zinc finger homeodomain transcription factors
 AREB6 (Atp1a1 regulatory element binding factor 6), deltaEF1 (Delta-crystallin enhancer binding factor, transcription factor 8, zinc finger homeobox 1a), SIP1 (Smad-interacting protein)

PLAGPleomorphic adenoma gene
 (PLAG) 1, a developmentally regulated C2H2 zinc finger protein

MOKFMouse Kruppel like factors
 MOK2.01/02 Ribonucleoprotein associated zinc finger protein MOK-2

HOMFHomeodomain transcription factors
 DLX1/2/5, Distal-less 3, EN1 homeobox protein engrailed, HHEX, MSX1/2, NOBOX, S8.

RBITRegulator of B-Cell IgH transcription
 Bright, B cell regulator of IgH transcription

SATBSpecial AT-rich sequence binding
 Protein SATB1

CLOXCLOX and CLOX homology (CDP) factors
 CDP cut-like homeodomain protein, transcriptional repressor CDP, CDPCR3, CDPCR3HD, CLOX, CUT2.

List of transcription factor families found in Regions A-D and the specific transcription factors that comprise them.

Figure 4

Conserved transcription factor binding sites in each region of homology. Black text indicates conservation between 3 species of which one is human, grey text indicates 3-species conservation without human, and red text indicates conservation between 4 species (human, bovine, porcine and caprine). An example of the highly conserved area of region D is shown as a sequence alignment with conserved transcription factor binding sites boxed or shaded.

Table 3

Matrix similarity scores for putative binding sites

RegionSiteBovineHumanGoatPorcineMean
AHOXF0.960.9880.947-0.965

GATA (a)0.9240.9630.956-0.948

GATA (b)0.9440.9720.916-0.944

PITI0.9420.930.945-0.939

GFI10.960.9110.918-0.930

HOXC0.9110.9220.951-0.928

OCTP0.9220.8750.968-0.922

EVI10.9580.860.904-0.907

TBPF0.9230.8130.933-0.890

BRBPF0.9440.9430.961-0.949

RORA0.9580.9830.897-0.946

HAML0.9430.9350.943-0.940

HOXF0.8840.8890.884-0.886

MEF20.9050.8850.775-0.855

SRFF0.6970.7170.681-0.698

CMZF11.000-1.0000.9950.998

ZFHX0.984-0.9840.9840.984

ETSF0.9830.9820.983-0.983

FKHD0.962-0.9620.9620.962

GATA0.9730.9360.9730.9540.959

IRFF0.964-0.8870.9450.932

CREB0.938-0.9380.9140.930

HOXF0.9750.8700.9750.8570.919

BRNF0.946-0.9060.8990.917

PARF0.940-0.8640.9210.908

ETSF0.880-0.8900.9250.898

OCT10.905-0.8940.8940.898

SORY0.879-0.8790.9270.895

PLAG0.900-0.8820.8870.890

BRNF0.810-0.8100.9160.845

LHXF0.8390.8460.8390.8490.843

OCT10.846-0.8410.8200.836

GZF10.7610.8580.858-0.826

HNF10.801-0.8030.8190.808

PAX60.778-0.7690.7810.776

DSORY-0.9910.9870.9860.988

MOKF0.9830.9830.983-0.983

HOMF0.9890.9500.989-0.976

SATB0.958-0.9580.9670.961

CLOX0.9480.9670.948-0.954

CDXF0.9800.8550.9800.9800.949

PARF0.921-0.9210.9950.946

RBIT0.9240.9650.924-0.938

NKXH0.9330.9280.933-0.931

HOXF0.923-0.9230.9420.929

NKXH0.946-0.8351.0000.927

FKHD0.922-0.9220.9090.918

NFKB0.8640.9920.8410.9470.911

CREB0.918-0.9180.8930.910

MEF2-0.8900.7910.9910.891

OCT10.8490.9540.849-0.884

OCT10.8730.8930.873-0.880

SRFF0.8440.8550.9180.8840.875

HNF10.9430.8540.828-0.875

PLZF-0.8830.8740.8660.874

PARF0.860-0.8650.8970.874

OCT10.862-0.8560.8990.872

FKHD0.8670.8610.8360.9190.871

PARF0.867-0.8670.8670.867

BRNF0.9020.7850.902-0.863

CDXF0.8720.8500.8700.8600.863

MYT10.875-0.7750.8750.842

BRNF0.8050.8920.819-0.839

CREB0.844-0.8330.8330.837

BRNF0.796-0.8070.8890.831

BRNF0.790-0.7900.8980.826

OCT10.790-0.783-0.787

List of matrix similarity scores (the similarity of each putative site to the canonical binding site for the relevant transcription factor) generated by MatInspector software for each putative transcription factor binding site in each species, for each region of homology. Matrix scores are ranked from the highest to lowest mean score.

Transcription factor binding sites found in SRY 5' regions Only sites conserved between 3 or more species are shown. Sites conserved between 4 species are marked in bold. Numbers in parentheses indicate the number of times each binding site was found in the same region. Data are sorted in order of most common to least common transcription factor binding sites. Bovine (b), goat (g), human (h), pig (p), mouse (m). Transcription factor family members List of transcription factor families found in Regions A-D and the specific transcription factors that comprise them. Matrix similarity scores for putative binding sites List of matrix similarity scores (the similarity of each putative site to the canonical binding site for the relevant transcription factor) generated by MatInspector software for each putative transcription factor binding site in each species, for each region of homology. Matrix scores are ranked from the highest to lowest mean score. Conserved transcription factor binding sites in each region of homology. Black text indicates conservation between 3 species of which one is human, grey text indicates 3-species conservation without human, and red text indicates conservation between 4 species (human, bovine, porcine and caprine). An example of the highly conserved area of region D is shown as a sequence alignment with conserved transcription factor binding sites boxed or shaded. The most frequently occurring transcription factor binding sites were those of BRNF and OCT1, which were represented in regions C and D a total of six times. PARF and FKHD binding sites were the next most frequent, represented four times between regions C and D. The HOXF family member binding sites were the most common, found in all of the regions and, in the case of region C, the site was conserved across four species. Eight transcription factor binding sites (HOXF, FKHD, SRFF, LHXF, CDXF repeated twice in the same region, MYT1, PLZF, and NFκB) were conserved across four species, and therefore displayed the highest level of conservation. With the exception of HOXF and LHXF (found in region C), all of these transcription factor binding sites at this four-way conservation level were found to localise to region D (Table 1 and Figure 4). Region A showed nine areas of conserved transcription factor binding sites, the most common being GATA, occurring twice. All of the sites were conserved between bovine, goat and human. Transcription factor family members unique to Region A were EVI1, TBPF, HOXC, GFI1, PITI, and OCTP (Table 1, Figure 4). Region B contained the fewest transcription factor binding sites of all the regions. Sites unique to this region were RORA, HAML and RBPF (Table 1, Figure 4). Region C contained 17 transcription factor binding site family members, with three repeated twice (BRNF, OCT1 and ETSF). Although there appeared to be many conserved transcription factor binding sites, not all were present in the human sequence. Transcription factor binding sites that were conserved in humans are HOXF, ETSF, LHXF and GZF1, all unique to this region with the exception of HOXF (Table 1 and Figure 4). Region D contained by far the largest number of transcription factor binding sites, with almost 50% of the total found. The majority showed conservation in the human sequence, and six sites were found to be very highly conserved across four species. CDXF sites are unique to region D and appeared twice close to each other conserved across four species. MYT1, PLZF, and NFκB were also unique to region D and showed conservation in four species. Other sites unique to region D and present in human were NKXH, MOKF, HOMF, RBIT and CLOX (Table 1 and Figure 4). Many of the transcription factor binding sites identified in the sequences were found in clusters of two or more, adjacent to or overlapping one another. Region A transcription factor binding sites were localised to three clusters, with the largest harbouring five transcription factor binding sites. Region B had two clusters, Region C had five, two of which contained four sites each, and Region D contained nine clusters, although on average each cluster contained only two transcription factor binding sites (Figure 4).

Discussion

The identification of gene regulatory regions through comparative genomics is a powerful entrée to directed studies of gene regulation. Using this method we have identified, for the first time, four regions upstream of SRY that show high conservation between human, bovine, pig and goat. Furthermore, these regions of homology share transcription factor binding sites that appear to be subject to strong evolutionary pressure for conservation and may therefore be important for correct regulation of SRY. Mouse Sry 5' sequences were found to be markedly dissimilar to other species across all regions of homology identified. This is perhaps not surprising given that mouse Sry coding sequences show particularly low homology to other species at the nucleotide and amino acid levels [7,20]. Moreover, mouse Sry is expressed for a short, specific time, with detectable levels of Sry transcripts first appearing at 10.5 dpc and waning by 13.25 dpc [21,2,23]. In other mammals, including humans, sheep, and pig, the gene remains actively transcribed into adulthood, albeit at a lower expression level than in fetal stages [24-27]. Therefore, mouse Sry evidently is regulated differently compared to other species and is therefore unlikely to have well conserved 5' regulatory regions. Previous data bearing on the likely position of SRY regulatory elements has come from limited homology searches, transgenesis studies, and mutation analyses. Due to the unavailability of Y chromosome sequences from mammals other than mouse and human to date, minimal sequence has been available for homology studies. One study looked for conserved sequences upstream of SRY across ten species of mammal, including human, chimpanzee, gorilla, sheep, pig, bull, gazelle, mouse, rat, and guinea pig [28]. However, only 427 to 610 bp of 5' sequence was analysed, and no meaningful conservation was identified. Boyer et al. (2006) used 3.3 kb and 5 kb of human SRY upstream sequence linked to human SRY coding sequence to produce transgenic mice, but only the larger fragment resulted in genital ridge expression of SRY. The same study showed that the pig 1.6 kb SRY promoter was sufficient for genital ridge expression [14]. Therefore we can postulate that the region necessary for genital ridge-specific regulation of SRY lies 5 kb upstream of the start of transcription in humans (corresponding to regions B, C and D from this study), and that this same site should be conserved in the pig 1.6 kb promoter (Region D). However, transgenic mouse models are subject to positional effects of the location of transgene insertion, which can cloud efforts to pinpoint gene regulatory sequences. Two documented cases of mutations 5' of the coding region of SRY leading to pure gonadal dysgenesis have been reported in human. The first, a point mutation 75 bp 5' to the gene, was associated with male to female sex reversal. A nucleotide change from G to A, located in a motif conserved in primates, was found to be responsible [29], but this motif is not conserved in other species [30]. This mutation maps to region D of the present study. The second, a 25 kb deletion 1.7 kb upstream of human SRY was identified in a sex reversed patient [31]. The deletion would remove regions A-C and part of D, identified in the present study, supporting the hypothesis that regions A-D harbour important functional SRY regulatory elements, although the possibility that the deletion affects regulatory elements lying further 5' cannot be excluded as a cause of human sex reversal. What transcription factor(s) may regulate expression of SRY? SRY is a master genetic switch that triggers testis development by initiating a cascade of gene expression. Its up-regulation marks the first male-specific gene expression event in the developing gonad. Therefore, any gene hypothesised to regulate SRY must be expressed equally in both sexes, before sex differentiation begins. Sf1, Sp1 and Wt1 are all expressed in genital ridges of both sexes and have been shown to influence expression of Sry in cell culture experiments [32-34]. Moreover, Sf1- and Wt1-knockout mice show gonadal sex development phenotypes [35,36]. Other genes known to have a role in gonadal formation and development, based on experiments in genital ridges and the absence of gonads in knockout mice are Lim1 [37], Lhx9 [38], and Gata4 [39]. The present study identified binding sites for a number of transcription factors 5' of SRY. The transcription factor families whose binding sites displayed the highest levels of conservation were LHXF, CDXF, HOXF, PLZF and NFκB. These families all have members that are plausible candidates for a role in SRY regulation. The highly conserved LHXF binding site found in region C could potentially bind either LIM1 or LHX9 transcription factors. Lhx9 is expressed in the genital ridges of male and female mice between 9.5 and 11.5 dpc. Gonads fail to form in mice null for each of these genes [37,38]. However, complete gonadal agenesis would implicate these genes in functions other than, or possibly additional to, regulation of Sry. PLZF and Nanog may bind to the HOXF and PLZF sites in the SRY 5' region, respectively. However, both are early germ cell transcription factors, and are therefore not present in the nuclei of supporting cell precursors in which SRY is expressed. NFκB is implicated in various stages of gonad development including spermatogenesis [40]. It is known to interact with AMH, and is likely have a role during the later stages of testis function, but expression in early gonadal development has not been described. Perhaps most intriguingly, the two conserved CDXF binding sites in region D point to a role for CDX1 in SRY regulation (Figure 4). Cdx1 has been shown to be a direct target of retinoic acid [41], present in the gonads and mesonephroi of both sexes from an early stage [42,43]. Cdx1 is expressed in the mesonephros in the developing mouse embryo and remains detectable till 12 dpc. Cdx1 knockout mice are viable and show homeotic vertebral transformations [44]. In view of the present data, it will be useful to examine the gonadal phenotype of these knockout mice.

Conclusion

In summary, we identified a large number of potential transcription factor binding sites localised to short regions of particularly high conservation in the SRY gene in human, bovine, porcine and caprine 5' flanking sequences. However, areas of high homology also exist that appear to lack binding sites for known transcription factors. These areas may also be important for the proper regulation of the gene by harbouring binding sites for unidentified proteins or transcription factors whose binding sites have not been characterized. The identification in the present study of regions of conservation upstream of SRY may facilitate the discovery of new mutations associated with human idiopathic XY sex reversal.

Methods

Bovine and goat SRY BAC sequence

The BAC clone RP42-95D10 from the CHORI BAC/PAC Resource Centre was previously identified as containing the bovine SRY coding region[17]. A 15 kb Sry fragment isolated from the BAC was cloned into pBluescript II KS+ using EcoRI, and shotgun sequenced by the Australian Genome Research Facility (AGRF) Brisbane, to five times coverage. The BAC clone (library number 568E7) containing goat SRY [18], was obtained from Dr. Eric Pailhoux. Primers designed from bovine sequences (MotAf, 5'-TCCTTCCTTTTCTCCTTTGTTG-3'; MotAr, 5'-TGGCCAAAAA CTACTTGATGA-3'; MotBf, 5'-GGAACAGGAGAGATCATGAAACA-3'; MotBr, 5'-CTTCACCATTCCCACTCACC-3'; MotCf, 5'-AACTTACATGCACTTCATTCCA-3'; and MotCr, 5'-GAGGACTTCA AATATTAATGTCATCAT-3') were used to amplify and sequence regions from the goat BAC. Assembly of goat sequences was performed using Sequencher version 4.6 (Gene Codes Corporation).

Sequence alignment and binding site analysis

mVISTA and SLAGAN (Shuffle-LAGAN) [45] were used for global alignment of the sequences after masking of repetitive elements. Conserved sequence blocks were analysed for conserved transcription factor binding sites using DiAlignTF software from Genomatix [46]. This analysis was carried out on the full-length conserved sequence blocks, as well as on core areas of high conservation found with ClustalW within each block. Each block was first checked for sites conserved across four species, then three species. Only transcription factor binding sites that showed homology across more than two species were included in this report. Matrix similarity scores for the conserved binding sites were calculated by the MatInspector software from Genomatix [46].

Abbreviations

BAC: bacterial artificial chromosome; bp: base pair; DNA: deoxyribonucleic acid; dpc: days post coitum; HMG: high mobility group; kb: kilobase pair; PCR: polymerase chain reaction; SRY: Sex determining region on the Y chromosome;

Authors' contributions

DR, JB, PK and SL designed the study. DR executed all of the experiments. DR, JB, PK and SL wrote and proof-read the manuscript. All authors read and approved the final manuscript.

Additional file 1

Multiple sequence alignment of Region A, B, C and D with transcription factor binding sites. ClustalW alignment of the four regions across human, bovine, caprine, porcine and mouse sequences with conserved transcription factor binding sites indicated using grey shading or boxing of relevant nucleotides. More detailed information on particular transcription factor families found per page are shown to the right of each alignment. Click here for file
  46 in total

1.  Glocal alignment: finding rearrangements during alignment.

Authors:  Michael Brudno; Sanket Malde; Alexander Poliakov; Chuong B Do; Olivier Couronne; Inna Dubchak; Serafim Batzoglou
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

2.  XY sex reversal associated with a deletion 5' to the SRY "HMG box" in the testis-determining region.

Authors:  K McElreavy; E Vilain; N Abbas; J M Costa; N Souleyreau; K Kucheria; C Boucekkine; E Thibaud; R Brauner; F Flamant
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

3.  WT-1 is required for early kidney development.

Authors:  J A Kreidberg; H Sariola; J M Loring; M Maeda; J Pelletier; D Housman; R Jaenisch
Journal:  Cell       Date:  1993-08-27       Impact factor: 41.582

4.  Tracking evolution's footprints in the genome.

Authors:  Jonathan B Weitzman
Journal:  J Biol       Date:  2003-06-23

5.  The nuclear receptor steroidogenic factor 1 acts at multiple levels of the reproductive axis.

Authors:  H A Ingraham; D S Lala; Y Ikeda; X Luo; W H Shen; M W Nachtigal; R Abbud; J H Nilson; K L Parker
Journal:  Genes Dev       Date:  1994-10-01       Impact factor: 11.361

6.  Expression of a candidate sex-determining gene during mouse testis differentiation.

Authors:  P Koopman; A Münsterberg; B Capel; N Vivian; R Lovell-Badge
Journal:  Nature       Date:  1990-11-29       Impact factor: 49.962

7.  Male development of chromosomally female mice transgenic for Sry.

Authors:  P Koopman; J Gubbay; N Vivian; P Goodfellow; R Lovell-Badge
Journal:  Nature       Date:  1991-05-09       Impact factor: 49.962

8.  DNA binding activity of recombinant SRY from normal males and XY females.

Authors:  V R Harley; D I Jackson; P J Hextall; J R Hawkins; G D Berkovitz; S Sockanathan; R Lovell-Badge; P N Goodfellow
Journal:  Science       Date:  1992-01-24       Impact factor: 47.728

9.  Normal structure and expression of Zfy genes in XY female mice mutant in Tdy.

Authors:  J Gubbay; P Koopman; J Collignon; P Burgoyne; R Lovell-Badge
Journal:  Development       Date:  1990-07       Impact factor: 6.868

10.  Rapid sequence evolution of the mammalian sex-determining gene SRY.

Authors:  L S Whitfield; R Lovell-Badge; P N Goodfellow
Journal:  Nature       Date:  1993-08-19       Impact factor: 49.962

View more
  6 in total

Review 1.  Switching on sex: transcriptional regulation of the testis-determining gene Sry.

Authors:  Christian Larney; Timothy L Bailey; Peter Koopman
Journal:  Development       Date:  2014-06       Impact factor: 6.868

2.  Conservation analysis of sequences flanking the testis-determining gene Sry in 17 mammalian species.

Authors:  Christian Larney; Timothy L Bailey; Peter Koopman
Journal:  BMC Dev Biol       Date:  2015-10-06       Impact factor: 1.978

3.  Y-chromosome phylogeny in the evolutionary net of chamois (genus Rupicapra).

Authors:  Trinidad Pérez; Sabine E Hammer; Jesús Albornoz; Ana Domínguez
Journal:  BMC Evol Biol       Date:  2011-09-26       Impact factor: 3.260

4.  Single Nucleotide Polymorphisms of NUCB2 and their Genetic Associations with Milk Production Traits in Dairy Cows.

Authors:  Bo Han; Yuwei Yuan; Yanhua Li; Lin Liu; Dongxiao Sun
Journal:  Genes (Basel)       Date:  2019-06-13       Impact factor: 4.096

5.  The Contribution of Y Chromosome Genes to Spontaneous Differentiation of Human Embryonic Stem Cells into Embryoid Bodies In Vitro.

Authors:  Simin Nafian Dehkordi; Farzaneh Khani; Seyedeh Nafiseh Hassani; Hossein Baharvand; Hamid Reza Soleimanpour-Lichaei; Ghasem Hosseini Salekdeh
Journal:  Cell J       Date:  2021-03-01       Impact factor: 2.479

6.  The evolutionary process of mammalian sex determination genes focusing on marsupial SRYs.

Authors:  Yukako Katsura; Hiroko X Kondo; Janelle Ryan; Vincent Harley; Yoko Satta
Journal:  BMC Evol Biol       Date:  2018-01-16       Impact factor: 3.260

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.