| Literature DB >> 16179648 |
Jordan T Shin1, James R Priest, Ivan Ovcharenko, Amy Ronco, Rachel K Moore, C Geoffrey Burns, Calum A MacRae.
Abstract
Whole genome comparisons of distantly related species effectively predict biologically important sequences--core genes and cis-acting regulatory elements (REs)--but require experimentation to verify biological activity. To examine the efficacy of comparative genomics in identification of active REs from anonymous, non-coding (NC) sequences, we generated a novel alignment of the human and draft zebrafish genomes, and contrasted this set to existing human and fugu datasets. We tested the transcriptional regulatory potential of candidate sequences using two in vivo assays. Strict selection of non-genic elements which are deeply conserved in vertebrate evolution identifies 1744 core vertebrate REs in human and two fish genomes. We tested 16 elements in vivo for cis-acting gene regulatory properties using zebrafish transient transgenesis and found that 10 (63%) strongly modulate tissue-specific expression of a green fluorescent protein reporter vector. We also report a novel quantitative enhancer assay with potential for increased throughput based on normalized luciferase activity in vivo. This complementary system identified 11 (69%; including 9 of 10 GFP-confirmed elements) with cis-acting function. Together, these data support the utility of comparative genomics of distantly related vertebrate species to identify REs and provide a scaleable, in vivo quantitative assay to define functional activity of candidate REs.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16179648 PMCID: PMC1236720 DOI: 10.1093/nar/gki853
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Experimental and positive control locations, DNA of origin, overlap with other dataset locations, right and left flanking primers
| Name | Species | Location | Overlap to UC | L primer (5′→3′) | R primer (5′→3′) |
|---|---|---|---|---|---|
| HZ.H2 | Human | chr15: 94,589,011-094.591,049 | None | TCCAGGCAATAATGAGAAAGG | GAATGCTGGGAAAAGGAGAG |
| HZ.H3 | Human | chr13: 76,963,964-976,965,840 | None | TTGTGCCATCAGAGTCTTGC | AAATCCAGGCAGCTGACATT |
| HZ.H4 | Human | chr16: 78,290,409-478,292,298 | None | TACGATGGGATTGTGTCTGC | CAGCTTATTCAGAAAGGGCTTG |
| HZ.H5 | human | chr4: 80,764,529-580,766,417 | None | TCCTCCTTGTGTTCATTTCTTG | TTTTCACTTTTTCCCCCTTAC |
| HZ.H6 | Human | chr10: 8,085,826-828,087,770 | None | TTTTGTTCCTTCGGCGTTAG | GTGTCCAGATCTCCACGATG |
| HZ.H7 | Human | chr4: 112,127,925-112,129,834 | None | GTCACTCTTTGGGCTGGATG | CACGATGCTTTCAGAAATGTG |
| HZ.H8 | Human | chr18: 51,238,041-051,240,065 | uc.435+, uc.388+ | GGTACCAGGTTGGCATCAAG | ACAGGGGGATTATGAAGACG |
| HZ.H9 | Human | chr8: 106,567,420-106,569,441 | None | GCTACCTCACTTCACGCTTTC | TTCCCTAATGCTTTTTAACTTGC |
| HZ.H10 | Human | chr15: 65,739,790-765,742,119 | None | CAGGCTGTGCATTCTACCTG | AAGCAAATGCCACCTACAGC |
| HZ.H11 | Human | chr17: 35,465,051-035,467,078 | None | TTGGCTAGGGGTAGCAGTTG | AATCCCAAGGGTCCCATAAC |
| HZ.H12 | Human | chr1: 87,244,111-187,246,143 | uc.29+ | TCTGGCGTGTGACTATCTGG | TTATGGGCCCAGATTCAATG |
| HZ.Z1 | Zebrafish | chr13: 7,685,256-7,687,093 | None | CACCCAAGCACTGAGCGTATTCCA | GCCACACAATTGAAGCCTTT |
| HZ.Z2 | Zebrafish | chr18: 27,546,874-27,549,311 | None | CACCTCAATTGTTGCCGTAGTCCA | CCGTCATGCATTAGGTGTTG |
| HZ.Z3 | Zebrafish | chr14: 29,552,127-29,553,616 | None | CACCCGAGCTCGGTACCCTAATTG | ATTAATCGCGTTTGCTGAT |
| HZ.Z14 | Zebrafish | chr25: 17,004,520-17,002,621 | None | CACCAAACGAAGAACGGGGACTTT | GCGAGAGAAACGAAGGATTG |
| HZ.Z15 | Zebrafish | chr19: 29,873,848-29,870,609 | None | CACCCGCTCTGACCAAAGAGGTTC | TGTACACGCCAGGTTTAAGG |
| PC1 | Human | chr13: 70,465,895-870,468,171 | uc.351+ | CGATTGCTTTCTCTTTTCCAG | GTCGGAAAGAGGCATCTCAG |
| PC2 | Human | chr13: 70,233,780-770,226,388 | None | GCACATGCCAAGTCCTGTC | GAACAAACTCTGGATTTTTGAGC |
| PC3 | Human | chr13: 70,098,436-470,100,836 | None | GCGGCCGCATTAGCAAAAGAATACTTCCATGTCTGAG | ATGAAGAACCATCCCACTTG |
The two letters to the left of the period in the name denotes that the sequences were selected from the HZ comparative dataset; the first letter to the right of the period denotes the genome of origin (H for human and Z for zebrafish). The absolute position in chromosomes (where that information was available) is listed in the location column. If there is overlap to the UC dataset (10) it is listed in the Overlap to UC column. Finally, specific primers used to amplify genomic DNA are listed in the last two columns. Zebrafish L-primer sequences all contain an initial CACC which was used for cloning.
Figure 1Expression vector constructs use for zebrafish transgenesis. The GFP construct is comprised of the zCMLC2 promoter (blue) driving expression of green fluorescent protein (green). Cloned sequences were inserted into the gateway recombination sequence (red) 5′ to the expression cassette and assessed for their ability to modulate transcriptional activity from this promoter. Similarly, the GL2 firefly luciferase construct consisted of an expression module containing the zCMLC2 promoter and the luciferase coding sequence (gray). The control pGM:RL plasmid, which contained the renilla luciferase coding sequence (purple), was used for normalization.
Figure 2Bioinformatic characterization of HZ NC evolutionarily conserved regions and overlap with other genomic comparisons. (A) Breakdown and characterization of the 6.5 × 104 conserved regions shared between the human and zebrafish genomes. While most of the conserved sequences correspond to known genes and transcripts, 7% represent conserved, NC sequences. (B) A Venn diagram comparison of HZ, human:pufferfish and human:UC NC datasets. Where one species is indicated, the comparison is between human genomic sequence and that species. The UC set is that reported by Bejerano et al. (10). For each dataset, the number of total elements is indicated in parentheses while the percentage of total elements that are unique to that set is indicated in italics. The numbers within each segment of the Venn diagram represents the absolute number of elements.
Figure 3Quantitative analysis of reporter expression demonstrates that HZ NCECR regulate transcription in transgenic zebrafish. Sixteen conserved NC elements (11 from human [black], 1 from zebrafish [blue] and 2 paralogous sequences from both species), as well as negative control (Neg Ctrl, n = 8) and positive control (PC, n = 3) sequences were cloned for analysis. The percentage of embryos exhibiting extra-cardiac GFP expression are plotted in the horizontal axis (with standard error bars); the red vertical line indicates an approximate threshold delineating positive results which are also marked by an asterisk (*). For GFP-positive sequences, tissues with significant fluorescence in non-cardiac tissues, those tissues/organs which reproducibly demonstrated ectopic fluorescence are denoted with colored boxes. In those embryos where ectopic fluorescence did not exceed the statistical threshold for significance, no tissues are highlighted (gray boxes). Quantiative assessment of luciferase activity is presented in the LR column. LR values shown in this figure are the mean values of at least three samples. Experimental LRs which exceeded the mean of the pooled negative control LR value by 2 SD are highlighted in red.
Figure 4Spatial localization of a GFP expression in transgenic zebrafish. Panels A1–D1 show GFP-images only; panels A2–D2 are merged GFP and bright-field images to assist in the spatial localization of the fluorescent signal. (A) GFP expression construct containing conserved sequence HZ.H2 demonstrates ectopic fluorescence in the dermal cells (yellow arrows). (B) Skeletal myocytes (elongated cells) and skin fluorescence (round cells) are identified in a 3 dpf transgenic embryo injected with the expression construct containing the HZ.H7 conserved sequence. (C) Brain, retina and neural tube fluorescence following injection with HZ.H11:cMLC2:GFP construct. In each case the bright fluorescence between the yolk sac and head represents intrinsic, positive control GFP expression in cardiac myocytes (indicated by the red asterisk in panels A1–C1) directed by the cMLC2 promoter. (D) A higher power photomicrograph identifies axonal projections (magnified view shown in the inset, with yellow arrows) and central neuronal expression of GFP in the forebrain and hindbrain of day 3 developing embryos transgenic for the HZ.H6 conserved NC element. Panel E and F show early (<30 hpf) expression of reporter genes in the skeletal myocytes (panel E) and dermal cells (panel F) prior to hatching.