| Literature DB >> 14709175 |
M Hild1, B Beckmann, S A Haas, B Koch, V Solovyev, C Busold, K Fellenberg, M Boutros, M Vingron, F Sauer, J D Hoheisel, R Paro.
Abstract
BACKGROUND: While the genome sequences for a variety of organisms are now available, the precise number of the genes encoded is still a matter of debate. For the human genome several stringent annotation approaches have resulted in the same number of potential genes, but a careful comparison revealed only limited overlap. This indicates that only the combination of different computational prediction methods and experimental evaluation of such in silico data will provide more complete genome annotations. In order to get a more complete gene content of the Drosophila melanogaster genome, we based our new D. melanogaster whole-transcriptome microarray, the Heidelberg FlyArray, on the combination of the Berkeley Drosophila Genome Project (BDGP) annotation and a novel ab initio gene prediction of lower stringency using the Fgenesh software.Entities:
Mesh:
Year: 2003 PMID: 14709175 PMCID: PMC395735 DOI: 10.1186/gb-2003-5-1-r3
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1The Heidelberg Collection R1. The combination of the BDGP cDNA Collection (BDGC) R1 with the BDGP genome annotation Release 2 contained 13,861 genes. The Heidelberg Prediction based on the Fgenesh ab initio gene prediction software contains 20,622 predictions. Assuming that genes that overlap by more than 30% of their exon sequence represent the same gene, we combined these two annotation sets. In addition we included 71 genes from different databases that were not present in either annotation. The resulting Heidelberg Collection consists of 21,396 potential genes and is the basis for the Heidelberg FlyArray.
Conservation between D. melanogaster and D. pseudoobscura
| Common predictions | Heidelberg Predictions | Expressed Heidelberg Predictions | |
| 10% overlap | 59.8% | 54.2% | 54.9% |
| 30% overlap | 53.5% | 31.3% | 32.5% |
| 50% overlap | 44.9% | 13.4% | 13.7% |
We tested the commonly predicted genes, the Heidelberg unique predictions and the Heidelberg unique predictions that are expressed according to our expression profiling for conservation to D. pseudoobscura. In the different rows, the percentage of predictions with 10%, 30% and 50% overlap with the respective D. pseudoobscura sequences is depicted.
Figure 2Developmental profiling. (a) Two-color hybridization (green: adult stage; red: 4-8 h old embryo) on the Heidelberg FlyArray directly showing the expression of genes unique to the Heidelberg Prediction (see lower part, spots within the green rectangle). (b) Correspondence cluster analysis of the developmental profiling. Samples from nine different stages of the Drosophila life-cycle were hybridized to the Heidelberg FlyArray. Each experiment was performed at least in triplicate, including a dye reversal to avoid bias. In the resulting plot, each hybridization of an individual developmental stage is depicted as a colored square for each replicate present on the slide. They all form distinct clusters (except for the larval stage), indicating the degree of reproducibility and specificity between them. As a consequence of the normalization process, only the median of all control hybridizations (0-4 h) is shown in the diagram as a single red square. Genes are shown as grey dots if they exhibited significant differential transcription levels. The distance between dots is low when their expression profiles show similar shape, independent of their absolute values. Colored guiding lines are displayed that correspond to the transcription profiles of virtual genes that would exhibit a signal in one condition only.
Summary for the Heidelberg Collection R1
| Total | Heidelberg Predictions | BDGP Release 2/ BDGC R1 | Other | Common predictions | |
| Heidelberg Collection R1 | 21,396 | 7,464 | 483 | 71 | 13,378 |
| Heidelberg PrimerSet | 21,306 | 7,463 | 442 | 65 | 13,336 |
| Heidelberg FlyArray | 20,948 | 7,319 | 425 | 62 | 13,142 |
| Expressed during development | 13,927 (66.5%) | 3,497 (47.8%) | 232 (54.6%) | 52 (83.9%) | 10,146 (77.2%) |
| Validation by RT-PCR | 386/478 (80.8%) | 334/424 (78.8%) | ND | ND | 52/54 (96.3%) |
The Heidelberg Collection R1 resulted from the combination of our novel annotation with the published BDGP Release 2 annotation and the sequences of the BDGC R1 clones. The PrimerSet includes only those annotations for which we could successfully design primer pairs, and likewise, the Heidelberg FlyArray sums up the annotations that are included on our novel microarray. The next row presents the results of the developmental profiling; numbers given in parentheses are the percentage of annotations represented on the array that scored positive. The last row shows the validation rate of the microarray results by RT-PCR (amplicon length ≤750 bp).
Summary for the Heidelberg Collection R2
| Total | Heidelberg Predictions | FlyBase Release 3.1 | Other | Common predictions | |
| Heidelberg Collection R2 | 19,879 | 6,224 | 605 | nd | 13,050 |
| Heidelberg PrimerSet | 19,095 (19,389) | 6,224 (294) | 296 | 40 | 12,535 |
| Heidelberg FlyArray | 18,837 (19,123) | 6,143 (286) | 288 | 39 | 12,367 |
| Expressed during development | 12,574 (66.8%) (12,734) (66.6%) | 2,636 (42.9%) (160) (55.9%) | 167 (57.9%) | 30 (76.9%) | 9,741 (78.8%) |
| Validation by RT-PCR | 354/438 (80.8%) | 218/293 (74.4%) | ND | ND | 136/145 (93.8%) |
The Heidelberg Collection R2 resulted from the combination of our novel annotation with the recently published FlyBase Release 3.1 annotation (excluding non-CG annotations, such as TE and CR). Only Heidelberg Predictions, primers and amplicons that matched with high stringency to the FlyBase genomic sequence Release 3.1 were included and re-assigned to the new Heidelberg Collection R2, thus all numbers represent a lower limit. Moreover, numbers in the table are corrected for several amplicons matching a single gene. As before, the PrimerSet includes all annotations for which we successfully designed primer pairs, and likewise, the Heidelberg FlyArray sums up the annotations that are included on the microarray. The next row presents the results of the developmental profiling; numbers given in parentheses are the percentage of annotations represented on the array that scored positive. The last row shows the validation rate of the microarray results by RT-PCR (amplicon length ≤750 bp). In the column for the Heidelberg Predictions we included below (in parentheses) the number of annotations that were unique to BDGP Release 2 and are not part of the FlyBase Release 3.1 CG annotations.
Figure 3Genomic location and expression patterns of Heidelberg unique predictions. The left part of the figure visualizes the genomic region (10 kb of sequence) for some examples of the novel Heidelberg Predictions. In addition, here is the corresponding amplicon present on the microarray as well as information on conserved regions (D. pseudoobscura in gray, A. gambiae in pink) and ESTs (orange). (a) HDC09253 and (b) HDC04256 lie within regions missing any BDGP/FlyBase prediction. HDC02494 is predicted within known FlyBase predictions but is located on the opposite strand (c). On the right, the in situ pictures show the expression patterns at three different time points of development, 0-4 h (top), 4-8 h (middle) and 8-12 h (bottom), respectively. Embryos are shown in (a, b) lateral view, (c) top: ventral view, middle and bottom: lateral view; anterior is always to the left.
Expression patterns obtained by in situ hybridization
| Name | GenBank accession number | Pattern | Evidence | Chromosome | Comments |
| HDC00027 | BK003260 | Ectoderm | ag | 2L | - |
| HDC00627 | BK003299 | Ectoderm | - | 2L | - |
| HDC00658 | BK003302 | Cellular blastoderm subset, salivary glands | - | 2L | - |
| HDC00966 | BK003326 | Cellular blastoderm subset | - | 2L | Intron CG11030 |
| HDC00979 | BK003327 | Yolk nuclei | - | 2L | - |
| HDC02005 | BK003369 | Maternal, subset of cells, embryonic large intestine | dp | 2L | - |
| HDC02009 | BK003370 | Protocerebrum primordium, trunk mesoderm primordium | dp | 2L | - |
| HDC02141 | BK003388 | Embryonic gut, cells in the head (stage 10/11) | - | 2L | - |
| HDC02262 | BK003403 | Weak signal | dp | 2L | - |
| HDC02272 | BK003405 | Weak signal | dp | 2L | - |
| HDC02494 | BK003424 | Mesoderm anlage | dp | 2L | Intron CG15288 |
| HDC02527 | BK003429 | Salivary glands | dp | 2L | - |
| HDC02528 | BK003430 | Protocerebrum primordium, anterior midgut primordium | dp | 2L | - |
| HDC02634 | BK003455 | Cellular blastoderm subset | dp | 2L | - |
| HDC02764 | BK003493 | Cellular blastoderm, ubiquitous, salivary glands | dp, EST | 2L | Intron CG4838 |
| HDC03057 | BK003539 | Maternal, blastoderm, ubiquitous, gut | dp, EST | 2L | Intron CG5803 (overlap) |
| HDC03960 | BK003614 | Trunk mesoderm anlage, head mesoderm primordium | dp | 2R | Opposite strand to CG17921 (overlap) |
| HDC04256 | BK003630 | Subset of mesoderm, gonads | dp | 2R | - |
| HDC05090 | BK003664 | Subset of cells (procephalic ectoderm primordium?), midgut | - | 2R | - |
| HDC05183 | BK003670 | Ubiquitous | dp | 2R | - |
| HDC05573 | BK003699 | Midgut, central nervous system | dp, EST | 2R | - |
| HDC06000 | BK003754 | Cellular blastoderm excluding ventral structures | dp | 2R | Intron CG12369, same staining as HDC05999 |
| HDC06241 | BK003785 | Ventral ectoderm anlage, trunk mesoderm anlage | dp | 2R | - |
| HDC06636 | BK003845 | Maternal | dp, EST | 2R | - |
| HDC07387 | BK003934 | Maternal, subset of cells until stage 12 | - | 2R | - |
| HDC07791 | BK001850 | Weak, ubiquitous at 4-8 h | - | 3L | - |
| HDC08265 | BK001908 | Subset of cells | - | 3L | - |
| HDC08749 | BK001956 | Weak, ubiquitous at 4-8 h | dp | 3L | - |
| HDC09080 | BK002002 | Salivary glands | dp | 3L | - |
| HDC09253 | BK002020 | Posterior spiracles, ectoderm | dp | 3L | - |
| HDC09513 | BK002067 | Weak, ubiquitous at 4-8 h | dp | 3R | - |
| HDC10019 | BK002122 | Salivary gland primordium, salivary glands | - | 3L | Intron CG10741 |
| HDC10028 | BK002123 | Ventral nerve cord | dp | 3L | Intron CG12478 |
| HDC10120 | BK002139 | Trunk mesoderm anlage, cuprophilic cells | - | 3L | Intron CG 17697 |
| HDC10292 | BK002156 | Lateral stripes blastoderm, third wave of neuroblasts | ag | 3L | Predicted in 2.0 as CG17014 |
| HDC10646 | BK002195 | Pole plasm, trunk mesoderm, salivary glands, embryonic midgut | dp | 3L | - |
| HDC10913 | BK002212 | Anterior midgut primordium, posterior midgut primordium | dp | 3L | Intron CG11614 (opposite strand) |
| HDC11249 | BK002252 | Malpighi, gonads | dp | 3L | Intron CG32432 (opposite strand) |
| HDC11512 | BK002283 | Weak, ubiquitous at 4-8 h | dp | 3L | - |
| HDC11876 | BK002318 | Weak, ubiquitous at 4-8 h | dp, EST | 3R | Intron CG12163 (opposite strand) |
| HDC11908 | BK002321 | Ventral nerve cord, embryonic central nervous system | dp | 3R | - |
| HDC12497 | BK002400 | Weak, ubiquitous at 4-8 h | - | 3R | - |
| HDC12511 | BK002404 | Ectoderm | dp | 3R | - |
| HDC12925 | BK002446 | Weak, ubiquitous at 4-8 h | EST | 3R | - |
| HDC13248 | BK002490 | Weak, ubiquitous at 4-8 h | - | 3R | - |
| HDC13350 | BK002511 | Ectoderm | - | 3R | Intron CG7855 (opposite strand) |
| HDC13470 | BK002532 | Cellular blastoderm subset segmentally repeated, ectoderm, embryonic foregut, embryonic hindgut | dp, EST | 3R | - |
| HDC13644 | BK002551 | Embryonic midgut, anal pads | - | 3R | - |
| HDC13905 | BK002563 | Trunk mesoderm anlage, embryonic midgut | dp | 3R | - |
| HDC14221 | BK002623 | Ventral ectoderm anlage, posterior endoderm anlage | dp | 3R | Intron CG31243 (opposite strand) |
| HDC14231 | BK002626 | Maternal, salivary glands | dp, EST | 3R | Short overlap with TE19396 |
| HDC14493 | BK002672 | Dorsal vessel | ag, dp | 3R | Intron CG31175 |
| HDC15090 | BK002773 | Maternal | dp | 3R | - |
| HDC15681 | BK002831 | Weak, ubiquitous at 4-8 h | dp, EST | 3R | - |
| HDC15728 | BK002837 | Maternal | dp | 3R | - |
| HDC16092 | BK002888 | Weak, ubiquitous at 4-8 h | dp | 3R | - |
| HDC16243 | BK002914 | Anterior endoderm anlage, anterior midgut primordium, posterior midgut primordium | dp | 3R | - |
| HDC16874 | BK002959 | Yolk nuclei, anterior endoderm anlage, embryonic midgut, subset of cells | ag | X | - |
| HDC16879 | BK002961 | Invaginating cells (hemocytes?/oenocytes?) | dp | X | - |
| HDC17351 | BK003012 | Embryonic gut | - | X | - |
| HDC18148 | BK003079 | Weak, ubiquitous at 4-8 h | - | X | - |
| HDC18326 | BK003102 | Weak, ubiquitous at 4-8 h | dp | X | Intron CG1691 |
| HDC18410 | BK003108 | Weak, ubiquitous at 4-8 h | dp | X | - |
| HDC19378 | BK003172 | Weak, ubiquitous at 4-8 h | dp | 3R | - |
| HDC19530 | BK003190 | Weak, ubiquitous at 4-8 h | - | X | - |
| HDC19643 | BK003204 | Midgut primordium, embryonic midgut | ag | X | Intron CG32541 (opposite strand) |
| HDC19645 | BK003205 | Cuprophilic cells | - | X | - |
For 40% of the novel genes tested we detected an expression pattern during embryonic development. Any overlap with regions conserved in D. pseudoobscura (dp), in A. gambiae (ag) or with D. melanogaster ESTs (EST) is listed. Note that the novel genes showing distinct in situ hybridization patterns are not enriched for conservation. With minimal overlap requirements applied, the numbers are consistent with those obtained for all Heidelberg Predictions (expressed and unexpressed) as described in the computational analysis of the combined annotation.
Figure 4In situ hybridization for HDC13470. (a-f) In situ hybridization of various stages of embryonic development using HDC13470 as probe. (g) The microarray-based expression profile (all stages compared to 0-4 h) is nicely reproduced by (h) the result of the northern analysis. Tub, tubulin. Embryos (a, b, d-f) are shown in lateral view, (c) is a ventral view, with the anterior always to the left.
Figure 5The Heidelberg FlyArray website. Screen shot of the Heidelberg FlyArray website based on the GBrowse platform. After selecting the genomic region of interest, for example by gene name, amplicon name or position, the user is offered a comparative view of the different gene models from the BDGP genome annotations Release 2, FlyBase Release 3.1 and the Heidelberg Prediction, as well as the placement of the amplicons chosen for the Heidelberg FlyArray. In addition, researchers find a comparison to D. pseudoobscura and A. gambiae along with a novel EST clustering and information on known P-element insertions.