Literature DB >> 11178230

The sequence of human chromosome 21 and implications for research into Down syndrome.

Abstract

The recent completion of the DNA sequence of human chromosome 21 has provided the first look at the 225 genes that are candidates for involvement in Down syndrome (trisomy 21). A broad functional classification of these genes, their expression data and evolutionary conservation, and comparison with the gene content of the major mouse models of Down syndrome, suggest how the chromosome sequence may help in understanding the complex Down syndrome phenotype.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2000 PMID： 11178230 PMCID： PMC138845 DOI： 10.1186/gb-2000-1-2-reviews0002

Source DB: PubMed Journal: Genome Biol ISSN： 1474-7596 Impact factor: 13.583

Down syndrome (DS), affecting one in 700 live births, is the most common genetic cause of mental retardation [1]. The phenotype of DS is complex and variable in severity among individuals; it includes mental retardation and cognitive deficits, heart defects, hypotonia, motor dysfunction, immune system deficiencies, an increased risk of leukemia, and development of the pathology of Alzheimer's disease [2]. Most commonly, DS is due to the presence of an extra copy of a complete chromosome 21 and it is assumed that the DS phenotypic features are a direct consequence of the overexpression of some number of genes contained within 21q (21p is largely made up of ribosomal RNA genes and other repeat sequences). Recently, the essentially complete sequence of 21q - 33.5 Mb - was finished, and 225 genes were identified by the application of a variety of experimental and computer-based approaches [3]. The availability of this massive amount of new data has immediate importance to DS research. This review discusses the following issues: the reliability of gene identification; what is known or can be inferred about the biological function of the 225 identified genes; expression patterns of the novel genes; evolutionary conservation of, in particular, those genes lacking functional associations; inferences about the gene content of the major mouse models of DS and therefore the causes of the phenotypic differences among them; and reasonable next steps towards the goal of understanding the gene-phenotype relationships in DS. Throughout the following discussions, references to numbers and kinds of genes and additional analyses of 21q gene content are based on the data presented in [3].

Gene number

Two hundred and twenty-five is a surprisingly small number for the complete gene content of approximately 1% of the human genome. It is significantly less than 1% of the 50,000-100,000 genes previously estimated in total for the human genome (see also [4]) and it is significantly less than the 545 genes identified on chromosome 22 in approximately the same amount of DNA [5]. Previous data from the mapping of expressed sequence tags (ESTs) and genes, and efforts at cDNA selection, have consistently suggested that chromosome 21 was relatively gene-poor overall, and extremely so in some regions [6,7]. It could also be predicted that chromosome 21 would have fewer genes than chromosome 22. Approximately half of chromosome 21 is a large dark band when stained with Giemsa, and such bands are known to be gene-poor, while chromosome 22 is almost entirely comprised of gene-rich R bands [8,9]. In addition, trisomy 21 is compatible with life, while trisomy 22 is not [1]. Chromosome 21, therefore, was expected to be relatively gene-poor. Its extreme paucity of genes, however, justifies further consideration. In particular, are there consistent errors or weaknesses in gene-finding techniques that could have missed a significant proportion of genes? To see where errors may have accumulated, it is worth reviewing the gene identification methods. Genes were identified on the basis of the following types of data: identities or similarities to known proteins; identities to spliced ESTs; and patterns of consistent coding-exon prediction. First, protein matches identified genes that were identical or similar to known genes, and also found pseudogenes. With some minor corrections, all 107 genes associated with complete cDNAs that had been mapped previously to chromosome 21 (listed by Swiss-PROT [10] in March 2000) were found. In addition, within 21q, 52 protein matches were classed as pseudogenes on the basis of a lack of introns and, most importantly, on the presence of multiple in-frame stop codons. Given the inability of transcripts from these genes to produce a complete protein, it is unlikely that any pseudogenes were incorrectly classified. Secondly, for EST matches, only those that showed evidence of splicing were used - that is, those that were non-contiguous with genomic sequence, showed consensus splice sites, and represented essentially perfect matches (>95% identity) to the genomic sequence. This eliminates many of the artifacts common to cDNA libraries. A survey of the EST database [11] for fifty of the known chromosome 21 genes found that forty-three were present as spliced ESTs, six were present only as unspliced ESTs (five of these were intronless genes), and one was not present in dbEST. Finally, the criterion of consistent exon prediction required that two of the three coding-exon prediction programs (Grail, Genscan and MZEF) agreed on the location of an exon, and that a minimum of three consistent exons were found within < 60 kb, with introns <30 kb. It is noteworthy that the coding regions of intronless genes were well predicted but only as single exons. Such exons tend to be very large - greater than a kilobase (kb) in length - in contrast to typical coding exons that average 100-150 base pairs (bp). After making exceptions for, and including, large single-coding-exon genes, by these criteria, all but one of the 107 known genes could be identified by exon prediction. This included very large genes, such as DSCAM, which spans >800 kb, and GRIK1, which spans >400 kb, both of which were well predicted through at least some of their coding regions. The important conclusion here is that each of the 107 genes previously known to map to chromosome 21 would have been identified, in the absence of protein similarities, by the criteria of EST matches plus exon prediction. These criteria do not, in most cases, define a complete gene structure, but they do successfully indicate the presence of a gene. Thus, unless novel genes have very different characteristics, it is reasonable to expect a similarly high level of success in their identification. Using these criteria a further 118 genes were identified. What is likely to have been missed? First, there are gaps in the sequence of 21q. They are few (three) and small (<50 kb each), however, and therefore cannot harbor large numbers of genes. Second, genes that would not be identified would have to possess the following features: no similarity to any known protein; consistently very large introns (>30-60 kb), so that patterns of predicted exons would not be scored; and long intronless 3' untranslated regions (UTRs) or restricted and/or low expression levels, so that no spliced EST is present in dbEST. It is certainly possible that some number of genes with such characteristics exist; that they represent a significant proportion of chromosome 21 genes is unlikely, however. The distal one third of 21q is the most gene-rich (and GC-rich); but intergenic distances here are not large enough to accommodate additional genes with uniformly large introns. So, unless coding exons in these genes are for some reason not recognized, such genes would be scored on the basis of patterns of predicted exons. The proximal two thirds, in contrast, is uniformly AT-rich and does have large segments lacking gene features; indeed, there is one segment of approximately 7 Mb that harbors only seven genes. Here there is room for numerous genes that have large introns and restricted expression. One argument against this is a biological one: an individual who is monosomic for this region has only mild phenotypic abnormalities [12]. A second argument is a general scarcity of any consistent exon prediction in the region, regardless of 'intron' size. If there are many coding exons within this region, they must also be largely unrecognized by prediction programs. Together, these data suggest that the total of 225 genes is likely to be reliable: false negatives should be few. But what about the possibility of false positives? Genes with complete protein or cDNA sequences identical or highly similar to known genes (these are the class 1 and class 2 genes in [3]) are unambiguous. Gene models (classes 3 and 4), however, are still open to further investigation and interpretation. For example, some investigators will choose to disregard a specific match to a protein domain if the similarity is weak. How many exons to include in a model, and whether an EST should be included will also sometimes be debatable. Thus, details in the gene catalog of 21q should be considered provisional. Investigators should review the basis for specific gene predictions of interest (available at [13]).

The nature of chromosome 21 genes

DS can be considered as a contiguous gene syndrome, with almost the entirety of 21q the relevant region. The segment of 21q22.2 that is referred to as the Down syndrome chromosomal region (DSCR) was defined to contain genes relevant to aspects of the DS phenotype on the basis of the phenotypes of several cases of partial trisomy 21 [14,15]. Data using a larger number of partial trisomy cases showed that only the most centromeric region of 21q could be excluded from containing relevant genes, in particular for mental retardation [16]. It is assumed that overexpression of chromosome 21 genes, as a result of their presence in an extra copy, causes the DS phenotype. Are all chromosome 21 genes overexpressed? Can overexpression of some genes be tolerated with no phenotypic effect? How many genes are overexpressed and relevant? Currently, there are no answers to these questions. It is, however, worth considering what is known about the function of chromosome 21 genes. Table 1 lists the 122 genes for which some functional association can be inferred. Functional inferences are based on partial or complete similarities of the chromosome 21 genes or gene models to proteins or protein domains for which experimental data has demonstrated a specific function. For example, ZNF295 is a gene model with an open reading frame that contains zinc finger domains. Some zinc finger proteins have been shown to be transcription factors, so ZBF295 is classed as such. In general, genes are classified as broadly as possible. For example ITGB2, is classed only as a cell adhesion molecule, although because it has been studied essentially only in lymphocytes, it is regarded as an immune system gene [17]. Future studies may well reveal functions other than those that have been observed, so it is as well to speculate about the functions of genes as broadly as possible.

Table 1

Chromosome 21 functional gene categories

Functional categories	Number of genes	Functional assignments
Transcription factors, regulators,	17	GABPA, BACH1, RUNX1, SIM2, ERG, ETS2 (transcription factors); ZNF294, ZNF295, Pred65,
and modulators		^*ZNF298, APECED (zinc fingers); KIAA0136 (leucine zipper); GCFC (GC-rich binding protein);
		SON (DNA binding domain); PKNOX1 (homeobox); HSF2BP (heat shock transcription factor
		binding protein); NRIP1 (modulator of transcriptional activation by estrogen)
Chromatin structure	4	H2BFS (histone 2B), HMG14 (high mobility group), CHAF1B (chromatin assembly factor), PCNT
		(pericentrin, an integral component of the pericentriolar matrix of the centrosome)
Proteases and protease inhibitors	6	BACE (beta-site APP cleaving enzyme); TMPRSS2, TMPRSS3 (transmembrane serine proteases);
		ADAMTS1, ADAMTS5 (metalloproteinases); CSTB (protease inhibitor)
Ubiquitin pathway	4	USP25, USP16 (ubiquitin proteases); UBE2G2 (ubiquitin conjugating enzyme); SMT3A (ubiquitin-like)
Interferons and immune response	9	IFNAR1, IFNAR2, IL10RB, IFNGR2 (receptors/auxilliary factors); MX1, MX2 (interferon-induced);
		CCT8 (T-complex subunit), TIAM1 (T-lymphoma invasion and metastasis inducing protein),
		TCP10L (T-complex protein 10 like)
Kinases	8	ENK (enterokinase); MAKV, MNB, KID2 (serine/threonine); PHK (pyridoxal kinase), PFKL
		(phosphofructokinase); ^*ANKRD3 (ankyrin-like with kinase domains); PRKCBP2 (protein kinase C
		binding protein)
Phosphatases	2	SYNJ1 (polyphosphinositide phosphatase); PDE9A (cyclicphosphodiesterase)
RNA processing	5	rA4 (SR protein), U2AF35 (splicing factor), RED1 (editase), PCBP3 (poly(C)-binding protein);
		^*RBM11 (RNA-binding motif)
Adhesion molecules	4	NCAM2 (neural cell), DSCAM; ITGB2 (lymphocyte); c21orf43 (similar to endothelial tight junction
		molecule)
Channels	7	GRIK1 (glutamate receptor, calcium channel); KCNE1, KCNE2, KNCJ6, KCNJ15 (potassium);
		^*CLIC1l (chloride); TRPC7 (calcium)
Receptors	5	CXADR (Coxsackie and adenovirus); Claudins 8, 14, 17 (Claustridia); Pred12 (mannose)
Transporters	2	SLC5A3 (Na-myoinositol); ABCG1 (ATP-binding cassette)
Energy metabolism	4	ATP50 (ATP synthase oligomycin-sensitivity conferral protein); ATP5A (ATPase-coupling factor 6);
		NDUFV3 (NADH-ubiquinone oxoreductase subunit precursor); CRYZL1 (quinone
		oxidoreductase)
Structural	4	CRYA (lens protein); COL18, COL6A1, COL6A2 (collagens)
Methyl transferases	3	DNMT3L (cytosine methyl transferase), HRMTIII (protein arginine methyl transferase); Pred28
		(AF139682) (N6-DNA methyltransferase)
SH3 domain	3	ITSN, SH3BGR, UBASH3A
One carbon metabolism	4	GART (purine biosynthesis), CBS (cystathionine-β -synthetase), FTCD (formiminotransferase
		cyclodeaminase), SLC19A1 (reduced folate carrier)
Oxygen metabolism	3	SOD1 (superoxide dismutase); CBR1, CBR3 (carbonyl reductases)
Miscellaneous	28	HLCS (holocarboxylase synthase); LSS (lanosterol synthetase); B3GALT5 (galactosyl transferase);
		^*AGPAT3 (acyltransferase); STCH (microsomal stress protein); ANA/BTG3 (cell cycle control);
		MCM3 (DNA replication associated factor); APP (Alzheimer's amyloid precursor); WDR4, WDR9
		(WD repeat containing proteins); TFF1, 2, 3 (trefoil proteins); UMODL1 (uromodulin); ^*Pred5
		(lipase); ^Pred3 (keratinocyte growth factor); KIAA0653, ^IgSF5 (Ig domain); TMEM1, ^*Pred44
		(transmembrane domains); TRPD (tetratricopeptide repeat containing); S100b (Ca binding); PWP2
		(periodic tryptophan protein); DSCR1 (proline rich); DSCR2 (leucine rich); WRB (tryptophan rich
		protein); Pred22 (tRNA synthetase); SCL37A1 (glycerol phosphate permease)

In the table, 122 genes are assigned. The majority have complete or presumed complete cDNA sequences. Functional assignments have been based either on literature reports of direct experiment or on inferences from similarities to other proteins. Genes where models are incomplete (*) contain domains that suggest a function. Functional categories were chosen to be broadly descriptive; each gene appears in only one category.

Every biologist will bring their own expertise to bear in deciding which of the genes in Table 1 are of greatest potential relevance to the DS phenotype. Transcription factors are attractive candidates because imbalance of one component of a transcription factor complex may alter the effectiveness of the activation or repression of transcription of target genes. Genes within the ubiquitin pathway may alter rates of target protein degradation. Cell adhesion was long ago postulated, with intriguing preliminary data [18], to play a role in altering rates and extents of cell migration during development. Overexpression of one potassium channel gene has been shown to disregulate expression of other channel genes, affecting neuronal network excitability [19]. If mental retardation and cognitive deficits are the primary focus of study, almost any of the categories in Table 1 could be relevant, such is the extent of our current understanding of the complex developmental processes leading to these conditions.

Expression data

Only about half the 225 chromosome 21 genes have any functional association, and some of these are particularly weak - for example, the presence of a transmembrane domain is not very definitive. In some cases, the lack of protein or functional domain data may be due to the lack of complete coding sequence information. While awaiting the generation of complete cDNA sequences (which may be laborious to obtain), and even for further analysis of complete cDNAs lacking functional associations, expression patterns may help in prioritizing genes for further study. Of the novel genes with incomplete cDNAs, thirty-eight are represented by ESTs from Soares or CGAP cDNA libraries [20]. Of these, only seven would be classed as ubiquitous in expression - that is, present in dbEST with more than 30 entries from numerous tissues. Twenty-six ESTs are each associated with fewer than five dbEST entries. Five of these ESTs are seen only in testes/prostate and three are seen only in fetal sources. While there are features of dbEST construction that can produce artifactual pictures of expression patterns, these data suggest that the novel genes within 21q may be largely of limited expression. In some cases at least, this is consistent with the failure to identify these genes previously. For relevance to mental retardation and cognitive deficits, genes with brain-specific expression, such as PCP4 [21], are of interest. Equally interesting are examples of brain-specific alternative processing, as is seen with Intersectin and DSCAM [22,23]. In an analysis of a number of novel gene models, alternative processing, some of it brain-specific, was observed in the majority of cases [24]. It is unlikely that even most known genes have been examined thoroughly for instances of multiple transcripts. Because these may alter protein sequences and therefore function, their role in DS may be relevant.

Evolutionary conservation

Model organisms will provide the basis for functional studies of the known and novel chromosome 21 genes. The genomes of Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila [25,26,27] have been completely sequenced, and thus the complete set of proteins of each of these organisms is known. Annotation of the Drosophila genome identified approximately 13,500 genes. Comparison of the translations of all annotated chromosome 21 genes with the Drosophila set identified 23 chromosome 21 gene products with similarity to a Drosophila protein over the complete length. Many of these similarities involve basic biochemical/biological functions and include such proteins as SOD1 (superoxide dismutase), GART (a purine biosynthesis enzyme), CBS (cys-tathionine beta-synthetase), and those involved in RNA splicing and the ubiquitin pathway. A further set of 31 genes showed excellent informative matches but only over a domain or subregion of the human protein. Previously known homologs include MNB (minibrain) and SIM2 (single-minded). Perhaps most interesting in both sets are those genes for which there is little or no functional data. Table 2 lists some of the known and novel chromosome 21 genes with partial and complete similarities in Drosophila. Among the novel genes, identities at the amino-acid level range as high as 64% (c21orf19) and over as many as 1,600 residues (c21orf5). Additional details remain to be resolved; for example, in several cases the lengths of the human and Drosophila proteins are significantly different. Correcting these differences, if it is necessary, may strengthen the similarity data. In addition, defining complete cDNAs may reveal new homologies not discernible with partial gene models. Determining the phenotypes of mutants in the Drosophila genes is likely to shed light on the function of the homologous human genes.

Table 2

Similarities between selected human and Drosophila gene products

	Size (amino acids)

						Length of similarity
Gene	Human	Drosophila	% ID^*	%Sim^*	E value	(amino acids)
SOD1	154	153	61	73	10^-47	152
GART	1,010	(1,747)	46	63	10^-180	995
CRYAA	173	187	38	56	10^-25	154
UBE2G2	165	167	78	88	10^-70	165

DSCR3	297	295	49	69	10^-72	281
KIAA0958	428	490	33	49	10^-50	375
c21orf4	158	113	37	62	10^-15	94
c21orf19	439	295	64	77	10^-94	291

SIM2	667	634	70	79	10^-125	353
MNB	763	722	44	61	10^-69	314
APP	770	816	24	37	10^-38	497
CHAF1B	559	747	47	64	10^-98	381
KIAA0179	740	687	26	42	10^-17	272
KIAA0539	2,300	2,029	26	42	10^-28	409
DSCR1	197	292	43	67	10^-33	153
c21orf2	256	454	53	67	10^-33	146
c21orf5	2,298	2,599	29	47	10^-115	1,082
			+ 40	58	10^-96	533

* The number of amino acids over which the % identity (%ID) and the % similarity (%Sim) was calculated. The E value is the expectation value, an indication of the probability of finding this level of similarity by chance.

Mouse models

Regions of human chromosome 21 are conserved within segments of three mouse chromosomes. The centromere-proximal region of chromosome 21 through the MX genes is homologous with the telomeric region of mouse chromosome 16 (Figure 1). The next approximately 2 Mb segment of chromosome 21 is homologous with the centromere-proximal region of mouse chromosome 17, and the telomeric 2 Mb of chromosome 21 is homologous with an internal segment of mouse chromosome 10. On the basis of current data, the order of chromosome 21 homologues in the mouse chromosome 16 and 10 segments appears to be completely conserved, although the boundaries of these regions are still approximate [28,29]. For example, the most centromere-proximal gene on chromosome 21 verified to map to mouse chromosome 16 is STCH. There are seven genes proximal to this that should be mapped in mouse. Similarly, although it is known that Mx maps to mouse chromosome 16 and Tff3, Cbs and Crya map to mouse chromosome 17, there are 11 genes between and among these that are of unknown map location in mouse. Lastly, PDXK is the most proximal chromosome 21 gene mapped to mouse chromosome 10 [28]. Genes in this region are relatively small, however, and additional chromosome 21 genes may be located on mouse chromosome 10 between Pdxk and the adjacent region homologous with human chromosome 19. Defining the endpoints of these homologous regions is critical for evaluating gene-phenotype correlations within existing mouse models and for designing new ones.

Figure 1

The regions of human chromosome 21 that are syntenic with mouse chromosomes are indicated on the left; those that are trisomic in the major mouse models are indicated on the right.

Currently, the best mouse models of DS are the mouse chromosome 16 segmental trisomies, Ts65Dn and Ts1Cje. Ts65Dn is trisomic for the region spanning an undefined distance proximal to App through Mx to presumably the telomere of chromosome 16. The phenotype of Ts65Dn includes working memory impairment and long term memory deficits; delayed development and lower body weight; motor dysfunction; decreased responsiveness to pain; hyperactivity; and decreased ability to inhibit behavior (reviewed in [30,31]; see also [32,33]). Particularly interesting are observations of age-related loss of cholinergic neurons, decreased numbers of asymmetric synapses in the temporal cortex, abnormalities in neuron number in hippocampal regions, and deficiencies of beta-noradrenergic transmission within the hippocampus and cerebral cortex [34,35,36,37,38]. Some of these deficits have been observed in DS; others suggest new avenues of investigation. Knowing which genes cannot be responsible for the phenotype can be helpful. Table 3a lists the 32 genes found centromeric to the Alzheimer's-associated gene APP on chromosome 21. On the basis of current comparative mapping data, most of these may be present in only two copies in Ts65Dn and therefore would not contribute to its phenotypic features. The Ts1Cje mouse is a more recent model, and is trisomic for the region of mouse chromosome 16 from Sod1 through Mx (and again presumably to the telomere). While it has not yet been studied so thoroughly as Ts65Dn, there are phenotypic differences between the two mice. In contrast to Ts65Dn, Ts1Cje shows hypoactivity, no loss of cholinergic neurons, and no deficits in the visible platform part of the water maze tests (which tests only memory and not the ability to make spatial correlations) [39]. Table 3b lists 27 genes that are expected to be trisomic in the Ts65Dn but only disomic in the Ts1Cje, based on the genetic map [29]. It is tempting to conclude that these genes must account for the phenotypic differences, but it must be kept in mind that the two mouse strains have been produced on different genetic backgrounds, which may have phenotypic consequences.

Table 3

Human chromosome 21 centromere-proximal genes

(a) Genes proximal to APP		(b) Genes from APP to SOD1
Gene	Classification	Gene	Classification

Pred 65	Zn finger	APP
Pred 3	Keratinocyte growth factor	Pred24
Pred 4	Similar to KIAA1074	ADAMTS1	Metalloproteinase
orf15	EST	ADAMTS5	Metalloproteinase
Pred5	Lipase	Pred25	Exon
RBM11	RNA binding	Pred26	Exon
Pred6	Exon	orf23	EST
STCH	Stress protein	Pred27	Exon
SAMSN-1	Similar to KIAA0790	Pred28	Methyltransferase
NRIP1	Nuclear factor	ZNF294	Zinc finger
USP28	Ubiquitin protease	orf6	EST
orf34	EST	USP16	Ubiquitin protease
orf35	EST	CCT8	T-complex subunit
orf36	EST	orf7	Exon
orf37	EST	Bach1	Transcription factor
CXADR	Viral receptor	orf12	EST
BTG3	Cell cycle control	orf8	EST
YG81	CDNA	GRIK1	Glutamate receptor
orf39	EST	Orf41	EST
Pred12	Mannose receptor	orf9	EST
PRSS7	Enterokinase	CLDN17	Claudin receptor
orf40	EST	CLDN8	Claudin receptor
NCAM2	Neural adhesion	Pred29	Exon
Pred15	Exon	Pred30	Exon
Pred16	EST	TIAM1	Lymphoma metastasis
orf53	EST	Pred 31	Exon
orf42	EST	SOD1
Pred21	EST
Red22	tRNA synthetase
orf43	Junction adhesion
ATP5A	ATPase factor
GABPA	Transcription factor

(a) Mouse homologs possibly disomic in both Ts65Dn and Ts1Cje. (b) Mouse homologs expected to be trisomic in Ts65Dn but disomic in Ts1Cje.

Segmental trisomies for the regions of chromosome 21 homologous with mouse chromosomes 17 and 10 do not exist. If Mx is the most telomeric gene on mouse chromosome 16 and Pdxk is the most centromeric on mouse chromosome 10, there are 33 genes within the approximately 2.2 Mb of the mouse chromosome 17 region (Table 4) and 50 genes within the approximately 2.9 Mb of the mouse chromosome 10 region. Adding the maximum of 32 genes not trisomic in the Ts65Dn, half of the chromosome 21 homologous genes are not trisomic in Ts65Dn. The phenotypic consequences of these genes must be assessed in some fashion, because the Ts65Dn lacks some features of DS. Constructing single-gene transgenic mice expressing each of these and then combining each with the Ts65Dn by breeding would be laborious and probably of limited success. An alternative is to generate additional segmental trisomies using the Cre-lox system [40].

Table 4

Human genes with homologues mapping within the 2.2 Mb maximum mouse chromosome 17 homologous region

		Chromosome	Exon
Gene	Functional class	21 ORFs	model
		orf 20
		orf 21
		orf 22
ANKRD3	Ankyrin kinase
ZNF298	Zinc finger	orf 25
ZNF295	Zinc finger
UMODL1	Uromodulin		Pred46
ABCG1^*	ATP-binding casette transporter
TFF3 ^*	Intestinal trefoil
TFF2	Spasmolytic peptide
TFF1	Estrogen-induced
TMPRSS3	Membrane serine protease
UBASH3A	SH3 domain
TSGA2^*	Testis-specific
SLC37A1	Glycerol 3-phosphate permease
PDE9A	Cyclic phosphodiesterase
WDR4	WD repeats
NDUFV3	NADH-ubiquinone oxoreductase subunit
PKNOX1^*	Homeobox
CBS^*	Cystathionine β synthetase
U2AF1	Splicing factor
CYRA ^*	Alpha-crystallin
HSF2BP	Heat shock transcription factor binding
			Pred47
			Pred48
SNF1LK^*	KID2 kinase
			Pred49
			Pred50
			Pred51
H2BFS	Histone
KIAAD179

Genes are listed in order from centromere to telomere on chromosome 21. * Genes verified as mapping to mouse chromosome 17.

From genes to functions

Analysis of the complete sequence of chromosome 21 has provided the first look at all candidate DS genes. The next steps require verifying and refining the predicted, incomplete gene models, defining new models as necessary, and isolating complete cDNAs for each gene. With complete coding sequences, protein sequences can be examined for motifs, domains, and biochemical characteristics that may suggest function. The most challenging problem will then be determining the functions of these genes and the other 'known' genes. While it is tempting to focus on genes whose protein characteristics suggest a hypothesis for relevance to some aspect of DS, the more than 100 genes distributed throughout the chromosome that have no functional association are too large a dataset to ignore. For these and other genes on 21q, detailed expression analysis may be informative. Demonstration that a gene shows increased expression in the trisomic state by northern blot or RT-PCR analysis, followed by RNA tissue in situ hybridization to define specific cell types, brain regions and developmental stages of expression, may help in selecting genes of greater or lesser interest. The most direct assessment of function will require mutation or overexpression of individual genes or sets of genes. For these experiments, the 'complete' protein databases for S. cerevisiae, C. elegans and Drosophila will provide homologous genes that can be analyzed in more tractable systems. The increasing complexity of the zebrafish EST database will add another model organism system of increasing utility. Issues remain with all model organisms, however, of verifying correct gene structures, identifying orthologous genes versus merely homologous genes, and interpreting mutation and knockout data in one system versus overexpression in another. The ultimate model organism, of course, will remain the mouse. Multiple genes can be 'added' to the Ts65Dn using transgenics carrying bacterial chromosomes (BACs), to look for enhanced DS-relevant phenotypes. The human sequence will be useful here in ensuring that clones are extensive enough to contain appropriate regulatory regions. Single-gene knockouts can also be 'subtracted' from the Ts65Dn mouse model, to search for amelioration of phenotype. With good biological intuition and luck, it may not be necessary to understand all of the genes within chromosome 21 before promising candidates are identified and the design of potential therapeutics can begin. The regions of human chromosome 21 that are syntenic with mouse chromosomes are indicated on the left; those that are trisomic in the major mouse models are indicated on the right. Chromosome 21 functional gene categories In the table, 122 genes are assigned. The majority have complete or presumed complete cDNA sequences. Functional assignments have been based either on literature reports of direct experiment or on inferences from similarities to other proteins. Genes where models are incomplete (*) contain domains that suggest a function. Functional categories were chosen to be broadly descriptive; each gene appears in only one category. Similarities between selected human and Drosophila gene products * The number of amino acids over which the % identity (%ID) and the % similarity (%Sim) was calculated. The E value is the expectation value, an indication of the probability of finding this level of similarity by chance. Human chromosome 21 centromere-proximal genes (a) Mouse homologs possibly disomic in both Ts65Dn and Ts1Cje. (b) Mouse homologs expected to be trisomic in Ts65Dn but disomic in Ts1Cje. Human genes with homologues mapping within the 2.2 Mb maximum mouse chromosome 17 homologous region Genes are listed in order from centromere to telomere on chromosome 21. * Genes verified as mapping to mouse chromosome 17.

33 in total

1. Alterations of central noradrenergic transmission in Ts65Dn mouse, a model for Down syndrome.

Authors: M Dierssen; I F Vallina; C Baamonde; S García-Calatayud; M A Lumbreras; J Flórez
Journal: Brain Res Date: 1997-02-28 Impact factor: 3.252

2. Clonability and gene distribution on human chromosome 21: reflections of junk DNA content?

Authors: K Gardiner
Journal: Gene Date: 1997-12-31 Impact factor: 3.688

3. Ts1Cje, a partial trisomy 16 mouse model for Down syndrome, exhibits learning and behavioral abnormalities.

Authors: H Sago; E J Carlson; D J Smith; J Kilbridge; E M Rubin; W C Mobley; C J Epstein; T T Huang
Journal: Proc Natl Acad Sci U S A Date: 1998-05-26 Impact factor: 11.205

4. Normalization and subtraction: two approaches to facilitate gene discovery.

Authors: M F Bonaldo; G Lennon; M B Soares
Journal: Genome Res Date: 1996-09 Impact factor: 9.043

5. Overexpression of a Shaker-type potassium channel in mammalian central nervous system dysregulates native potassium channel gene expression.

Authors: M L Sutherland; S H Williams; R Abedi; P A Overbeek; P J Pfaffinger; J L Noebels
Journal: Proc Natl Acad Sci U S A Date: 1999-03-02 Impact factor: 11.205

6. Developmental abnormalities and age-related neurodegeneration in a mouse model of Down syndrome.

Authors: D M Holtzman; D Santucci; J Kilbridge; J Chua-Couzens; D J Fontana; S E Daniels; R M Johnson; K Chen; Y Sun; E Carlson; E Alleva; C J Epstein; W C Mobley
Journal: Proc Natl Acad Sci U S A Date: 1996-11-12 Impact factor: 11.205

7. DSCAM: a novel member of the immunoglobulin superfamily maps in a Down syndrome region and is involved in the development of the nervous system.

Authors: K Yamakawa; Y K Huot; M A Haendelt; R Hubert; X N Chen; G E Lyons; J R Korenberg
Journal: Hum Mol Genet Date: 1998-02 Impact factor: 6.150

8. Hippocampal volume and neuronal number in Ts65Dn mice: a murine model of Down syndrome.

Authors: A M Insausti; M Megías; D Crespo; L M Cruz-Orive; M Dierssen; I F Vallina; R Insausti; J Flórez; T F Vallina
Journal: Neurosci Lett Date: 1998-09-11 Impact factor: 3.046

9. Two isoforms of a human intersectin (ITSN) protein are produced by brain-specific alternative splicing in a stop codon.

Authors: M Guipponi; H S Scott; H Chen; A Schebesta; C Rossier; S E Antonarakis
Journal: Genomics Date: 1998-11-01 Impact factor: 5.736

Review 10. Genome sequence of the nematode C. elegans: a platform for investigating biology.

Authors:
Journal: Science Date: 1998-12-11 Impact factor: 47.728

10 in total

1. Bach1 overexpression in Down syndrome correlates with the alteration of the HO-1/BVR-a system: insights for transition to Alzheimer's disease.

Authors: Fabio Di Domenico; Gilda Pupo; Cesare Mancuso; Eugenio Barone; Francesca Paolini; Andrea Arena; Carla Blarzino; Frederick A Schmitt; Elizabeth Head; D Allan Butterfield; Marzia Perluigi
Journal: J Alzheimers Dis Date: 2015 Impact factor: 4.472

2. Gene expression changes in the MAPK pathway in both Fragile X and Down syndrome human neural progenitor cells.

Authors: Erin L McMillan; Allison L Kamps; Samuel S Lake; Clive N Svendsen; Anita Bhattacharyya
Journal: Am J Stem Cells Date: 2012-06-03

3. The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies.

Authors: Jan O Korbel; Tal Tirosh-Wagner; Alexander Eckehart Urban; Xiao-Ning Chen; Maya Kasowski; Li Dai; Fabian Grubert; Chandra Erdman; Michael C Gao; Ken Lange; Eric M Sobel; Gillian M Barlow; Arthur S Aylsworth; Nancy J Carpenter; Robin Dawn Clark; Monika Y Cohen; Eric Doran; Tzipora Falik-Zaccai; Susan O Lewin; Ira T Lott; Barbara C McGillivray; John B Moeschler; Mark J Pettenati; Siegfried M Pueschel; Kathleen W Rao; Lisa G Shaffer; Mordechai Shohat; Alexander J Van Riper; Dorothy Warburton; Sherman Weissman; Mark B Gerstein; Michael Snyder; Julie R Korenberg
Journal: Proc Natl Acad Sci U S A Date: 2009-07-13 Impact factor: 11.205

Review 4. Beyond amyloid: Immune, cerebrovascular, and metabolic contributions to Alzheimer disease in people with Down syndrome.

Authors: Alessandra C Martini; Thomas J Gross; Elizabeth Head; Mark Mapstone
Journal: Neuron Date: 2022-04-25 Impact factor: 18.688

5. Deficits in human trisomy 21 iPSCs and neurons.

Authors: Jason P Weick; Dustie L Held; George F Bonadurer; Matthew E Doers; Yan Liu; Chelsie Maguire; Aaron Clark; Joshua A Knackert; Katharine Molinarolo; Michael Musser; Lin Yao; Yingnan Yin; Jianfeng Lu; Xiaoqing Zhang; Su-Chun Zhang; Anita Bhattacharyya
Journal: Proc Natl Acad Sci U S A Date: 2013-05-28 Impact factor: 11.205

6. The genome organization of Neurospora crassa at high resolution uncovers principles of fungal chromosome topology.

Authors: Sara Rodriguez; Ashley Ward; Andrew T Reckard; Yulia Shtanko; Clayton Hull-Crew; Andrew D Klocko
Journal: G3 (Bethesda) Date: 2022-05-06 Impact factor: 3.542

7. Cholinergic Senescence in the Ts65Dn Mouse Model for Down Syndrome.

Authors: Martina Kirstein; Alba Cambrils; Ana Segarra; Ana Melero; Emilio Varea
Journal: Neurochem Res Date: 2022-06-29 Impact factor: 4.414

8. GANP protein encoded on human chromosome 21/mouse chromosome 10 is associated with resistance to mammary tumor development.

Authors: Kazuhiko Kuwahara; Mutsuko Yamamoto-Ibusuki; Zhenhuan Zhang; Suchada Phimsen; Naomi Gondo; Hiroko Yamashita; Toru Takeo; Naomi Nakagata; Daisuke Yamashita; Yoshimi Fukushima; Yutaka Yamamoto; Hiroji Iwata; Hideyuki Saya; Eisaku Kondo; Keitaro Matsuo; Motohiro Takeya; Hirotaka Iwase; Nobuo Sakaguchi
Journal: Cancer Sci Date: 2016-02-09 Impact factor: 6.716

Review 9. Where Environment Meets Cognition: A Focus on Two Developmental Intellectual Disability Disorders.

Authors: I De Toma; L Manubens-Gil; S Ossowski; M Dierssen
Journal: Neural Plast Date: 2016-07-28 Impact factor: 3.599

Review 10. Down Syndrome, Obesity, Alzheimer's Disease, and Cancer: A Brief Review and Hypothesis.

Authors: Daniel W Nixon
Journal: Brain Sci Date: 2018-03-24

10 in total