| Literature DB >> 21712190 |
Brian Jackson1, Chad Brocker, David C Thompson, William Black, Konstandinos Vasiliou, Daniel W Nebert, Vasilis Vasiliou.
Abstract
Members of the aldehyde dehydrogenase gene (ALDH) superfamily play an important role in the enzymic detoxification of endogenous and exogenous aldehydes and in the formation of molecules that are important in cellular processes, like retinoic acid, betaine and gamma-aminobutyric acid. ALDHs exhibit additional, non-enzymic functions, including the capacity to bind to some hormones and other small molecules and to diminish the effects of ultraviolet irradiation in the cornea. Mutations in ALDH genes leading to defective aldehyde metabolism are the molecular basis of several diseases, including gamma-hydroxybutyric aciduria, pyridoxine-dependent seizures, Sjögren-Larsson syndrome and type II hyperprolinaemia. Interestingly, several ALDH enzymes appear to be markers for normal and cancer stem cells. The superfamily is evolutionarily ancient and is represented within Archaea, Eubacteria and Eukarya taxa. Recent improvements in DNA and protein sequencing have led to the identification of many new ALDH family members. To date, the human genome contains 19 known ALDH genes, as well as many pseudogenes. Whole-genome sequencing allows for comparison of the entire complement of ALDH family members among organisms. This paper provides an update of ALDH genes in several recently sequenced vertebrates and aims to clarify the associated records found in the National Center for Biotechnology Information (NCBI) gene database. It also highlights where and when likely gene-duplication and gene-loss events have occurred. This information should be useful to future studies that might wish to compare the role of ALDH members among species and how the gene superfamily as a whole has changed throughout evolution.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21712190 PMCID: PMC3392178 DOI: 10.1186/1479-7364-5-4-283
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
List of all species examined in the current study, including the Latin name and common name and the number of unique ALDH genes found in each species
| Latin name | Common name | # |
|---|---|---|
| Human | 19 | |
| Common chimpanzee | 18 | |
| Common marmoset | 16 | |
| Sumatran orangutan | 18 | |
| Rhesus macaque | 20 | |
| Cow | 20 | |
| Norway rat | 21 | |
| House mouse | 21 | |
| Zebra finch | 15 | |
| Chicken | 14 | |
| Zebrafish | 25 |
The data reflect the number of gene records found in the NCBI Gene Entrez database for each species, as of 13th March 2011
ALDH genes and duplicated genes across species with respective chromosome (Chr) locations
| Gene (by homology) | Primates | Rodents | Birds | Fish | ||||
|---|---|---|---|---|---|---|---|---|
| Human | Orangutan | Cow | Rat | Mouse | Zebra finch | Chicken | ||
| 9q21.13 (216) | 9 (100174688) | 8 (281615) | 1q51 (24188) | 19 12.0 cM (11668) | Z (100223406) | Z (395264) | ||
| 15q21.3 (8854) | 15 (100171834) | 10 (535075) | 8q24 (116676) | 9 42.0 cM (19378) | 10 (751771) | 10 (395884) | 7 (116713) | |
| 15q26.3 (220) | 15 (100452276) | 21 (507093) | 1q22 (266603) | 7 (56847) | 10 (100231202) | 10 (395389) | 7 (751785) | |
| 1q51 (29651) | 19 20.0 cM (26358) | |||||||
| 9q11.1 (219) | 9 (100174654) | 8 (281618) | 5q22 (298079) | 4 B2 (72535) | ||||
| 3q21.3 (10840) | 3 (100172380) | 3 (505677) | 4 (64392) | 6 (107747) | 6 (798292) | |||
| 12q23.3 (160428) | 12 (100459691) | 5 (516864) | 7q13 (299699) | 10 (216188) | 1A (100230131) | 1 (418078) | 4 (100333269) | |
| 12q24.2 (217) | 12 (100171596) | 17 (508629) | 12q16 (29651) | 5 F-G1 (11669) | 15 (100217978) | 15 (416880) | 5 (393462) | |
| 17p11.2 (218) | 17 (100446485) | 19 (281617) | 10q22 (25375) | 11 34.25 cM (11670) | ||||
| 17p11.2 (224) | 17 (100171557) | 19 (513967) | 10q22 (65183) | 11 34.3 cM (11671) | 19 (100230924) | 19 (417615) | 15 (323653) | |
| 11q13 (221) | 11 (100450634) | 29 (511469) | 1q42 (309147) | 19 (67689) | 5 (100232483) | 5 (428813) | 5 (557008) | |
| 11q13 (222) | 1q42 (688800) | 19 (621603) | ||||||
| 3 (282559) | ||||||||
| 1p36 (8659) | 1 (10072770) | 2 (100126042) | 5q36 (641316) | 4 66.1 cM (212647) | 21 (100228902) | 21 (419467) | 11 (394133) | |
| 6p22 (7915) | 6 (100458767) | 23 (532724) | 17p11 (291133) | 13 A3.1 (214579) | 2 (100222151) | 2 (420818) | 16 (565235) | |
| 14q24.3 (4329) | 14 (100171652) | 10 (327692) | 6q31 (81708) | 12 39.0 cM (104776) | 5 (100226750) | 5 (423345) | 17 (436647) | |
| 5q31 (501) | 5 (100461726) | 7 (507477) | 18q12.1 (291450) | 18 29.0 cM (110695) | Z (100223716) | Z (426812) | 10 (334197) | |
| 6q23.2 (64577) | 6 (100450228) | 9 (513537) | 1p12 (685750) | 10 (237320) | 3 (100222753) | 3 (421695) | 23 (447801) | |
| 1q23.1 (223) | 1 (100173126) | 3 (537539) | 13q24 (64040) | 1 H2 (56752) | 8 (100225645) | 8 (424405) | 8 (100005587) | |
| 19q13.33 (126133) | 19 (100434496) | 18 (506329) | 1q22 (361571) | 7 (69748) | 3 (492710) | |||
| 10q24.3 (5832) | 10 (100173488) | 26 (514759) | 1q54 (361755) | 19 (56454) | 6 (423976) | 12 (557186) | ||
Numbers in parentheses indicate NCBI Entrez gene ID (GI). Records in bold text denote duplications compared with the human genome. Z, the sex Chr in birds (ZW system); cM, centiMorgans. Letter designations in mouse gene locations indicate chromosomal regions
*Zebrafish genes are named in accordance with nomenclature guidelines described at (http://www.zfin.org) and established by Mullins et al. [15].
List of the Entrez Gene genes ID (GI), chromosome location, presence of introns, gene type and recommended gene name of all ALDH genes in this study that show evidence of gene duplication, compared with that in the human genome
| Gene (by homology) | Species | NCBI Gene ID | NCBI Gene name | Chromosome | Chromosomal location | Introns | Gene type | Recommended gene name | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Ref Seq ID | Range | |||||||||
| Cow | 507093 | 21 | NC_007319.4 | 4,261,104 | 4,301,275 | yes | Parent gene | |||
| 534200 | 28 | NC_007329.4 | 11,750,749 | 11,762,637 | yes | Pseudogene-- detritus | ||||
| Zebrafish* | 393462 | 5 | NC_007116.4 | 71,734,127 | 71,754,941 | yes | Parent gene | |||
| 368239 | 5 | NC_007116.4 | 71,708,861 | 71,732,452 | yes | New gene | ||||
| 100332355 | 5 | NC_007116.4 | 71,632,543 | 71,658,511 | yes | New gene | ||||
| Zebrafish* | 323653 | 15 | NC_007126.4 | 21,001,391 | 21,009,951 | yes | Parent gene | |||
| 100000026 | 15 | NC_007126.4 | 20,970,670 | 20,976,922 | yes | New gene | ||||
| 100329417 | 21 | NC_007132.4 | 40,585,351 | 40,617,892 | yes | New gene | ||||
| 447920 | 21 | NC_007132.4 | 40,905,693 | 40,917,184 | yes | Pseudogene-- detritus | ||||
| Zebra finch | 100230924 | 19 | NC_011483.1 | 8,354,898 | 8,361,968 | yes | Parent gene | |||
| 100226132 | 19 | NC_011483.1 | 8,364,080 | 8,368,708 | yes | New gene | ||||
| Cow | 511469 | 29 | NC_007330.4 | 47,708,146 | 47,722,523 | yes | Parent gene | |||
| 508879 | 29 | NC_007330.4 | 47,568,715 | 47,575,449 | yes | New gene | ||||
| Zebra finch | 100232483 | 5 | NC_011469.1 | 7,960,933 | 7,967,624 | yes | Parent gene | |||
| 100229547 | 5 | NC_011469.1 | 7,968,165 | 7,973,465 | yes | New gene | ||||
| Rat | 688800 | 1 | NC_005100.2 | 206,549,529 | 206,553,424 | yes | Parent gene | |||
| 688778 | 1 | NC_005100.2 | 206,500,430 | 206,510,746 | yes | New gene | ||||
| Mouse | 621603 | 19 | NC_000085.5 | 3,972,328 | 3,981,665 | yes | Parent gene | |||
| 73458 | 19 | NC_000085.5 | 3,958,808 | 3,969,947 | yes | New gene | ||||
| Zebrafish* | 565235 | 16 | NC_007127.4 | 35,584,243 | 35,592,745 | yes | Parent gene | |||
| 100330723 | 16 | NC_007127.4 | 35,723,717 | 35,735,263 | yes | New gene | ||||
| Macaque | 702749 | 6 | NC_007863.1 | 122,937,640 | 122,989,782 | yes | Parent gene | |||
| 716090 | 14 | NC_007871.1 | 68,342,919 | 68,344,780 | no | Pseudogene-- RTevent | ||||
| Zebrafish* | 100005587 | 8 | NC_007119.4 | 21,476,877 | 21,484,987 | yes | Parent gene | |||
| 399481 | 2 | NC_007113.4 | 4,838,438 | 4,863,128 | yes | New gene | ||||
| 100006238 | 8 | NC_007119.4 | 21,464,110 | 21,473,710 | yes | New gene | ||||
| Zebrafish* | 557186 | 12 | NC_007123.4 | 29,670,615 | 29,686,508 | yes | Parent gene | |||
| 100332705 | 12 | NC_007123.4 | 29,643,982 | 29,661,436 | yes | New gene | ||||
*Zebrafish genes are named in accordance with nomenclature guidelines described at (http://www.zfin.org) and established by Mullins et al. [15].
RT, reverse transcription
Tabulation of all ALDH genes in this study that show evidence of gene duplication, compared with that in the human genome
| Species | Recommended gene name | RefSeq Protein ID | Protein length | Ka/Ks | Aligned sequences | % AA (unaligned) | % AA Identity (unaligned included) | % AA Identity (unaligned excluded) | Functional protein | Recommended protein name |
|---|---|---|---|---|---|---|---|---|---|---|
| XP_583647.3 | 537 | 0.234 | - | - | - | - | Yes | ALDH1A3 | ||
| XP_001789867.1 | 127 | 0.260 | (a)/(b) | 76.4 | 23.6 | 100 | No | Pseudogene | ||
| NP_956784.1 | 516 | 0.278 | - | - | - | - | Yes | Aldh2.1 | ||
| NP_998466.2 | 516 | 0.112 | (a)/(b) | 0 | 95.2 | 95.2 | Yes | Aldh2.2 | ||
| XP_002662252.1 | 516 | 0.041 | (a)/(c) | 0 | 95.2 | 95.2 | Yes | Aldh2.3 | ||
| (b)/(c) | 0 | 99.6 | 99.6 | - | - | |||||
| NP_997814.1 | 488 | 0.175 | - | - | - | - | Yes | Aldh3a2.1 | ||
| XP_001335979.2 | 489 | 0.402 | (a)/(b) | 1.8 | 63.1 | 64.9 | Yes | Aldh3a2.1 | ||
| XP_002666107.1 | 514 | 0.175 | (a)/(c) | 5.1 | 65.8 | 70.9 | Yes | Aldh3a2.3 | ||
| NP_001004658.1 | 169 | 0.190 | (a)/(d) | 65.6 | 23.5 | 89.1 | No | Pseudogene | ||
| (b)/(c) | 7.5 | 57.4 | 64.9 | - | - | |||||
| (b)/(d) | 66 | 18.9 | 84.9 | - | - | |||||
| (c)/(d) | 67.1 | 31.3 | 98.4 | - | - | |||||
| XP_002198810.1 | 510 | 0.396 | - | - | - | - | Yes | ALDH3A2 | ||
| XP_002196134.1 | 526 | 0.625 | (a)/(b) | 5.6 | 84.1 | 89.7 | Yes | ALDH3A3 | ||
| NP_001068986.1 | 486 | 0.335 | - | - | - | - | Yes | ALDH3B1 | ||
| XP_585724.2 | 486 | 0.550 | (a)/(b) | 4.5 | 80.9 | 85.4 | Yes | ALDH3B4 | ||
| XP_002196917.1 | 450 | 0.308 | - | - | - | - | Yes | ALDH3B1 | ||
| XP_002196928.1 | 341 | 0.434 | (a)/(b) | 39.9 | 53.2 | 93.1 | Yes | ALDH3B5 | ||
| XP_001068348.2 | 483 | 0.436 | - | - | - | - | Yes | ALDH3B2 | ||
| XP_001068253.1 | 530 | 0.239 | (a)/(b) | 11 | 76.9 | 87.9 | Yes | ALDH3B3 | ||
| NP_001170909.1 | 479 | 0.270 | - | - | - | - | Yes | ALDH3B2 | ||
| XP_900106.1 | 479 | 0.229 | (a)/(b) | 0 | 86.4 | 86.4 | Yes | ALDH3B3 | ||
| NP_001103938.1 | 404 | < 0.001 | - | - | - | - | Yes | Aldh5a1.1 | ||
| XP_002664997.1 | 514 | 0.008 | (a)/(b) | 21.4 | 78.6 | 100 | Yes | Aldh5a1.2 | ||
| XP_002804539.1 | 502 | 0.180 | - | - | - | - | Yes | ALDH7A1 | ||
| XP_001111963.1 | 538 | 1.289 | (a)/(b) | 16.3 | 82.3 | 98.6 | No | Pseudogene | ||
| NP_958879.1 | 508 | 0.126 | - | - | - | - | Yes | Aldh9a1.1 | ||
| NP_958916.1 | 518 | 0.154 | (a)/(b) | 1.9 | 71.2 | 73.1 | Yes | Aldh9a1.2 | ||
| NP_001119952.1 | 508 | 0.190 | (a)/(c) | 0 | 94.9 | 94.9 | Yes | Aldh9a1.3 | ||
| (b)/(c) | 1.9 | 70.3 | 72.2 | - | - | |||||
| NP_001077015.1 | 782 | 0.103 | - | - | - | - | Yes | Aldh18a1.1 | ||
| XP_002664020.1 | 782 | 0.826 | (a)/(b) | 0 | 100 | 100 | Yes | Aldh18a1.2 |
Included are protein lengths (in number of amino acids [AAs]), Ka/Ks values, RefSeq protein IDs and recommended protein names. '% AA identity' denotes the absolute number of identical AAs relative to the absolute number of AA locations. '% AA unaligned' indicates the percentage of AAs that are represented by either a gap in the alignment of either sequence or an overhang if one sequence is longer than the other. '% AA identity (unaligned excluded)' indicates the percentage of AA locations that are identical when unaligned AAs are excluded from the total number of AA locations. For example, a 127-AA fragment of a 537-AA protein, which is identical except for the truncation, would have 127/537 = 23.6 per cent identity, of which 410/537 = 76.4 per cent is represented by unaligned residues (AAs in the longer sequence that have no correlation with the shorter sequence) but, excluding those residues, 127/127 = 100 per cent paired AAs are identical. The final column indicates which sequences are being compared for percentage identity, percentage gaps and percentage identity (excluding gaps)
Figure 1Neighbour-joining dendrogram (with branch lengths representing relative protein sequence similarity) of ALDH3B sequences in human, rat and mouse, indicating the likely homology and identity of the genes assigned '.
Known copy number variations in humans
| Variation ID | Type | Gain/loss | Site | Sample size (variant/controls) | Chr | |
|---|---|---|---|---|---|---|
| 26310 | InDel | Gain | Intron | 1/1 | 19q13.33 | |
| 26311 | InDel | Gain | Intron | 1/1 | 19q13.33 | |
| 26312 | InDel | Gain | Intron | 1/1 | 19q13.33 | |
| 26313 | InDel | Loss | Intron | 1/1 | 19q13.33 | |
| 109892 | InDel | Gain | Intron | 1/1 | 9q21.13 | |
| 102109 | CNV | Loss | Intron | 1/1 | 15q22.1 | |
| 25534 | InDel | Loss | Intron | 1/1 | 15q22.1 | |
| 40101 | InDel | Loss | Intron | 1/1 | 15q22.1 | |
| 41386 | InDel | Loss | Intron | 1/1 | 15q22.1 | |
| 45349 | InDel | Loss | Intron | 1/1 | 15q22.1 | |
| 45350 | InDel | Loss | Intron | 1/1 | 15q22.1 | |
| 102186 | CNV | Loss | Intron | 1/1 | 15q26.3 | |
| 11819 | InDel | Loss | Intron | 1/36 | 15q26.3 | |
| 25599 | InDel | Loss | Intron | 1/1 | 15q26.3 | |
| 25600 | InDel | Loss | Intron | 1/1 | 15q26.3 | |
| 25601 | InDel | Loss | Intron | 1/1 | 15q26.3 | |
| 40124 | InDel | Loss | Intron | 1/2 | 15q26.3 | |
| 42429 | InDel | Loss | Intron | 1/1 | 15q26.3 | |
| 42898 | InDel | Loss | Intron | 1/1 | 15q26.3 | |
| 45395 | InDel | Loss | Intron | 1/1 | 15q26.3 | |
| 61482 | InDel | Loss | Intron | 1/1 | 15q26.3 | |
| 68446 | InDel | Loss | Intron | 1/39 | 3q21.2 | |
| 106822 | CNV | Gain | Intron | 1/1 | 12q23.3 | |
| 42760 | InDel | Loss | Intron | 1/1 | 17p11.2 | |
| 24787 | InDel | Loss | Intron | 1/1 | 11q13.2 | |
| 44926 | InDel | Loss | Intron | 1/1 | 11q13.2 | |
| 81276 | InDel | Gain | Intron | 1/90 | 6p22.2 | |
| 93550 | CNV | Loss | Intron | 2/90 | 6p22.2 | |
| 99466 | CNV | Loss | Intron | 1/1 | 6p22.2 | |
| 33982 | InDel | Gain | Intron | 1/1 | 5q23.2 | |
| 97538 | InDel | Gain | Intron | 1/1 | 1q24.1 | |
| 23991 | InDel | Gain | Intron | 1/1 | 1q24.1 | |
| 11004 | InDel | Loss | Intron | 15/50 | 1q24.1 | |
| 35661 | CNV | Gain | Part | 1/1 | 19q13.33 | |
| 114045 | CNV | Gain | Part | 1/30 | 15q26.3 | |
| 72379 | CNV | Loss | Part | 1/39 | 15q26.3 | |
| 4352 | CNV | 2G 1L | Part | 3/95 | 3q21.2 | |
| 59786 | Inv | Inversion | Part | 1/1 | 3q21.2 | |
| 68445 | CNV | Loss | Part | 1/39 | 3q21.2 | |
| 107014 | CNV | Loss | Part | 1/1 | 12q23.3 | |
| 88379 | CNV | Loss | Part | 1/90 | 17p11.2 | |
| 88381 | CNV | Loss | Part | 1/90 | 17p11.2 | |
| 3140 | CNV | Loss | Part | 4/270 | 17p11.2 | |
| 65982 | CNV | Gain | Part | 2/450 | 11q13.2 | |
| 85827 | CNV | Loss | Part | 2/90 | 11q13.2 | |
| 53128 | CNV | Loss | Part | 2/1064 | 11q13.2 | |
| 3055 | CNV | Gain | Part | 1/270 | 14q24.3 | |
| 66668 | CNV | Loss | Part | 2/450 | 14q24.3 | |
| 6793 | CNV | Loss | Part | 2/50 | 1q24.1 | |
| 3856 | CNV | Gain/loss | Whole | 3/270 | 11q13.2 | |
| 113072 | CNV | Gain | Whole | 1/30 | 11q13.2 | |
| 30558 | CNV | Gain | Whole | 1/1 | 11q13.2 | |
| 5275 | CNV | Gain | Whole | 1/272 | 11q13.1-11q13.2 | |
| 5111 | CNV | Loss | Whole | 25/95 | 19q13.33 | |
| 32261 | CNV | Loss | Whole | 18/30 | 19q13.32-19q13.33 | |
| 5110 | CNV | Loss | Whole | 4/95 | 19q13.33 | |
| 2201 | CNV | Loss | Whole | 3/269 | 15q26.3 | |
| 47939 | CNV | Loss | Whole | 6/2906 | 9p13.1 | |
| 30022 | CNV | Loss | Whole | 2/485 | 17p11.2 | |
| 53160 | CNV | Loss | Whole | 2/1064 | 11q13.2 | |
| 2931 | CNV | Loss | Whole | 8/270 | 11q13.2 | |
| 29913 | CNV | Loss | Whole | 1/485 | 11q13.2 | |
| 29914 | CNV | Loss | Whole | 1/485 | 11q13.2 | |
| 47969 | CNV | Loss | Whole | 9/2906 | 6p22.2 |
Included are the variation ID from the Database of Genomic Variants, ALDH family member, type (CNV - copy number variation with changes > 1 kb; InDel - insertions and deletions with changes 100-999 bp; inv --inversions with changes that invert the nucleotide sequence), whether the change was a loss or gain, site (intron -- change only affects an intronic region; part -- change affects one or more exons; whole -- change affects the entire gene), sample size and chromosomal location
Figure 2Comparison of ALDH4A1 from human and rat. Rat Aldh4a1 is part of the larger fusion gene LRRP Ba1-651 [26]. The exons representing the Aldh4a1 portion of this gene with homology to mouse and human are highlighted.