| Literature DB >> 26061870 |
Timo Tiirikka1, Jukka S Moilanen.
Abstract
BACKGROUND: As the high throughput sequencing efforts generate more biological information, scientists from different disciplines are interpreting the polymorphisms that make us unique. In addition, there is an increasing trend in general public to research their own genealogy, find distant relatives and to know more about their biological background. Commercial vendors are providing analyses of mitochondrial and Y-chromosomal markers for such purposes. Clearly, an easy-to-use free interface to the existing data on the identified variants would be in the interest of general public and professionals less familiar with the field. Here we introduce a novel metadatabase YDHS that aims to provide such an interface for Y-chromosomal DNA (Y-DNA) haplogroups and sequence variants.Entities:
Year: 2015 PMID: 26061870 PMCID: PMC4477006 DOI: 10.1186/s40169-015-0060-7
Source DB: PubMed Journal: Clin Transl Med ISSN: 2001-1326
Fig. 1The image contains multiple tracks from YDHS database: starting from the outer rim with Gene Ontology (GO), Gene, SNP and finally the innermost track is OrphaNet. In the Gene Ontology track biological process is viewed in red, molecular function in black, and cellular component in green. The height of the GO bar corresponds to the amount of ontologies linked to the gene. In the Gene track the positive strand is colored black and the negative one in red. The third track counting from the outer rim is the SNP track where different types of SNPs have different color schema. From up to down: T- > G is in light blue, T- > C is in blue, T- > A is in dark blue; G- > T is in light grey, G- > is in grey, G- > A is in black; C- > T is in light green, C- > G is in green, C- > A is in dark green; A- > G is in pink, A- > C is in red, A- > T is in dark red. In the innermost track, OrphaNet and OMIM, the higher the bar is the more diseases has been reported for that gene or region. The grey plot on top of the SNP track indicates the frequency of SNPs in that region. There are some clearly visible 'hotspots' where a lot of mutations have clustered
Fig. 2The frequency of mutations that hit gene regions separated by haplogroups. Data derived from ISOGG consortium (July 2013)
Fig. 3The overall distribution of mutations per haplogroup. Data derived from ISOGG on July 2013
Shows the names of genes located in chromosome Y with the amount of respective mutations alongside with the frequency of GO terms
| Ensembl ID | Gene name HGNC | No. ISOGG mutations | Positive selection? | No. Gene Ontologies | Gene length (bp) | MIM Gene Accession(s) | Orphanet Ids |
|---|---|---|---|---|---|---|---|
| ENSG00000012817 | KDM5D | 861 | No | 55 | 41,074 | 426,000 | 1646 |
| ENSG00000067048 | DDX3Y | 320 | No | 21 | 16,371 | 400,010 | - |
| ENSG00000114374 | USP9Y | 250 | No | 12 | 159,604 | 400,005 | 1646 |
| ENSG00000129873 | CDY2B | - | No | 8 | 2810 | 400,018 | - |
| ENSG00000131002 | Cyorf15A / Cyorf15B | - | No | 0 | 38,961 | 400,031 | - |
| ENSG00000157828 | RPS4Y2 | 12 | No | 6 | 24,868 | 400,030 | - |
| ENSG00000169789 | PRY | - | No | 0 | 24,240 | 400,019,400,041 | - |
| ENSG00000169807 | PRY2 | - | No | 0 | 24,251 | 400,019,400,041 | - |
| ENSG00000169953 | HSFY2 | - | No | 18 | 42,295 | 400,029 | - |
| ENSG00000172288 | CDY1 | - | No | 10 | 2785 | 400,016 | - |
| ENSG00000172468 | HSFY1 | - | No | 24 | 42,292 | 400,029 | - |
| ENSG00000182415 | CDY2A | - | No | 8 | 2810 | 400,016,400,018 | - |
| ENSG00000183753 | BPY2 | - | No | 10 | 31,646 | 400,013 | - |
| ENSG00000183795 | BPY2B | - | No | 5 | 31,647 | 400,013 | - |
| ENSG00000185894 | BPY2C | - | No | 5 | 31,647 | 400,013 | - |
| ENSG00000187191 | DAZ3 | - | No | 16 | 50,410 | 400,027 | 1646 |
| ENSG00000188120 | DAZ1 | - | No | 23 | 69,739 | 400,003 | 1646 |
| ENSG00000198692 | EIF1AY | 108 | No | 7 | 17,429 | 400,014 | - |
| ENSG00000205916 | DAZ4 | - | No | 27 | 73,175 | 400,003,400,026 | 1646 |
| ENSG00000205944 | DAZ2 | - | No | 48 | 71,909 | 400,026 | 1646 |
| ENSG00000234414 | RBMY1A1 | 1 | No | 19 | 37,954 | 400,006 | 1646 |
| ENSG00000244646 | XKRY2 | - | No | 0 | 8417 | - | - |
| ENSG00000250868 | XKRY | - | No | 0 | 8418 | - | - |
If the gene has a very high number of GO terms associated, it can be argued that the gene is functionally active and probably part of a signaling network. It can be stated that genes having a lot of mutations also have a very high number of ontologies, except for EIF1AY. Table 1 reveals also the length of the gene so that the density of mutations per gene can be addressed. The MIM and OrphaNet codes are also included to show how the genes are linked to various medical conditions
Fig. 4A more thorough look into Y-chromosome variant composition for the region between 22 Mb and 24 Mb