| Literature DB >> 22139932 |
Francesco Rubino1, Roberta Piredda, Francesco Maria Calabrese, Domenico Simone, Martin Lang, Claudia Calabrese, Vittoria Petruzzella, Mila Tommaseo-Ponzetta, Giuseppe Gasparre, Marcella Attimonelli.
Abstract
HmtDB (http://www.hmtdb.uniba.it:8080/hmdb) is a open resource created to support population genetics and mitochondrial disease studies. The database hosts human mitochondrial genome sequences annotated with population and variability data, the latter being estimated through the application of the SiteVar software based on site-specific nucleotide and amino acid variability calculations. The annotations are manually curated thus adding value to the quality of the information provided to the end-user. Classifier tools implemented in HmtDB allow the prediction of the haplogroup for any human mitochondrial genome currently stored in HmtDB or externally submitted de novo by an end-user. Haplogroup definition is based on the Phylotree system. End-users accessing HmtDB are hence allowed to (i) browse the database through the use of a multi-criterion 'query' system; (ii) analyze their own human mitochondrial sequences via the 'classify' tool (for complete genomes) or by downloading the 'fragment-classifier' tool (for partial sequences); (iii) download multi-alignments with reference genomes as well as variability data.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22139932 PMCID: PMC3245114 DOI: 10.1093/nar/gkr1086
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The HmtDB database generation workflow. The workflow describes the procedure implemented when updating HmtDB.
Description of the rules implemented for the updating of the genomes MA
Figure 2.The HmtDB ‘query’ form. A detailed description of the implemented retrieval criteria is reported in Table 2.
List and descriptions of criteria that may be combined when using the HmtDB retrieval system (Query macro function)
| Query criteria | Description |
|---|---|
| HmtDB Genome Identifier | A pop up menu allows to select a genome whose HmtDB genome identifier is known. |
| Reference DB ID | A pop up menu allows to select a genome whose INSDC Accession number is known. |
| Subject's geographical origin (continent/country) | A pop up menu allows to select genomes whose associated subject belong to a specific continent/country. |
| Author/Fragment_Classifier Haplogroup user code | A pop up menu allows to select genomes matching with a specific haplogroup as it has been assigned in the associated paper and for partial genomes as it has been assigned by the fragment classifier tool. |
| Complete genomes / Only coding region | HmtDB annotates complete genomes or genomes not inclusive of the D-loop region. |
| SNP position | Position of the rCRS sequence reporting in the genomes to be selected a mutation respect to rCRS. |
| Variation type | It is possible to add to the position search, the option to search for genomes whose asked position reports transition or trasversions only, and in addition to search for a specific transition or transversion. Within Variation type it is possible to ask to search for Genomes reporting Insertion and/or deletions in assigned positions. |
| Subject age | Genomes whose related subjects had a specific age at the sampling time. |
| Subject sex | Genomes from subjects of a specific gender. |
| DNA source | Genomes sequenced from sample extracted from a specific tissue. |
| Individual type | Genomes from healthy or pathologic data sets or from a phenotype related to a specific disease. |
| References | Genomes related to a specific paper or to paper published from a specific author, or a genome identified in the paper with a specific haplotype code. |
The Haplogroup user code may not match with the haplogroup predicted by the application of the classifier tool because this last is assigned according to the last updating of Phylotree while the first one was assigned when the genome was published.
Figure 3.The genome card associated to the genome AF_BF_0004. The genome card is organized in three parts hosting the following annotations. (Part 1) Data allowing the genome identification (HmtDB identifier, genome accession number and database of origin, the Author/Fragment_classifier Predicted Haplogroup code, the haplotype identifier, the tissue origin of DNA, the length of the genome, the sequencing technique and the Pubmed link to the reference where the genome has been published) followed by the haplogroup prediction as described in the classify your genome section and based on Phylotree classification (the percentage threshold is fixed to 75); a click on the haplogroup code opens a new window where the haplogroup defining sites are displayed in association with Yes or No flag indicating whether the genome harbours the allele that contributes to the haplogroup definition. (Part 2) Data about the individual from which the DNA has been extracted. (Part 3) Variability table: each line in the table refers to a mutation; the colour of the line is associated to the locus type (see colour legend when connected to HmtDB). The line reports mutation data (nucleotide and, when appropriate, amino acidic positions, alleles and locus name), the mtREV index (19) indicating the amino acidic change probability estimated on mammalian mitochondria-coded proteins, the variability index estimated on the two nucleotide data sets and on the corresponding amino acidic data sets as well as on proteins derived from 60 mammalian genomes. Finally, data derived from MITOMAP concerning the association to the disease are reported. On the bottom of the genome-card data concerning deletion and insertion events are shown. A clicking on the variability index opens up a new window, reporting more detailed data concerning the site variability in different continents and in pathologic samples.
Figure 4.The Statistics window. For each data set and sub-data sets the number of HmtDB stored genomes and of their variant sites is summarized.
HmtDB haplotypes
| Continent | Data set | Genomes ( | Unique haplotypes ( | Unique haplotypes (%) |
|---|---|---|---|---|
| Africa | Healthy | 826 | 772 | 93.46 |
| Pathologic | 0 | 0 | 0.00 | |
| America | Healthy | 693 | 656 | 94.66 |
| Pathologic | 4 | 4 | 100.00 | |
| Asia | Healthy | 3185 | 3171 | 99.56 |
| Pathologic | 501 | 465 | 92.81 | |
| Europe | Healthy | 1597 | 1527 | 95.62 |
| Pathologic | 237 | 230 | 97.05 | |
| Oceania | Healthy | 193 | 184 | 95.34 |
| Pathologic | 0 | 0 | 0.00 | |
| All continents | Healthy | 6494 | 6310 | 97.17 |
| Pathologic | 742 | 699 | 94.20 |
Unique haplotypes number and percentage in the healthy and pathologic continent data sets. All continents number of genomes does not match with the whole ‘healthy’ data set quantity of genomes because of the unclassified_continent genomes.