| Literature DB >> 16381963 |
Jochen Kohl1, Ingo Paulsen, Thomas Laubach, Achim Radtke, Arndt von Haeseler.
Abstract
HvrBase++ is the improved and extended version of HvrBase. Extensions are made by adding more population-based sequence samples from all primates including humans. The current collection comprises 13,873 hypervariable region I (HVRI) sequences and 4940 hypervariable region II (HVRII) sequences. In addition, we included 1376 complete mitochondrial genomes, 205 sequences from X-chromosomal loci and 202 sequences from autosomal chromosomes 1, 8, 11 and 16. In order to reduce the introduction of erroneous data into HvrBase++, we have developed a procedure that monitors GenBank for new versions of the current data in HvrBase++ and automatically updates the collection if necessary. For the stored sequences, supplementary information such as geographic origin, population affiliation and language of the sequence donor can be retrieved. HvrBase++ is Oracle based and easily accessible by a web interface (http://www.hvrbase.org). As a new key feature, HvrBase++ provides an interactive graphical tool to easily access data from dynamically created geographical maps.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16381963 PMCID: PMC1347393 DOI: 10.1093/nar/gkj030
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Assignment of language names to the SIL and ISO/DIS 639-2 codes in HvrBase++ for the mitochondrial dataset
| SIL | ISO | No. of individuals | Language family or population |
|---|---|---|---|
| Yes | Yes | 7248 | English |
| Yes | No | 41 | Mandenka (population from Senegal, ‘Mandinka’ in SIL) |
| No | Yes | 1951 | Bantu (Africa's largest language family) |
| No | No | 454 | Mbenzele (population from Central African Republic) |
| 4611 | Language information missing or not assignable | ||
| Total | 14 305 |
This year, the SIL and ISO/DIS 639-2 codes have converged. We will account for them in the next major release.
Number of sequence categories in humans, great apes and Neanderthalers across all sequence types in HvrBase++
| Number of | |||
|---|---|---|---|
| Humans | Great Apes | Neanderthalers | |
| HVRI | 13 350 | 520 | 3 |
| HVRII | 4925 | 13 | 2 |
| Mitochondrial genomes | 1376 | 0 | 0 |
| Nuclear sequences | 386 | 21 | 0 |
| Total | 20 037 | 554 | 5 |
Human HVRI datasets over six continents
| Continent | Lineages | Human samples | Number of | |||||
|---|---|---|---|---|---|---|---|---|
| Countries | References | Populations | Languages | SIL | ISO | |||
| Europe | 2033 | 4358 | 17 | 39 | 25 | 31 | 20 | 16 |
| Africa | 1046 | 1680 | 25 | 22 | 47 | 47 | 22 | 25 |
| North America | 824 | 1581 | 7 | 19 | 34 | 9 | 9 | 8 |
| South America | 267 | 473 | 7 | 10 | 11 | 19 | 7 | 7 |
| Asia | 2867 | 4778 | 23 | 49 | 102 | 67 | 31 | 47 |
| Australia/Oceania | 224 | 473 | 10 | 10 | 12 | 28 | 9 | 16 |
| World | 7036 | 13 343 | 89 | 103 | 220 | 194 | 81 | 118 |
Note that the last row does not depict the arithmetic sum in columns 2, 5–9 as some relevant subsets overlap across continents.
Location, length and amount of the 407 nuclear sequences in HvrBase++
| Amount | Gene | Gene function | Chromosome | Length in bp |
|---|---|---|---|---|
| 8 | pdh1 | Pyruvate dehydrogenase E1-α subunit gene, partial seq. | X | 1769 |
| 41 | Factor ix | Factor IX gene, intron 4 | X | 3740 |
| 42 | rrm2p4 | Ribonucleotide reductase M2 pseudogene 4, partial seq. | X | 2392 |
| 42 | tnfsf5 | Tumor necrosis factor ligand superfamily 5 gene, partial seq. | X | 5239 |
| 1 | amelx | Amelogenin X chromosome gene, complete seq. | X | 5323 |
| 71 | xq13.3 | Xq13.3 non-coding region | X | 10 178 |
| 56 | mc1r promoter | Melanocortin 1 receptor gene, promoter | 16 | 6599 |
| 1 | mc1r | Melanocortin 1 receptor gene | 16 | 953 |
| 8 | lpl | Lipoprotein lipase gene, partial seq. | 8 | 542–1636 |
| 61 | ch1 | Membrane protein CH1 gene, partial seq. | 1 | 9626 |
| 59 | β-globin | β-globin gene, complete seq. | 11 | 3008 |
| 17 | β-globin repl. init. reg. | β-globin gene, repl. ori. init. reg. and partial seq. | 11 | 1312 |
Figure 1Accumulation of HVRI, HVRII, mitochondrial genomes and nuclear sequences over the last 25 years.
Figure 2Geographical map interface in HvrBase++. The upper frame contains elements for searching sequences, the search results are displayed in the map and at the bottom. A countries' colour represents the number of sequences for a given gene. The main table shows the results of all available genes for a selected country. Additional information for each gene is displayed in separate tables (data not shown). Sequences are accessible by selecting them from the table.