| Literature DB >> 27899604 |
Giuseppe Maccari1,2, James Robinson2,3, Keith Ballingall4, Lisbeth A Guethlein5, Unni Grimholt6, Jim Kaufman7, Chak-Sum Ho8, Natasja G de Groot9, Paul Flicek10, Ronald E Bontrop9, John A Hammond1, Steven G E Marsh11,3.
Abstract
The IPD-MHC Database project (http://www.ebi.ac.uk/ipd/mhc/) collects and expertly curates sequences of the major histocompatibility complex from non-human species and provides the infrastructure and tools to enable accurate analysis. Since the first release of the database in 2003, IPD-MHC has grown and currently hosts a number of specific sections, with more than 7000 alleles from 70 species, including non-human primates, canines, felines, equids, ovids, suids, bovins, salmonids and murids. These sequences are expertly curated and made publicly available through an open access website. The IPD-MHC Database is a key resource in its field, and this has led to an average of 1500 unique visitors and more than 5000 viewed pages per month. As the database has grown in size and complexity, it has created a number of challenges in maintaining and organizing information, particularly the need to standardize nomenclature and taxonomic classification, while incorporating new allele submissions. Here, we describe the latest database release, the IPD-MHC 2.0 and discuss planned developments. This release incorporates sequence updates and new tools that enhance database queries and improve the submission procedure by utilizing common tools that are able to handle the varied requirements of each MHC-group.Entities:
Mesh:
Year: 2016 PMID: 27899604 PMCID: PMC5210539 DOI: 10.1093/nar/gkw1050
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.An overview of the current state of IPD-MHC Database. (A) Number of submitted sequences over the years (blue, original IPD-MHC Database; yellow, IPD-MHC Database v2.0); (B) Species distribution in the IDP-MHC Database v2.0; (C) Distribution of alleles per species in IPD-MHC Database (blue, class I; yellow, class II); species covering the 95% of all the alleles are shown.
Figure 2.Single- and multi-locus alignment. (A) For each computed alignment, a CIGAR (Compact Idiosyncratic Gapped Alignment Report) string defining the sequence of matches/mismatches (M) and deletions or gaps (D) compared to the reference sequence is stored in the database. (B) For each locus in the database, the nucleotide and protein allele alignment is pre-computed and the CIGAR string is stored in the database to correctly represent the sequence alignment. (C) In multi-locus alignment, the consensus sequence of each locus is aligned in real time and the previously calculated single-locus aligned are assembled and rendered as one.