| Literature DB >> 23589541 |
Laurens G Wilming1, Elizabeth A Hart, Penny C Coggill, Roger Horton, James G R Gilbert, Chris Clee, Matt Jones, Christine Lloyd, Sophie Palmer, Sarah Sims, Siobhan Whitehead, David Wiley, Stephan Beck, Jennifer L Harrow.
Abstract
Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.Entities:
Mesh:
Year: 2013 PMID: 23589541 PMCID: PMC3626023 DOI: 10.1093/database/bat011
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
The clone names, versioned accession numbers and variation between the BACs that make up the MHC of gorilla ‘Frank’
| Clone name | ENA accession | Number of | Overlap length (bp) | ||
|---|---|---|---|---|---|
| SNPs | Deletions | Insertions | |||
| CH255-522B23 | CU104671.1 | 0 | 0 | 0 | 2000 |
| CH255-127C23 | CU104653.1 | 0 | 0 | 0 | 2000 |
| CH255-37P17 | CU104661.1 | 0 | 0 | 0 | 2000 |
| CH255-179G12 | CT025620.2 | 0 | 0 | 0 | 2000 |
| CH255-451A14 | CU104667.1 | 0 | 0 | 0 | 2000 |
| CH255-405H12 | CU104665.1 | 0 | 0 | 0 | 2000 |
| CH255-259J17 | CU104658.1 | 0 | 0 | 0 | 2000 |
| CH255-39I5 | CU104664.1 | 0 | 0 | 0 | 2000 |
| CH255-83O18 | CU104675.1 | 0 | 0 | 0 | 2000 |
| CH255-289P22 | CU104659.1 | 0 | 0 | 0 | 2000 |
| CH255-201E9 | CU104656.1 | 0 | 0 | 0 | 2000 |
| CH255-478L19 | CU104669.1 | 0 | 0 | 0 | 2000 |
| CH255-48E14 | CU104670.1 | 0 | 0 | 0 | 65339 |
| CH255-386C2 | CU104662.1 | 0 | 0 | 0 | 2000 |
| CH255-375N4 | CU104660.1 | 0 | 0 | 0 | 37060 |
| CH255-13G2 | CU104654.2 | 8 | 4 | 3 | 24914 |
| CH255-415I16 | CU104666.2 | 98 | 20 | 22 | 103353 |
| CH255-559J12 | CU104673.1 | 0 | 0 | 0 | 2000 |
| CH255-397I3 | CU104663.1 | 0 | 0 | 0 | 2000 |
| CH255-469C9 | CU104668.1 | 0 | 0 | 0 | 2635 |
| CH255-56N15 | CU104674.1 | 4 | 2 | 1 | 32416 |
| CH255-114D6 | CU104652.1 | 0 | 0 | 0 | 2000 |
| CH255-58L21 | CU104676.1 | 0 | 0 | 0 | 2000 |
| CH255-351B13 | CT025711.1 | 0 | 0 | 0 | 29819 |
| CH255-354J20 | CT025621.2 | 0 | 0 | 0 | 2000 |
| CH255-336G22 | CT025558.1 | 0 | 0 | 0 | 2000 |
| CH255-191J6 | CU104655.1 | 0 | 0 | 0 | 2000 |
| CH255-206J13 | CU104657.1 | 101 | 14 | 16 | 59844 |
| CH255-529K7 | CU104672.1 | ||||
Variation shown is between the clone listed on that row and the next, in the length of the overlap between the two. Clones are listed in the order of contiguous overlap.
SNPs = single nucleotide polymorphisms.
Figure 1Feature map of the gorilla MHC, modified from the VEGA browser (release 50, December 2012). Each locus is labelled with a name, coloured according to type (see legend at bottom) and with indication of orientation (angle bracket before or after the name) and position within the region. The tiling path of the sequenced BACs is shown at the top of each panel (labelled contigs), with clones in alternating dark and light blue and, space permitting, with accession numbers. At the top and bottom of each panel, a size scale is shown. The regions highlighted in Figure 2 are marked with green bars at the bottom of a panel and labelled with the figure section identifier.
Figure 2Detailed view of regions of the MHC where there is a difference in gene content or type between gorilla, human reference and chimpanzee. Figure is not to scale. Rectangle = gene; oval = pseudogene; grey fill = type difference (pseudogene versus gene); black fill = gene absent/present in at least one species and not another; black and white striped = not direct orthologue; above line = locus on forward strand (in reference to human chromosomes); below line = locus on reverse strand; stacked = genes overlap or are nested. Gene names are given where available and are only shown for gorilla and chimpanzee when different from human; locus names that appear as numbers with leading zeros are loci without approved nomenclature, with the numbers representing the numerical part of VEGA stable gene IDs (to obtain the full ID, the 11-digit number should be prepended with OTTGORG, OTTPANG or OTTHUMG for gorilla, chimpanzee and human, respectively). An italicised locus name between brackets for a pseudogene indicates the parent gene or gene family of that pseudogene. The loci on the chimpanzee contig in panel B3 are annotated by ENSEMBL (release 70, January 2013), with dotted outlined loci indicating manually determined genes not annotated by ENSEMBL. Section labels A, B and C have been added to allow for easier reference to this figure in the text.
Figure 3ENSEMBL browser view (release 70, January 2013) of the RCCX cluster and flanking regions of the genome of ‘Kamilah’ (whole-genome shotgun gorilla sequence) showing assembly gaps (white between the blue contigs) and gene models straddling assembly gaps and merging separate fragmented loci (green arrows). See Figure 1 legend for description of features.