| Literature DB >> 34551192 |
Yuanyuan Cheng1, Catherine Grueber1, Carolyn J Hogg1,2, Katherine Belov1.
Abstract
The major histocompatibility complex (MHC) plays a critical role in the vertebrate immune system. Accurate MHC typing is critical to understanding not only host fitness and disease susceptibility, but also the mechanisms underlying host-pathogen co-evolution. However, due to the high degree of gene duplication and diversification of MHC genes, it is often technically challenging to accurately characterise MHC genetic diversity in non-model species. Here we conducted a systematic review to identify common issues associated with current widely used MHC typing approaches. Then to overcome these challenges, we developed a long-read based MHC typing method along with a new analysis pipeline. Our approach enables the sequencing of fully phased MHC alleles spanning all key functional domains and the separation of highly similar alleles as well as the removal of technical artefacts such as PCR heteroduplexes and chimeras. Using this approach, we performed population-scale MHC typing in the Tasmanian devil (Sarcophilus harrisii), revealing previously undiscovered MHC functional diversity in this endangered species. Our new method provides a better solution for addressing research questions that require high MHC typing accuracy. Since the method is not limited by species or the number of genes analysed, it will be applicable for studying not only the MHC but also other complex gene families.Entities:
Keywords: MHC genetic diversity; MHC genotyping; PacBio sequencing; amplicon analysis; major histocompatibility complex
Mesh:
Year: 2021 PMID: 34551192 PMCID: PMC9293008 DOI: 10.1111/1755-0998.13511
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 8.678
FIGURE 1Analysis pipeline of PacBio‐based MHC typing. (a) Experimental workflow. (b) The complete data analysis workflow, with the numbers of sequences (sum or mean ±standard deviation) obtained in this study shown in the left panel and optional data indicated by dashed lines; The initial two steps (grey) of the analysis are performed on the entire data set, while the remaining steps are carried out on an individual sample basis. (c) Common PCR artefacts ‐ heteroduplex and chimera. (d) Circular consensus sequence (CCS) calling by strand eliminates mosaic CCS reads resulting from heteroduplexes. (e) Schematic diagrams explaining concepts behind the allele calling method
FIGURE 2Summary of the literature reviewed in terms of (a) species studied, (b) main context of the study, and (c) methodology used for MHC typing. Since some studies used multiple methods, percentages in (c) do not add up to 1. Abbreviations: SSCP, single‐strand conformation polymorphism; RFLP, restriction fragment length polymorphism; RSCA, reference strand‐mediated conformational analysis; SSP, sequence‐specific primer method; DGGE, denaturing gradient gel electrophoresis; ONT, Oxford Nanopore Technologies; HRM, high‐resolution melting curves
Comparison of commonly used MHC typing methods
| Method (% of studies) | Simplicity of laboratory work | Throughput | Sequence completeness | Genotyping accuracy | Affected by heteroduplexes | Affected by chimeras | Examples |
|---|---|---|---|---|---|---|---|
| Cloning and Sanger sequencing (43%) | Labour intensive, long process | Low | Amplicon size limited by cloning vectors (the larger the amplicon, the harder the ligation) | High specificity, low sensitivity (allelic dropout is a common issue) | Yes | Yes; sequences usually manually removed by researcher | Otting et al. ( |
| NGS‐based amplicon sequencing (37%) | Easy | High | Low: Usually target a single exon, or multiple exons separately | High sensitivity, though requires additional phase prediction when typing multiple exons | Yes | Yes; sequences removed using bioinformatic software | Biedrzycka et al. ( |
| Banding pattern or fragment size based methods (14%) | Certain methods are laborious | Mid | Compatible with short amplicons | Limited resolution | Yes | Yes | Castro‐Prieto et al. ( |
| Microsatellite (9%) | Easy | High | NA | Limited resolution | NA | NA | Aguilar et al. ( |
| Long read sequencing (PacBio, Nanopore; 3%) | Easy | High | High: Provide full‐length sequences spanning multiple functional domains | High: Provide fully phased sequences; high accuracy (with PacBio CCS) | Yes; addressed in this work (with CCS calling by strand) | Yes; addressed in this work (bellerophon) | Fuselli et al. ( |
Loosely defined by the number of samples that can be pooled or analysed in each run/gel/batch. High, hundreds; Mid, tens; Low, samples are processed individually in small batches.
Sequencing methods include Illumina, Roche 454, and Ion Torrent sequencing platforms.
Methods mainly include single‐strand conformation polymorphism (SSCP), restriction fragment length polymorphism (RFLP), reference strand‐mediated conformational analysis (RSCA), sequence‐specific primer (SSP), and denaturing gradient gel electrophoresis (DGGE).
FIGURE 3Sequence analysis of Tasmanian devil MHC‐I alleles. (a) Supertype identification. (b) Sequence alignment at variable amino acid sites (invariable sites are not shown). (c) Predicted 3D structure of extracellular domains of Tasmanian devil MHC‐I protein, with variable residues shown; residues with evidence of selection are colour‐coded: red, positive selection; blue, negative selection
FIGURE 4Comparison of nine Tasmanian devil subpopulations based on MHC‐I types. (a) From left to right, MHC‐I haplotype, allele, and supertype frequencies (only top 20 most abundant haplotypes and alleles are shown), with darker heatmap colours indicating higher frequencies. (b) Map of Tasmania showing locations of sampling sites; the dashed lines indicate the rough boundary between the eastern, north‐western and southwestern populations. (c) From left to right, PCA analyses of the nine surveyed subpopulations based on haplotype, allele, and supertype frequencies