| Literature DB >> 32355290 |
Federica Pierini1,2, Marcel Nutsua3, Lisa Böhme3, Onur Özer1, Joanna Bonczarowska3, Julian Susat3, Andre Franke3, Almut Nebel3, Ben Krause-Kyora3, Tobias L Lenz4.
Abstract
The highly polymorphic human leukocyte antigen (HLA) plays a crucial role in adaptive immunity and is associated with various complex diseases. Accurate analysis of HLA genes using ancient DNA (aDNA) data is crucial for understanding their role in human adaptation to pathogens. Here, we describe the TARGT pipeline for targeted analysis of polymorphic loci from low-coverage shotgun sequence data. The pipeline was successfully applied to medieval aDNA samples and validated using both simulated aDNA and modern empirical sequence data from the 1000 Genomes Project. Thus the TARGT pipeline enables accurate analysis of HLA polymorphisms in historical (and modern) human populations.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32355290 PMCID: PMC7193575 DOI: 10.1038/s41598-020-64312-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Different steps performed by the TARGT pipeline for HLA genotyping of ancient and modern samples. Preprocessing (optional): after quality control, genomic sequences are pre-processed (adapter clipping, merging and trimming) using ClipAndMerge (version 1.7.3) from the EAGER pipeline[55]. Mapping: performed using Bowtie2[49] against a comprehensive reference file, containing known 3rd field HLA alleles (following G-group nomenclature). Sorting: mapped reads are grouped by gene specificity and saved into gene-specific FASTA files. HLA genotyping: Sample-specific FASTA files are manually analyzed using BioEdit[50] to genotype HLA genes in ancient and modern samples.
Figure 2Performance of HLA target-enrichment experiments. Comparison between the median values of (A) endogenous DNA content (B) percentage of reads aligning to HLA genes (C) average HLA coverage and (D) average HLA read depth calculated across a subset of 62 aDNA samples before and after performing the HLA enrichment experiments. Significant differences between median values, as derived from Mann-Whitney test, are indicated by horizontal line and asterisks (***p < 0.001). Box plots show median, interquartile range, min-max whiskers and outliers.
Performance of HLA target-enrichment experiments.
| Sequence data | Min | Max | Median (95% CI) | |
|---|---|---|---|---|
| endogenous [%] | original | 1 | 76 | 14 (8–25) |
| HLA-enriched | 2 | 88 | 36 (24–42) | |
| reads aligning to HLA genes [%] | original | 0.00 | 0.02 | 0.00 (0.00–0.00) |
| HLA-enriched | 0.00 | 0.39 | 0.12 (0.05–0.17) | |
| average HLA coverage [%] | original | 0 | 93 | 7 (5–13) |
| HLA-enriched | 2 | 100 | 97 (94–99) | |
| average HLA read depth (x-fold) | original | 0.00 | 3.98 | 0.14 (0.11–0.22) |
| HLA-enriched | 0.12 | 472.62 | 18.00 (9.68–68.95) |
Endogenous DNA content (%), percentage of reads aligning to HLA genes (%), average HLA coverage (%) and average HLA read depth (x-fold) compared between pre-capture shotgun sequence data (original) and sequence data after HLA enrichment experiments (HLA-enriched) obtained from a subset of 62 historical samples.
Locus-specific allele call success for the 68 historical samples.
| HLA-A | HLA-B | HLA-C | HLA-DRB1 | HLA-DQB1 | HLA-DPB1 | |
|---|---|---|---|---|---|---|
| 1st field | 83 | 75 | 74 | 83 | 96 | 46 |
| - of these 2nd field | 45 | 49 | 11 | 58 | 79 | 46 |
| NA (no call possible) | 53 | 61 | 62 | 53 | 40 | 90 |
Number of alleles called at the 1st field level and at the 2nd field level of resolution reported for each locus (2n = 136) for the 68 historical samples.
Figure 3HLA allele call success rate for 68 historical aDNA samples. Percentage of alleles called at each individual HLA locus calculated for the whole set of historical samples (N = 68). Allele calls are reported at two different level of resolution 2nd field (4-digit) and 1st field (2-digit) levels. ‘No call possible’ represents the fraction of cases where the allele call was not possible or allele calls were ambiguous.
Success rate for the 68 historical samples.
| HLA-A | HLA-B | HLA-C | HLA-DRB1 | HLA-DQB1 | HLA-DPB1 | Overall | |
|---|---|---|---|---|---|---|---|
| Success 1st field (%) | 61 | 55 | 54 | 61 | 71 | 34 | 56 |
| Success 2nd field (%) | 33 | 36 | 8 | 43 | 58 | 34 | 35 |
Success rate at the 1st field level and at the 2nd field level of resolution, across the 6 investigated genes and overall, for the whole dataset of historical samples.
Success rate and accuracy rate for simulated ancient DNA samples.
| HLA-B | HLA-DRB1 | Overall | |
|---|---|---|---|
| Success 1st field (%) | 62 | 80 | 71 |
| Success 2nd field (%) | 50 | 64 | 57 |
| Accuracy 1st field (%) | 100 | 100 | 100 |
| Accuracy 2nd field (%) | 100 | 100 | 100 |
Success and accuracy rate of HLA allele calls, at the 1st field level and at the 2nd field level of resolution, across the 2 investigated genes and overall, for the simulated ancient samples.
Figure 4HLA allele call success rate for the simulated ancient samples. Percentage of allele calls calculated across the two investigated loci (HLA-B and HLA-DRB1) and across the set of samples (N = 6) investigated at different read depth (from 1x up to 60×) for a total of 30 simulated ancient samples. Allele calls are reported at two different levels of resolution, 2nd field (4-digit) and 1st field (2-digit). ‘No call possible’ represents the fraction of cases where the allele call was not possible or allele calls were ambiguous.
Success rate and accuracy rate for the 1000 Genomes Project samples.
| HLA-B | HLA-DRB1 | Overall | |
|---|---|---|---|
| Success 1st field (%) | 100 | 100 | 100 |
| Success 2nd field (%) | 84 | 95 | 90 |
| Accuracy 1st field (%) | 95 | 100 | 99 |
| Accuracy 2nd field (%) | 95 | 100 | 97 |
Success and accuracy rate of HLA allele calls, at the 1st field level and at the 2nd field level of resolution, across the 2 investigated genes and overall, for a diverse subset (N = 31) of the 1000 Genomes Project samples.