| Literature DB >> 34876659 |
Indu Khatri1,2, Magdalena A Berkowska1, Erik B van den Akker2,3,4, Cristina Teodosio1, Marcel J T Reinders2,4, Jacques J M van Dongen5.
Abstract
Entities:
Year: 2021 PMID: 34876659 PMCID: PMC8674126 DOI: 10.1038/s41435-021-00155-3
Source DB: PubMed Journal: Genes Immun ISSN: 1466-4879 Impact factor: 2.676
Fig. 1Correlogram representing the similarity of the IMGT artificial loci with the pmIG artificial loci belonging to all populations.
To develop the artificial loci, we randomly selected the alleles for IGHV1 genes and concatenated the alleles in a particular order of the IGHV1 genes (IGHV1-18, 1-2, 1-24, 1-3, 1-45, 1-46, 1-58, 1-69, and 1-8). For pmIG alleles, we separated the alleles specific to the African populations (represented by “pmIG AFR”) and the alleles present in all the populations (represented as “pmIG All population”) and generated the artificial loci with respective alleles. These artificial loci were aligned and the aligned regions containing gaps and the unknown nucleotides (“N” bases in the IMGT alleles) were discarded. The selected aligned regions were converted into a binary form of the matrix and Pearson correlations were computed. The upper correlogram comprises of all the three categories of the artificial loci, i.e., “IMGT,” “pmIG All populations,” and “pmIG AFR,” IMGT loci in G1 category showed ~70% correlation with other IMGT alleles and all pmIG alleles. Interesting, G2 category of the IMGT loci shows a higher correlation with the pmIG loci from all populations (>90%) as compared to the ones derived from African alleles (~80%), as outlined by the box in blue. In contrast, G3 category of the IMGT artificial loci did not show any similarity to any of the IMGT or the pmIG alleles. This category of the alleles is comprised of the heavily mutated IGHV1-69 alleles from the IMGT database. The lower correlogram is a subset of the upper correlogram that does not comprise of the African-specific pmIG artificial loci. We can clearly observe the high similarity of the G2 category IMGT loci with the pmIG loci derived from the alleles belonging to all the populations (majorly consisting of the GRCh37 genes of which IMGT is also majorly comprised of). This overall data again support our claim for the lack of population-based diversity in the IMGT database (and other existing databases).
Fig. 2High diversity in African and East-Asian populations in the IG loci from 1000 Genomes resource.
The PCA plots are generated using the SNPs in the IGH loci in the 2504 individuals available in the 1000 genomes. Each individual is colored based on the (super)populations these individuals belong to. A PCA plot using all individuals in the 1000 Genomes that clearly suggest that African (AFR) individuals have a higher diversity and are genetically distinct from other super-populations. B PCA plot generated by excluding the African individuals to understand the diversity among other populations. After excluding the African super-populations, a clear separation of East-Asian (EAS) super-populations is observed as compared to the America (AMR), European (EUR), and South-Asian (SAS) super-populations. C PCA plot generated using only the African populations wherein we observe a homogenous mixing of the African populations, suggesting a common ancestry of the populations sampled in the 1000 Genomes. Please note that the majority of these populations are sampled from the western coast of the Africa. ACB: African Caribbean in Barbados; ASW: African Ancestry in Southwest US; ESN: Esan in Nigeria; GWD: Gambian in Western Division, The Gambia – Mandinka; LWK: Luhya in Webuye, Kenya; MSL: Mende in Sierra Leone; and YRI: Yoruba in Ibadan, Nigeria.