| Literature DB >> 35178516 |
P Prakrithi1, Khushboo Singhal1, Disha Sharma1, Abhinav Jain1, Rahul C Bhoyar1, Mohamed Imran1, Vigneshwar Senthilvel1, Mohit Kumar Divakar1, Anushree Mishra1, Vinod Scaria1, Sridhar Sivasubbu1, Mitali Mukerji1.
Abstract
Actively retrotransposing primate-specific Alu repeats display insertion-deletion (InDel) polymorphism through their insertion at new loci. In the global datasets, Indian populations remain under-represented and so do their Alu InDels. Here, we report the genomic landscape of Alu InDels from the recently released 1021 Indian Genomes (IndiGen) (available at https://clingen.igib.res.in/indigen). We identified 9239 polymorphic Alu insertions that include private (3831), rare (3974) and common (1434) insertions with an average of 770 insertions per individual. We achieved an 89% PCR validation of the predicted genotypes in 94 samples tested. About 60% of identified InDels are unique to IndiGen when compared to other global datasets; 23% of sites were shared with both SGDP and HGSVC; among these, 58% (1289 sites) were common polymorphisms in IndiGen. The insertions not only show a bias for genic regions, with a preference for introns but also for the associated genes showing enrichment for processes like cell morphogenesis and neurogenesis (P-value < 0.05). Approximately, 60% of InDels mapped to genes present in the OMIM database. Finally, we show that 558 InDels can serve as ancestry informative markers to segregate global populations. This study provides a valuable resource for baseline Alu InDels that would be useful in population genomics. The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.Entities:
Year: 2022 PMID: 35178516 PMCID: PMC8846365 DOI: 10.1093/nargab/lqac009
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Figure 2.Correlation of Alus with GC content, gene density, gene length, intron length and intron density. (A) Polymorphic genic Alu InDels density identified in 1021 IndiGenomes. (B) Fixed Alus in the reference human genome retrieved from the UCSC genome browser GRCh38/hg38. * marks parameters where r values for all chromosomes are significant. For correlation with GC content and gene density, only a few chromosomes did not pass the significance cut-off as provided in Supplementary Table S3. Dotted lines connecting the different points is to show the trend across different chromosomes.
Figure 1.Distribution of identified polymorphic Alu InDels in 1021 IndiGenomes (A) Number of polymorphic Alu InDels in each chromosome (B) Number of polymorphic InDels per 10MB region of a chromosome split into contiguous bins. Insertions with a frequency ≥5% are common, <5% are rare, and present in one individual in IndiGen data are termed as private. (C) Distribution of insertions in AluY subfamily; Inset shows the distribution in the major subfamilies AluY, AluS and AluJ. (D) Distribution of Alu Insertions within a gene; genic versus intergenic region is shown in the inset.
Figure 3.Validation of polymorphic Alu InDels identified in 1021 Indigen samples (A) Schematic of validation approach for selected polymorphic loci, PCR primers marked with red arrows are designed flanking the site of Alu insertions leading to expected amplicons of different sizes with and without Alu insertions; the three possible genotypes are also shown (C) Representative gel electrophoresis image of the three genotypes: Ins/Ins (single amplicon at ∼600 bp), Ins/Del (two amplicons; insertion at ∼600 bp and deletion at ∼300 bp) and Del/Del (single band at ∼300 bp) for loci listed in Table 1. The band at ∼850 bp in InDel_15446 is non-specific.
Details of polymorphic Alu insertions that are represented in Figure 3B.
| Expected amplicon size (bp) | ||||||
|---|---|---|---|---|---|---|
| S.No. | Gene | ID | No Insertion | With insertion |
| Frequency Group |
| 1 |
| InDel_5797 | 298 | 579 | 281 | Common |
| 2 |
| InDel_15374 | 281 | 558 | 277 | Common |
| 3 |
| InDel_5542 | 254 | 535 | 281 | Common |
| 4 |
| InDel_17932 | 253 | 533 | 280 | Common |
| 5 |
| InDel_16968 | 245 | 526 | 281 | Common |
| 6 |
| InDel_1848 | 300 | 581 | 281 | Common |
| 7 |
| InDel_10992 | 254 | 533 | 279 | Common |
| 8 |
| InDel_1977 | 220 | 501 | 281 | Common |
| 9 |
| InDel_11019 | 204 | 484 | 280 | Rare |
| 10 |
| InDel_3672 | 280 | 561 | 281 | Rare |
| 11 |
| InDel_4112 | 298 | 579 | 281 | Rare |
| 12 |
| InDel_15446 | 254 | 534 | 280 | Private |
Figure 4.Comparison of IndiGen with HGSVC and SGDP. (A) Venn diagram representing the overlap between polymorphic Alu InDels in IndiGen, HGSVC and SGDP. (B and C) Principal component analysis (PCA) plots of major world populations depicting clustering of each population (B) with 2232 polymorphic Alu InDels shared between IndiGen data, HGSVC and SGDP, (C) 554 polymorphic Alu InDels sorted on basis of FST value (top 25%). The segregation of the population clusters is as good as using all the shared Alu InDels. The proportion of variances for PC1 and PC2 are shown in brackets.
Summary statistics from IndiGen, HGSVC and SGDP datasets
| Parameters | IndiGen | HGSVC | SGDP |
|---|---|---|---|
| Sample size | 1021 | 3202 (687 of South Asian ancestry) | 296 (49 of South Asian ancestry) |
| Coverage | 25–30× | 30× | 30× |
| Total QC filtered | 9239 | 9331 | 11 661 |
| MAF ≥ 5% (common polymorphisms) | 1434 | 3546 | 1941 |
| Average insertion sites per individual | 614 | 1705 | 835 |