| Literature DB >> 34922440 |
Fadilla Wahyudi1, Farhang Aghakhanian2,3, Sadequr Rahman1,4, Yik-Ying Teo5, Michał Szpak6,7, Jasbir Dhaliwal8, Qasim Ayub9,10,11.
Abstract
BACKGROUND: In population genomics, polymorphisms that are highly differentiated between geographically separated populations are often suggestive of Darwinian positive selection. Genomic scans have highlighted several such regions in African and non-African populations, but only a handful of these have functional data that clearly associates candidate variations driving the selection process. Fine-Mapping of Adaptive Variation (FineMAV) was developed to address this in a high-throughput manner using population based whole-genome sequences generated by the 1000 Genomes Project. It pinpoints positively selected genetic variants in sequencing data by prioritizing high frequency, population-specific and functional derived alleles.Entities:
Keywords: Adaption; Evolutionary genomics; Human evolution; Population differentiation; Selective sweep
Mesh:
Year: 2021 PMID: 34922440 PMCID: PMC8684245 DOI: 10.1186/s12859-021-04506-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Information needed for the tab-delimited input files
| Information needed from the VCF file | Description | Mandatory VCF column |
|---|---|---|
| CHROM:POS | Chromosome number:position | Yes |
| ID | Identifier | Yes |
| REF | Reference base | Yes |
| ALT | Alternative base | Yes |
| AA | Ancestral allele | No |
| CADD_PHRED | Phred-scaled Combined Annotation Dependent Depletion (CADD) score | No |
| AF | Allele frequency (AF) for the alternative base. The AF should be reported for each population | No |
This information can be extracted from the VCF file and provided in a tab-delimited format for the software to calculate the FineMAV scores
Fig. 1Pipeline for calculating the genome-wide FineMAV scores. The boxed region highlighted in grey are the parts of the workflow that are automated by the software. The intermediate output files are deleted when the pipeline is complete. AA: ancestral allele, CADD_PHRED: phred-scaled Combined Annotation Dependent Depletion scores, VEP: Variant Effect Predictor, AF: allele frequency of the alternative allele
Recommended minimal value of the penalty parameter (), rounded off to two decimal places, for a given as determined by Szpak et al. [3]
| Number of populations ( | Penalty parameter ( |
|---|---|
| 2 | 4.96 |
| 3 | 3.50 |
| 4 | 2.98 |
| 5 | 2.71 |
| 6 | 2.53 |
| 7 | 2.41 |
Fig. 2Utilising the chunk size option. A Diagram illustrating how the software separates the input files into chunks and iterates through them when performing the FineMAV calculations. It proceeds to merge them into one output file. B Utilising the chunk size option. A graph that compares the time taken and the maximum random access memory (RAM) when different chunk sizes for a dataset of 66,236,516 biallelic SNPs is used
Fig. 3Annotated screenshot of the bigWig files of the genome-wide FineMAV scores. A FineMAV scores for Han Chinese (90HC, orange), Singaporean Indian (SSIP, blue) and Singaporean Malay (SSMP, grey) populations displayed on the Integrative Genomics Viewer (IGV). The genomic regions on display are the autosomal and the X chromosomes and the horizontal line depicts the 99th percentile. B A multi-locus view of two regions where the left panel displays a locus with a well-known positively selected missense variant in EDAR (rs3827760) in East Asians that also stands out in the SSMP population. The right panel displays a novel locus with two high scoring variants in SSIP: rs151233, a synonymous variant in APOBR and rs151234, an intronic variant in CLN3 that stand out in the SSIP
Top 10 FineMAV candidates from the Han Chinese (90HC), Singaporean Indian (SSIP) and Singaporean Malay (SSMP) populations
| Chr | Position a | SNP ID | Gene | Consequence b | Known or novel | ||||
|---|---|---|---|---|---|---|---|---|---|
| 2 | 109513601 | rs3827760:A > G | Missense (p.Val370Ala) | 0.922 | 0.029 | 0.490 | 4.661 | Known [ | |
| 5 | 176099727 | rs13186794:A > G | Intergenic | 0.494 | 0.057 | 0.047 | 4.114 | Novel | |
| 5 | 176099728 | rs13186795:A > G | Intergenic | 0.494 | 0.057 | 0.057 | 4.096 | Novel | |
| 4 | 31442427 | rs56345433:G > A | Intergenic | 0.528 | 0.086 | 0.021 | 3.211 | Novel | |
| 3 | 98031307 | rs2316271:T > A | Stop gained (p.Leu184Ter) | 0.767 | 0.314 | 0.599 | 3.102 | Novel | |
| 16 | 31088347 | rs749671:G > A | Synonymous (p.Glu234 =) | 0.906 | 0.043 | 0.776 | 3.053 | Known [ | |
| 5 | 76129053 | rs631465:T > C | Synonymous (p.Ile207 =) | 0.522 | 0.014 | 0.208 | 3.008 | Novel | |
| 2 | 109451118 | rs72627476:A > G | Intronic | 0.917 | 0.029 | 0.484 | 2.961 | Known [ | |
| 12 | 132106717 | rs10794470:T > C | Intronic | 0.272 | 0.000 | 0.005 | 2.940 | Novel | |
| 7 | 14587199 | rs10236893:G > A | Intronic | 0.417 | 0.029 | 0.120 | 2.895 | Novel | |
| 16 | 28506428 | rs151233:C > T | Synonymous (p.Leu22 =) | 0.006 | 0.571 | 0.026 | 7.677 | Novel | |
| 16 | 30936081 | rs35675346:G > A | Missense (p.Glu10Lys) | 0.061 | 0.800 | 0.188 | 7.213 | Known [ | |
| 16 | 28505660 | rs151234:G > C | Intronic | 0.006 | 0.571 | 0.031 | 6.839 | Novel | |
| 16 | 31044683 | rs58726213:A > G | Upstream gene | 0.089 | 0.871 | 0.214 | 6.686 | Known [ | |
| 15 | 64592833 | rs114713921:T > C | 5 prime UTR | 0.006 | 0.486 | 0.036 | 6.341 | Novel | |
| 16 | 30666367 | rs3747481:C > T | Missense (p.Pro359Leu) | 0.100 | 0.857 | 0.245 | 6.090 | Known [ | |
| 19 | 49206674 | rs601338:G > A | Stop gained (p.Trp154Ter) | 0.011 | 0.186 | 0.016 | 6.033 | Known [ | |
| 15 | 91452595 | rs2106673:A > G | Missense (p.Gln412Arg) | 0.017 | 0.514 | 0.063 | 5.746 | Novel | |
| 10 | 17407147 | rs729170:G > T | Intronic | 0.006 | 0.343 | 0.005 | 5.736 | Novel | |
| 15 | 64653984 | rs8026043:G > T | Downstream gene | 0.006 | 0.486 | 0.036 | 5.726 | Novel | |
| 2 | 98272491 | rs2290123:A > G | 3 prime UTR | 0.033 | 0.029 | 0.380 | 3.378 | Known [ | |
| 2 | 97613974 | rs114979404:C > G | Intronic | 0.022 | 0.029 | 0.375 | 2.806 | Known [ | |
| 17 | 2238152 | rs79597880:T > C | Missense (p.Lys199Glu) | 0.089 | 0.014 | 0.297 | 2.747 | Novel | |
| 16 | 31088347 | rs749671:G > A | Synonymous (p.Glu234 =) | 0.906 | 0.043 | 0.776 | 2.616 | Known [ | |
| 7 | 100371358 | rs2293766:G > A | Stop gained (p.Trp1883Ter) | 0.528 | 0.257 | 0.557 | 2.531 | Known [ | |
| 2 | 109513601 | rs3827760:A > G | Missense (p.Val370Ala) | 0.922 | 0.029 | 0.490 | 2.474 | Known [ | |
| 3 | 98031307 | rs2316271:T > A | Stop gained (p.Leu184Ter) | 0.767 | 0.314 | 0.599 | 2.424 | Novel | |
| 11 | 62848487 | rs11231341:A > C | Stop gained (p.Tyr501Ter) | 0.867 | 0.757 | 0.792 | 2.421 | Novel | |
| 12 | 57865558 | rs2229300:G > T | Missense (p.Gly1012Val) | 0.050 | 0.014 | 0.224 | 2.402 | Novel | |
| 16 | 31075175 | rs2303223:G > A | Synonymous (p.Gly225 =) | 0.911 | 0.043 | 0.781 | 2.290 | Known [ | |
aThe genomic position according to the GRCh37/hg19 reference genome
bThe most severe variant consequence according to Ensembl
Chr: chromosome, DAF: derived allele frequency, UTR: untranslated region
Fig. 4Screenshot of the FineMAV software as a graphical user interface