| Literature DB >> 29370748 |
Soheil Yousefi1, Tooba Abbassi-Daloii1, Thirsa Kraaijenbrink1, Martijn Vermaat1, Hailiang Mei2, Peter van 't Hof2, Maarten van Iterson3, Daria V Zhernakova4, Annique Claringbould4, Lude Franke4, Leen M 't Hart3,5, Roderick C Slieker5,6, Amber van der Heijden7,8, Peter de Knijff1, Peter A C 't Hoen9,10.
Abstract
BACKGROUND: SNP panels that uniquely identify an individual are useful for genetic and forensic research. Previously recommended SNP panels are based on DNA profiles and mostly contain intragenic SNPs. With the increasing interest in RNA expression profiles, we aimed for establishing a SNP panel for both DNA and RNA-based genotyping.Entities:
Keywords: Biobanking; Forensics; Genetic variation; Mix up samples; Sample tracking
Mesh:
Substances:
Year: 2018 PMID: 29370748 PMCID: PMC5785835 DOI: 10.1186/s12864-018-4482-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Flow chart of different steps to select panel of 50 SNPs and downstream analysis
Filtering steps with number of remaining SNPs
| Filter steps | Number of SNPs | |
|---|---|---|
| RNA | DNA | |
| Total SNPs | 507,975 | 19,562,004 |
| Genotype calling rate > 90% | 4876 | – |
| Biallelic loci contain three genotypes | 4672 | – |
| MAF > 0.2 | 1263 | 3,077,712 |
| Common SNPs between DNA and RNA | 1023 | |
| HW p-value > 0.01 | 100 | |
| LD < 0.01 and Ignore SNPs located in HLA loci | 50 | |
Fig. 2Pairwise LD comparisons of the set of a 100 SNPs before and b 50 SNPs after filtering for LD (r2 < 0.01). A color bar represents the p-values from the LD test
Fig. 3a Comparison of AAF between RNA (BIOS, x-axis) and DNA (GoNL, y-axis) data. b AAF comparison between Dutch population (common DNA/RNA, x-axis) and 1000 Genomes phase_3 populations (y-axis). Black points depict the common DNA/RNA SNPs before filtering and the red ones depict the 50 selected SNPs. r refers to Pearson correlation between data sets
Fig. 4Probability of identity for 50 SNPs in 2115 samples. The blue line refers to PI between unrelated individuals. The red line refers to PI when related individuals are included in the samples (PISibs). The x-axis indicates the number of SNPs which are needed for identity when PI is zero
Fig. 5a AAF comparison of 50 selected SNPs in different populations. (Correlation between Dutch SNPs are: rDutch SNPs _ ExAC: 0.94, rDutch SNPs _ ESP: 0.87, rDutch SNPs _ 1000 Genomes: 0.85. b Distribution of 50 selected SNPs in different populations. Correlation between Dutch SNPs is: r Dutch SNPs_Europe: 0.99, rDutch SNPs_South Asia: 0.87, rDutch SNPs_America: 0.86, rDutch SNPs_East Asia: 0.72, rDutch SNPs_Africa: 0.58
Fig. 6Distribution of the number of identical genotype calls in 1357 matching (red) and non-matching (random selection, blue) DNA and RNA samples
Number of sample matches in the DCS study using the 50 and 2622 SNP panels
| Matching category(*) | 50 SNP panel | 2622 SNP panel |
|---|---|---|
| Passed_Matching | 530 | 514 |
| Failed_Matching | 5 | 8 |
| UnsureRNAseq | 3 | 16 |
| Total | 538 | 538 |
“Passed_Matching”: contains RNAseq samples where the identified best GWAS hits are identical to the study’s mapping list
“Failed_Matching”: contains RNAseq samples where the identified best GWAS hits are different from the study’s mapping list
“UnsureRNAseq”: contains RNAseq samples for which no best GWAS hits were found based on our threshold (minimal allelic concordance score of 0.8)