| Literature DB >> 28926565 |
Yonghong Du1, Joshua S Martin2, John McGee3, Yuchen Yang2, Eric Yi Liu4, Yingrui Sun5, Matthias Geihs6, Xuejun Kong7, Eric Lingfeng Zhou8, Yun Li2,4,8, Jie Huang9,10.
Abstract
In the current precision medicine era, more and more samples get genotyped and sequenced. Both researchers and commercial companies expend significant time and resources to reduce the error rate. However, it has been reported that there is a sample mix-up rate of between 0.1% and 1%, not to mention the possibly higher mix-up rate during the down-stream genetic reporting processes. Even on the low end of this estimate, this translates to a significant number of mislabeled samples, especially over the projected one billion people that will be sequenced within the next decade. Here, we first describe a method to identify a small set of Single nucleotide polymorphisms (SNPs) that can uniquely identify a personal genome, which utilizes allele frequencies of five major continental populations reported in the 1000 genomes project and the ExAC Consortium. To make this panel more informative, we added four SNPs that are commonly used to predict ABO blood type, and another two SNPs that are capable of predicting sex. We then implement a web interface (http://qrcme.tech), nicknamed QRC (for QR code based Concordance check), which is capable of extracting the relevant ID SNPs from a raw genetic data, coding its genotype as a quick response (QR) code, and comparing QR codes to report the concordance of underlying genetic datasets. The resulting 80 fingerprinting SNPs represent a significant decrease in complexity and the number of markers used for genetic data labelling and tracking. Our method and web tool is easily accessible to both researchers and the general public who consider the accuracy of complex genetic data as a prerequisite towards precision medicine.Entities:
Mesh:
Year: 2017 PMID: 28926565 PMCID: PMC5604942 DOI: 10.1371/journal.pone.0182438
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Cross tabulation of bi-allelic autosomal SNPs across eight arrays.
| 865,720 | 272,701 | 207,468 | 128,503 | 82,373 | 70,227 | 61,240 | 21,941 | |
| 272,701 | 800,194 | 359,529 | 289,548 | 103,360 | 91,747 | 103,139 | 65,910 | |
| 172,088 | 172,088 | 629,487 | 105,807 | 77,132 | 65,734 | 232,406 | 185,863 | |
| 39,292 | 39,292 | 39,292 | 733,348 | 185,489 | 113,481 | 192,333 | 54,913 | |
| 15,905 | 15,905 | 15,905 | 15,905 | 693,518 | 303,948 | 253,917 | 18,683 | |
| 10,478 | 10,478 | 10,478 | 10,478 | 10,478 | 510,550 | 128,062 | 15,684 | |
| 8,385 | 8,385 | 8,385 | 8,385 | 8,385 | 8,385 | 540,551 | 233,277 | |
| 3,239 | 3,239 | 3,239 | 3,239 | 3,239 | 3,239 | 3,239 | 238,468 |
The numbers highlighted in grey along the diagonal line are for each individual SNP panel. The upper diagonal numbers are the numbers of overlapping SNPs for each corresponding pair. The lower diagonal numbers (shown in italicized font with an underline) are the cumulative numbers of overlapping SNPs for each corresponding pair. For example, for the second column, there are 865,720 SNPs in Axiom PMRA array, among which 272,701 are also present in Axiom UK Biobank array, among the 272,701, 172,088 are also in Axiom Biobank array, and among the 172,088, 39,292 are also on Illumina GSA array, etc; and eventually, 3,239 are shared across all eight arrays.
List of fingerprint SNPs.
| # | Chr | Pos (b37) | rsID | Ref | Alt | RAF | # | Chr | Pos (b37) | rsID | Ref | Alt | RAF |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 7,202,190 | rs970973 | T | C | 0.539 | 8 | 1,514,009 | rs2301963 | C | A | 0.477 | ||
| 1 | 34,071,525 | rs1874045 | C | T | 0.571 | 8 | 30,973,957 | rs1800392 | G | T | 0.446 | ||
| 1 | 110,998,854 | rs7514102 | G | A | 0.435 | 8 | 121,228,679 | rs4870723 | A | C | 0.512 | ||
| 1 | 161,479,745 | rs1801274 | A | G | 0.479 | 8 | 143,761,931 | rs2294008 | C | T | 0.306 | ||
| 1 | 183,542,387 | rs2274064 | T | C | 0.489 | 9 | 4,576,680 | rs301430 | T | C | 0.364 | ||
| 1 | 203,194,186 | rs2297950 | C | T | 0.303 | 9 | 15,784,631 | rs1539172 | A | G | 0.478 | ||
| 1 | 225,534,219 | rs7527925 | T | C | 0.476 | 9 | 116,136,198 | rs1043836 | C | T | 0.615 | ||
| 1 | 248,039,713 | rs3811445 | A | G | 0.608 | 9 | 133,927,878 | rs10901333 | A | G | 0.459 | ||
| 2 | 26,804,247 | rs935172 | T | C | 0.547 | 10 | 6,001,696 | rs3136618 | C | T | 0.507 | ||
| 2 | 101,638,888 | rs3739014 | A | G | 0.607 | 10 | 30,316,208 | rs2185724 | T | C | 0.373 | ||
| 2 | 113,309,473 | rs1545133 | C | T | 0.523 | 10 | 99,498,234 | rs3818876 | G | A | 0.53 | ||
| 2 | 138,420,996 | rs10206850 | A | G | 0.543 | 10 | 124,610,027 | rs1891110 | G | A | 0.528 | ||
| 2 | 191,301,368 | rs9646748 | A | G | 0.485 | 10 | 134,748,331 | rs12781609 | C | T | 0.402 | ||
| 2 | 207,041,053 | rs3732083 | T | C | 0.458 | 11 | 14,246,296 | rs1025412 | G | A | 0.515 | ||
| 2 | 237,149,941 | rs6756597 | C | T | 0.479 | 11 | 33,065,394 | rs1064005 | C | T | 0.38 | ||
| 3 | 14,755,572 | rs6765537 | A | G | 0.391 | 11 | 73,785,326 | rs4453265 | T | C | 0.476 | ||
| 3 | 52,727,257 | rs2289247 | G | A | 0.429 | 12 | 16,397,734 | rs1852450 | C | A | 0.489 | ||
| 3 | 100,963,154 | rs571391 | G | A | 0.652 | 12 | 58,162,739 | rs703842 | A | G | 0.385 | ||
| 3 | 122,259,606 | rs9851180 | T | C | 0.538 | 12 | 125,467,158 | rs11558556 | C | T | 0.361 | ||
| 3 | 193,209,178 | rs6788448 | T | C | 0.427 | 13 | 33,703,656 | rs495680 | T | C | 0.585 | ||
| 4 | 42,639,186 | rs898500 | A | G | 0.481 | 13 | 50,141,345 | rs4942848 | G | A | 0.616 | ||
| 4 | 79,443,850 | rs931606 | G | A | 0.519 | 14 | 23,299,135 | rs1135641 | G | T | 0.464 | ||
| 4 | 187,120,211 | rs13146272 | C | A | 0.585 | 14 | 73,138,189 | rs1060570 | C | A | 0.449 | ||
| 5 | 1,065,399 | rs737154 | C | T | 0.525 | 14 | 101,350,298 | rs3825569 | T | C | 0.506 | ||
| 5 | 52,193,287 | rs1531545 | C | T | 0.554 | 16 | 4,751,045 | rs863980 | C | T | 0.533 | ||
| 5 | 73,339,114 | rs285599 | C | T | 0.394 | 16 | 29,998,200 | rs4077410 | A | G | 0.491 | ||
| 5 | 96,503,523 | rs160632 | C | T | 0.586 | 16 | 56,995,236 | rs1800775 | C | A | 0.459 | ||
| 5 | 150,943,085 | rs2304054 | G | A | 0.465 | 17 | 14,005,439 | rs2159132 | G | A | 0.522 | ||
| 5 | 169,685,163 | rs315717 | C | T | 0.508 | 17 | 33,749,546 | rs2586514 | A | G | 0.602 | ||
| 6 | 31,610,686 | rs1052486 | A | G | 0.499 | 17 | 57,963,537 | rs1292053 | A | G | 0.446 | ||
| 6 | 129,807,629 | rs2229848 | C | T | 0.667 | 17 | 71,196,809 | rs1026128 | A | G | 0.523 | ||
| 6 | 147,680,359 | rs9390459 | A | G | 0.532 | 18 | 60,027,241 | rs1805034 | C | T | 0.537 | ||
| 6 | 167,360,389 | rs2236313 | T | C | 0.375 | 19 | 4,288,332 | rs888930 | A | G | 0.412 | ||
| 7 | 33,282,577 | rs7793096 | G | A | 0.502 | 19 | 17,394,124 | rs2363956 | T | G | 0.486 | ||
| 7 | 99,757,612 | rs3823646 | G | A | 0.537 | 19 | 49,658,367 | rs3745298 | C | T | 0.459 | ||
| 7 | 141,672,604 | rs10246939 | T | C | 0.476 | 20 | 52,786,219 | rs2296241 | G | A | 0.492 | ||
| 7 | 156,762,248 | rs12919 | G | A | 0.515 | 22 | 19,951,271 | rs4680 | G | A | 0.462 |
The resulting 74 SNPs sorted by chromosome and position as reported by build 37 reference genome. The RAF is based on 1000GP.
Fig 1Reference allele frequency of the selected 80 SNPs.
Reference allele frequency across the five major population groups (African: AFR, European: EUR, Native American: AMR, Eastern Asian: EAS and Southern Asian: SAS) and overall as reported by 1000GP and ExAC. Y-axis is the RAF in ExAC.
Fig 2The QRC website interface.
A. The interface allows a user to first upload genetic data to generate a QR code and save it into his local computer, and then compare any two QR codes for concordance check. Researchers could also generate their own ID SNPs. B. A sample report, based on genotype datasets for two different individuals. The report includes the number of missing SNPs and the overlap of non-missing SNPs and the type of matches.