| Literature DB >> 31194742 |
Lars G Fritsche1,2, Lauren J Beesley1, Peter VandeHaar1,2, Robert B Peng1, Maxwell Salvatore1, Matthew Zawistowski1,2, Sarah A Gagliano Taliun1,2, Sayantan Das1,2, Jonathon LeFaive1,2, Erin O Kaleba3, Thomas T Klumpner3,4, Stephanie E Moser3, Victoria M Blanc5, Chad M Brummett3,4, Sachin Kheterpal3,4, Gonçalo R Abecasis1,2, Stephen B Gruber6, Bhramar Mukherjee1,2,7,8,9.
Abstract
Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.Entities:
Mesh:
Year: 2019 PMID: 31194742 PMCID: PMC6592565 DOI: 10.1371/journal.pgen.1008202
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Demographics and clinical characteristics of the analytic datasets.
| Characteristic | MGI | UK Biobank |
|---|---|---|
| n | 30,702 | 408,961 |
| Females, n (%) | 16,297 (53.1%) | 221,052 (54.1%) |
| Mean age, years (S.D.) | 54.2 (15.9) | 57.7 (8.1) |
| Median number of visits per participant | 27 | not available |
| Median days between first and last visit | 1,469 | not available |
| Total number of ICD9 code days | 3,459,331 | 49,085 |
| Number of unique ICD9 codes | 10,323 | 3,126 |
| Median ICD9 code days per participant | 58 | 2 |
| Total number of ICD10 code days | 1,311,264 | 2,764,868 |
| Number of unique ICD10 codes | 14,997 | 11,059 |
| Median ICD10 code days per participant | 27 | 6 |
| Total number of PheWAS code days | 6,367,117 | 3,679,624 |
| Number of unique PheWAS codes | 1,856 | 1,680 |
| Median PheWAS code days per participant | 94 | 8 |
| n samples without skin cancer diagnosis | 26,199 | 395,179 |
| n cases with skin cancer | 4,503 | 13,782 (13,624 |
| n cases with melanomas of skin | 1,772 | 2,724 (2,718 |
| n cases with epithelial skin cancer and others | 3,220 | 11,152 (11,030 |
| n cases with basal cell carcinoma | 1,303 | not available |
| n cases with squamous cell carcinoma | 836 | not available |
a The provided characteristics are based a subset of White British subjects of the UK Biobank Study for which phenotype data and imputed data was available. To retain as many unrelated cases as possible for each trait, a maximal set of unrelated cases was identified before choosing controls from the pool of subjects unrelated to these cases or to each other.
b Original PheWAS code “172.2” description "Other non-epithelial cancer of skin".
c Unrelated cases
ICD9 and ICD10: International Statistical Classification of Diseases codes (9th and 10th revision), MGI based on code systemts with clinical modiciations ICD9-CM and ICD10-CM; S.D. standard deviation
Associations of constructed PRS with skin cancer traits in MGI.
| PRS | Skin cancer | Melanoma | Basal cell carcinoma | Squamous cell carcinoma | |
|---|---|---|---|---|---|
| Melanoma | PRS OR | 1.3 (1.26,1.34) | 1.3 (1.23,1.38) | 1.23 (1.14,1.32) | |
| Basal cell carcinoma | PRS OR | 1.32 (1.27,1.36) | 1.31 (1.25,1.37) | 1.32 (1.23,1.42) | |
| Squamous cell carcinoma | PRS OR | 1.25 (1.21,1.3) | 1.32 (1.26,1.39) | 1.35 (1.28,1.43) | 1.26 (1.17,1.35) |
| Melanoma | PRS OR | 1.31 (1.27,1.36) | 1.49 (1.42,1.57) | 1.39 (1.3,1.47) | 1.25 (1.16,1.34) |
| Basal cell carcinoma | PRS OR | 1.32 (1.28,1.37) | 1.33 (1.27,1.4) | 1.62 (1.53,1.71) | 1.34 (1.25,1.44) |
| Squamous cell carcinoma | PRS OR | 1.34 (1.3,1.38) | 1.42 (1.36,1.49) | 1.47 (1.4,1.56) | |
a Association of each cancer with continuous PRS that were transformed to standard normal distribution. Point estimates, 95% confidence intervals and P- values are obtained by fitting Firth’s bias-corrected logistic regression adjusted for age, sex, batch and PC1-4 to the full data.
b Area under the curve of the receiver operating characteristic (ROC) curve with 95% confidence intervals calculated using the test data after fitting a model with the training data.
c Hosmer-Lemeshow Goodness-of-Fit test for the test data after fitting a model with the training data
d Number of cases in training / test set
Fig 1PRS-PheWAS in MGI and UKB phenomes.
The horizontal line indicates phenome-wide significance. Phenome-wide significant traits are indicated by PheCodes with their description listed below. Directional triangles indicate whether a phenome-wide significant trait was positively (pointing up) or negatively (pointing down) associated with the PRS.
Fig 2Overlap between the three skin cancer trait loci.
Reported risk SNPs within 1 Mb were merged into the same locus. Loci that were also reported to be associated with skin tanning ability are highlighted in bold. Loci were named according to the closest RefSeq genes (except M1CR a 385 kb locus with 16 RefSeq genes and HV745896 named after a nearby, uncurated mRNA sequence).
Phenome-wide significant phenotypes in MGI identified using various PRS construction strategies.
| Melanoma | Basal cell carcinoma | Squamous cell carcinoma | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | PRS Constructed Using UK Biobank Summary Statistics | LDpred | mPRS | mPRS | bPRS | bPRS | sPRS | sPRS | ||||||
| Data Source | MGI | MGI | MGI | UKB | MGI | UKB | MGI | UKB | ||||||
| Phenotype | PheCode | 5x10-9 | 5x10-8 | 5x10-7 | 5x10-6 | 5x10-5 | 5x10-4 | 0.001% | ||||||
| Melanomas of skin | 172.11 | * | * | * | * | * | * | * | * | * | * | * | * | * |
| Melanomas of skin, dx or hx | 172.1 | * | * | * | * | * | * | * | * | * | * | * | * | |
| Skin cancer | 172 | * | * | * | * | * | * | * | * | * | * | * | * | |
| Other non-epithelial cancer of skin | 172.2 | * | * | * | * | * | * | * | * | * | * | * | * | |
| Carcinoma in situ of skin | 172.3 | * | * | * | * | * | * | * | * | * | * | * | * | |
| Basal cell carcinoma | 172.21 | * | * | * | * | * | * | * | n/a | * | n/a | * | n/a | |
| Actinic keratosis | 702.1 | * | * | * | * | * | * | * | * | * | * | * | ||
| Benign neoplasm of skin | 216 | * | * | * | * | * | * | * | * | |||||
| Squamous cell carcinoma | 172.22 | * | * | * | * | * | * | n/a | * | n/a | * | n/a | ||
| Secondary malignant neoplasm of skin | 198.7 | * | * | * | * | * | * | * | ||||||
| Disorder of skin and subcutaneous tissue NOS | 689 | * | * | * | * | * | * | * | * | * | * | |||
| Benign neoplasm of lymph nodes | 229.1 | * | * | * | * | * | ||||||||
| Neoplasm of uncertain behavior of skin | 173 | * | * | * | * | * | * | * | ||||||
| Degenerative skin conditions & other dermatoses | 702 | * | * | * | * | * | * | * | * | * | ||||
| Secondary malignancy of lymph nodes | 198.1 | * | * | * | * | |||||||||
| Dermatitis due to solar radiation | 938 | * | * | * | * | * | * | * | ||||||
| Chronic dermatitis due to solar radiation | 938.2 | * | * | * | * | * | * | * | ||||||
| Benign neoplasm of unspecified sites | 229 | * | * | * | * | |||||||||
| Secondary malignant neoplasm | 198 | * | * | * | * | |||||||||
| Scar conditions and fibrosis of skin | 701.2 | * | * | * | * | * | ||||||||
| Seborrheic keratosis | 702.2 | * | * | * | * | * | ||||||||
| Sebaceous cyst | 706.2 | * | * | * | ||||||||||
| Malignant neoplasm, other | 195.1 | * | * | * | ||||||||||
| Cancer, suspected or other | 195 | * | * | * | ||||||||||
| Other hypertrophic and atrophic conditions of skin | 701 | * | * | * | ||||||||||
| Diseases of sebaceous glands | 706 | * | * | |||||||||||
| Diseases of hair and hair follicles | 704 | * | * | |||||||||||
| Diaphragmatic hernia | 550.2 | * | * | |||||||||||
| # SNPs: | 6 | 9 | 13 | 27 | 156 | 1193 | 6.4x106 | 29 | 29 | 32 | 32 | 18 | 18 | |
* Indicates phenotype that reached phenome-wide significance in the corresponding PRS-PheWAS
a Including only diseases identified in at least two PRS-PheWAS
b Evaluated at different depths (p-value thresholds indicated below)
c Modelled proportion of causal variants
Notes: mPRS, bPRS, and sPRS: chosen PRS for melanoma, basal cell carcinoma, and squamous cell carcinoma. mPRS and bPRS are based on GWAS catalog entries while sPRS is based on the single, latest GWAS results [30]; n/a: not available in ICD-based phenome of the UK Biobank.
Fig 3Example view from PRSweb (see web resources).
A selection menu on top allows selection of PRS constructs and phenome while interactive plots with “PheWAS results” and “Exclusion PheWAS results” are generated after selection. “Associations between PRS and Selected Phenotype” plots are generated after clicking on a triangle in the PheWAS plots. Detailed summary statistics for each trait association are provided in mouseover elements (shown in grey). Underlying weights of a selected PRS can be downloaded via bottons below the plots (blue).