| Literature DB >> 35945607 |
Matthew T Patrick1, Redina Bardhi2,3, Wei Zhou4,5,6, James T Elder2, Johann E Gudjonsson2, Lam C Tsoi7,8,9.
Abstract
BACKGROUND: Rare diseases collectively affect up to 10% of the population, but often lack effective treatment, and typically little is known about their pathophysiology. Major challenges include suboptimal phenotype mapping and limited statistical power. Population biobanks, such as the UK Biobank, recruit many individuals who can be affected by rare diseases; however, investigation into their utility for rare disease research remains limited. We hypothesized the UK Biobank can be used as a unique population assay for rare diseases in the general population.Entities:
Keywords: Demographics; Genetic associations; Phenotyping; Rare disease; UK Biobank
Mesh:
Year: 2022 PMID: 35945607 PMCID: PMC9364550 DOI: 10.1186/s13073-022-01094-y
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 15.266
Sample sizes obtained for some of the rare disorders. For each of the rare diseases we identified in the UK Biobank, we provide the set of ICD-10 codes that map directly to the codes from Orphanet, such that the ORPHA code is no more specific that the ICD-10 codes. We then provide the number of individuals with that rare disease in the UK Biobank (UKB count) along with the group the disease belongs to, based on its ICD-10 chapter
| Disease name | ORPHA code | ICD-10 code | UKB count | Group |
|---|---|---|---|---|
| Addison’s disease | 85138 | E27.1 | 242 | Endocrine/metabolic |
| Waldenström macroglobulinemia | 33226 | C88.0 | 109 | Neoplasms |
| Marfan syndrome | 558 | Q87.4 | 93 | Congenital |
| Beta-thalassemia | 848 | D56.1 | 71 | Blood |
| Autosomal dominant tubulointerstitial kidney | 34149 | Q61.5 | 41 | Congenital |
| Congenital ptosis | 91411 | Q10.0 | 31 | Congenital |
| Tetralogy of Fallot | 3303 | Q21.3 | 19 | Congenital |
| Congenital renal artery stenosis | 97598 | Q27.1 | 7 | Congenital |
| Autosomal dominant epidermolytic ichthyosis | 312 | Q80.3 | <5 | Congenital |
| Fragile X syndrome | 908 | Q99.2 | <5 | Congenital |
| Reye syndrome | 3096 | G93.7 | <5 | Neurological |
Fig. 1Rare diseases in the UK Biobank. a Density plot of the 420 rare diseases we identified in the UK Biobank by mapping ICD-10 codes to ORPHA codes. The x-axis shows the log10 number of individuals recorded as having each disease, while the y-axis shows the density of diseases with that number of individuals. The red dashed line indicates the (fewer than 1 in 2000) prevalence criterion for rare diseases in Europe. b Scatter plot comparing the prevalence of rare diseases in the UK Biobank and the Optum dataset. Each point represents a rare disease, and the dotted red line represents the linear regression between the prevalence in Optum and the UK Biobank. c The groups of rare diseases identified in the UK Biobank are shown as a bar plot, with the y-axis indicating the number of diseases in each group. Overlaid is a second bar plot (in red), with the y-axis indicating the number of individuals who have at least one disease in each group. d Hexagon/scatter plot showing the mean age at recruitment and proportion of males for each disease. The x-axis shows the percentage of males recorded as having each disease and the y-axis shows the mean age at recruitment. The hexagons show the density of diseases at each mean age and sex proportion, while the asterisks indicate the actual values for a particular disease. The red dashed line shows the overall mean age and sex proportion in the UK Biobank. e Box/scatter plot of comorbidities for rare diseases. The y-axis shows, for each rare disease, the percentage of individuals who have at least one comorbidity in each group (excluding the rare disease itself)
Fig. 2Sex of individuals with different groups of rare disease. Each bar plot presents the number of male and female individuals in the UK Biobank who have at least one rare disease from a particular group. Non-overlapping groups of rare diseases were identified from their corresponding ICD-10 chapters. For each group, we conducted a Fisher enrichment test, comparing the number of males and females in the group with the number of males and females in the UK Biobank overall; p-values and odds ratios are provided under each bar plot and the subtitles of groups with significant sex bias are indicated in bold font
Significant variant-level associations. We applied SAIGE’s GLMM test with Bonferroni adjustment to identify significant variant associations for each rare disease and Bonferroni adjustment is applied. Genomic positions are provided in hg38 build and HGVS nomenclature. The ACMG/AMP classification for each association was determined through the use of Varsome and InterVar. Associations are specified as being reported if they have previously been indicated as pathogenic or likely pathogenic in ClinVar for that disease. Abbreviations are as follows: MAC, minor allele count; MAF, minor allele frequency. Although additional significant variant-level associations were identified for systemic lupus erythematosus and von Willebrand disease, these diseases were indicated as not being rare in the USA by NIH’s GARD. Furthermore, significant variant-level associations with interatrial communication, benign epithelial tumor salivary glands, and endophthalmitis were excluded because these diseases were not listed on NIH’s GARD, so it is difficult to confirm their rareness in the USA
| Disease | ORPHA | Marker | ACMG/AMP classification | Reported? | Case MAC | Percent affected | Control MAF | |
|---|---|---|---|---|---|---|---|---|
| Polycythemia vera | 729 | 9:5073770_G/T ( NC_000009.12:g.5073770G>T | Pathogenic (PS3/PS4) | Yes | 1.32 × 10−114 | 51/370 | 47% | 1.71 × 10−04 |
| Chronic myeloproliferative disease | 86830 | 9:5073770_G/T ( NC_000009.12:g.5073770G>T | Pathogenic (PS3/PS4) | Yes | 2.40 × 10−67 | 30/154 | 28% | 2.33 × 10−04 |
| Essential thrombocythemia | 3318 | 9:5073770_G/T ( NC_000009.12:g.5073770G>T | Pathogenic (PS3/PS4) | Yes | 2.91 × 10−42 | 21/218 | 19% | 2.60 × 10−04 |
| Primary myelofibrosis | 824 | 9:5073770_G/T ( NC_000009.12:g.5073770G>T | Pathogenic (PS3/PS4) | Yes | 5.30 × 10−40 | 16/52 | 15% | 2.75 × 10−04 |
| Immune thrombocytopenic purpura | 3002 | 9:5073770_G/T ( NC_000009.12:g.5073770G>T | Pathogenic (PS3/PS4) | No | 2.63 × 10−18 | 11/368 | 10% | 2.90 × 10−04 |
| Chronic myelomonocytic leukemia | 98823 | 17:76736877_G/A ( NC_000017.11:g.76736877G>A | Likely pathogenic (PS4/PM1) | No | 1.19 × 10−13 | 4/32 | 27% | 3.29 × 10−05 |
| Essential thrombocythemia | 3318 | 19:12943813_A/ATTGTC ( NC_000019.10:g.12943813_12943814insTTGTC | Pathogenic (PVS1/PS4) | No | 2.82 × 10−13 | 5/218 | 50% | 1.50 × 10−05 |
| Beta-thalassemia | 848 | 11:5226774_G/A ( NC_000011.10:g.5226774G>A | Pathogenic (PVS1/PS4) | Yes | 3.46 × 10−12 | 3/12 | 33% | 1.79 × 10−05 |
| Congenital factor XI deficiency | 329 | 4:186288589_T/G ( NC_000004.12:g.186288589T>G | Pathogenic (PS4/PM1/PM2/PP2/PP3) | No | 3.41 × 10−11 | 3/18 | 12% | 6.58 × 10−05 |
| B-cell chronic lymphocytic leukemia | 67038 | 3:38141150_T/C ( NC_000003.12:g.38141150T>C | Pathogenic (PS4/PM2/PM4/PP3/PP5) | Yes | 2.42 × 10−10 | 4/490 | 57% | 8.98 × 10−06 |
| Acute panmyelosis with myelofibrosis | 86843 | 9:5073770_G/T ( NC_000009.12:g.5073770G>T | Pathogenic (PS3/PS4) | No | 7.81 × 10−10 | 3/8 | 3% | 3.14 × 10−04 |
| Immune thrombocytopenic purpura | 3002 | 16:83907050_G/A ( NC_000016.10:g.83907050G>A | Likely pathogenic (PS4/PM2) | No | 7.60 × 10−08 | 4/368 | 8% | 1.35 × 10−04 |
| Osteochondritis dissecans | 2764 | 17:10505866_C/T ( NC_000017.11:g.10505866C>T | Likely pathogenic (PS4/PM1) | No | 1.01 × 10−07 | 3/56 | 3% | 2.84 × 10−04 |
| AA amyloidosis | 85445 | 2:151727817_T/TGCTGGCTGTGCCAGA ( NC_000002.12:g.151727823_151727837dup | Likely pathogenic (PS4/PM4) | No | 1.97 × 10−07 | 3/24 | 1% | 7.56 × 10−04 |