| Literature DB >> 36092251 |
Minja Pehrsson1, Hanna Heikkinen1, Ulla Wartiovaara-Kautto2, Sampo Mäntylahti1, Pia Bäckström1, Mariann I Lassenius3, Kristiina Uusi-Rauva3, Olli Carpén1,4, Kaisa Elomaa5.
Abstract
Background: Autosomal recessive Gaucher disease (GD) is likely underdiagnosed in many countries. Because the number of diagnosed GD patients in Finland is relatively low, and the true prevalence is currently not known, it was hypothesized that undiagnosed GD patients may exist in Finland. Our previous study demonstrated the applicability of Gaucher Earlier Diagnosis Consensus point-scoring system (GED-C PSS; Mehta et al., 2019) and Finnish biobank data and specimens in the automated point scoring of large populations. An indicative point-score range for Finnish GD patients was determined, but undiagnosed patients were not identified partly due to high number of high-score subjects in combination with a lack of suitable samples for diagnostics in the assessed biobank population. The current study extended the screening to another biobank and evaluated the feasibility of utilising the automated GED-C PSS in conjunction with small nucleotide polymorphism (SNP) chip genotype data from the FinnGen study of biobank sample donors in the identification of undiagnosed GD patients in Finland. Furthermore, the applicability of FFPE tissues and DNA restoration in the next-generation sequencing (NGS) of the GBA gene were tested.Entities:
Keywords: BB, Biobank Borealis of Northern Finland; Biobank study; DF4/DF5, Data freeze 4/5; EHR, Electronic health record; Electronic health record data; FFPE, Formalin-fixed, paraffin embedded; GBA; GBA1/GBA, β-glucocerebrosidase gene; GD, Gaucher disease; GED-C, Gaucher Earlier Diagnosis Consensus; Gaucher disease; Gaucher earlier diagnosis consensus point-scoring system; GlcCer, β-glucosylceramide; GlcCerase, β-glucosylceramidase; GlcSph/Lyso-Gb1, β-glucosylsphingosine; HBB, Helsinki Biobank; HUH, Helsinki University Hospital; HUS, Hospital District of Helsinki and Uusimaa; ICD-10, International Statistical Classification of Diseases and Related Health Problems 10th Revision; NGS, Next-generation sequencing; OUH, Oulu University Hospital; PSS, Point-scoring system; SNP, small nucleotide polymorphism; Small nucleotide polymorphism chip genotype data; VUS, variant of uncertain significance
Year: 2022 PMID: 36092251 PMCID: PMC9449642 DOI: 10.1016/j.ymgmr.2022.100911
Source DB: PubMed Journal: Mol Genet Metab Rep ISSN: 2214-4269
Fig. 1A summary of the formation of the final cohorts 1–4. Subjects of the cohort 1 and 2 represented patients previously diagnosed with or examined for features of Gaucher disease (GD) in Helsinki University Hospital (HUH) and who had samples/sample-derived data available in Helsinki Biobank (HBB). In addition, patients with known GBA variant status [15] and diagnosed with GD in Oulu University Hospital (OUH) with formalin-fixed, paraffin-embedded (FFPE) tissue samples available in Biobank Borealis (BB) were included to increase the number of samples in the cohort 3. Samples of the subjects of the cohort 1 have been analysed in the FinnGen study and are thus included in the final cohort 4 with genotype data as well as electronic health record (EHR) data available in Hospital District of Helsinki and Uusimaa (HUS) (indicated by an asterisk).
Electronic health record data sources and local adjustments to or exclusions of the original (Mehta et al., 2019; [16]) and additional (Mehta et al., 2020; [17]) signs and covariables utilised in the GED-C point scoring in the current study.
| Original or modified GED-C sign/covariable (adjustments to Mehta et al., 2019 [ | Weighted scores | Data source (structural data/text mining/both) | Laboratory and diagnosis code, and Finnish terms used in text mining | GD patients | Automated point scoring |
|---|---|---|---|---|---|
| Splenomegaly, any extent (note the difference to the original scoring protocol) | 3 | Text | [‘suurentunut perna’, ‘pernan suurentuma’, ‘hypersplenismi’, ‘suuri perna’, ‘splenomegalia’, ‘kookas perna’, ‘laajentunut perna’, ‘suurentuneen pernan’] | X | X |
| Disturbed oculomotor function (slow horizontal saccades with unimpaired vision) | 3 | Text | [‘silmän liikkeiden häiriö’, ‘silmävärve’, ‘nystagmus’, ‘silmän liikehäiriö’, ‘nykäisyliike’] | X | X |
| Thrombocytopenia, mild or moderate (platelet count, 50–150 × 109/L) | 2 | Mainly structural (additional data text mined) | B -Trom 50–150 × 109/L ([‘trombosytopenia’, ‘trombopenia’]) | X | X |
| Bone issues, including pain, crises, avascular necrosis, and fractures | 2 | Text | [‘luukipu’, ‘luukriisi’, ‘luusto%kuolio’, ‘luukipu’] | X | X |
| Family history of Gaucher disease | 2 | Text | Manual chart review | X | – |
| Anaemia, mild or moderate (haemoglobin, 95–140 g/L) | 2 | Structural | B -Hb 95–140 g/L | X | X |
| Hyperferritinaemia, mild or moderate (serum ferritin, 300–1,000 μg/L) | 2 | Mainly structural (plasma ferritin; additional data text mined) | P -Ferrit 300–1,000 μg/L ([‘rautalasti’, ‘rautakuorma’, ‘hyperferrit%nemia’]) | X | X |
| Jewish ancestry | 2 | NA | – | – | – |
| Disturbed motor function (impairment of primary motor development) | 2 | Text | [‘parkinsonismi’, ‘vapina’, ‘jäykkyys’, ‘spasmit’] | X | X |
| Hepatomegaly, any extent (note the difference to the original scoring protocol) | 2 | Text | [‘suurentunut maksa’, ‘maksan laajentuma’, ‘hepatosplenomegalia’, ‘suuri maksa’, ‘maksa “melko kookas”‘] | X | X |
| Myoclonus epilepsy | 2 | Text | [‘myoklonaalinen%epilepsia’, ‘myokloninen%epilepsia’, ‘epilepsia%myoklonaalinen’, ‘epilepsia%myokloninen’] | X | X |
| Kyphosis | 2 | Text | [‘kyfoosi’, ‘selkäkyttyrä’, ‘kyfoottinen ryhti’, ‘th-rangan nikam%spontaani murtuma’, ‘th-rangan nikam%spontaani luhistuminen’, ‘äkkijyrkkä mutka rangassa’] | X | X |
| Adult gammopathy – monoclonal or polyclonal | 2 | Both | ICD-10 D47.2, D89.0 [‘Immunoglobuliinien pitoisuuden häiriö’, ‘hypergammaglobulinemia’, ‘gammapatia’] | X | X |
| Anaemia, severe (haemoglobin, < 95 g/L) | 1 | Structural | B -Hb <95 g/L | X | X |
| Hyperferritinaemia, severe (serum ferritin, > 1,000 μg/L) | 1 | Mainly structural (additional data text mined) | P -Ferrit >1,000 μg/L ([‘rautalasti’, ‘rautakuorma’, ‘hyperferrit%nemia’]) | X | X |
| Thrombocytopenia, severe (platelet count, < 50 × 109/L) | 1 | Structural | B -Trom <50 × 109/L. | X | X |
| Gallstones | 0.5 | Text | [‘sappikivet’, ‘sappikiviä’] | X | X |
| Bleeding, bruising or coagulopathy | 0.5 | Both | ICD-10 D68 [‘vuototaipumus’] | X | X |
| Leukopenia | 0.5 | Mainly structural (additional data text mined) | B -Leuk <5 × 109/L ([‘leukopenia’]) | X | X |
| Cognitive deficit | 0.5 | Both | ICD-10 R41.8 [‘kognitiohäiriö’, ‘oppimishäiriö’, ‘kognitiivinen häiriö’] | X | X |
| Low bone mineral density | 0.5 | Both | ICD-10 M80-M82 [‘osteopenia’, ‘osteoporoosi’, ‘osteolyysi’, ‘luubiopsiassa nekroosia’, ‘hankalat luustomuutokset’, ‘luustonekroosi’] | X | X |
| Growth retardation including low body weight | 0.5 | Text | [‘kasvuhäiriö’, ‘pienipainoisuus’, ‘alipainoisuus’, ‘lyhytkasvuisuus’] | X | X |
| Asthenia | 0.5 | Text | [‘astenia’] | X | X |
| Cardiovascular calcification | 0.5 | Both | ICD-10 I70 [‘ateroskleroosi’, ‘sydämen ja verisuonten kalkkeutuminen’, ‘kalkkeutuminen hiippaläpän takapurjeessa’, ‘Hypertensio arterialis’] | X | X |
| Dyslipidaemia | 0.5 | Both | ICD-10 code E78 [‘dyslipidemia’, ‘hyperkolesterolemia’, ‘hypertriglyseridemia’, ‘rasva-aineenvaihdunnan häiriö’] | X | X |
| Elevated angiotensin-converting enzyme levels | 0.5 | Structural | fS-ACE >65 U/L | X | X |
| Fatigue | 0.5 | Both | ICD-10 code R53 [‘väsymys’, ‘lihasheikkous’, ‘uupumus’, ‘fatiikki’] | X | X |
| Pulmonary infiltrates | 0.5 | Text | [‘keuhkojen varjostumat’, ‘keuhkoinfiltraatti’] | X | X |
| Age ≤ 18 years [at diagnosis] | 0.5 | Structural | ≤18 vuotta | X | NA |
| Family history of Parkinson's disease | 0.5 | NA | – | – | – |
| Blood relative who died of foetal hydrops and/or with diagnosis of neonatal sepsis of uncertain aetiology | 0.5 | NA | – | – | – |
| Additional signs/covariables introduced in Mehta et al., 2020 [ | Weighted scores | Data source (structural data/text mining/both) | Laboratory and diagnosis code, and Finnish terms used in text mining | GD patients | Automated point scoring |
| Lymph node enlargement | NA | Both | ICD-10 code R59 [‘suurentuneet imusolmukkeet’, ‘imusolmukkeet suurentuneet’, ‘lymfadenopatia’, ‘lymphadenopatia’] | X (separately) | X (separately) |
| Splenectomy | NA | Text | [‘splenektomia’, ‘poistettu perna’, ‘pernan poisto’, ‘perna poistettu’] | X (separately) | X (separately) |
Abbreviations: ACE, angiotensin converting enzyme; GED-C, Gaucher Earlier Diagnosis Consensus; Hb, haemoglobin; ICD-10, International Statistical Classification of Diseases and Related Health Problems, 10th Revision; NA, not applicable.
“%” refers to any letters between the two parts of the phrase.
“X” indicates that the sign/covariable was included in the scoring of the cohort(s) in question.
Point-scored data was restricted to pretreatment time of GD patients.
List of the GBA variants analysed in the current study.
| Chr | Position | Reference SNP cluster ID | Variant | Clinical significance in ClinVar | Allele frequency (gnomAD | Finnish allele frequency (based on 20,000–25,000 samples; gnomAD | Further information | Availability in each chip version |
|---|---|---|---|---|---|---|---|---|
| 1 | 155235002 | rs75822236 | c.1604G > A | Pathogenic | 0.0001694 | 0.000 | v1 | |
| 1 | 155235195 | rs80356772 | c.1505G > A | Pathogenic/Likely pathogenic | 0.000007963 | 0.000 | v1 | |
| 1 | 155235196 | rs80356771 | c.1504C > T | Pathogenic | 0.00006724 | 0.000 | v1/v2 | |
| 1 | 155235205 | rs369068553 | c.1495G > C | Likely pathogenic | 0.00005176 | 0.000 | v1/v2 | |
| 1 | 155235252 | rs421016 | c.1448 T > C | Pathogenic | 0.001226 | 0.001837 | 1) Genotype analysed from cluster plots validated by NGS. 2) Hot spot variant; most common among Caucasians. | v1/v2 |
| 1 | 155235726 | rs77369218 | c.1343A > T | Likely pathogenic | – | – | v1 | |
| 1 | 155235772 | rs80356769 | c.1297G > T | Pathogenic/Likely pathogenic | 0.00003182 | 0.000 | v1 | |
| 1 | 155235798 | rs772548282 | c.1271 T > C | Likely pathogenic | – | – | v1/v2 | |
| 1 | 155235843 | rs76763715 | c.1226A > G | Pathogenic | 0.002235 | 0.001314 | 1) Genotype analysed from cluster plots validated by NGS. 2) Hot spot variant; most common among Ashkenazi. | v1/v2 |
| 1 | 155236367 | rs374306700 | c.1102C > T | Likely pathogenic/VUS | 0.00001193 | 0.000 | v1/v2 | |
| 1 | 155237427 | rs770796008 | c.913C > G | Likely pathogenic | 0.000003981 | 0.000 | v1/v2 | |
| 1 | 155237438 | rs140955685 | c.902G > A | VUS | 0.0001097 | 0.000 | v1/v2 | |
| 1 | 155238228 | rs61748906 | c.667 T > C | Pathogenic/Likely pathogenic | 0.000007965 | 0.000 | v1 | |
| 1 | 155238597 | rs398123530 | c.508C > T | Pathogenic | 0.000003981 | 0.00004619 | v1 | |
| 1 | 155240660 | rs387906315 | c.84dupG | Pathogenic | 0.00004958 | 0.000 | v1/v2 |
Abbreviations: Chr, chromosome; NGS, next-generation sequencing; SNP, small nucleotide polymorphism; VUS, a variant of uncertain significance.
GRCh38.
Allele frequencies of the variants according to gnomAD v2. 1.1 [20].
Chip v1 and v2 have been utilised in the analysis of the samples in the Data freeze 4 and 5, respectively.
The most prevalent GED-C signs/covariables observed among the previously diagnosed GD patients in the current study (N = 3).
| GED-C sign/covariable | Points | N (max) = 3 |
|---|---|---|
| Splenomegaly | 3 | 3 |
| Thrombocytopenia mild or moderate | 2 | 3 |
| Hyperferritinaemia mild or moderate | 2 | 3 |
| Low bone mineral density | 0.5 | 3 |
| Elevated angiotensin-converting enzyme levels | 0.5 | 3 |
| Anaemia mild or moderate | 2 | 2 |
| Family history of GD | 2 | 2 |
| Hepatomegaly | 2 | 2 |
| Bleeding, bruising or coagulopathy | 1 | 2 |
| Dyslipidaemia | 0.5 | 2 |
The performance of DNA extracted from formalin-fixed paraffin-embedded tissues with and without DNA restoration in the next-generation sequencing of GBA gene (final N = 6).
| ID | Gaucher status | Previously determined | FFPE tissue availability | FFPE age (years) | Successful of the | ||||
|---|---|---|---|---|---|---|---|---|---|
| Unrestored FFPE DNA | Restored FFPE DNA | ||||||||
| Per cent ≥ 20× coverage | Per cent ≥ 20× coverage | ||||||||
| HBB1 | Patient (the cohort 1) | c.1448T>C ho | – | Skin | 10–15 | c.1448T>C ho | 63.12% | c.1448T>C ho | 100.00% |
| HBB3 | Patient (the cohort 1) | – | c.1448T>C he | Muscle | 5–10 | c.1448T>C he | 98.96% | NA (Contamination in sampling) | |
| HBB8 | Suspected (the cohort 2) | – | NA (DNA not available) | Lung | 15–20 | Unsuccessful | 73.71% | c.1448T>C he | 100.00% |
| BB1 | Patient | c. 1226A>G ho | – | Thyroid | 20–25 | c. 1226A>G ho | 94.04% | c. 1226A>G ho | 100.00% |
| BB2 | Patient | c.1448T>C he | – | Endometrium | 20–25 | c.1448T>C he | 100.00% | c.1448T>C he | 100.00% |
| BB4 | Patient | c. 1226A>G ho | – | Appendix | 10–15 | c. 1226A>G ho | 100.00% | NA (Contamination in sampling) | |
Abbreviations: BB, Biobank Borealis; FFPE, formalin-fixed paraffin-embedded; GD, Gaucher disease; HBB, Helsinki Biobank; he, heterozygote; ho, homozygote; NA, not applicable; NGS, next-generation sequencing.
Fig. 2A workflow of the screening for potential undiagnosed Gaucher disease (GD) patients in Helsinki Biobank (HBB). Both the automated GED-C point scoring and small nucleotide polymorphism (SNP) chip genotype data from the FinnGen study were utilised. Two genotype data sets were employed due to release schedules of the raw data (Data freeze 4 and 5).
Fig. 3Cluster plots generated from small nucleotide polymorphism (SNP) genotype data of Helsinki Biobank samples genotyped in the FinnGen study [18]. The plots shown here include the samples of the three previously diagnosed Gaucher disease patients identified in Helsinki Biobank and eligible to the study (HBB1–HBB3 in A–C, respectively; marked with pink circles). One plot corresponds to one variant, while one spot in each plot corresponds to one sample. The colour of the spots and ellipses indicate the computed genotype result and cluster boundaries, respectively, as determined by the SNP chip calling algorithm. The algorithm is not reliable for genotyping rare variants.
Fig. 4The GED-C point-score distribution in Helsinki Biobank (HBB) subpopulations representing all HBB sample donors who have been genotyped in the FinnGen study and who have electronic health record data in the records of the Hospital District of Helsinki and Uusimaa (N ≈ 45,100; black columns), and in respective subgroups of individuals suspected (N = 55; yellow columns) or negative (N = 4,670; grey columns) for the GBA hot spot variants c.1448 T > C or c.1226A > G accompanied by the separately indicated point scores of the previously diagnosed GD patients who were identified in HBB and eligible to the study, and who have also been genotyped in the FinnGen (N = 3; red data points). The point scoring of all samples was carried out in an automated manner. The main image shows the distribution of values (rounded to closest integer) among all assessed subjects while the insert represents subjects with a score of ≥10 (n = 346).
| Term | Author | Definition |
|---|---|---|
| Conceptualization | KE, OC | Ideas; formulation or evolution of overarching research goals and aims |
| Methodology | All | Development or design of methodology; creation of models |
| Validation | HH, UWK, and SM performed the GED-C point scoring. Aggregate data results (pseudonymised) were validated by all authors. MP carried out the genetic analysis of FinnGen genotype data that was partly confirmed by NGS. | Verification, whether as a part of the activity or separate, of the overall replication/reproducibility of results/experiments and other research outputs |
| Formal analysis | HH, MP, SM | Application of statistical, mathematical, computational, or other formal techniques to analyse or synthesize study data |
| Investigation | HH, MP, UWK, SM, PB, OC, the personnel of the participating biobanks | Conducting a research and investigation process, specifically performing the experiments, or data/evidence collection |
| Resources | Participating institutions/parties. Funding from Takeda. | Provision of study materials, reagents, materials, patients, laboratory samples, animals, instrumentation, computing resources, or other analysis tools |
| Data Curation | Participating biobanks and Medaffcon (published aggregate data). | Management activities to annotate (produce metadata), scrub data and maintain research data (including software code, where it is necessary for interpreting the data itself) for initial use and later reuse |
| Writing - Original Draft | KU | Preparation, creation and/or presentation of the published work, specifically writing the initial draft (including substantive translation) |
| Writing - Review & Editing | All | Preparation, creation and/or presentation of the published work by those from the original research group, specifically critical review, commentary or revision – including pre-or postpublication stages |
| Visualization | KU, HH, MP | Preparation, creation and/or presentation of the published work, specifically visualization/data presentation |
| Supervision | KE | Oversight and leadership responsibility for the research activity planning and execution, including mentorship external to the core team |
| Project administration | KE, KU, MIL | Management and coordination responsibility for the research activity planning and execution |
| Funding acquisition | Funding from Takeda | Acquisition of the financial support for the project leading to this publication |