| Literature DB >> 32596513 |
Iñigo Urteaga1,2, Mollie McKillop3, Noémie Elhadad2,3.
Abstract
Endometriosis is a systemic and chronic condition in women of childbearing age, yet a highly enigmatic disease with unresolved questions: there are no known biomarkers, nor established clinical stages. We here investigate the use of patient-generated health data and data-driven phenotyping to characterize endometriosis patient subtypes, based on their reported signs and symptoms. We aim at unsupervised learning of endometriosis phenotypes using self-tracking data from personal smartphones. We leverage data from an observational research study of over 4000 women with endometriosis that track their condition over more than 2 years. We extend a classical mixed-membership model to accommodate the idiosyncrasies of the data at hand, i.e., the multimodality and uncertainty of the self-tracked variables. The proposed method, by jointly modeling a wide range of observations (i.e., participant symptoms, quality of life, treatments), identifies clinically relevant endometriosis subtypes. Experiments show that our method is robust to different hyperparameter choices and the biases of self-tracking data (e.g., the wide variations in tracking frequency among participants). With this work, we show the promise of unsupervised learning of endometriosis subtypes from self-tracked data, as learned phenotypes align well with what is already known about the disease, but also suggest new clinically actionable findings. More generally, we argue that a continued research effort on unsupervised phenotyping methods with patient-generated health data via new mobile and digital technologies will have significant impact on the study of enigmatic diseases in particular, and health in general.Entities:
Keywords: Chronic pain; Computational science; Experimental models of disease; Reproductive signs and symptoms; Statistics
Year: 2020 PMID: 32596513 PMCID: PMC7314826 DOI: 10.1038/s41746-020-0292-9
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1Example screenshots of Phendo, the endometriosis research app.
Participants can answer multiple questions (e.g., related to gastrointestinal and genitourinary issues above) by selecting from a set of answers (e.g., “painful urination” or “frequent urination”).
Phendo cohort (N = 4368) demographics.
| Phendo cohort demographics | |
|---|---|
| Demographic variable | N (%) or Mean (s.d.) |
| Age | |
| Mean (s.d.) | 30.29 (7.0) |
| Gender | |
| Male | 3 (0.1%) |
| Other | 40 (0.9%) |
| Female | 4308 (99.0%) |
| BMI | |
| Underweight | 201 (4.8%) |
| Obese | 1979 (47.2%) |
| Normal | 2015 (48.0%) |
| Race/ethnicity | |
| Native american | 29 (0.7%) |
| Black, non-hispanic | 101 (2.3%) |
| Asian | 111 (2.6%) |
| Hispanic | 215 (4.9%) |
| Other | 290 (6.7%) |
| White, non-hispanic | 3604 (82.9%) |
| Education | |
| High-school or under | 639 (14.7%) |
| Some college | 1320 (30.4%) |
| More than college | 2389 (55.9%) |
| Living environment | |
| Rural | 718 (16.5%) |
| Urban | 1755 (40.4%) |
| Suburban | 1871 (43.1%) |
Summary statistics per-tracked question.
| Summary statistics per-tracked question | ||
|---|---|---|
| Question | Number of observations (mean/max) | Number of tracked days (mean/max) |
| Where is the pain? | 31/2382 | 6/245 |
| Describe the pain. | 29/1745 | 6/245 |
| How severe is the pain? | 10/803 | 6/245 |
| What are you experiencing? | 9/907 | 4/188 |
| How severe is the symptom? | 5/552 | 4/188 |
| Describe your period flow. | 3/243 | 3/243 |
| What kind of bleeding. | 2/173 | 2/77 |
| Describe GI/GU system. | 7/342 | 4/205 |
| How severe is it? | 6/321 | 4/205 |
| Describe sex. | 1/203 | 1/195 |
| Activities difficult to perform. | 42/2148 | 7/404 |
| How was your day? | 13/710 | 13/710 |
| Medications/hormones taken. | 15/1623 | 9/368 |
| Total | 177/8253 | 76/2523 |
The lowest number of responses for the sex-related question occurs because it is not a default question in the Phendo app. Users must go to the app settings and specifically add this question to their daily tracked variables. The app is designed this way because individuals 13 and older are eligible to participate.
WERF survey statistics for participants’ medical history (N = 533).
| WERF survey statistics for participants’ medical history | |
|---|---|
| WERF Survey question | |
| Menstrual and endometriosis history | |
| Age at menarchy: Mean (s.d.) | 12.13 (1.6) |
| Were your periods in the last 3 months hormone-induced? | 165 (31.0%) |
| Were your periods in the the last 3 months regular? | 239 (44.8%) |
| How many days of bleeding did you usually have for each period in the last 3 months (Not counting discharge/spotting for which you needed a panty liner only)? Mean (s.d.) | 6.24 (3.6) |
| At what age did you start having pain with your period? Mean (s.d.) | 14.63 (4.7) |
| How old were you when you first had symptoms? Mean (s.d.) | 17.79 (6.9) |
| How many doctors did you see before receiving a diagnosis of endometriosis? Mean (s.d.) | 5.03 (4.6) |
| How many surgical procedures have you had for endometriosis or pelvic pain? Mean (s.d.) | 1.70 (1.6) |
| Have you ever had surgery to look for endometriosis and none was found? | 45 (8.6%) |
| Family history of endometriosis | 202 (37.9%) |
| Mother | 92 (17.3%) |
| Daughter | 1 (0.2%) |
| Sister | 32 (6.0%) |
| Maternal Grandmother, aunt, and/or cousin | 86 (16.1%) |
| Paternal Grandmother, aunt, and/or cousin | 63 (11.8%) |
| Family history of chronic pelvic pain | 240 (45.0%) |
| Mother | 136 (25.5%) |
| Daughter | 4 (0.8%) |
| Sister | 61 (11.4%) |
| Maternal Grandmother, aunt, and/or cousin | 107 (20.1%) |
| Paternal Grandmother, aunt, and/or cousin | 73 (13.7%) |
| Surgical history | |
| Appendectomy | 89 (16.7%) |
| Hysterectomy | 22 (4.1%) |
| Oophorectomy | 22 (4.1%) |
| Cervical surgery (LEEP or conization) | 21 (3.9%) |
| Hysteroscopy | 71 (13.3%) |
| Gallbladder surgery | 35 (6.6%) |
| Hernia operation | 21 (3.9%) |
| Sigmoidoscopy/colonoscopy | 118 (22.1%) |
| Laparoscopy count | 1.41 (1.3) |
| Some abdominal surgery | 417 (78.2%) |
Note that not all Phendo participants completed the WERF survey.
WERF survey statistics for participants’ comorbidities (N = 533).
| WERF survey statistics for participants’ comorbidities | |
|---|---|
| Diagnosed comorbidities | N (%) |
| Irritable-bowel syndrome | 136 (25.5%) |
| Hashimoto’s disease | 16 (3.0%) |
| Graves’ disease | 3 (0.6%) |
| Glandular Fever | 38 (7.1%) |
| Fibromyalgia | 27 (5.1%) |
| Anxiety disorder requiring medication or therapy | 261 (49.0%) |
| Asthma | 151 (28.3%) |
| Cardiovascular disease | 14 (2.6%) |
| Some cancer | 14 (2.6%) |
| Crohn’s disease | 4 (0.8%) |
| Chronic fatigue syndrome | 31 (5.8%) |
| Deaf/difficulty hearing | 6 (1.1%) |
| Depression/mood disorder requiring medication or therapy | 274 (51.4%) |
| Diabetes | 5 (0.9%) |
| Uterine fibroids | 70 (13.1%) |
| High blood pressure | 33 (6.2%) |
| Migraine | 203 (38.1%) |
| Mitral valve prolapse | 7 (1.3%) |
| Multiple sclerosis | 1 (0.2%) |
| Painful bladder/interstitial cystitis (NOT bacterial bladder infection) | 36 (6.8%) |
| Pelvic inflammatory disease (PID) | 25 (4.7%) |
| Polycystic ovary syndrome (PCOS) | 47 (8.8%) |
| Rheumatoid arthritis | 9 (1.7%) |
| Scoliosis (curvature of the spine) | 54 (10.1%) |
| Other spine problems | 39 (7.3%) |
| Sjogren’s syndrome | 2 (0.4%) |
| Lupus erythematosus | 2 (0.4%) |
| Thyroid disease | 32 (6.0%) |
| Ulcerative colitis | 3 (0.6%) |
| Other chronic condition | 228 (42.8%) |
| Have you been told that you were born with a structural problem/birth defect of your uterus, cervix, or vagina? | 44 (8.9%) |
| General health and activities of daily living | N (%) |
| In general, would you say your health is good? | 296 (55.5%) |
| Has there been a time in your life when you typically had pelvic pain during your periods? | 522 (97.9%) |
| During your last period, did your pelvic pain prevent you from going to work or school or carrying out your daily activities (even if taking pain-killers)? | 280 (70.9%) |
| During your last period, did you have to lie down for any part of the day or longer because of your pelvic pain? | 356 (90.6%) |
| Does your health now limit you in bathing or dressing yourself? | 134 (25.1%) |
| Does your health now limit you in lifting or carrying groceries? | 253 (47.5%) |
| Does your health now limit you in moderate activities, such as moving a table, pushing a vaccum cleaner, bowling, or playing golf? | 351 (65.9%) |
| Does your health now limit you in vigorous activities, such as running, lifting heavy objects, participating in strenuous sports? | 477 (89.5%) |
| Does your health now limit you in walking one block? | 138 (25.9%) |
| Does your health now limit you in walking several blocks? | 272 (51.0%) |
| Does your health now limit you in walking more than a mile? | 327 (61.4%) |
| Does your health now limit you in bending, kneeling or stooping? | 295 (55.3%) |
| Does your health now limit you in climbing one flight of stairs? | 169 (31.7%) |
| Does your health now limit you in climbing several flights of stairs? | 355 (66.6%) |
Note that not all Phendo participants completed the WERF survey.
10-fold cross-validated test data log-likelihood of the proposed method Vs vanilla LDA.
| Test data log-likelihood of the proposed method | ||
|---|---|---|
| Model hyperparameters | Test log-likelihood for the model as in[ | Test log-likelihood for the proposed model (mean ± std) |
| −687951.86 (±47455.33) | −367855.70 (±27324.98) | |
| −688027.21 (±47424.77) | −367728.17 (±27179.73) | |
| −688049.86 (±47477.81) | −367781.96 (±27190.10) | |
| −689056.69 (±47443.14) | −368086.97 (±27197.90) | |
| −689270.81 (±47284.88) | −368368.67 (±27174.22) | |
| −689730.70 (±47553.67) | −368588.16 (±27689.93) | |
| −693364.63 (±47505.99) | −370060.29 (±27350.08) | |
| −693446.41 (±47418.04) | −369991.03 (±27165.18) | |
| −693645.14 (±47480.30) | −370052.10 (±27195.04) | |
| −681064.83 (±47169.90) | −364978.13 (±27062.30) | |
| −681003.93 (±47332.34) | −365462.25 (±27289.95) | |
| −681534.16 (±47021.27) | −365380.51 (±27122.67) | |
| −682631.13 (±47218.50) | −365766.74 (±26798.13) | |
| −682392.99 (±47130.02) | −365806.39 (±27076.40) | |
| −682620.18 (±47179.25) | −365807.88 (±27171.46) | |
| −686273.30 (±47539.28) | −367994.90 (±27034.03) | |
| −686666.58 (±47408.23) | −367859.95 (±26874.11) | |
| −686417.56 (±47156.44) | −367923.44 (±26942.59) | |
| −677435.77 (±47321.00) | −362748.07 (±26930.41) | |
| −677681.05 (±46751.86) | −363277.82 (±27292.59) | |
| −678124.44 (±46816.71) | −363310.62 (±26850.11) | |
| −678858.29 (±47393.52) | −364019.43 (±27006.19) | |
| −679569.77 (±47161.13) | −364008.15 (±27215.34) | |
| −679277.59 (±47150.13) | −364036.90 (±26933.38) | |
| −683839.21 (±46870.69) | −366149.54 (±26829.01) | |
| −683417.91 (±46932.56) | −366384.64 (±26828.00) | |
| −684045.97 (±47494.00) | −366304.03 (±27138.42) | |
| −674507.71 (±47127.03) | −361290.00 (±26836.91) | |
| −674681.24 (±47024.50) | −361318.58 (±26818.05) | |
| −675159.95 (±46797.63) | −361855.70 (±26851.60) | |
| −676658.40 (±47147.61) | −362468.01 (±27138.36) | |
| −676662.81 (±47356.85) | −362369.08 (±26737.76) | |
| −676309.70 (±46958.32) | −362585.89 (±27140.12) | |
| −681362.66 (±46825.38) | −364723.91 (±27100.00) | |
| −681469.84 (±47357.16) | −364799.82 (±27106.59) | |
| −681478.89 (±47564.73) | −364866.80 (±26866.35) | |
Notice the improvement in log-likelihood achieved by the proposed method when compared to the vanilla LDA model as in ref. [33].
Fig. 2Visualization of learned posteriors for endometriosis phenotypes.
Each phenotype is defined as a set of per-question probability distributions across the answers to each of the thirteen questions. Each heatmap represents the likelihood of the answers within a question for a given phenotype—for visual clarity, only the top 10 (most likely) vocabulary items of the posterior are displayed. a Where is the pain? b Describe the pain. c How severe is the pain? d What are you experiencing? e How severe is the symptom? f Describe your period flow. g What kind of bleeding. h Describe GI/GU system. i How severe is it? j How was your day? k Activities difficult to perform. l Describe sex. m Medications/hormones taken. For instance, the “no_sex” answer is highly likely to be tracked under phenotype D, and not likely to be tracked under phenotype A—yellow versus purple respectively, in heatmap l.
Fig. 3Answer-cloud visualization of learned endometriosis phenotypes.
a Answer-cloud for phenotype A. b Answer-cloud for phenotype B. c Answer-cloud for phenotype C. d Answer-cloud for phenotype D. The font size of each answer reflects its likelihood to be tracked within the phenotype. Answers to the same question are depicted with the same color (see legend): e.g., “no_sex” and “avoided_sex”, shown in red, are two of the six potential answers to the sexual activity questions.
Fig. 4Posterior assignment probability of each participant across the phenotypes learned by the model.
While the model provides membership probabilities for each participant across phenotypes, most participants are clearly assigned to a single phenotype (assignment probability above 0.9, in yellow in the heatmap).
Fig. 5Learned phenotype assignments are not correlated with the number of days, number of observations tracked, nor the ratio of observations per day tracked by participants.
Posterior assignment probability of each participant across the phenotypes learned by the model, ordered by a number of days, b number of observations, and c ratio of observations per day tracked by each participant. We observe no correlation between the phenotype assignments and the number of days, number of observations tracked, nor their ratio.
Phenotype confusion matrix for Expert 1.
| Phenotype confusion matrix for Expert 1 | ||||
|---|---|---|---|---|
| Expert 1 | Expert 1 | Expert 1 | Expert 1 | |
| Phenotype A | Phenotype B | Phenotype C | Phenotype D | |
| Model Phenotype A | 7 | 0 | 0 | 2 |
| Model Phenotype B | 1 | 6 | 3 | 1 |
| Model Phenotype C | 0 | 3 | 5 | 2 |
| Model Phenotype D | 1 | 1 | 2 | 6 |
Phenotype confusion matrix for Expert 2.
| Phenotype confusion matrix for Expert 2 | ||||
|---|---|---|---|---|
| Expert 2 | Expert 2 | Expert 2 | Expert 2 | |
| Phenotype A | Phenotype B | Phenotype C | Phenotype D | |
| Model Phenotype A | 5 | 2 | 1 | 1 |
| Model Phenotype B | 2 | 9 | 0 | 0 |
| Model Phenotype C | 1 | 5 | 1 | 3 |
| Model Phenotype D | 1 | 3 | 3 | 3 |
Confusion matrices for severe cases.
| Confusion matrices for severe cases | ||||
|---|---|---|---|---|
| Expert 1 | Expert 1 | Expert 2 | Expert 2 | |
| Severe case | Non-severe case | Severe case | Non-severe case | |
| Model Severe case | 7 | 2 | 5 | 4 |
| Model Non-severe case | 2 | 29 | 4 | 27 |
Confusion matrices for mild cases.
| Confusion matrices for mild cases | ||||
|---|---|---|---|---|
| Expert 1 | Expert 1 | Expert 2 | Expert 2 | |
| Mild case | Non-mild case | Mild case | Non-mild case | |
| Model Mild case | 6 | 5 | 9 | 2 |
| Model Non-mild case | 4 | 25 | 10 | 19 |