| Literature DB >> 34440331 |
James Margolin Havrilla1, Mengge Zhao1, Cong Liu2, Chunhua Weng2, Ingo Helbig3,4,5,6, Elizabeth Bhoj7,8, Kai Wang1,3,9.
Abstract
Human genetic disorders, such as Down syndrome, have a wide variety of clinical phenotypic presentations, and characterizing each nuanced phenotype and subtype can be difficult. In this study, we examined the electronic health records of 4095 individuals with Down syndrome at the Children's Hospital of Philadelphia to create a method to characterize the phenotypic spectrum digitally. We extracted Human Phenotype Ontology (HPO) terms from quality-filtered patient notes using a natural language processing (NLP) approach MetaMap. We catalogued the most common HPO terms related to Down syndrome patients and compared the terms with those from a baseline population. We characterized the top 100 HPO terms by their frequencies at different ages of clinical visits and highlighted selected terms that have time-dependent distributions. We also discovered phenotypic terms that have not been significantly associated with Down syndrome, such as "Proptosis", "Downslanted palpebral fissures", and "Microtia". In summary, our study demonstrated that the clinical phenotypic spectrum of individual with Mendelian diseases can be characterized through NLP-based digital phenotyping on population-scale electronic health records (EHRs).Entities:
Keywords: Down syndrome; electronic health records; large-scale; longitudinal study; natural language processing; phenotype; phenotypic spectrum; text mining
Mesh:
Year: 2021 PMID: 34440331 PMCID: PMC8393657 DOI: 10.3390/genes12081159
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Population numbers for Down syndrome and baseline patient cohorts. # This details the number of patients, their notes, and notes after filtering out short and low-quality notes.
| Dataset | # Patients | # Notes (with ≥1 HPO Terms) |
|---|---|---|
| DS cases | 4095 | 784,695 |
| DS (filtered) | 3553 | 87,276 |
| Baseline | 7845 | 19,494 |
Demographics for Down syndrome patient cohort. Gender and ethnicity values are Epic-coded. Ethnicity values are not mutually exclusive, and each row combined does not add up to the total number of patients.
| Category | ||
|---|---|---|
| Sex | Male | 2145 |
| Female | 1950 | |
| Ethnicity | White | 2685 |
| Asian | 130 | |
| American Indian | 9 | |
| Native Hawaiian/Pacific Islander | 15 | |
| Black | 635 | |
| Hispanic | 170 | |
| Other | 684 | |
| Unknown | 56 |
Figure 1Phenotypic spectrum of HPO terms for Down syndrome and baseline patient cohorts. (a) These are the top 20 HPO terms from baseline patients ranked by patient frequency, or the proportion of the cohort possessing at least one instance of the term in its notes. (b) The top 20 HPO terms in DS patients ranked by patient frequency, after propagation up to “Phenotypic abnormality” and before filtering on note length and quality. (c) The top 20 HPO terms in DS patients ranked by patient frequency, after both propagation and filtering.
Figure 2Top 100 HPO terms age distributions for the Down syndrome cohort. The number of visits with a certain age within a 3-month bin is represented by the color bar on the right, the top 100 HPO terms are listed on the left-side y-axis, and the age in years from 0–12.5 is listed on the x-axis.
Figure 3Snapshots of select HPO term age distributions for the Down syndrome cohort. The occurrence of various HPO terms at each patient visit reflects the longitudinal distribution (age in years) of each feature in Down syndrome, including (a) Global developmental delay, (b) Delayed speech and language development, (c) Abnormal anterior eye segment morphology, (d) Hypotonia, (e) Microtia, and (f) Abnormal ear physiology.