| Literature DB >> 29623569 |
Jean T Pantel1,2,3, Max Zhao1,2, Martin A Mensah1,3, Nurulhuda Hajjir1, Tzung-Chien Hsieh1,2, Yair Hanani4, Nicole Fleischer4, Tom Kamphans5, Stefan Mundlos1, Yaron Gurovich4, Peter M Krawitz6,7,8.
Abstract
Significant improvements in automated image analysis have been achieved in recent years and tools are now increasingly being used in computer-assisted syndromology. However, the ability to recognize a syndromic facial gestalt might depend on the syndrome and may also be confounded by severity of phenotype, size of available training sets, ethnicity, age, and sex. Therefore, benchmarking and comparing the performance of deep-learned classification processes is inherently difficult. For a systematic analysis of these influencing factors we chose the lysosomal storage diseases mucolipidosis as well as mucopolysaccharidosis type I and II that are known for their wide and overlapping phenotypic spectra. For a dysmorphic comparison we used Smith-Lemli-Opitz syndrome as another inborn error of metabolism and Nicolaides-Baraitser syndrome as another disorder that is also characterized by coarse facies. A classifier that was trained on these five cohorts, comprising 289 patients in total, achieved a mean accuracy of 62%. We also developed a simulation framework to analyze the effect of potential confounders, such as cohort size, age, sex, or ethnic background on the distinguishability of phenotypes. We found that the true positive rate increases for all analyzed disorders for growing cohorts (n = [10...40]) while ethnicity and sex have no significant influence. The dynamics of the accuracies strongly suggest that the maximum distinguishability is a phenotype-specific value, which has not been reached yet for any of the studied disorders. This should also be a motivation to further intensify data sharing efforts, as computer-assisted syndrome classification can still be improved by enlarging the available training sets.Entities:
Mesh:
Year: 2018 PMID: 29623569 PMCID: PMC5959962 DOI: 10.1007/s10545-018-0174-3
Source DB: PubMed Journal: J Inherit Metab Dis ISSN: 0141-8955 Impact factor: 4.982
Fig. 1Overview of the original sample set with sex ratios (male/female/sex not mentioned) and ethnic backgrounds of European (left) vs. Non-European (right)
Fig. 2Frequency of occurrence of the five disorders as differential diagnoses (DDx) under the first 30 ranks in Face2Gene CLINIC in the respective test groups. The proportion of the correct diagnosis at the first rank is hatched. For instance, the correct diagnosis “MPS” appears in the MPS I and II cohort in 34% of the cases at the top position and in altogether 70% in the top 30. With about 300 DDx to choose from in gestalt match a frequency of occurrence above 10% in the top 30 ranks (dotted line) indicates phenotypic similarity
Fig. 3The performance of the gestalt-model in the multi-class problem in Face2Gene RESEARCH is shown as a color-coded confusion matrix, where deep red corresponds to a high value. True positive rates (TPR) are on the diagonal and false negatives and positives rates aside. The whole classification process achieves an accuracy of 62%, which is significantly better than randomly expected (28%). Syndrome masks on top show the average appearance of the disorder, while photos on the left show instances of individuals featuring the respective disorder. The dendrogram is the result of a clustering analysis and visualizes the similarity of the disorders
Fig. 4(a) Confusion matrix with TPRs and FPRs with a cohort size of n = 40. (b) Course of TPRs with increasing cohort size with linear regression. The performance of the classification process was evaluated for equally sized cohorts from n = 10 to n = 40. The true positive rates for the prediction of the disorder improve with increasing cohort size and seem to approach different limits, indicating a difference in relative distinguishability. Especially the prediction of SLOS and NCBRS benefit, when the classifier is trained on more cases. The inference of the correct lipid storage disorder increases less for larger cohort sizes
The classification of DS is more accurate on only European or African patients. These marked differences cannot be observed for ML, MPS I, MPS II, SLOS, and NCBRS. Also, the restriction to only male patients has only a minor effect on the performance. The difference of MCCs for the binary classification of every disease was normalized by the standard deviations of MCCs that were computed in the mixed controls
| pt. confounder | DS | pt. confounder | ML | MPS I | MPS II | SLOS | NCBRS | |
|---|---|---|---|---|---|---|---|---|
| CEU vs mixed: | 2.7 | CEU vs mixed: | −0.7 | 1.53 | −0.29 | 0.31 | 0.77 | |
| ΔΜCC-STD | AFR vs mixed: | 3.75 | male vs mixed: | 0.14 | 0.13 | 1.17 | 1.84 | 1.12 |