| Literature DB >> 34645491 |
Francisco M De La Vega1,2,3, Shimul Chowdhury4, Barry Moore5, Erwin Frise1, Jeanette McCarthy1, Edgar Javier Hernandez5, Terence Wong4, Kiely James4, Lucia Guidugli4, Pankaj B Agrawal6,7, Casie A Genetti6, Catherine A Brownstein6, Alan H Beggs6, Britt-Sabina Löscher8, Andre Franke8, Braden Boone9, Shawn E Levy9, Katrin Õunap10,11, Sander Pajusalu10,11, Matt Huentelman12, Keri Ramsey12, Marcus Naymik12, Vinodh Narayanan12, Narayanan Veeraraghavan4, Paul Billings1, Martin G Reese13, Mark Yandell14,15, Stephen F Kingsmore4.
Abstract
BACKGROUND: Clinical interpretation of genetic variants in the context of the patient's phenotype is becoming the largest component of cost and time expenditure for genome-based diagnosis of rare genetic diseases. Artificial intelligence (AI) holds promise to greatly simplify and speed genome interpretation by integrating predictive methods with the growing knowledge of genetic disease. Here we assess the diagnostic performance of Fabric GEM, a new, AI-based, clinical decision support tool for expediting genome interpretation.Entities:
Mesh:
Year: 2021 PMID: 34645491 PMCID: PMC8515723 DOI: 10.1186/s13073-021-00965-0
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Characteristics of case cohorts. Benchmark cohort, 119 cases total. Validation cohort, 60 cases total. Grand total, 179 cases
| Assay type | Variant type | Proband sex | Pedigree Type | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Mode of inheritance | WGS | WES | SNV/Indel | SV | Male | Female | Single | Duos | Trios |
| Autosomal dominant | 70 | 11 | 66 | 15 | 36 | 45 | 35 | 6 | 40 |
| Autosomal recessive | 27 | – | 23 | 4 | 14 | 13 | 9 | 1 | 17 |
| X-linked dominant | 6 | – | 5 | 1 | 1 | 5 | 2 | – | 4 |
| X-linked recessive | 5 | – | 5 | – | 5 | – | 2 | 1 | 2 |
| Sub-totals | 108 | 11 | 99 | 20 | 56 | 63 | 48 | 8 | 63 |
| Autosomal dominant | 3 | 34 | 37 | – | 10 | 27 | 15 | 2 | 20 |
| Autosomal recessive | 1 | 14 | 15 | – | 5 | 10 | 9 | – | 6 |
| X-linked dominant | 1 | 5 | 6 | – | 3 | 3 | 1 | 3 | 2 |
| X-linked recessive | – | 2 | 2 | – | 2 | – | 1 | – | 1 |
| Sub-totals | 5 | 55 | 60 | 0 | 20 | 40 | 26 | 5 | 29 |
Diagnostic structural variants identified by GEM in the benchmarking cohort (20 out of 119 cases). Structural variants are ranked by GEM based on the genes harbored by the variant and presented alongside other ranked genes with coding SNVs or small indels based on the top scored gene. The asterisk indicates genes that in the literature are candidates for the phenotype of the diagnostic disease/syndrome (as described in OMIM). The results show that GEM can analyze both deletions (del) and duplications (dup) of sizes as small as 4 kb and up to entire chromosome arms, diverse modes of inheritance, pedigree structure, and from either WGS or WES assay data. GEM also automatically identified compound heterozygotes between SVs and SNV/indels (cases 1, 2, and 8). Input SV calls can include breakpoint-based calls (here “SV”), or imprecise CNV calls based on read depth analysis. Notably, GEM can also infer SVs directly from the small variant data when external SV calls are not provided (cases 2, 10, 15, and 17), and score them appropriately, identifying diagnostic variants that in the original cases were found by microarrays and not by sequencing
| Case no. | Top scored gene(s) | Gene rank | GEM score | Variant(s) position | SV type | Length (kb) | Mode of Inheritance | Pedigree type | Assay type | SV calls in input | Diagnosis |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 252268 | FANCA* | 1 | 2.28 | chr16:89847864-89863349; FANCA: c.3788_3790delTCT | Del | 15 | Recessive | Trio | WGS | SV | Fanconi anemia |
| 223449 | TANGO2* | 1 | 2.13 | chr22:20028937-20057143; TANGO2: c.605+1G>A | Del | 28 | Recessive | Trio | WGS | None | MECRCN |
| 266523 | BTRC* | 1 | 2.05 | chr10:102941001-103430600 | Dup | 490 | Dominant | Duo | WGS | SV | Split hand/foot malformation type 3 |
| 267392 | HIRA, TBX1* | 1 | 2.05 | chr22:18893883-21562619 | Del | 2669 | Dominant | Single | WES | CNV | DiGeorge syndrome; velocardiofacial syndrome |
| 267148 | KMT2A | 1 | 1.87 | chr11:116691508-126432828; chr22:17038511-20307516 | Dup | 9741; 3269 | Dominant | Trio | WES | CNV | Emanuel syndrome |
| 253691 | HIRA, TBX1* | 1 | 1.73 | chr22:18893883-20307516 | Del | 1414 | Dominant | Single | WES | CNV | DiGeorge syndrome; velocardiofacial Syndrome |
| 256943 | MAGEL2* | 1 | 1.64 | chr15:22833478-28566610 | Del | 5733 | Dominant | Single | WES | CNV | Prader Willi syndrome |
| 254012 | NDUFS3* | 1 | 1.56 | chr11:47605229-47609177; NDUFS3: c.374G>A | Del | 4 | Recessive | Trio | WGS | SV | Leigh syndrome |
| 254728 | EPHA4 | 2 | 1.46 | chr2:220309089-224580863 | Del | 4272 | Dominant | Single | WGS | SV | Pathogenic deletion in 2q35q36.1 |
| 44671 | NPAP1 | 1 | 1.42 | chr15 tetrasomy (broken in multiple dups) | Dup | 4542; 991; 358; 158 | Dominant | Trio | WGS | None | Isodicentric chromosome 15 syndrome |
| 360547 | FREM1 | 1 | 1.33 | chr9:1-18477200 | Del | 18,437 | Dominant | Trio | WGS | SV | Chromosome 9p deletion syndrome |
| 259685 | TYROBP | 1 | 1.31 | chr19:23158251-33502767 | Dup | 10,345 | Dominant | Trio | WES | SV | Partial trisomy 19p12.q13.11 |
| 266700 | TAB2 | 1 | 1.31 | chr6:144951601-150260400 | Del | 5309 | Dominant | Trio | WGS | SV | Chromsome 6q24-q25 Syndrome |
| 244102 | MAGEL2* | 1 | 1.28 | chr15:23684685-26108259 | Del | 2424 | Dominant | Single | WES | CNV | Prader Willi syndrome |
| 204560 | JAG1* | 2 | 1.21 | chr20:10471400-13459333 | Del | 44 | Dominant | Trio | WGS | None | Alagille syndrome |
| 246146 | HCN1 | 1 | 1.20 | chr5:213101-46,270,700 | Dup | 44 | Dominant | Single | WGS | SV | Trisomy 5p |
| 45020 | PCDH19* | 1 | 1.15 | chrX:92925011-99669272 | Del | 6744 | X-linked dominant | Trio | WGS | None | Developmental and epileptic encephalopathy 9 |
| 248678 | FANCC* | 1 | 1.14 | chr9:97998556-98009092 | Del | 11 | Recessive | Single | WGS | SV | Fanconi Anemia |
| 352726 | THRA | 1 | 1.00 | chr17:32147833-79020944 | Dup | 46,873 | Dominant | Proband | WGS | SV | Distal trisomy 17q |
| 251355 | TRIP11 | 4 | 0.58 | chr14:84783523-96907490 | Del | 12,124 | Dominant | Duo | WGS | SV | Chromosome 14q31.2q32.2 Syndrome |
Fig. 1The diagnostic sensitivity of GEM was greater than the variant prioritization methods Phevor, Exomiser, and VAAST. A Proportion of the benchmark cohort of 119 cases where the true causal genes (or variants in the case of causal SVs) were identified among the top 1st, 2nd, 5th, or 10th gene candidates. Patient phenotypes were extracted manually from medical records by clinicians and provided as HPO term inputs to GEM, Exomiser, and Phevor. VAAST only considers variant information. It should be noted that GEM and Phevor ranks correspond to genes, which may include one or two variants (the latter in the case of a compound heterozygote), whereas Exomiser and VAAST ranks were for single variants. In the case of compound heterozygotes, the rank of the top-ranking variant is shown for Exomiser and VAAST. B Comparison of GEM performance in the validation cohort (excluding SV cases) versus the validation cohort (comprised of 60 rare genetic disease cases from multiple sources)
Fig. 2Comparison of GEM performance with manually curated and CNLP-derived HPO terms in the benchmark cohort. Distribution of ranks for causal genes (A); GEM Bayes factors for causal genes (B); and number of candidates (hits) at BF ≥ 0.69 threshold (moderate support) (C). The black line in the graphs denotes the median. The asterisks represent statistical difference between the groups with p < 0.0001 from a two-tailed Wilcoxson matched pairs signed rank test (ranks showed no statistically significant difference)
Fig. 3Impact of missing data and mis-phenotyping on GEM performance in the benchmark cohort. Causal gene rank (A); Bayes factors for causal genes (B); and number of candidates (hits) above gene BF ≥ 0.69 threshold (moderate support) (C) under standard conditions, withdrawing ClinVar information, and permuting HPO terms extracted by CNLP. The black line in the graphs denotes the median
Fig. 4Comparative performance of parent-offspring trios or duos vs. singleton probands in the benchmark cohort. Causal gene rank (A); Bayes factors (B); and number of candidates (hits) above gene BF ≥ 0.69 (moderate support) (C) for 63 cases analyzed as parent-offspring trios (n = 59) or duos (n = 4), as compared with analysis as single probands, using both manually curated or CNLP-derived HPO terms. The black line in the graphs denotes the median. No statistically significance difference between the any manual/CNLP groups was found between trios versus single probands using the two-tailed Wilcoxson matched pairs signed rank test
Fig. 5Trade-off between GEM gene scores, maximal true positive rates, and number of candidates for review in the benchmark cohort. GEM gene scores are Bayes factors (BF) that can be used speed case review. A Gene maximal true positive rate achieved at the different BF thresholds (Y-axis). B Median number of candidate genes for review at each BF threshold. As the BF threshold is decreased, true positive rate increases, while the number of candidates to review remains manageable. Input HPO terms for this analysis were extracted by CNLP
Fig. 6Performance of GEM condition match scores for diagnostic nomination in the benchmark cohort. A Ranks for reported diagnostic conditions for the benchmark dataset, using a GEM gene BF score ≥ 0.69 and sorted by CM score, for HPO terms derived from CNLP or manual curation. B Receiver-operator characteristic curves for the condition match (CM) score for all hits with BF ≥ 0. CNLP All: HPO extracted from clinical notes by CNLP; AUC = 0.91. Manual: Manually curated HPO terms; AUC = 0.88. CNLP Multiple Dx: CNLP-derived CM score for the true positive disorder versus the other possible disorders associated with that gene; AUC = 0.68. Manual Multiple Dx: As for CNLP-derived CM but using manually curated HPO terms; AUC = 0.69
Previously undiagnosed cases with potential leads. Cases with hits with a GEM gene score BF > 1.5. Zygo zygosity, Hom homozygous, Het heterozygous, Dup large duplication
| Case | Pedigree | Sex | Assay | Rank | Chr | Gene | Variant Type | Variant ACMG | De novo | Zygo | GEM score | Mode of inheritance | MIM ID(s) | CM score(s) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 244799 | Single | Male | WGS | 1 | 14 | SRP54 | SNV | Uncertain significance | Likely | Het | 1.76 | Dominant | 618752, 260400 | 0.672, 0.893 |
| 245237 | Trio | Male | WGS | 2 | X | GK | SNV | VUS | Yes | Het | 1.60 | X-linked recessive | 307030 | 1.119 |
| 245237 | Trio | Male | WGS | 3 | 16 | FANCA | SNV | VUS | No | Hom | 1.55 | Recessive | 227650 | 1.315 |
| 245768 | Single | Male | WGS | 1 | 16 | TSC2 | Dup | VUS | Likely | Het | 1.64 | Dominant | N/A | N/A |
| 247458 | Single | Male | WGS | 1 | 1 | SLC25A24 | SNV | VUS | Likely | Het | 1.86 | Dominant | 612289 | 1.995 |
| 247963 | Trio | Female | WGS | 1 | X | STAG2 | SNV | Likely pathogenic | Yes | Het | 1.53 | X-linked dominant | 301022 | 1.25 |