| Literature DB >> 36068205 |
Yaoting Sun1,2,3, Sathiyamoorthy Selvarajan4, Zelin Zang5, Wei Liu6, Yi Zhu1,2,3, Hao Zhang7, Wanyuan Chen8, Hao Chen6, Lu Li1,2,3, Xue Cai1,2,3, Huanhuan Gao1,2,3, Zhicheng Wu1,2,3, Yongfu Zhao9, Lirong Chen10, Xiaodong Teng11, Sangeeta Mantoo4, Tony Kiat-Hon Lim4, Bhuvaneswari Hariraman12, Serene Yeow13, Syed Muhammad Fahmy Alkaff4, Sze Sing Lee13, Guan Ruan6, Qiushi Zhang6, Tiansheng Zhu1,2,3, Yifan Hu6, Zhen Dong1,2,3, Weigang Ge6, Qi Xiao1,2,3, Weibin Wang14, Guangzhi Wang9, Junhong Xiao9, Yi He15, Zhihong Wang7, Wei Sun7, Yuan Qin7, Jiang Zhu16, Xu Zheng17, Linyan Wang18, Xi Zheng19, Kailun Xu19, Yingkuan Shao19, Shu Zheng19, Kexin Liu20, Ruedi Aebersold21,22, Haixia Guan23, Xiaohong Wu24, Dingcun Luo25, Wen Tian26, Stan Ziqing Li27,28, Oi Lian Kon29, Narayanan Gopalakrishna Iyer30,31, Tiannan Guo32,33,34.
Abstract
Determination of malignancy in thyroid nodules remains a major diagnostic challenge. Here we report the feasibility and clinical utility of developing an AI-defined protein-based biomarker panel for diagnostic classification of thyroid nodules: based initially on formalin-fixed paraffin-embedded (FFPE), and further refined for fine-needle aspiration (FNA) tissue specimens of minute amounts which pose technical challenges for other methods. We first developed a neural network model of 19 protein biomarkers based on the proteomes of 1724 FFPE thyroid tissue samples from a retrospective cohort. This classifier achieved over 91% accuracy in the discovery set for classifying malignant thyroid nodules. The classifier was externally validated by blinded analyses in a retrospective cohort of 288 nodules (89% accuracy; FFPE) and a prospective cohort of 294 FNA biopsies (85% accuracy) from twelve independent clinical centers. This study shows that integrating high-throughput proteomics and AI technology in multi-center retrospective and prospective clinical cohorts facilitates precise disease diagnosis which is otherwise difficult to achieve by other methods.Entities:
Year: 2022 PMID: 36068205 PMCID: PMC9448820 DOI: 10.1038/s41421-022-00442-x
Source DB: PubMed Journal: Cell Discov ISSN: 2056-5968 Impact factor: 38.079
Fig. 1Schematic view of the study and clinic-pathologic characteristics.
a The project design and workflow of the FFPE-PCT-DIA pipeline. b Clinic-pathologic characteristics of the study cohorts.
Fig. 2Global thyroid proteome profile.
a Heatmap showing protein expression profiles of 579 thyroid tissue specimens from 578 patients. 5312 proteins (rows) are clustered without supervision. Samples (columns) are ordered based on the tissue types. The color indicates the log2-scaled intensity of each protein in each sample. b–f UMAP plots showing global snapshots comparing the indicated types of thyroid tissues using 5312 proteins for all subtypes (b); benign vs malignant (c); only benign (d); FA vs FTC (e); and only malignant (f) tissue types.
Fig. 3Classifier development, performance testing, and validation in independent blinded datasets.
a Schematic workflow of the classifier development. Protein features were prioritized based on the discovery dataset. The model was trained using 19 proteins selected from the discovery dataset and further validated in test datasets. More details are described in Materials and Methods. b The importance rank of the selected 19 protein features was interpreted by SHapley Additive exPlanations (SHAP) algorithm. c Protein abundance distribution of the 19 features. d Network of the 19 proteins. Blue nodes and orange nodes indicate the protein features and connected molecules or pathways, respectively. Direct interactions are in solid lines and indirect interactions are in dash lines. e ROC plots of seven different machine learning models of 19 selected features. f ROC plots of the discovery set, retrospective test sets, prospective test sets and Bethesda III and IV samples in the prospective test sets. g UMAP plots showing the separation between benign and malignant groups in the retrospective and prospective test sets using 19 protein features with latent space. h Overall performance metrics of prediction of the neural network model for five specific histopathological types per set. Graduated colors in the shaded bar indicate accuracy levels. Numbers in the boxes indicate the number of correctly identified samples/total sample number. HCA and HCC were assigned as FA and FTC, respectively. i Sankey diagram showing the distribution ratio and correspondence between histopathology and cytopathology in the prospective sets. Histopathological type L denotes lymphocytic thyroiditis. Cytopathology scores were assigned by specialized pathologists using the Bethesda System. TP, TN, FP, and FN indicate true positive, true negative, false positive, and false negative, respectively, of the results predicted by our classifier model.
Nineteen proteins selected by genetic algorithm and previously known associations with thyroid physiology or pathology.
| Uniprot ID | Gene name | Protein name | Thyroid cancer related | Thyroid function related |
|---|---|---|---|---|
| P04083 | Annexin A1 | Yes | Yes | |
| P17931 | Galectin-3 | Yes | Yes | |
| P02751 | Fibronectin (FN) | Yes | Yes | |
| P10909 | Clusterin | Yes | Yes | |
| P00568 | Adenylate kinase isoenzyme 1 (AK1) | Yes | Yes | |
| P42224 | Signal transducer and activator of transcription 1-alpha/beta | Yes | Yes | |
| P30086 | Phosphatidylethanolamine-binding protein 1 | Yes | Yes | |
| P27797 | Calreticulin | Yes | Yes | |
| P78527 | DNA-dependent protein kinase catalytic subunit | Yes | Yes | |
| O00339 | Matrilin-2 | Yes | – | |
| P02765 | Alpha-2-HS-glycoprotein | Yes | – | |
| P04792 | Heat shock protein beta-1 | Yes | – | |
| O75347 | Tubulin-specific chaperone A | – | Yes | |
| P04216 | Thy-1 membrane glycoprotein | – | Yes | |
| Q9HAT2 | Sialate | – | – | |
| O14964 | Hepatocyte growth factor-regulated tyrosine kinase substrate | – | – | |
| P58546 | Myotrophin | – | – | |
| P83731 | 60 S ribosomal protein L24 | – | – | |
| P57737 | Coronin-7 | – | – |
Fig. 4Protein expression plots for 19 selected protein features in the five histotypes of thyroid tissues in the discovery cohort.
a The plots showing the abundance distribution of 5312 proteins and 19 selected features. b y-axis shows log2 values of protein expression intensity, and x-axis indicates tissue types. P-value was calculated by one-way ANOVA.
Fig. 5Biological insights of thyroid tumor subtypes based on proteotypic data.
a Rose chart plotting the DEP counts of corresponding pairwise comparison for follicular-pattern tumors and control samples (cPTC). The threshold that we used was fold change > 4 and adjusted P-value < 0.01. The pink and blue colors represent counts of upregulated and downregulated proteins in the Rose chart, respectively. b Box plots showing CRABP1 and NAMPT dysregulated in six histological tumor subtypes, especially between FTC and FA. P-values were calculated by one-way ANOVA for six-group comparison in the box plots. c UMAP plot for 186 proteins distinguishing Hürthle cell tumors from other follicular neoplasms. d Network map showing expression of key mitochondrial proteins implicated in Hürthle cell neoplasms. e UMAP plot for 401 proteins distinguishing FTC from cPTC, with fvPTC as an intermediate phenotype. f, g Heatmap showing DEPs (f) in FTC compared with fvPTC and cPTC, with pathways (g) indicated in the chord plot.