| Literature DB >> 34754064 |
Nicole M Thomasian1, Ihab R Kamel2, Harrison X Bai3.
Abstract
Artificial intelligence (AI) has illuminated a clear path towards an evolving health-care system replete with enhanced precision and computing capabilities. Medical imaging analysis can be strengthened by machine learning as the multidimensional data generated by imaging naturally lends itself to hierarchical classification. In this Review, we describe the role of machine intelligence in image-based endocrine cancer diagnostics. We first provide a brief overview of AI and consider its intuitive incorporation into the clinical workflow. We then discuss how AI can be applied for the characterization of adrenal, pancreatic, pituitary and thyroid masses in order to support clinicians in their diagnostic interpretations. This Review also puts forth a number of key evaluation criteria for machine learning in medicine that physicians can use in their appraisals of these algorithms. We identify mitigation strategies to address ongoing challenges around data availability and model interpretability in the context of endocrine cancer diagnosis. Finally, we delve into frontiers in systems integration for AI, discussing automated pipelines and evolving computing platforms that leverage distributed, decentralized and quantum techniques.Entities:
Mesh:
Year: 2021 PMID: 34754064 PMCID: PMC8576465 DOI: 10.1038/s41574-021-00543-9
Source DB: PubMed Journal: Nat Rev Endocrinol ISSN: 1759-5029 Impact factor: 43.330
Fig. 1Integrative diagnostics.
The convergence of different omics data with clinical intuition. Endocrinologists communicate with patients and radiologists to gain a clinical overview of their patient. Four arms give a holistic overview of disease: radiomics (for example, CT or MRI), pathomics (for example, histology of tissue samples), genomics and phenomics (for example, digital health mobile phone applications and wearable trackers). An artificial intelligence algorithm (such as a deep neural network, seen in the centre) synthesizes all the information and provides a diagnostic classification.
Fig. 2Computer vision workflow.
The four main steps in the conventional machine learning workflow are image acquisition, segmentation, feature extraction and analysis, or feature selection. Segmentation involves determining the region of interest of the image and feature extraction identifies pixel features that are then graphically analysed. A radiomics signature is the final output. One can also use either machine learning or deep learning for feature extraction and engineering, including the identification of pixel intensity, lesion shape, texture feature matrices and wavelets. Conventional machine learning algorithms must respect this pathway of acquisition, segmentation, feature extraction and feature selection. By contrast, deep learning can circumvent this process altogether with end-to-end processing from inputs to outputs.
Examples of artificial intelligence techniques in endocrine cancer imaging
| Model | Description | Highlighted applications (not exhaustive) |
|---|---|---|
| SVM | A machine learning model that finds a ‘hyperplane’ or decision boundary to separate data of one class from another | SVMs are widely used for classification as a stand-alone approach[ |
| Random forest | A machine learning model made up of decision trees that classify using the combined predictions of trees in the ‘forest’ | Random forest classifiers, similar to SVMs, are also commonly used for extraction[ |
| k-means | A machine learning technique where the number of clusters is specified and the model partitions the data into non-overlapping groups | k-means can be used for unstructured image data processing such as automated image detection and annotation; for example, they have been used for thyroid nodule segmentation on mobile devices[ |
| ANN | A model class designed to mimic the structure and behaviour of neurons in the brain with layers of nodes that activate based on inputs | ANNs are well suited for pattern recognition, making them good candidates for feature selection; for example, they have been used on MRI-based classification of malignant and benign adrenal masses[ |
| CNN | A deep neural network composed of layers that perform operations to sequentially abstract image features, followed by fully connected layers containing probability distributions for classification; some common subtypes include AlexNet, VGG, GoogLeNet, ResNet and U-Net, among others | CNNs excel at image feature learning and have been utilized in thyroid and pancreatic neuroendocrine tumour diagnostics, for example[ |
| SAE | Two-layered deep neural networks that learn by reducing and reconstructing input data | SAE is an unsupervised technique that has the potential to improve efficiency in data pre-processing; for example, SAEs have been utilized for multi-organ detection and segmentation on 3D and 4D dynamic contrast-enhanced MRI[ |
| GAN | A type of CNN with two neural networks pitted against each other using a generative network, which produces synthetic samples based on input data to fool a discriminator that tries to differentiate between the real and synthetic data | We can envision GANs as a workaround to low volume data in rare adrenal cancers via synthetic data generation; their use has been demonstrated in thyroid nodule analysis and in consistency determinations of the pituitary and endocrine pancreas[ |
ANN, artificial neural network; CNN, convolutional neural network; GAN, generative adversarial network; SAE, stacked auto-encoder; SVM, support vector machine.
Fig. 3A convolutional neural network.
The input is a medical image to which an overlaying grid and a kernel matrix (for example, 3 × 3) are applied. The matrix feature maps to a smaller area on a stacked convolution layer. Another smaller kernel matrix (for example, 2 × 2) is pulled from a different area on that convolutional layer to a pooling layer. This pipeline then coalesces into a classification region with the ‘fully connected’ layers, which will yield an output.
Key evaluation metrics for computer vision applications in medicine
| Development phase | Principle | Description | |
|---|---|---|---|
| Data management | Pre-processing | Acquisition | Studies should disclose protocols used to obtain medical images as these can be different across institutions; variations in imaging machines, positioning, image capture and slicing, and data formats can limit generalizability; augmentation of acquisition protocols through automation can improve standardization |
| Segmentation | Refers to the process of making images machine-readable through annotation of ROIs, which can be performed manually or automatically; protocols can be subject to inter-observer and inter-study variability (such as whole tumour versus axial ROIs) | ||
| Heterogeneity | Refers to the sample data mix; ideally, data would include a multi-institutional and representative set of experimental and control images with both typical and atypical cases; publishing the data distributions for pathologies or demographics included in model training can help to mitigate these concerns | ||
| Size | With increasing dimensionality, models need more data for generalization; researchers can use model-specific or post hoc thresholds in performance cut-offs on validation to ‘power’ their studies but these processes are variable in practice; sample size determination practices should be reported in research studies; future work should assess for possible best practices in post hoc techniques for sample size determination | ||
| Training | Reference standard | A degree of uncertainty exists in the ground truth condition in clinical diagnosis, although sample biopsy tends to be the gold standard in cancer diagnostics; however, diagnosis by a specific biomarker, imaging finding or clinical criteria might be more appropriate to the clinical question and/or institutional resources; however, the establishment of uniform reference standards for endocrine neoplasms is needed in cases where biopsy is not routinely obtained such as in small adrenal or pituitary masses | |
| Data separation | Failure to separate training and validation sets is discouraged as it limits the generalizability of findings | ||
| Testing and/or validation | Performance: efficacy (diagnostic performance); safety (potential untoward effects on overall patient health or well being); fairness (equitable algorithm performance across populations) | An expert radiologist comparison can be used to infer the clinical relevance of algorithm performance; retrospective and prospective experimental designs are typically used, with prospective studies less prone to memory bias (internal test sets) and selection bias; in algorithms intended for autonomous use in diagnosis or other high-risk applications, randomized clinical trials might be warranted to assess for efficacy, safety and fairness | |
| Implementation and quality control | Generalizability | Institutions should assess how algorithms perform in their respective clinical populations; ideally, all studies would be tested on a distinct, external dataset prior to implementation to infer generalizability; baseline variation in radiologist skill level across institutions can muddy comparisons; drawing from experts across different institutions as well as including a consensus agreement on ‘highly experienced’ expert level in existing reporting guidelines could help in assessments of model generalizability | |
| Longevity | Model performance has the potential to degrade over time due to changing health infrastructure, cyber sabotage or shifts in population characteristics over time; continued performance auditing across the algorithms life cycle is indicated | ||
| Utility | The number of algorithms being developed to assist clinical diagnostics is exploding to the point where it can constrain bandwidth, clutter interfaces and overwhelm providers; moving forward, there will be a need for inventories of models that can guide clinical stewardship efforts to curtail their excessive use | ||
ROIs, regions of interest.
Major studies in AI imaging for endocrine cancer diagnostics
| Study and yeara | Task | Modality | Model type or package | Study design | Training data size; test data size | External testing | Compared with expert? | Reference standard |
|---|---|---|---|---|---|---|---|---|
| Romeo et al.[ | Classify LRA vs LPA vs NAL | MRI | J48 (Weka) | Retrospective | 80 | No | Yes | LRA and LPA by imaging; NAL by pathology |
| Yi et al.[ | Classify LPA vs sPHEO | CT | LASSO | Retrospective | 212; 53 | No | No | Pathology |
| Yi et al.[ | Classify LPA vs sPHEO | CT | MaZda-B11 | Retrospective | 110 | No | No | Pathology |
| Barstugan et al.[ | Classify lesion benign vs malignant and by type (AA vs cyst vs LP vs MET) | MRI | SVM, ANN | Retrospective | 112 | No | No | Imaging |
| Elmohr et al.[ | Classify benign vs malignant large adrenal tumours | CT | RF | Retrospective | 54 | No | Yes | Pathology |
| Koyuncu et al.[ | Classify benign vs malignant (AA, haematoma, LP, PHEO, MET) | CT | ANN | Retrospective | 57; 57 | No | No | Imaging |
| Choi et al.[ | Predict grade G1 vs G2 or G3 PNET | CT | MISSTA package | Retrospective | 66 | No | Yes | Pathology |
| Gao and Wang[ | Predict grade G1 vs G2 vs G3 PNET | MRI | GAN, CNN | Retrospective | NR ( | Yes | No | Pathology |
| Gu et al.[ | Predict grade G1 vs G2 or G3 PNET | CT | RF | Retrospective | 104; 34 | No | No | Pathology |
| Liang et al.[ | Predict grade G1 vs G2 or G3 PNET | CT | LASSO | Retrospective | 86; 51 | Yes | No | Pathology |
| Luo et al.[ | Predict grade G1 or G2 vs G3 PNET | CT | CNN | Retrospective | 93; 19 | Yes | No | Pathology |
| Zhao et al.[ | Predict grade G1 vs G2 PNET | CT | SVM | Retrospective | 59; 40 | No | No | Pathology |
| Kitajima et al.[ | Differentiate PA vs CP vs RCC | MRI | ANN | Retrospective | 43 | No | Yes | Pathology |
| Zhang et al.[ | Differentiate NCA vs other NFPA | MRI | SVM | Retrospective | 75; 37 | No | No | Pathology |
| Fan et al.[ | Differentiate PA consistency | MRI | SVM | Prospective | 100; 58 | Yes | No | Clinical criteria and surgical video |
| Niu et al.[ | Preoperative prediction of PA cavernous sinus invasion | MRI | LASSO, SVM | Retrospective | 97; 97 | No | No | Surgeon postoperation evaluation |
| Qian et al.[ | Differentiate PA vs other (sellar lesions or healthy) | MRI | CNN | Retrospective | 5,164; 1,393 | No | No | Clinical diagnosis |
| Zhu et al.[ | Differentiate PA consistency | MRI | GAN, CNN, CRNN | Retrospective | 70%; 30%; ( | No | No | Imaging |
| Zhu et al.[ | Differentiate benign vs malignant | US | ANN | Retrospective | 464; 225 | No | No | Pathology |
| Buda et al.[ | Differentiate benign vs malignant | US | CNN | Retrospective | 1,278; 99 | No | Yes | Pathology |
| Li et al.[ | Differentiate benign vs malignant | US | CNN | Retrospective | 312,399; 19,781 | Yes | Yes | Pathology |
| Song et al.[ | Detect and differentiate benign vs malignant | US | CNN | Retrospective | 6,228; 367 | Yes | Yes | Pathology |
| Song et al.[ | Differentiate benign vs malignant | US | CNN | Prospective | 1,358; 100 | Yes | No | Pathology |
| Wang et al.[ | Predict aggressive[ | MRI | LASSO, GBC | Prospective | 96; 24 | No | No | Pathology |
AA, adrenal adenoma; AI, artificial intelligence; ANN, artificial neural network; CA, carcinoma; CNN, convolutional neural network; CP, craniopharyngioma; CRNN, convolutional recurrent neural network; G, grade; GAN, generative adversarial network; GBC, Gradient Boosting Classifier; LASSO, least absolute shrinkage and selection operator; LPA, lipid-poor adenoma; LRA, lipid-rich adenoma; LP, lipoma; MET, metastases; MISSTA, Medical Imaging Solution for Segmentation and Texture Analysis; NAL, non-adenomatous lesion; NCA, null cell adenoma; NFPA, non-functioning pituitary adenoma; NR, not reported; PA, pituitary adenoma; PHEO, pheochromocytoma; PNET, pancreatic neuroendocrine tumour; RF, random forest; RCC, Rathke cleft cyst; sPHEO, subclinical phaeochromocytoma; SVM, support vector machine; US, ultrasonography. aThis list is not exhaustive and provides a selection of key studies.
Fig. 4Real-time analytics with automatic picture archiving and communications systems integration.
The system named DICOM Image Analysis and Archive (DIANA) is an automated workflow solution developed by the authors’ group that provides a programming interface with the hospital picture archiving and communications systems (PACS) to streamline clinical artificial intelligence (AI) research[176]. DIANA has facilitated near-real-time monitoring of acquired images, large data queries and post-processing analyses. More importantly, DIANA is integrated with the machine learning algorithms developed for various applications. The future goal is to integrate AI endocrine cancer diagnostics (such as adrenal adenoma and pituitary adenoma) in this or other systems. HTTP, hypertext transfer protocol; PHI, protected health information. Figure 4 is adapted from ref.[176], Springer Nature Limited.
Fig. 5Exploring alternative computing platforms.
Centralized, distributed, decentralized and quantum computing frameworks are shown. The centralized network panel has a node with spokes spreading outward that represents a single, consolidated platform such as a local (on-site data centre) or remote (Cloud) server. The distributed network panel shows a net-like pattern of equally spaced nodes and such a platform with multiple local servers or devices can be used for collaborative model training techniques like cyclic weight transfer. The decentralized network panel has multiple centralized nodes connected in a net-like pattern and federated learning is one training paradigm that uses this platform. Previous studies[177,178] depicted quantum networks as two nodes with the cutout region between nodes illustrating the induction of dependent quantum states among two particles (A and B, where S refers to a shared source of squeezed light) and this particle ‘entanglement’ is at the crux of quantum communications. Adapted from ref.[177], Springer Nature Limited. The quantum network is reprinted from ref.[178], CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/).