Literature DB >> 34754064

Machine intelligence in non-invasive endocrine cancer diagnostics.

Nicole M Thomasian¹, Ihab R Kamel², Harrison X Bai³.

Abstract

Artificial intelligence (AI) has illuminated a clear path towards an evolving health-care system replete with enhanced precision and computing capabilities. Medical imaging analysis can be strengthened by machine learning as the multidimensional data generated by imaging naturally lends itself to hierarchical classification. In this Review, we describe the role of machine intelligence in image-based endocrine cancer diagnostics. We first provide a brief overview of AI and consider its intuitive incorporation into the clinical workflow. We then discuss how AI can be applied for the characterization of adrenal, pancreatic, pituitary and thyroid masses in order to support clinicians in their diagnostic interpretations. This Review also puts forth a number of key evaluation criteria for machine learning in medicine that physicians can use in their appraisals of these algorithms. We identify mitigation strategies to address ongoing challenges around data availability and model interpretability in the context of endocrine cancer diagnosis. Finally, we delve into frontiers in systems integration for AI, discussing automated pipelines and evolving computing platforms that leverage distributed, decentralized and quantum techniques.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34754064 PMCID： PMC8576465 DOI： 10.1038/s41574-021-00543-9

Source DB: PubMed Journal: Nat Rev Endocrinol ISSN： 1759-5029 Impact factor: 43.330

Introduction

Artificial intelligence (AI) refers to any non-living entity that executes tasks typically requiring human intelligence[1]. Endocrinology stands to benefit greatly from the rise of AI, particularly in the realm of cancer diagnostics, where AI has the potential to facilitate enhanced diagnostic precision and improved workflows. Medical images are a mainstay of tumour diagnostics and they also serve as a reservoir of mineable pixel data that naturally lends itself to machine-based classification[2,3]. Computer vision (Box 1) applications are already leveraging this property to power robust diagnostic interpretations of endocrine neoplasms[4-7]. Although tissue pathology remains the gold standard in the diagnosis of many endocrine tumours, the macroscopic characterization of tissue in imaging studies can augment histological findings. Indeed, in some cases, a biopsy sample might not always reflect the intratumoural heterogeneity across genomic subclones[8,9]. Additionally, a biopsy is invasive and is subject to sampling error that can render it inconclusive. Computer vision can be leveraged in support of histological findings by inferring diagnosis from the structural heterogeneity observed within tumours on medical imaging[3]. Furthermore, given the high frequency with which medical imaging is performed in cancer management, archives of longitudinal medical imaging data can be used by computer vision applications to better characterize disease, predict progression at the time of diagnosis and monitor response to treatment[2]. The correlation of AI-driven image analytics with other omics data and clinical expertise can also be used to enable integrative approaches to care[10,11] (Fig. 1). Indeed, studies demonstrate that the mapping of genomics or pathomics (image features are extracted from pathology studies) data with radiomics (image features are extracted from radiology studies) data from medical imaging can illuminate conserved trends at different levels of human physiology, with implications for diagnosis and prognosis[12-14]. Furthermore, advances in machine intelligence have the potential to enable non-invasive endocrine cancer diagnostics that could preclude or limit the use of invasive biopsy[15]. Looking ahead, it will be important for both endocrinologists and radiologists to cultivate a working understanding of the utility and limitations of AI if the benefits of these technologies are to be realized.

Fig. 1

Integrative diagnostics.

Integrative diagnostics.

The convergence of different omics data with clinical intuition. Endocrinologists communicate with patients and radiologists to gain a clinical overview of their patient. Four arms give a holistic overview of disease: radiomics (for example, CT or MRI), pathomics (for example, histology of tissue samples), genomics and phenomics (for example, digital health mobile phone applications and wearable trackers). An artificial intelligence algorithm (such as a deep neural network, seen in the centre) synthesizes all the information and provides a diagnostic classification. In this Review, we highlight the ever-growing contributions of machine intelligence to the field of endocrine imaging diagnostics for tumours of the adrenal, pancreatic, pituitary and thyroid glands. Machine learning: a branch of artificial intelligence where algorithms can learn a task without explicit pre-programming. Computer vision: the use of artificial intelligence for image or other digital media analysis. Supervised learning: a training technique that uses labelled inputs and asks the algorithm to identify how the relevant features from that data map to each respective label. Unsupervised learning: a training technique that uses unlabelled inputs and lets the model adjudicate the data into clusters or associations. Reinforcement learning: a training technique where the model interacts with its environment through actions that are each tied to a value reward. Deep learning: a subset of machine learning algorithms that process data in networks of abstracted layers to learn, usually via sequential transformations of the data. Continuous learning: an open training state whereby models can modify their architectures in real time. Data augmentation: a process to generate synthetic data that involves slight transformations in the training images. Transfer learning: a technique that uses large and diverse datasets to prime models prior to training with the limited target dataset. Segmentation: the process of making images machine-readable through annotation of regions of interest. Region of interest: demarcation of areas relevant to the classification decision-making process. Texture features: quantifiable patterns of pixels in the medical images, many of which are not visible to the naked eye. Feature engineering: extraction of features from the data space is guided by domain knowledge, a process that deep learning can bypass through automated feature probing. Feature selection: an analytic process in which a subset of the total pool of extracted features is selected for incorporation to the model. Overfitting: a phenomenon that occurs when the model too closely maps to features in the training data resulting in poor generalizability. Generalizability: model performance on real-world patient populations outside the study data used to develop the model. Backpropagation: a training paradigm often used to develop neural networks where the weights of neurons are repeatedly tuned based on the error rate in the previous cycle through the training dataset. Picture Archiving and Communications Systems: a system for multi-modal (such as MRI, CT, X-ray and ultrasound) medical image storage and transfer using a universal digital imaging and communications in medicine (DICOM) file format. Cloud computing: a network architecture that performs data operations using a remote, centralized server. Decentralized and distributed computing: network architectures that perform data operations using multiple local or non-centralized servers. Federated learning: an ensemble training strategy where gradient information from models trained locally in parallel is loaded on a central server to develop a single consensus model; it does not require the transfer of patient data. Quantum computing: a network architecture that leverages the properties of atomic and subatomic particles to improve the computational efficiency of conventional algorithms or to develop new learning paradigms.

Understanding AI

The definition of AI is broad and encompasses a variety of approaches that bridge the natural, applied and social sciences. Examples of tasks in medicine that can leverage AI include image interpretation[16,17], disease forecasting[18], genomics[19,20], natural language processing[21,22] and therapeutic discovery[23], among others. We review key concepts in machine learning and deep learning in more detail in this section, both of which fall under the larger umbrella of AI.

Machine learning

Machine learning algorithms can be distinguished from conventional statistical models by their ability to learn without explicit pre-programming[24]. Therefore, machine learning has the potential to reduce coding effort by researchers. Furthermore, task performance can be improved using rules gleaned from examples in the data rather than from those in prewritten code. This data-driven process also confers an advantage to machine learning in terms of adaptability, whereby algorithms can be configured to update in real time to continuously reflect new data. Machine learning algorithms are often used to assist with diagnostics in medical imaging, which comprise an excellent source of large volume, multidimensional data. Of note, medical image pixel data contains features that are not apparent to the human eye, which can be extracted using radiomics methods[25]. The intuitive coupling of machine learning to the field of radiomics has been used to enhance diagnostic performance and to automate workflow. The traditional radiomics workflow moves in a stepwise fashion from image acquisition, to segmentation (Box 1), feature extraction and feature analysis, which ultimately yields a radiomics signature (Fig. 2). Image acquisition begins the workflow, with image capture followed by file conversion to achieve digital workflow compatibility for subsequent data processing. Next, segmentation is performed to delineate tumour regions of interest (ROIs) (Box 1), after which feature extraction is used to harvest quantitative pixel features. Following this step, feature analysis is used to determine the most robust and generalizable features for inclusion in the final model. This selection process prevents overfitting (Box 1), a phenomenon that occurs when the model too closely maps to features in the training data, resulting in poor generalizability[26] (Box 1). These steps can be performed by hand; however, the benefit of machine learning algorithms is that they can be used to semi or fully automate this process for improved efficiency and detail. Some examples of machine learning algorithms include support vector machines, random forest and k-nearest neighbour (Table 1). Next, we cover three prominent training methodologies in machine intelligence: supervised, unsupervised and reinforcement learning.

Fig. 2

Computer vision workflow.

The four main steps in the conventional machine learning workflow are image acquisition, segmentation, feature extraction and analysis, or feature selection. Segmentation involves determining the region of interest of the image and feature extraction identifies pixel features that are then graphically analysed. A radiomics signature is the final output. One can also use either machine learning or deep learning for feature extraction and engineering, including the identification of pixel intensity, lesion shape, texture feature matrices and wavelets. Conventional machine learning algorithms must respect this pathway of acquisition, segmentation, feature extraction and feature selection. By contrast, deep learning can circumvent this process altogether with end-to-end processing from inputs to outputs.

Table 1

Examples of artificial intelligence techniques in endocrine cancer imaging

Model	Description	Highlighted applications (not exhaustive)
SVM	A machine learning model that finds a ‘hyperplane’ or decision boundary to separate data of one class from another	SVMs are widely used for classification as a stand-alone approach[102] or following conventional and/or deep learning feature extraction
Random forest	A machine learning model made up of decision trees that classify using the combined predictions of trees in the ‘forest’	Random forest classifiers, similar to SVMs, are also commonly used for extraction[74] or optimization in feature engineering pipelines
k-means	A machine learning technique where the number of clusters is specified and the model partitions the data into non-overlapping groups	k-means can be used for unstructured image data processing such as automated image detection and annotation; for example, they have been used for thyroid nodule segmentation on mobile devices[169]
ANN	A model class designed to mimic the structure and behaviour of neurons in the brain with layers of nodes that activate based on inputs	ANNs are well suited for pattern recognition, making them good candidates for feature selection; for example, they have been used on MRI-based classification of malignant and benign adrenal masses[55]
CNN	A deep neural network composed of layers that perform operations to sequentially abstract image features, followed by fully connected layers containing probability distributions for classification; some common subtypes include AlexNet, VGG, GoogLeNet, ResNet and U-Net, among others	CNNs excel at image feature learning and have been utilized in thyroid and pancreatic neuroendocrine tumour diagnostics, for example[4,170]; although CNNs were designed for 2D images, studies have also indicated a potential role for their use in 3D imaging such as with volume reconstruction on thyroid ultrasound[171]
SAE	Two-layered deep neural networks that learn by reducing and reconstructing input data	SAE is an unsupervised technique that has the potential to improve efficiency in data pre-processing; for example, SAEs have been utilized for multi-organ detection and segmentation on 3D and 4D dynamic contrast-enhanced MRI[87,172]
GAN	A type of CNN with two neural networks pitted against each other using a generative network, which produces synthetic samples based on input data to fool a discriminator that tries to differentiate between the real and synthetic data	We can envision GANs as a workaround to low volume data in rare adrenal cancers via synthetic data generation; their use has been demonstrated in thyroid nodule analysis and in consistency determinations of the pituitary and endocrine pancreas[75,105,151]; GANs also have utility in pre-processing such as via automatic MRI protocol and artefact reduction

ANN, artificial neural network; CNN, convolutional neural network; GAN, generative adversarial network; SAE, stacked auto-encoder; SVM, support vector machine.

Computer vision workflow.

Supervised

Supervised learning (Box 1) uses labelled inputs and asks an algorithm to identify how the relevant features from a dataset map to each respective label[26]. For example, let us say we are trying to differentiate between benign or malignant thyroid nodules using quantitative pixel features extracted from an ultrasonography study that represent nodule texture. Our labels here are ‘benign’ and ‘malignant’ and our inputs are texture representations, such as pixel correlation or entropy. During model training, the machine learning algorithm studies the texture features (Box 1) of benign and malignant images to develop and refine its decision-making process. Conceptually, the goal of supervised techniques is to correctly classify unlabelled data into the pre-defined categories used during model training. In this hypothetical example, we feed the algorithm feature data from unlabelled scans and we want it to tell us if the imaging findings are benign or malignant. The model is supervised in the sense that its programmer shows the algorithm correct examples to guide the learning process.

Unsupervised

In contrast to supervised learning, unsupervised techniques (Box 1) use unlabelled inputs and let the model adjudicate the data into groups. Revisiting our previous example of the thyroid nodule, we could build a model where pixel data of texture features from scans of patients with unconfirmed or borderline diagnoses are used as the unlabelled inputs. In what is an oversimplification, we could imagine the model ‘plotting’ the imaging data based on common features. Doing so enables the algorithm to identify clusters in the data, which might or might not translate to a substantive interpretation. Critically, the algorithm decides what is important when plotting the data. In the supervised learning example, we were looking to classify thyroid nodules as either benign or malignant. In this unsupervised scenario, the data could cluster any which way. For example, the data might triangulate to ‘coordinates’ or groups for different types of nodules as intended or it could group by background noise. In this way, the unsupervised learning model can potentially elucidate trends that the investigator had not originally set out to find, arguably the greatest strength and weakness of this technique. Unsupervised techniques can also be leveraged for augmenting imaging workflows in the annotation and pre-processing of unlabelled data[27,28]. Again, a critical conceptual distinction between supervised and unsupervised learning is that the output for the former will typically be a defined label or value, whereas the latter will be a cluster or association.

Reinforcement

Reinforcement learning (Box 1) is a framework where the model interacts with its environment through actions that are each tied to a value reward[29]. In keeping with our thyroid nodule example, we could build a model that is fed pixel data from unlabelled scans of patients. The model is tasked with the identification of the malignant and benign target patterns. The model will take an action based on the data it encounters and then uses the reward information from its environment to find the path that maximizes the reward over time. We can think of this type of technique as learning by trial and error.

Deep learning

Deep learning (Box 1) is a subset of machine learning using algorithm architectures inspired by neural processing in humans that make classifications or predictions using layers of abstract data representations[30]. Deep learning models typically perform sequential operations that distort the data in each successive layer and this series of transformations enables the model to progressively deduce information relevant to the assigned task. Revisiting our hypothetical thyroid nodule example, the first layer of our deep learning model might assess groups of image pixels at different orientations to discern edge information[31]. The second layer might then compile the edges from the first layer to detect patterns of edges[31]. The next layer might assemble different edge motifs to detect hyperechoic or hypoechoic regions of the scan. Finally, subsequent layers might transform inputs from the previous layer to recognize complex image traits such as microcalcifications, cysts and necrosis. Importantly, deep neural networks can be differentiated from shallow neural networks by their multiple (>1) ‘hidden’ layers, which contain complex, non-linear connections that can be difficult for humans to interpret (Fig. 3). Although these hidden layers are striking in their ability to enhance the complexity of features discernible by the model, deep learning algorithms require lots of data to avoid picking up noise specific to the training dataset (See ‘overfitting’, Box 1). A key strength of deep learning is that the technique is less reliant on feature engineering (Box 1) when compared with classic machine learning models[32].

Fig. 3

A convolutional neural network.

A convolutional neural network.

The input is a medical image to which an overlaying grid and a kernel matrix (for example, 3 × 3) are applied. The matrix feature maps to a smaller area on a stacked convolution layer. Another smaller kernel matrix (for example, 2 × 2) is pulled from a different area on that convolutional layer to a pooling layer. This pipeline then coalesces into a classification region with the ‘fully connected’ layers, which will yield an output. Deep learning models can also make use of the aforementioned supervised, unsupervised and reinforcement learning techniques. Deep learning models (Table 1) can be used for specific tasks within the radiomics workflow, such as in segmentation or feature extraction, often with improved performance compared with traditional machine learning methods like single-layer ‘shallow’ neural networks. Mixed techniques are often employed in the feature extraction process, whereby ‘deep features’ mined using deep learning algorithms are syphoned into a second classifier algorithm, either in isolation or in some combination with other manually extracted or statistically derived features. However, deep learning can also be used in end-to-end processing, effectively obviating the need for human involvement in the segmentation, feature extraction and feature selection (Box 1) steps of the radiomics workflow[33,34] (Fig. 2).

Diagnostics

In this section, we review AI applications in endocrine cancer diagnostics by organ system, with an emphasis on clinical utility, technical limitations and areas for future research.

Adrenal gland

On abdominal imaging, approximately 5% of the general population have adrenal lesions that are revealed as incidentally found asymptomatic tumours (incidentalomas)[35,36]. Clinical work-up for adrenal masses starts with assessing them for potential malignancy and functionality[37]. Early radiomics efforts to discriminate adrenal lesions on imaging using CT and MRI use mean frequency attenuation mapping with histogram analysis[38]. However, the replication of findings has been a challenge, possibly owing to variation in techniques to define the ROI[39-41]. Importantly, histogram analysis has paved the way for automated radiomics-based machine learning techniques with texture analysis, which can assess both low-order and higher-order features. Texture analysis explores hierarchical spatial relationships between pixels. First-order features describe distributions in grey-level pixel intensity, second-order features assess relationships between pixel pairs and higher-order features explore distributions in pixel neighbourhoods[42-44]. CT is the dominant imaging modality for evaluating the adrenal gland and can be performed with or without contrast enhancement for the visualization of adrenal tumours. In the evaluation of malignancy, a size of >4 cm is a concerning feature, often prompting resection[45]. However, this risk factor should not be taken in isolation as ~70% of these large adrenal tumours have been shown to be non-malignant lesions[46,47]. Machine learning has been used to differentiate large (>4 cm) adrenocortical carcinomas from other large cortical lesions on contrast-enhanced CT[48]. The radiomics signature obtained by machine learning had a diagnostic accuracy for malignant disease exceeding that of radiologists, although there was inter-observer variability on the radiologist evaluation (P <0.0001)[48]. The performance of this machine learning-based texture analysis model further improved with the inclusion of pre-contrast mean attenuation, which is a parameter that is also used in established adrenal radiological criteria[49]. In terms of functional adrenal lesions, machine intelligence has also been used to differentiate between lipid-poor adenoma and subclinical phaeochromocytoma (which might secrete catecholamines), where attenuation thresholds and washout characteristics might not always be reliable[50,51]. As subclinical phaeochromocytomas can sustain secretory function, biopsy or surgery could precipitate haemodynamic instability if a functional tumour goes undetected. Studies have yielded radiomics signatures for subclinical phaeochromocytoma via machine learning-driven texture analysis on non-contrast CT imaging with performance accuracy ranging from 85% to 89%[52,53]. However, the potential benefits of this approach over existing clinical criteria are hard to discern due to considerable differences in baseline tumour characteristics, such as attenuation and size, and the lack of comparison between machine learning-driven analysis and expert radiologist evalution[52,53]. Still, we can envision a future role for the enhanced detection of subclinical pheochromocytomas with artificial intelligence techniques to confidently and quickly prompt confirmatory biochemical testing. Other groups have also leveraged the improved resolution of adrenal imaging on MRI to train their models. Indeed, one group developed a machine learning-based radiomics signature to differentiate adrenal adenomas from non-adenomatous lesions on MRI, with non-inferior performance in comparison with expert radiologists[54]. Other studies have explored neural networks for the differentiation of tumour subtypes on MRI (accuracy, 93%) and CT (accuracy, 82%), including adrenal adenomas, cysts, lipomas, lymphangiomas and metastases. However, these neural networks were trained with radiologist evaluation as the ground truth condition rather than with the gold standard of biopsy pathology[55,56]. Looking ahead, we anticipate that the field of AI-powered adrenal tumour diagnostics will move towards robust automated detection and preoperative classification of incidentalomas. Future work is needed in the differentiation of small adrenal masses <4 cm, particularly in the case of malignancy, where early detection is linked with better outcomes[57]. The field will be improved with more robust clinical evaluation and workarounds for small cohort size, possibly through increased data-sharing and/or pre-processing techniques to reduce overfitting.

Pancreas

The aberrant proliferation of endocrine islet cells leads to the development of pancreatic neuroendocrine tumours (NETs) and prognosis is overall favourable with complete resection[58-60]. A minority of these neoplasms retain the functional status of their original islet cell lineage, which can induce a clinical syndrome due to hormone production, often facilitating their detection[61,62]. Absent such biochemical indicators, the clinical management of pancreatic NETs is primarily stage-guided by Ki67 index and mitotic count observed in tissue samples obtained by biopsy; however, imaging characteristics, such as tumour size, depth of invasion and presence of metastases, are also considered[63-66]. Pancreatic NETs classically present on CT imaging as contrast-enhancing masses that are best visualized on the arterial phase, often with a hypervascular appearance and washout on the delayed phase[67,68]. Preoperatively, biopsy samples are typically obtained via fine-needle aspiration on endoscopic ultrasound, although the localization and yield can be complicated by lesion size and spatial orientation[69]. In light of these uncertainties, there is interest in developing a system for preoperative risk stratification of pancreatic NETs, which will help guide therapeutic directions in support of endocrine oncologists and surgeons[70,71]. Studies have utilized both conventional machine learning and deep learning on preoperative CT and MRI to classify pancreatic NET grade with robust accuracy in pathology-confirmed tumours[4,72-75]. Importantly, the development of classification boundaries for future studies requires consensus in the partitioning of tumour grades. For example, some studies differentiate grade G1 and G2 from G3 neoplasms, whereas others differentiate grade G1 from G2 and G3 neoplasms[4,74,76]. Given that pancreatic NETs are so rare, a deep learning study using MRI has used data augmentation (Box 1) with a generative adversarial network on 96 patients with confirmed disease to enable their convolutional neural network to have improved generalizability on unseen data[75]. As well as stratification, future computer-aided diagnosis could also potentially be used for pancreatic NETs if efforts using AI could be expanded to functional imaging techniques with tracers such as the octreotide scan[77-79]. We also envision a role for machine intelligence to support radiologists in the differentiation of atypical pancreatic NETs from adenocarcinoma. Pancreatic adenocarcinoma is an exocrine malignancy of the epithelioid ductal cells that often confers a poor prognosis due to delays in diagnosis[80,81]. Although pancreatic NETs are usually distinguishable from adenocarcinomas on CT by their vascularity pattern and absence of ductal dilation, a hypovascular enhancement pattern occurs non-infrequently in atypical variants[63,67,68]. To date, statistical approaches utilizing histogram analysis on CT images have seen conflicting findings in terms of the robustness of features used for differentiation, including entropy, kurtosis and skew[82,83]. Future studies can be performed with AI and focus on combining imaging information with clinical data (such as laboratory tests) for increased accuracy. Broadly, studies in the field of pancreatic imaging have utilized deep neural networks to improve workflow by carrying out automatic segmentation of pancreatic lesions, a process ordinarily complicated by the irregular contours and difficult anatomy of the pancreas[84-88]. In addition, several studies used advanced learning techniques for classification in exocrine pancreatic cancer and precursor lesions, with encouraging findings[89,90]. For example, one exploratory study with a small dataset used a mix of supervised and unsupervised learning techniques for the classification of pancreatic cystic neoplasms on MRI. We highlight the paper’s use of unsupervised methods, in which a k-means algorithm is trained to cluster pancreatic precursor lesions on unlabelled MRI scans. Following this step, the machine-annotated scans are fed into a novel proportioning type support vector machine for final label adjudication[91] (Table 1). Potential also exists here to eventually adapt such unsupervised models for the automatic labelling of unstructured medical image data in order to reduce the pre-processing workload. This work is still exploratory, with only a modest (6–10%) improvement in diagnostic accuracy over prior unsupervised machine learning approaches; however, it nevertheless highlights the opportunity to improve on prior learning techniques in the field of pancreatic imaging to develop models that can be used clinically.

Pituitary gland

Pituitary adenomas are found to occur in approximately 10% of the population, although they are typically small and subclinical lesions that do not require treatment[92,93]. Clinical syndromes such as acromegaly or bitemporal haemianopsia, for example, can result from tumour hormonal hypersecretion or tumour mass effect on surrounding structures[94-98]. In combination with clinical data, neuroimaging plays a vital role in informing pituitary tumour diagnosis, surgical planning and longitudinal monitoring[99,100]. MRI is generally the preferred imaging modality for the sellar region as it can provide exquisite detail of the neuroanatomy. An incredible diversity of sella turcica pathologies localize to the sellar region, including those of primary pituitary, local or distant origin. Machine intelligence has been leveraged for a variety of diagnostic tasks that reflect the diversity of sellar lesions and hold implications in terms of treatment. For example, an early study utilized a three-layer feedforward artificial neural network (Table 1) with backpropagation (Box 1) for the differentiation of large suprasellar masses such as pituitary adenomas, craniopharyngiomas and Rathke cleft cysts[101]. Their learning model used patient age together with MRI features to achieve excellent accuracy, which improved on the performance of both neuroradiologists and general radiologists[101]. Interestingly, upon assessment of expert confidence and misclassifications, the authors found that the AI model was most beneficial when used to identify cases where cystic degeneration occurred in pituitary adenomas[101]. Other models have been used for the differentiation of null cell adenomas from other non-functioning pituitary adenomas via machine learning-based radiomics signatures, albeit lacking expert radiologist comparison[102]. Accurate diagnosis of null cell adenomas is critical, as adjuvant radiotherapy has shown some benefit in this adenoma subtype but not in others due to an overriding risk of hypopituitarism. Deep learning is also gaining traction, with one study utilizing convolutional neural networks (Table 1) on multisequence MRI to diagnose pituitary adenomas from other sellar pathologies and healthy controls, with a performance accuracy of 97.0%, although this protocol is still in need of radiologist comparison[5]. Robust pituitary tumour characterization at the time of diagnosis can also inform subsequent surgical planning. A variety of conventional machine learning and deep learning techniques have been used to evaluate macroadenoma consistency, with many models achieving good diagnostic performance on par with that of radiologists[103-105]. This preoperative finding can have surgical implications as soft adenomas are generally amenable to suction curettage upon a transsphenoidal approach, whereas the firm subtype is more difficult to resect and requires ultrasonic aspiration and often a staged transsphenoidal approach[106,107]. Other deep learning models have been used to preoperatively predict tumour invasion or cerebrospinal fluid leak, to inform surgical planning[108,109]. Future machine learning directions should strive to enable the early detection of small pituitary lesions, possibly via automated lesion detection or improved diagnostic performance, as early clinical intervention can prevent the sequelae of worsening mass effect or protracted hormone hypersecretion. In terms of disease forecasting, we also see potential value in tools for the determination of appropriate patient follow-up periods for tumour surveillance to reduce unnecessary scanning and promote efficient health-care utilization. To this aim, studies could use longitudinal patient data gathered by automated segmentation and measurement of lesions over time and link those imaging features with clinical outcomes.

Thyroid gland

Thyroid cancer is the most common malignancy of the endocrine system, with an estimated 5-year prevalence of 4.6%[110] (International Agency for Research on Cancer). Ultrasonography is the mainstay imaging modality in diagnosis that can provide excellent visualization of nodules and guide potential biopsy acquisition. Many robust AI applications have emerged to characterize thyroid nodules owing, in part, to the ubiquity of data as ultrasound scans are non-ionizing, fairly low-cost and increasingly portable[110,111]. Studies to date primarily explore the automatic segmentation and classification aspects of thyroid nodule diagnosis[112-117]. The primary utility of these models lies in their potential to inform decisions around whether to proceed with surveillance or fine-needle aspiration biopsy[15]. To date, many of the radiomics signatures for thyroid cancer developed by conventional machine learning approaches map to the five domains in the Thyroid Imaging, Reporting and Data System (TI-RADS, used by radiologists) of echogenicity, echogenic foci, composition, shape and margin criteria[118-121]. These models support the robustness of these TI-RADS clinical imaging criteria; however, they also highlight a potential role for automated techniques in reducing inter-observer variability. An abundance of deep learning models has also been developed to inform clinical decisions in patients with thyroid nodules, although a 2020 metanalysis did not find a clear superiority over classic machine learning techniques or radiologists in terms of diagnostic accuracy[122]. Of course, interpretation of this pooled data is difficult as many of the deep learning models, sample sizes and clinical evaluation criteria vary substantially across studies. For example, one 2019 study with high volume data trained a convolutional neural network (Table 1) with images drawn from over 312,399 ultrasound scans from 42,952 patients across multiple institutions; this model was found to outperform skilled radiologists (>6 years experience) on external validation[6]. Although not all institutions will have access to high volume thyroid ultrasound scans, they can still implement a number of strategies to increase data availability. Here, we want to highlight one emerging strategy: the use of model pre-training with synthetic data creation via generative adversarial networks (Table 1). In fact, in the past year, the endocrine literature began to explore innovative ‘knowledge-guided’ approaches to data synthesis using deep learning-extracted features from TI-RADS to assist the generative adversarial network in its generation of thyroid nodule images[123]. It is not clearly established how the benefits of machine intelligence systems for improving diagnostic accuracy will ultimately translate to the clinical setting. Overall, the literature suggests that these systems can achieve non-inferior performance to that of experienced radiologists (experience varies, typically 5–20 years)[122,124]. These algorithms do tend to outperform less-experienced radiologists and might therefore play a valuable supporting role, particularly in low-resource settings, where access to experts could be constrained[125-127]. Compared with a small cohort of models that are actively being utilized in the clinical setting, radiologists seemingly have a slight edge in varying indicators of performance on individual studies, although pooled overall metrics are comparable[122,128]. A centralized inventory to actively track these diagnostic algorithms in clinical use would improve performance auditing and algorithm stewardship. Looking ahead, we see that the field is already heading towards 3D detection and reconstruction in thyroid ultrasonography that might power more robust analytics[117]. Another challenge moving forward will be in mitigating the risks of excessive intervention in thyroid cancers with improved detection as many slow-progressing or early-stage cancers will remain subclinical. Possible solutions here include linking imaging algorithms to pathology reference standards as well as with longitudinal outcomes data for improved risk stratification[129].

Facial recognition

Interestingly, a number of computer vision applications of facial recognition software have been developed to identify stereotyped facial features induced in hormonal excess[130]. A positive identification of characteristic facial features could indicate a number of pathologies, including an underlying endocrine tumour. The process is similar to the radiomics workflow, except that facial landmark tagging occurs in lieu of segmentation during image pre-processing. Acromegaly can result in facial manifestations such as frontal bossing, sunken nasolabial folds, prominent zygomatic arch and enlarged jaw often due to an underlying pituitary somatotrophic macroadenoma. Both machine learning and deep learning approaches have been used to craft models to identify stereotyped facial features, with a performance comparable to that of acromegaly specialists and exceeding that of general internists[131,132]. Stereotyped features can also occur in Cushing syndrome, such as facial plethora, hirsutism, acne and cervical fat pad, owing to increased cortisol. Initial pilot studies using machine learning are limited by small cohorts and demonstrate variable performance on retrospective validation (accuracy range 62.8–85%)[133,134]. Limitations in models to date include poor visualization of facial features and potential entrenched bias due to racial and gender homogeneity in training data[131-134]. The diversification of data and obtaining metrics of bias are critically important as is documenting bias assessments in these facial recognition software applications to avoid replicating current racial and gender disparities in the care of Cushing syndrome and acromegaly that manifest as poor outcomes and delays in diagnosis, respectively[135,136].

Clinical evaluation

The metrics used for clinical assessment in AI currently lack standardization, which undermines the smooth integration of AI into the health-care system (Tables 2,3). Many computer vision studies in endocrine cancer imaging lack robust validation, which poses inherent limitations in terms of reproducibility. First, the lack of consistent reference standards (including biopsy, stable imaging and clinical criteria) for common clinical questions in machine intelligence for tumour imaging diagnostics can undermine the ability to establish a ground truth for comparison across studies. Furthermore, no consensus exists in definitions of high versus low experience levels in radiologists; however, the endocrine cancer computer vision literature generally trends towards more than 5 years of clinical practice at a minimum as indicative of a high level of expertise. Next, separation of the data training sets and testing datasets is critically important and cross-validation alone is not adequate in evaluating clinical performance. At a minimum, studies should be validated on external datasets, ideally with prospective studies, which are less prone to selection bias.

Table 2

Key evaluation metrics for computer vision applications in medicine

Development phase	Principle		Description
Data management	Pre-processing	Acquisition	Studies should disclose protocols used to obtain medical images as these can be different across institutions; variations in imaging machines, positioning, image capture and slicing, and data formats can limit generalizability; augmentation of acquisition protocols through automation can improve standardization
	Pre-processing	Segmentation	Refers to the process of making images machine-readable through annotation of ROIs, which can be performed manually or automatically; protocols can be subject to inter-observer and inter-study variability (such as whole tumour versus axial ROIs)
	Heterogeneity		Refers to the sample data mix; ideally, data would include a multi-institutional and representative set of experimental and control images with both typical and atypical cases; publishing the data distributions for pathologies or demographics included in model training can help to mitigate these concerns
	Size		With increasing dimensionality, models need more data for generalization; researchers can use model-specific or post hoc thresholds in performance cut-offs on validation to ‘power’ their studies but these processes are variable in practice; sample size determination practices should be reported in research studies; future work should assess for possible best practices in post hoc techniques for sample size determination
Training	Reference standard		A degree of uncertainty exists in the ground truth condition in clinical diagnosis, although sample biopsy tends to be the gold standard in cancer diagnostics; however, diagnosis by a specific biomarker, imaging finding or clinical criteria might be more appropriate to the clinical question and/or institutional resources; however, the establishment of uniform reference standards for endocrine neoplasms is needed in cases where biopsy is not routinely obtained such as in small adrenal or pituitary masses
Training	Data separation		Failure to separate training and validation sets is discouraged as it limits the generalizability of findings
Testing and/or validation	Performance: efficacy (diagnostic performance); safety (potential untoward effects on overall patient health or well being); fairness (equitable algorithm performance across populations)		An expert radiologist comparison can be used to infer the clinical relevance of algorithm performance; retrospective and prospective experimental designs are typically used, with prospective studies less prone to memory bias (internal test sets) and selection bias; in algorithms intended for autonomous use in diagnosis or other high-risk applications, randomized clinical trials might be warranted to assess for efficacy, safety and fairness
Implementation and quality control	Generalizability		Institutions should assess how algorithms perform in their respective clinical populations; ideally, all studies would be tested on a distinct, external dataset prior to implementation to infer generalizability; baseline variation in radiologist skill level across institutions can muddy comparisons; drawing from experts across different institutions as well as including a consensus agreement on ‘highly experienced’ expert level in existing reporting guidelines could help in assessments of model generalizability
	Longevity		Model performance has the potential to degrade over time due to changing health infrastructure, cyber sabotage or shifts in population characteristics over time; continued performance auditing across the algorithms life cycle is indicated
	Utility		The number of algorithms being developed to assist clinical diagnostics is exploding to the point where it can constrain bandwidth, clutter interfaces and overwhelm providers; moving forward, there will be a need for inventories of models that can guide clinical stewardship efforts to curtail their excessive use

ROIs, regions of interest.

Table 3

Major studies in AI imaging for endocrine cancer diagnostics

Study and year^a	Task	Modality	Model type or package	Study design	Training data size; test data size	External testing	Compared with expert?	Reference standard
Adrenal
Romeo et al.[54], 2018	Classify LRA vs LPA vs NAL	MRI	J48 (Weka)	Retrospective	80	No	Yes	LRA and LPA by imaging; NAL by pathology
Yi et al.[53], 2018	Classify LPA vs sPHEO	CT	LASSO	Retrospective	212; 53	No	No	Pathology
Yi et al.[52], 2018	Classify LPA vs sPHEO	CT	MaZda-B11	Retrospective	110	No	No	Pathology
Barstugan et al.[55], 2020	Classify lesion benign vs malignant and by type (AA vs cyst vs LP vs MET)	MRI	SVM, ANN	Retrospective	112	No	No	Imaging
Elmohr et al.[48], 2019	Classify benign vs malignant large adrenal tumours	CT	RF	Retrospective	54	No	Yes	Pathology
Koyuncu et al.[56], 2019	Classify benign vs malignant (AA, haematoma, LP, PHEO, MET)	CT	ANN	Retrospective	57; 57	No	No	Imaging
Pancreas
Choi et al.[76], 2018	Predict grade G1 vs G2 or G3 PNET	CT	MISSTA package	Retrospective	66	No	Yes	Pathology
Gao and Wang[75], 2019	Predict grade G1 vs G2 vs G3 PNET	MRI	GAN, CNN	Retrospective	NR (n = 96; augmented)	Yes	No	Pathology
Gu et al.[74], 2019	Predict grade G1 vs G2 or G3 PNET	CT	RF	Retrospective	104; 34	No	No	Pathology
Liang et al.[73], 2019	Predict grade G1 vs G2 or G3 PNET	CT	LASSO	Retrospective	86; 51	Yes	No	Pathology
Luo et al.[4], 2020	Predict grade G1 or G2 vs G3 PNET	CT	CNN	Retrospective	93; 19	Yes	No	Pathology
Zhao et al.[72], 2020	Predict grade G1 vs G2 PNET	CT	SVM	Retrospective	59; 40	No	No	Pathology
Pituitary
Kitajima et al.[101], 2009	Differentiate PA vs CP vs RCC	MRI	ANN	Retrospective	43	No	Yes	Pathology
Zhang et al.[102], 2018	Differentiate NCA vs other NFPA	MRI	SVM	Retrospective	75; 37	No	No	Pathology
Fan et al.[104], 2019	Differentiate PA consistency	MRI	SVM	Prospective	100; 58	Yes	No	Clinical criteria and surgical video
Niu et al.[108], 2019	Preoperative prediction of PA cavernous sinus invasion	MRI	LASSO, SVM	Retrospective	97; 97	No	No	Surgeon postoperation evaluation
Qian et al.[5], 2020	Differentiate PA vs other (sellar lesions or healthy)	MRI	CNN	Retrospective	5,164; 1,393	No	No	Clinical diagnosis
Zhu et al.[105], 2020	Differentiate PA consistency	MRI	GAN, CNN, CRNN	Retrospective	70%; 30%; (n = 374; augmented)	No	No	Imaging
Thyroid
Zhu et al.[120], 2013	Differentiate benign vs malignant	US	ANN	Retrospective	464; 225	No	No	Pathology
Buda et al.[125], 2019	Differentiate benign vs malignant	US	CNN	Retrospective	1,278; 99	No	Yes	Pathology
Li et al.[6], 2019	Differentiate benign vs malignant	US	CNN	Retrospective	312,399; 19,781	Yes	Yes	Pathology
Song et al.[112], 2019	Detect and differentiate benign vs malignant	US	CNN	Retrospective	6,228; 367	Yes	Yes	Pathology
Song et al.[173], 2019	Differentiate benign vs malignant	US	CNN	Prospective	1,358; 100	Yes	No	Pathology
Wang et al.[174], 2020	Predict aggressive[175] vs non-aggressive papillary thyroid CA	MRI	LASSO, GBC	Prospective	96; 24	No	No	Pathology

AA, adrenal adenoma; AI, artificial intelligence; ANN, artificial neural network; CA, carcinoma; CNN, convolutional neural network; CP, craniopharyngioma; CRNN, convolutional recurrent neural network; G, grade; GAN, generative adversarial network; GBC, Gradient Boosting Classifier; LASSO, least absolute shrinkage and selection operator; LPA, lipid-poor adenoma; LRA, lipid-rich adenoma; LP, lipoma; MET, metastases; MISSTA, Medical Imaging Solution for Segmentation and Texture Analysis; NAL, non-adenomatous lesion; NCA, null cell adenoma; NFPA, non-functioning pituitary adenoma; NR, not reported; PA, pituitary adenoma; PHEO, pheochromocytoma; PNET, pancreatic neuroendocrine tumour; RF, random forest; RCC, Rathke cleft cyst; sPHEO, subclinical phaeochromocytoma; SVM, support vector machine; US, ultrasonography. aThis list is not exhaustive and provides a selection of key studies.

Key evaluation metrics for computer vision applications in medicine ROIs, regions of interest. Major studies in AI imaging for endocrine cancer diagnostics AA, adrenal adenoma; AI, artificial intelligence; ANN, artificial neural network; CA, carcinoma; CNN, convolutional neural network; CP, craniopharyngioma; CRNN, convolutional recurrent neural network; G, grade; GAN, generative adversarial network; GBC, Gradient Boosting Classifier; LASSO, least absolute shrinkage and selection operator; LPA, lipid-poor adenoma; LRA, lipid-rich adenoma; LP, lipoma; MET, metastases; MISSTA, Medical Imaging Solution for Segmentation and Texture Analysis; NAL, non-adenomatous lesion; NCA, null cell adenoma; NFPA, non-functioning pituitary adenoma; NR, not reported; PA, pituitary adenoma; PHEO, pheochromocytoma; PNET, pancreatic neuroendocrine tumour; RF, random forest; RCC, Rathke cleft cyst; sPHEO, subclinical phaeochromocytoma; SVM, support vector machine; US, ultrasonography. aThis list is not exhaustive and provides a selection of key studies. To improve the quality of research, a number of guidelines for reporting in computer vision studies in medicine have been developed[137-139]. Moving forward, the development of performance profiles for any high-fidelity model classes or software packages for standard benchmarking might also be helpful, while at the same time acknowledging that, often, many ways exist to accomplish the same task with machine learning. Importantly, for those algorithms intended for autonomous clinical use, multicentre randomized trials might also be indicated to qualify their performance in integrated settings. Long-term monitoring of efficacy and bias across the algorithm life cycle is also indicated, particularly in cases of continuous learning (Box 1) where algorithms continually update to reflect new data.

Interpretability

Decoding AI for physicians can mitigate uncertainty that could undermine trust in machine intelligence[140-142]. Broadly speaking, interpretability strategies come in multiple flavours, either being specific or agnostic to a given model class and assessing function either at a global or local level[143]. Global interpretations seek to offer holistic depictions of model behaviour and they focus on illuminating trends in the data that are most important to classification. Local interpretations focus on explaining individual model prediction instances. The intent of these strategies is to reassure endocrinologists and radiologists that the model is making decisions of what it should be looking at, often by way of visualizations or text. These explanations are generated using several techniques, including feature importance to highlight salient features, counterfactual examples of model predictions for a given input or decision rules that describe the logical flow of the model[143,144]. Feature attribute strategies are quite popular and include colour mapping[145], an interpretability technique that highlights regions of the medical image that influence the model decision. Other feature attribute methods include surrogate strategies[146], which use simpler models to explain the behaviour of more complex models. In the oncologic endocrinology literature, one form of colour mapping, known as saliency mapping, has been demonstrated in thyroid nodule classification to illustrate model behaviour[147]. Other studies have utilized both gradient mapping and surrogate modelling techniques to highlight feature importance in the segmentation of brain tumours on MRI and abdominal CT, with the potential for future use in sellar, pancreatic and adrenal diagnostics[148-150]. Finally, image similarity feature attribute strategies have also been applied to computer vision models for thyroid cancer. This technique displays a similar image linked to a classification as an explanation for the user, often with a superimposed gradient mapping to illuminate any respective discrepancies in regions of importance[151]. Of note, textual explanations are less common; however, they have been utilized in breast MRI and pelvic x-ray imaging to generate descriptive semantic outputs[152,153]. Interestingly, a combined approach with both saliency maps and textual explanations was shown to be better received by a small group of physicians[153]. Future efforts should strive to develop standardized metrics for evaluating the performance of interpretability models to ensure their effective and reproducible knowledge translation to the clinical setting.

Data availability

Abundant medical imaging data is needed to develop clinically meaningful deep learning models for non-invasive endocrine cancer diagnostics, capable of generalizing to a variety of clinical settings. In this section, we discuss techniques to increase the availability of data to prevent overfitting in AI models.

Open data curation

A lack of high volume, quality data impedes the development of robust AI in endocrine cancer diagnosis. One strategy to improve data for use by machine learning models is through improved sharing of existing data via the creation of open databases. The ongoing coronavirus disease 2019 pandemic has highlighted the role of open science in enabling timely advances in medical research, a movement that we should strive to foster outside of crisis scenarios[154-156]. In terms of medical imaging specifically, open data can be used either for training and development of models or as external test sets. Some examples would be the US National Institute of Health Cancer Imaging Archive and the UK Biobank, the latter of which expanded its archive in 2020 to include an imaging database with pan-MRI and DXA scans on >5,000 patients.

Automated workflow

AI integration can be targeted through automated pipelines (including XNAT or DICOM Image Analytics and Archive) that can reduce latency in data retrieval through improved integration with existing health-care infrastructure[157] (Fig. 4). These tools can uncouple imaging data in Picture Archiving and Communications Systems from protected health information following image acquisition for use by AI models for near-real-time processing. From there, these imaging findings can be conveyed using automatic workflow interfaces connected with the electronic medical record as a central hub for coordination among endocrinologists, radiologists and other care team members. Looking ahead, we envision the deployment of these automated workflow pipelines to facilitate real-time analytics that endocrinologists can access rapidly at the bedside via smartphone-based imaging viewing platforms or portable imaging devices.

Fig. 4

Real-time analytics with automatic picture archiving and communications systems integration.

The system named DICOM Image Analysis and Archive (DIANA) is an automated workflow solution developed by the authors’ group that provides a programming interface with the hospital picture archiving and communications systems (PACS) to streamline clinical artificial intelligence (AI) research[176]. DIANA has facilitated near-real-time monitoring of acquired images, large data queries and post-processing analyses. More importantly, DIANA is integrated with the machine learning algorithms developed for various applications. The future goal is to integrate AI endocrine cancer diagnostics (such as adrenal adenoma and pituitary adenoma) in this or other systems. HTTP, hypertext transfer protocol; PHI, protected health information. Figure 4 is adapted from ref.[176], Springer Nature Limited.

Real-time analytics with automatic picture archiving and communications systems integration.

Data augmentation and transfer learning

Data pre-processing and model pre-training techniques can also be used to engineer workaround solutions to limited imaging data in order to improve AI model generalizability[32]. Data augmentation is a process that distorts the training images via oversampling to generate synthetic data[158,159]. Another popular option in computer vision for treating small sample sizes involves pre-training of the model with a large and diverse image set to transfer preliminary weights to nodes in the network, after which fine-tuning of the model is performed using the target data[160,161]. Although these augmentation and transfer learning (Box 1) methods are now becoming staples in medical image informatics research, they were not used in a number of the endocrine cancer studies that we reviewed. Looking ahead, we anticipate that improved uptake of these methods will promote deep learning breakthroughs, particularly in the cases of rare neoplasms with limited availability of imaging data such as those of the adrenal gland and endocrine pancreas.

Alternative computing platforms

The organization of network servers used to access, store and transfer data can influence AI model training and development (Fig. 5). In this section, we draw attention to how exploratory computing frameworks might be leveraged to improve the quality of AI applications for endocrine cancer diagnostics.

Fig. 5

Exploring alternative computing platforms.

Centralized, distributed, decentralized and quantum computing frameworks are shown. The centralized network panel has a node with spokes spreading outward that represents a single, consolidated platform such as a local (on-site data centre) or remote (Cloud) server. The distributed network panel shows a net-like pattern of equally spaced nodes and such a platform with multiple local servers or devices can be used for collaborative model training techniques like cyclic weight transfer. The decentralized network panel has multiple centralized nodes connected in a net-like pattern and federated learning is one training paradigm that uses this platform. Previous studies[177,178] depicted quantum networks as two nodes with the cutout region between nodes illustrating the induction of dependent quantum states among two particles (A and B, where S refers to a shared source of squeezed light) and this particle ‘entanglement’ is at the crux of quantum communications. Adapted from ref.[177], Springer Nature Limited. The quantum network is reprinted from ref.[178], CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/).

Exploring alternative computing platforms.

Decentralized or distributed

Information technology infrastructures are trending towards Cloud computing (Box 1) solutions that consolidate data operations within a central server. However, computing platforms with diffuse servers are now being explored to circumvent data-sharing issues associated with centralized data that pose barriers to the multi-institutional, collaborative training of AI models. Distributed networks process data diffusely across local nodes, whereas decentralized platforms operate as collectives of nodal clusters (Fig. 5; Box 1). We highlight the potential of an emerging decentralized training paradigm known as federated learning (Box 1), which is already being utilized to enable deep AI models to be developed for diabetic retinopathy and breast cancer diagnosis[162]. Federated learning uses distributed servers across multiple institutions for parallel model training and model updates are subsequently loaded onto a central server to develop an ensemble model. Distributed learning techniques, such as cyclic weight transfer, can conduct this process across local servers in series, using one model passed from institution to institution over the course of training[163]. Importantly, these techniques do not require inter-institutional patient data transfer or co-location. We can similarly envision a role for decentralized and distributed techniques in bypassing current barriers to data sharing and availability to enable deep learning applications in oncologic endocrinology, particularly in rare cancers. However, a notable limitation in current federated learning techniques is that the diversity of data is only as robust as that of the collaborating institutions. Still, past efforts have yielded deep learning models with impressive performance on par with those from shared multi-institutional datasets[162-164].

Quantum

Other breakthroughs in machine intelligence in medicine will come with shifts in computing frameworks that can enhance model training and efficiency. Quantum computing (Box 1) represents one emerging prospect that would leverage the physical properties of atomic and subatomic particles to enhance processing power, algorithm performance and data transfer[165]. Quantum computers can theoretically support the simultaneous, parallel-path processing of data to create shortcuts that might outperform conventional computing[165] (Fig. 5). Encouraging scientific breakthroughs over the past 5 years demonstrated ‘quantum supremacy’ in terms of problem-solving capabilities over conventional computing, albeit these findings are still very much exploratory[166]. However, some experts anticipate that the arrival of usable quantum computing could occur as early as within the next few decades[167].

Conclusions

Machine intelligence continues to gain traction in oncologic endocrinology for its potential to enable robust non-invasive diagnostics. However, for these technologies to take hold, both adherence to consensus reporting standards and evaluation criteria in AI image interpretation are required, which will enable meaningful cross-study comparisons. Although several of such AI guidelines have been established[137-139], a lack of harmonization impedes their widespread uptake. Another challenge will be facilitating the smooth movement of these technologies into the clinical setting so that physicians embrace their use. Clarity at the federal and institutional levels is urgently needed in terms of developing longitudinal performance auditing, medicolegal liability frameworks and guidance on reimbursements for clinical AI developers and medical institutions utilizing these technologies. Another theme is how poor data availability continues to stymie the development of robust machine learning applications, particularly in rare endocrine cancers. Although access to medical imaging is improving through open data-sharing initiatives, we still note a relative paucity of endocrine cancer scans within these larger imaging databases. We encourage the creation of domain-specific imaging databases that can better enable AI for oncologic endocrinology purposes. Collaborative learning strategies might also centre the foray given their potential to circumvent data access issues without transferring personal health information. Future work on distributed computing paradigms also need to consider how to best manage potential cyber risks and data as the potential surface area vulnerable for cyberattack increases with the increasing number of participants. Digital health could also enable future breakthroughs such as via correlation of radiomics findings with wearables or digital health application data[168]. Finally, the advent of smartphone imaging viewing platforms and automated workflows will bring the field closer to smooth, real-time analytics that can enable robust partnerships among endocrinologists, radiologists and AI.

148 in total

Review 1. Heterogeneity of Thyroid Cancer.

Authors: Ewa Chmielik; Dagmara Rusinek; Malgorzata Oczko-Wojciechowska; Michal Jarzab; Jolanta Krajewska; Agnieszka Czarniecka; Barbara Jarzab
Journal: Pathobiology Date: 2018-02-06 Impact factor: 4.342

Review 2. Individualized medicine from prewomb to tomb.

Authors: Eric J Topol
Journal: Cell Date: 2014-03-27 Impact factor: 41.582

3. Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study.

Authors: Xiangchun Li; Sheng Zhang; Qiang Zhang; Xi Wei; Yi Pan; Jing Zhao; Xiaojie Xin; Chunxin Qin; Xiaoqing Wang; Jianxin Li; Fan Yang; Yanhui Zhao; Meng Yang; Qinghua Wang; Zhiming Zheng; Xiangqian Zheng; Xiangming Yang; Christopher T Whitlow; Metin Nafi Gurcan; Lun Zhang; Xudong Wang; Boris C Pasche; Ming Gao; Wei Zhang; Kexin Chen
Journal: Lancet Oncol Date: 2018-12-21 Impact factor: 41.316

4. A novel diagnostic method for pituitary adenoma based on magnetic resonance imaging using a convolutional neural network.

Authors: Yu Qian; Yue Qiu; Cheng-Cheng Li; Zhong-Yuan Wang; Bo-Wen Cao; Hong-Xin Huang; Yi-Hong Ni; Lu-Lu Chen; Jin-Yu Sun
Journal: Pituitary Date: 2020-06 Impact factor: 4.107

5. Predicting the Future - Big Data, Machine Learning, and Clinical Medicine.

Authors: Ziad Obermeyer; Ezekiel J Emanuel
Journal: N Engl J Med Date: 2016-09-29 Impact factor: 91.245

Review 6. Artificial intelligence in radiology.

Authors: Ahmed Hosny; Chintan Parmar; John Quackenbush; Lawrence H Schwartz; Hugo J W L Aerts
Journal: Nat Rev Cancer Date: 2018-08 Impact factor: 60.716

7. Preoperative Prediction of Pancreatic Neuroendocrine Neoplasms Grading Based on Enhanced Computed Tomography Imaging: Validation of Deep Learning with a Convolutional Neural Network.

Authors: Yanji Luo; Xin Chen; Jie Chen; Chenyu Song; Jingxian Shen; Huanhui Xiao; Minhu Chen; Zi-Ping Li; Bingsheng Huang; Shi-Ting Feng
Journal: Neuroendocrinology Date: 2019-09-13 Impact factor: 4.914

8. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing.

Authors: Marco Gerlinger; Andrew J Rowan; Stuart Horswell; James Larkin; David Endesfelder; Eva Gronroos; Pierre Martinez; Nicholas Matthews; Aengus Stewart; Charles Swanton; M Math; Patrick Tarpey; Ignacio Varela; Benjamin Phillimore; Sharmin Begum; Neil Q McDonald; Adam Butler; David Jones; Keiran Raine; Calli Latimer; Claudio R Santos; Mahrokh Nohadani; Aron C Eklund; Bradley Spencer-Dene; Graham Clark; Lisa Pickering; Gordon Stamp; Martin Gore; Zoltan Szallasi; Julian Downward; P Andrew Futreal
Journal: N Engl J Med Date: 2012-03-08 Impact factor: 91.245

2. Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation.

Authors: Jiajin He; Jinhua Li; Siqing Jiang; Wei Cheng; Jun Jiang; Yun Xu; Jiezhe Yang; Xin Zhou; Chengliang Chai; Chao Wu
Journal: Front Public Health Date: 2022-08-25

2 in total