| Literature DB >> 36092822 |
Mara Graziani1,2, Lidia Dutkiewicz3, Davide Calvaresi1, José Pereira Amorim4,5, Katerina Yordanova3, Mor Vered6, Rahul Nair7, Pedro Henriques Abreu4, Tobias Blanke8, Valeria Pulignano9, John O Prior10, Lode Lauwaert11, Wessel Reijers12, Adrien Depeursinge1,10, Vincent Andrearczyk1, Henning Müller1,13.
Abstract
Since its emergence in the 1960s, Artificial Intelligence (AI) has grown to conquer many technology products and their fields of application. Machine learning, as a major part of the current AI solutions, can learn from the data and through experience to reach high performance on various tasks. This growing success of AI algorithms has led to a need for interpretability to understand opaque models such as deep neural networks. Various requirements have been raised from different domains, together with numerous tools to debug, justify outcomes, and establish the safety, fairness and reliability of the models. This variety of tasks has led to inconsistencies in the terminology with, for instance, terms such as interpretable, explainable and transparent being often used interchangeably in methodology papers. These words, however, convey different meanings and are "weighted" differently across domains, for example in the technical and social sciences. In this paper, we propose an overarching terminology of interpretability of AI systems that can be referred to by the technical developers as much as by the social sciences community to pursue clarity and efficiency in the definition of regulations for ethical and reliable AI development. We show how our taxonomy and definition of interpretable AI differ from the ones in previous research and how they apply with high versatility to several domains and use cases, proposing a-highly needed-standard for the communication among interdisciplinary areas of AI.Entities:
Keywords: Explainable artificial intelligence; Interpretability; Machine learning
Year: 2022 PMID: 36092822 PMCID: PMC9446618 DOI: 10.1007/s10462-022-10256-8
Source DB: PubMed Journal: Artif Intell Rev ISSN: 0269-2821 Impact factor: 9.588
Fig. 1Trends of the publications containing “interpretable AI" or “explainable AI" as keywords
Fig. 2Graphical representation of Artificial Intelligence, Machine Learning, and Deep Learning adapted from https://www.intel.com
Analysis of the etymology of the terms related to interpretability
| ID | Word | Etymology | ML Definition | |
|---|---|---|---|---|
| 1 | Interpretability, Interpretable | From late Latin interpretabilitis from Latin interprĕtor, interprĕtāri (to interpret) | To interpret, comment, explain, expose, illustrate, to translate | To translate, expose, and comment on the generation process of one or multiple ML systems outcomes, making the overall process understandable by a human |
| 2 | Explainability, Explainable | From 1600 use of explain + -able adapted from Latin explāno, explānāre | To explain, clarify, expose, illustrate, state clearly | To indicate with precision, to illustrate what features or high-level concepts were used by the ML system to generate predictions for one or multiple inputs. In intelligent agent systems: possibly iterative process of symbolic knowledge manipulation to make it interpretable |
| 3 | Transparency, Transparent | Medieval Latin adaptation of the words trans (on the other side) and pārĕo, pārēre (to appear, to show) | To see through | A |
| 4 | Intelligibility, Intelligible | From Latin intellegibilis, intellegibilis, II class adjective | To understand, comprehend, decipher | An intelligible ML system is an understandable system with inherent interpretability |
| 5 | Accountability, Accountable. | From 1770 use of accountable + -ity, adapted from Old French acont derived from Latin compŭto, compŭtāre, which has multiple meanings including to count, to estimate, to judge and to believe. | Used from the 1610s with the sense of “rendering an account", meaning providing a statement answering for conduct. | An accountable ML system is expected to justify its outcomes and behavior |
| 6 | Reliability, Reliable | From Scottish of the 1560s “raliabill", derived from Old French relier a derivation of the Latin rĕlĭgo, rĕlĭgāre (meaning to tie, to bind). | From the 1570s used with the sense of to depend, to trust, typically used in the expression “to rely on something/someone". | To be consistently good and be worthy of trust |
| 7 | Auditability, Auditable | From Latin noun auditŭs, auditŭs | The sense of hearing, the act of hearing, audition. Used in the sense of official audience, judicial hearing or examination. | An “auditable" ML system should provide information on how to perform an official audience of the model. For example, this can be done by providing extra documentation and functionalities. |
| 8 | Liability, liable | From Anglo-French liable, derived from Latin lĭgo, lĭgāre (to tie, to bind) | Legal responsibility for acts. | Legal liability of a product implementing ML, particularly in the case where something goes wrong |
| 9 | Robustness, Robust | From French robuste, derived from Latin robustus, robustum. | The literal meaning is oaken, made of oak. Used in the figurative sense of strong, vigorous and resistant. | Robust ML systems are resistant, secure and reliable. Providing consistent results also in case of adversarial attacks, variations in the dataset, domain shifts, and outliers |
Multiple taxonomies-part 1
| Interpretable | Explainable | Transparent | Intelligible | Refs. |
|---|---|---|---|---|
| The system operations can be understood by a human, either through introspection or through a produced explanation | To show the rationale behind each step in the decision. It is linked to justification and affects user acceptance and satisfaction | Not mentioned | Not mentioned, although they refer to introspective explanations |
Biran and Cotton ( |
| Ability to explain or to present in understandable terms to a human | Not mentioned | Not mentioned | Not mentioned |
Doshi-Velez and Kim ( |
| A non-monolithic concept reflecting several distinct ideas. | Solely intended as post-hoc interpretability. Post-hoc explanations can be verbal, and visual | Understanding the mechanism by which the model works. Related to simulatability and decomposability. | Understandable models are sometimes called transparent |
Lipton ( |
| A mapping of an abstract concept into a domain that the human can make sense of | Collection of features [ | Achievable by both interpreting and explaining ML outcomes | Post-hoc interpretability should be contrasted to incorporate interpretability into the structure of the model. |
Montavon et al. ( |
| Used more frequently than “explainable” by the ML community, referring to a powerful tool for justifying AI-based decisions | Not mentioned | Not mentioned | Understandability is characterized by no means of understanding the internal model functioning. Understandable is different from intelligible |
Adadi and Berrada ( |
| The level to which an agent gains and can make use of both the information embedded within explanations given by the system and the information provided by the system’s transparency level | The level to which a system can provide clarification for the cause of its decisions/outputs | The level to which a system provides information about its internal workings or structure and the data it has been trained with | Not mentioned. |
Tomsett et al. ( |
| Equated with “explainability”, it defines the degree to which an observer can understand the cause of a decision" | Establishing an interaction between the explainer and the explainee (i.e. the subject on the receiving end of an explanation), that is contextual and selective, based on small subset of causes | Briefly mentioned as interlinked to trust | Not mentioned |
Miller ( |
| Acknowledgment of multifaceted definitions from earlier studies | Answering “why" and “why not" questions to improve the user’s mental model of the system. In other cases, equated to interpretable | Providing explanations on how the system works, clearly describing model structure, equations, parameter values and assumptions | A system that is “clear enough to be understood". It is challenging to understand how an AI system should be defined in order to be “intelligible" since this would require the clarification of “complex computational processes to various types of users" |
Clinciu and Hastie ( |
| Broadly defined, referring to the extraction of relevant knowledge (visualization, language, or equation) about domain relationships contained in the data. | Used as a synonym of interpreting | A feature engineering process to enhance the analysis of model interpretability | Not mentioned |
Murdoch et al. ( |
Fig. 3Differences of definitions in other domains than ML development. In this diagram, interpretable is equated to explainable since most of the social domains equate the two terms for simplicity
Taxonomy of Interpretable AI for the social and technical sciences
| Terminology | Definition in AI | Family of AI systems (technical) |
|---|---|---|
| Interpretability | (global) AI interpretability defines those AI systems for which it is possible to translate the working principles and outcomes in human-understandable language without affecting the validity of the system | Three families of AI systems may be identified by interpretable AI. These are (i) AI systems with built-in interpretability (ii) AI systems that are inherently interpretable (iii) AI systems that were explained by post-hoc methods. More details on these families in Table |
| (EU law) AI interpretability defines the supply of meaningful information about the underlying logic, significance and envisaged consequences of the AI system | – | |
| (symbolic AI) AI interpretability includes explanations of the symbolic AI systems in symbolic language | – | |
| (sociology) AI interpretability must define a social relationship of trust between the human and the machine | – | |
| Interpretability by design | (global) The translation of the system’s working principles and outcomes into human-understandable language is provided directly by the AI-system itself, interpretability being one of the tasks of the system | Two families of systems may be identified, namely (i) systems with a transparent design (e.g. introducing parameter sparsity, implementing monotonic functions Nguyen and Martínez ( |
| Post-hoc interpretability | (global) The AI system is neither inherently interpretable nor interpretable by-design, rather additional analyses are performed to generate explanations without re-training the model parameters | Six families of post-hoc interpretability methods can be identified based on the form of the generated explanations into (i) feature attribution (ii) feature visualization (iii) concept attribution (iv) surrogate explanations (v) case-based explanations and (vi) textual explanations. For further details on these categories we refer the reader to Arrieta et al. ( |
| Local interpretability | (technical) Local interpretability is provided when interpretability analysis is performed on the system’s outcome for a single input | The family of feature attribution methods contain several approaches that provide local interpretability Ribeiro et al. ( |
| Global interpretability | (technical) Global interpretability is provided when interpretability analysis is performed to explain the system behavior for a set of inputs corresponding to an entire class or multiple classes | Post-hoc interpretability methods may provide global interpretability, such as distillation techniques Frosst and Hinton ( |
| Explainability | (global) Explainable AI, also denoted as XAI, defines the branch of AI research that focuses on generating explanations for complex AI systems | The six families of post-hoc interpretability methods known as feature attribution, feature visualization, concept attribution, surrogate, case-based and textual explanations are addressed as explainable AI. |
| Transparency | (global) Transparency is used in AI to characterize those systems for which the role of internal components, paradigms and overall behaviour is known and can be simulated | The family of linear regression models and decision trees in low dimension are transparent and can be simulated |
Brackets specify the domain in which each definition applies. Global marks a definition common to both the social and technical sciences
Definitions of families of interpretability techniques
| Scope | Family | Definition |
|---|---|---|
| Inherent Interpretability | Interpretable Model | Models that are considered interpretable due to their low complexity and simple structure. |
| Black-box Model | Models that are considered hard to interpret due to their high complexity and complicated structure. | |
| Global Interpretability | Feature Visualization Nguyen et al. ( | Synthetization of new instances that help visualize features learned by the model or a specific part of the model. |
| Prototype, Criticism Kim et al. ( | A prototype is a data instance that is representative of all the data. A criticism is a data instance that is not well represented by the set of prototypes. | |
| Influential Instances Koh and Liang ( | Data instances of which the removal has a strong effect on the trained model. | |
| Dependency Plot | Depicts the functional relationship between a small number of input variables and predictions. | |
| Global Surrogate Hinton et al. ( | Interpretable model that is trained to approximate the predictions of a black-box model. | |
| Concept Attribution Kim et al. ( | Explain the model’s behavior based on user-friendly concepts. | |
| Feature Importance Lundberg and Lee ( | Assigns a score to input features based on how useful they are at predicting a target variable. | |
| Local Interpretability | Local Surrogate Ribeiro et al. ( | Local surrogate models are interpretable models that are used to explain individual predictions of black-box models. |
| Saliency Map Selvaraju et al. ( | Highlight the pixels that were relevant for a certain image prediction. | |
| Counterfactual Example Wachter et al. ( | A counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output. | |
| Adversarial Example Goodfellow et al. ( | An adversarial example is an instance with small, intentional feature perturbations that cause a ML model to make a false prediction. |
Classification of families of interpretability techniques
| Scope | Family | Interpretability | Explainability | Transparency | Intelligibility | Accountability | Auditability | Robustness |
|---|---|---|---|---|---|---|---|---|
| Inherent Interpretability | Interpretable Models | x | x | x | x | x | x | x |
| Black-box Models | – | – | – | – | – | – | – | |
| Global Interpretability | Feature Visualization | x | – | x | – | x | x | – |
| Prototypes and Criticisms | x | – | x | – | x | x | x | |
| Influential Instances | x | x | x | – | x | x | x | |
| Dependency Plot | – | x | x | – | x | x | – | |
| Global Surrogate | x | x | x | x | x | x | – | |
| Concept Attribution | x | x | x | – | x | x | – | |
| Feature Importance | – | x | x | – | x | x | – | |
| Local Interpretability | Local Surrogate | – | x | x | x | x | x | - |
| Saliency Map | – | x | x | x | – | x | – | |
| Counterfactual Example | – | x | x | – | x | x | – | |
| Adversarial Example | – | – | – | – | – | x | x |
Multiple taxonomies-part 2
| Interpretable | Explainable | Transparent | Intelligible | Refs. |
|---|---|---|---|---|
| Used interchangeably with explainable | Post-hoc explanations involve an auxiliary method after a model is trained. Self-explaining models generate local explanations that may not be directly interpretable | Not mentioned | A “directly interpretable" model, namely intrinsically understandable by most consumers |
Arya et al. ( |
| It is a domain-specific notion that does not allow a general-purpose definition. An interpretable ML model is constrained in model form so that it is either useful to someone, or obeys structural knowledge of the domain [...] | Possibly unreliable and misleading, explanations are not faithful to what the original model computes. Often, they do not make sense nor do they provide enough detail to understand what the black box is doing | Fully transparent models are allowed to understand their variables and the related correlations | Not mentioned. |
Rudin ( |
| It refers to the degree of human comprehensibility of a given black-box model or decision | It refers to the numerous ways of exchanging information about a phenomenon (a model’s functionality or the rationale and criteria for a decision) with multiple stakeholders | A model is transparent if its functionality can be comprehended in its entirety by a person | Not mentioned |
Mittelstadt et al. ( |
| It is a passive characteristic of a model referring to the level at which it makes sense for a human observer (also referred to as transparency) | Any action or procedure to clarify the internal model functions | As in Lipton, described by Simulability, Decomposability and Algorithmic Transparency | Not mentioned. Understandable is different from intelligible |
Chromik and Schuessler ( |
| It encompasses multiple concepts and definitions. Generally, it is associated with models with inherently interpretable behavior | It is intended as the generation of post-hoc explanations for black-box models | It is intended as an explanation of how the system works | Not mentioned |
Arrieta et al. ( |
| Assigning meaning to an explanation | Process of describing one or more facts, facilitating the understanding of said facts by a human consumer | Not mentioned | Not mentioned |
Palacio et al. ( |
| Assigning a subjective meaning to a model, object, or variable that is possible to be interpreted by the explainee | The activity of producing more interpretable objects manipulating symbolic information | Providing a clear representation of the black-box dynamics | Concerning the explainee, it is intended a successful consumption of an explanation |
Ciatto et al. ( |