| Literature DB >> 34819537 |
José Hernández-Orallo1,2, Bao Sheng Loe3, Lucy Cheke2,4, Fernando Martínez-Plumed5,6, Seán Ó hÉigeartaigh2,7.
Abstract
Success in all sorts of situations is the most classical interpretation of general intelligence. Under limited resources, however, the capability of an agent must necessarily be limited too, and generality needs to be understood as comprehensive performance up to a level of difficulty. The degree of generality then refers to the way an agent's capability is distributed as a function of task difficulty. This dissects the notion of general intelligence into two non-populational measures, generality and capability, which we apply to individuals and groups of humans, other animals and AI systems, on several cognitive and perceptual tests. Our results indicate that generality and capability can decouple at the individual level: very specialised agents can show high capability and vice versa. The metrics also decouple at the population level, and we rarely see diminishing returns in generality for those groups of high capability. We relate the individual measure of generality to traditional notions of general intelligence and cognitive efficiency in humans, collectives, non-human animals and machines. The choice of the difficulty function now plays a prominent role in this new conception of generality, which brings a quantitative tool for shedding light on long-standing questions about the evolution of general intelligence and the evaluation of progress in Artificial General Intelligence.Entities:
Year: 2021 PMID: 34819537 PMCID: PMC8613222 DOI: 10.1038/s41598-021-01997-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The role of difficulty when Agent Characteristic Curves (ACCs) are derived from a result matrix, and the new metrics represented in the spread vs capacity plot. Top: Response matrix R with cells , where rows represent the agents (also referred to as respondents or subjects) and columns represent the tasks (also referred to as items or instances, depending on the aggregation). The difficulty of each task (low, medium or high) is shown in parentheses just below the name of the task. While the subdomains may not be known, if we cluster some questions in a particular way (A B C D), we see that one agent () completely neglects subdomain D. Bottom: We show the responses of these two different agents corresponding in terms of difficulty (h) on the -axis. Left: . Right: . We also highlight how the two agents fare for the four domains in which the items are grouped. Having the same average result (0.625), which agent, or , is more general?.
Figure 2Agent characteristic curves (ACC), showing agent responses in terms of difficulty (h). The responses are shown as grey circles. The means of these responses for each difficulty are shown in blue, and connected to form an ACC. The plot shows the values for mean performance and its variance (Bernoulli: , when responses are binary), capability (), expected difficulty (), spread () and its reciprocal, generality (). The more compact the curve is, like a single steep fall, the higher the generality. Note that generality cannot be calculated from curve slopes. Top: a human (left, subject #22) and a machine (right, ‘repdiff’), both on 25 instances of the letter series problems as described in the text. The human has higher generality and lower capability than the machine. Bottom: an aggregation of human population (left) and an aggregation of Q-learning agents (right), both for 140 experiments from a benchmark comparing humans and simple reinforcement learning algorithms[29]. With similar capability, machines are more general.
Figure 3Representation of the two individuals in Fig. 1 in the space of spread versus capability. As spread is reciprocal to absolute generality, the lower the point the more general. The isometrics (dotted lines) represent the maximum normalised generality (, flat at the bottom, in green), the isometrics for a constant ACC (, in blue) and the isometrics for minimum normalised generality (, top, in red).
Figure 4Capability vs spread for the intrinsic difficulties scenarios. The space is characterised by three normalised generality isometrics shown as dotted curves: the maximum spread in red (complete abstrusity), the minimum spread in green (complete generality) and a middle case where results are independent of difficulty (constant ACC). Top left: Elithorn’s mazes. Results for all 496 participants. Top right: letter series for all participants (48 humans and 12 machines). Middle left: object recognition scenario using the intrinsic difficulty derived from the psychophysical parameters). Middle right 10 rats in the Odour Span Task, with difficulty being the number of scents to remember. Bottom: iris classification problem using 419 classifiers from study 7306 in OpenML (after classifiers with accuracy below 0.35 removed). Left, capability vs spread using KDN difficulty (). Right: using difficulty instead.
Several modalities for applying generality analysis depending on various situations and transformations. We show the input (size and types of the response matrix R and some other information), the minimum sizes (approximate rules of thumb in some cases or statistical analysis in others[43]), a short description of the approach and the output. The top row (case GA) is the canonical case, as it starts with a difficulty function . All the others (except the last one, included for completeness, and similar for PCA) derive the difficulty function (which jointly with the response matrix can be used as input for the first case, the GA).
| Case | Input (agents | Approach | Output | ||
|---|---|---|---|---|---|
| GA | Generality Analysis | ||||
| Opp | – | ||||
| ARef | |||||
| Rnk | |||||
| DRef | |||||
| IRT | |||||
| FA | Factor analysis | Loadings on factors ( |
Figure 5Capability and spread for several scenarios using the extrinsic difficulty transformations as per Table 1. First row: participants in the World Computer Chess Championship using the final score of the opponent as difficulty. Details in §A.6. Left: Reykjavik 2005 with 12 participants. The winner (Zappa) and the last one (Fute) won and lost all matches respectively except the one between them, which was surprisingly a draw. Right: Leiden 2015 with 9 participants. Here, no low-rank participant beat any high-rank participant, and draws were usually between participants with close scores. Accordingly, the average generality is higher in this case. Second row: ALE video games with 23 AI systems and the human reference on 45 games. Details in §A.7. Left: using the ARef transformation over a human reference. Right: using the Rnk transformation. Third row: 23 AI systems on 49 games. Details in §A.8. Left: each game has 5 variants and they are treated separately, as 245 instances. Right: all variants are aggregated together for each game (23 items). In both cases, we use the Rnk transformation. Bottom row left: physical cognition of 53 orangutans over five physical cognition tasks. Details in §A.9. Individuals represented by the names of the orangutans, with different group aggregates shown in colours. We use the Rnk transformation. Bottom row right: three different primate spaces for the Primate Cognition Test Battery[44]. Details in §A.10. We use the DRef transformation.
Summary of the scenarios and subjects seen in the paper. Difficulty modalities as per Table 1. The last two columns show the mean normalised generality and its correlation with capability.
| Scenario | Subjects | Difficulty | Mean ( | Corr ( |
|---|---|---|---|---|
| Elithorn’s mazes | Humans | Intrinsic | 0.31 | 0.03 |
| Letter series | Humans and machines | Intrinsic | 0.41 | 0.32 |
| Object recognition | Humans, macaques and machines | Intrinsic | 0.54 | 0.92 |
| Odour span task | Rats | Intrinsic | 0.28 | 0.16 |
| Iris (KDM) | Machines | Intrinsic | 0.75 | − 0.32 |
| Iris ( | Machines | Intrinsic | 0.84 | 0.11 |
| Chess (Reykjavik) | Machines | Opp | 0.73 | 0.09 |
| Chess (Leiden) | Machines | Opp | 0.41 | 0.32 |
| ALE video games | Machines (+ human reference) | ARef | 0.88 | − 0.01 |
| ALE video games | Machines and human | Rnk | 0.83 | − 0.17 |
| GVGAI games (agg.) | Machines | Rnk | 0.78 | 0.02 |
| Physical cognition | Orangutans | Rnk | 0.74 | 0.13 |
| PCTB | Humans, chimpanzees and orangutans | DRef | 0.91 | 0.14 |