%
Last update: Dec/2024
Get The Memo.
AGI definition & further reading
Definition
AGI = artificial general intelligence = a machine that performs at the level of an average (median) human.
ASI = artificial superintelligence = a machine that performs at the level of an expert human in practically any field.
I use a slightly stricter definition for AGI that includes the ability to act on the physical world via embodiment. I appreciate that there were some approaches on getting to AGI that fully bypass embodiment or robotics.
Artificial general intelligence (AGI) is a machine capable of understanding the world as well as—or better than—any human, in practically every field, including the ability to interact with the world via physical embodiment.
And the short version: ‘AGI is a machine which is as good or better than a human in every aspect’.
The world and acceptance of AGI
Why does AGI need physical embodiment?
A reader asks ‘Does AGI really need physical embodiment? Think about Stephen Hawking. Was it a knock that he couldn’t move his body? Was that a part of his human intelligence? I’d argue that is a clear no. GPT-4 and Gemini are already a kind of proto ASI they are just glitchy and strange. The only thing they need to be ASI is more agency and automation that’s the final bottleneck. Clearly not ironed out yet sure. Putting them in a Boston dynamic robot won’t make them any smarter, the same way if we gave Hawking a magic cure to move again he would just be the same intelligence. Intelligence and embodiment are just not correlated at all with these systems; it wouldn’t change how smart they are, only utility, which is not the same thing.’
Here are some additional considerations for this thought experiment:
1. The definition of intelligence is not fully agreed upon, but may include ‘the ability to learn or understand or to deal with new or trying situations.’ Would it be possible to deal with a new or trying physical situation without embodiment?
2. Hawking had the benefit of more than two decades of full embodiment, including access to all 5+ of his human senses, until ALS began to weaken his physical abilities in his 20s and 30s. Would he have been able to make big discoveries in gravitational and theoretical physics without falling over? Or being able to move pen and paper? Or playing with ball models?
3. All major IQ tests for under 18s include physical object manipulation (like blocks, toys, chips, cards and other manipulatives for fine motor skills of the hands and fingers). For Wechsler this is the WPPSI and WISC. For Stanford-Binet this is the SB-5 and the older Form L-M.
Some further reading for interest:
Paper: The necessity of embodiment (2019, PDF).
LessWrong: Embodiment is Indispensable for AGI (Jun/2022).
What is a median human?
More than a decade ago, the average human was a 28-year-old man from China: ‘He is Han Chinese so his ethnicity is Han. He is 28 years old. He is Christian. He speaks Mandarin. He does not have a car. He does not have a bank account.’ (NatGeo study cited by CBS, 2011).
By the way, the median American is much different (Read more via New Strategist on archive.org, 2011, and CNBC, 2018)
The median human in 2024-2025 may meet these dot points:
- A 30-year-old woman from India
- Works as a Product Manager (or in agriculture or medicine)
- Speaks 2 languages
- Will read 700 books in her lifetime
- Can recall roughly 7 items (working memory)
- Average SAT score around 1050/1600 (P50)
- Average IQ around 100 (P50)
- Can make a cup of coffee in a strange kitchen
- Can assemble IKEA furniture
- Cannot build a house
Ability | GPT-4 (2022) | Gemini (2023) | LLM + Robot (2024) | 2024 H2 model | 2025 model |
---|---|---|---|---|---|
Cognitive | |||||
Works as a Product Manager | ✅ | ✅ | ✅ | ||
Speaks 2 languages | ✅ | ✅ | ✅ | ||
Will read 700 books in her lifetime | ✅ | ✅ | ✅ | ||
Can recall roughly 7 items (working memory) | ✅ | ✅ | ✅ | ||
Average SAT score around 1050/1600 (P50) | ✅ | ✅ | ✅ | ||
Average IQ around 100 (P50) | ✅ | ✅ | ✅ | ||
Truthful: grounded in an accepted version of truth without confabulation or hallucination | ❌ | ❌ | ❌ | ||
Basic human abilities | |||||
See: can intepret images with vision | ✅ | ✅ | ✅ | ||
Hear: can detect tone in language, music | – | ✅ | ✅ | ||
Taste: can detect flavour | – | ❌ | ❌ | ||
Touch: can detect temperature, texture, pressure | – | ❌ | ❌ | ||
Smell: can detect fragrance | – | ❌ | ❌ | ||
Proprioception: awareness of where body parts are in space | – | ❌ | ✅ | ||
Embodiment (autonomous; not pre-programmed) | |||||
Can make a cup of coffee in a strange kitchen | – | – |
❌
|
||
Can assemble IKEA furniture | – | – |
❌
|
Who tf is Alan?
A fair question in the age of millions of newly-minted AI experts! Alan is the ‘secret weapon’ for many AI labs, companies, and governments. He started his AI journey in the early 1990s, developing AI chatbots in QBASIC, followed by a degree in Computer Science with Psychology. He spent a decade leading applied human intelligence research, including as Chairman of Mensa’s gifted families. He rejoined the artificial intelligence fold with the launch of GPT-3 in 2020. He’s also known for things like:
– Apple leveraging his research in their new AI model paper for September 2024, using it as the foundation for their visualizations.
– Microsoft, RAND, and the European Commission, a few of the 10,000+ clients reading the regular advisory, The Memo, a Substack ‘bestseller’ in 142 countries.
– Ernst & Young, Ray White, USAA, and major government agencies consistently book him as their go-to speaker and educator for artificial intelligence topics.
– The largest asset managers in the world, with trillions of dollars in assets under management, use Alan for ongoing AI advisory.
– Fortune 500 companies import his AI expertise for in-person consulting engagements.
– Brookings Institution and NBER frequently cite Alan’s research.
– NYU and other universities leverage visualizations and the popular Models Table, now detailing 400+ large language models.
– ‘What’s in my AI? (2022)‘ is said to be the most comprehensive analysis on GPT datasets, and was followed up by ‘What’s in GPT-5? (2024)‘.
– More than 5 million early adopters watched Leta AI, across 67 episodes. Leta was powered by GPT-3 175B, a frontier model 2½ years before ChatGPT was released.
And much more: LifeArchitect.ai/about-alan
Further reading
– Alan’s ASI checklist (first 50)
– AGI achieved internally: A re-written story
Milestones & justifications (most recent at top)
Date | Summary | Links |
---|---|---|
Dec/2024 | 88%: OpenAI o3 (reasoning model), new state-of-the-art frontier model.
GPQA Diamond=87.7% Fields Medalist Sir Timothy Gowers on the hundreds of questions in the FrontierMath benchmark (Nov/2024): |
Dr Noam Brown evals, ARC-AGI writeup (Note: While I recognize ARC’s contributions, this benchmark fails to capture the essential dynamics of AI performance.) |
Dec/2024 | INFO: Tesla Optimus humanoid walking on uneven ground. |
Source, YT video |
Dec/2024 | 84%: DeepMind introduces Genie 2 world model: ‘Genie 2 is the path to solving a structural problem of training embodied agents safely while achieving the breadth and generality required to progress towards AGI.’
(Sidenote: DeepMind has not demonstrated specific real-world use cases, but expect applications across diverse sectors to ‘instantly’ train humanoids in agriculture, mining, manufacturing and assembly, construction, supply chain operations and transport networks, healthcare assistance, home duties including making coffee, and anywhere else you can put a robot…) |
DeepMind blog |
Nov/2024 | INFO: US Govt recommends a secret ‘Manhattan Project’ to reach AGI: ‘The [U.S.-China Economic and Security Review] Commission recommends: Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability.’
(Sidenote: This should have begun 4½ years ago in May/2020 with the announcement of GPT-3, or within that period based on my media releases and reports to the UN: 1, 2, 3, 4) |
PDF (p10), Manhattan Project (wiki), Born secret (wiki) |
Nov/2024 | 83%: DeepMind AlphaQubit, a system that ‘accurately identifies errors inside quantum computers’. This brings the total DeepMind Alpha system count to 17 (seven announced in 2024), see LifeArchitect.ai/gemini-report/#alpha | DeepMind blog, Nature paper |
Nov/2024 | 83%: Context recall hits 100% in ‘needle in the haystack’ evals for models like Qwen2.5-Turbo 1M. |
Alibaba blog post, Models Table, related discussion by Steven Johnson at TheLongContext.com |
Nov/2024 | INFO: Asked: ‘What are you excited about in 2025? What’s to come?’ OpenAI CEO responds: ‘AGI.’ | Video |
Oct/2024 | 83%: Anthropic introduces ‘the first frontier AI model to offer computer use in public beta… direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text.’ Anthropic comments: ‘This example is representative of a lot of drudge work that people have to do.’ |
Anthropic announce, video, my analysis |
Oct/2024 | INFO: Microsoft CEO: ‘The autoencoder we use for GitHub Copilot is being optimized by o1. So think about the recursiveness of it, which is: We are using AI to build AI tools to build better AI. It’s just a new frontier.‘ | Video |
Oct/2024 | INFO: Anthropic CEO: ‘I think [powerful AI/AGI] could come as early as 2026…’ | Dario’s 15,000-word essay |
Oct/2024 | INFO: OpenAI suggests ‘you’ll be able to tell we’ve achieved AGI internally when we take down all the job listings.’ [tongue-in-cheek… maybe…] | Video, OpenAI careers (169 open jobs) |
Sep/2024 | 81%: DeepMind AlphaChip, a process in use since 2020 for ‘superhuman chip layouts’. This brings the total DeepMind Alpha system count to 16 (six announced in 2024), see LifeArchitect.ai/gemini-report/#alpha | DeepMind blog, Nature addendum |
Sep/2024 | 81%: OpenAI o1 (reasoning model) consistently scores 100% in all ALPrompts. These were hardened prompts designed for frontier models. I hadn’t expected the 2024 H2 version to be solved for a long time (prior to this, no LLM in Sep/2024 got a score of more than 2/5 for this prompt). I will be re-evaluating my life’s work… The model also hits the ‘uncontroversially correct’ ceilings on major benchmarks (GPQA Extended ceiling is 74%, MMLU ceiling is about 90%).
GPQA Diamond=78.3 Update: o1 completes new Dutch high school maths exam in 10 minutes, scores 100% (paper). And—although several earlier models achieved verbal-linguistic IQ (but not full-scale IQ) test results far above 98%—for the first time, o1 would officially pass the Mensa admission based on its LSAT score (95.6%, Mensa minimum 95%, Metaculus discussion.) Click to visualize the distance between o1 and other models on major benchmarks. Note that there is nowhere left to go at the top; AI has now hit the human-comprehensible ceiling across standardized testing for ‘smarts’: |
OpenAI o1 announce, evals, my o1 page, Models Table |
Sep/2024 | INFO: 1X NEO humanoid robot backed by OpenAI investment, behind the scenes video.
1X NEO was able to walk into a strange house, navigate available tools(?), and make a cup of [filter] coffee from scratch (the Woz AGI test): |
New S3 video 1 1/Sep/2024, New S3 video 2 8/Sep/2024, my video from a year ago May/2023 |
Jul/2024 | 76%: DeepMind AlphaProof and AlphaGeometry 2 [with Gemini] solve advanced reasoning problems in mathematics. ‘Today, we present AlphaProof, a new reinforcement-learning based system for formal math reasoning, and AlphaGeometry 2, an improved version of our geometry-solving system. Together, these systems solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving the same level as a silver medalist in the competition for the first time… AlphaProof [also solved] the hardest problem in the competition, solved by only five contestants at this year’s IMO.’
‘When people saw Sputnik in 1957, they might have had the same feeling I do now. Human civilization needs to move to high alert!’ — Professor Po-Shen Loh, national coach of the United States’ International Mathematical Olympiad team (26/Jul/2024) |
DeepMind blog, NYT analysis |
Jul/2024 | INFO: OpenAI internal discussions on AGI levels. OpenAI shared the new classification system with employees on Tuesday 9/Jul/2024 during an all-hands meeting. OpenAI “believes it is… on the cusp of reaching the second, which it calls ‘Reasoners.’ This refers to systems that can do basic problem-solving tasks as well as a human with a doctorate-level education who doesn’t have access to any tools. At the same meeting, company leadership gave a demonstration of a research project involving its GPT-4 AI model that OpenAI thinks shows some new skills that rise to human-like reasoning…” See comparison with DeepMind’s levels further down this page. | Bloomberg |
Jun/2024 | INFO: Adam Unikowsky, a former law clerk to Justice Antonin Scalia, has won eight Supreme Court cases as lead counsel, says: “Claude is fully capable of acting as a Supreme Court Justice right now…I frequently was more persuaded by Claude’s analysis than the Supreme Court’s… Claude works at least 5,000 times faster than humans do, while producing work of similar or better quality…” | Source, The Memo analysis |
Jun/2024 | 75%: Claude 3.5 Sonnet: New state-of-the-art model. MMLU=90.4 (5-shot CoT). GPQA=67.2 (maj32 + 5-shot). Scores 5/5 on ALPrompt 2024H1.
For the first time, a large language model has breached the 65% mark on GPQA, designed to be at the level of our smartest PhDs. ‘Regular’ PhDs score 34%, while in-domain specialized PhDs are at 65%. Claude 3 Sonnet scored 67.2% (maj32 + 5-shot). |
Model card, Announce, Models Table |
Jun/2024 | 74%: Harvard + Google TalkTuner: LLMs “have a ‘user model’, an internal representation of the person it is talking with… LLM-based chatbots appear to tailor their answers to user characteristics…” Tested using the internal representations of the small model LLaMa2Chat-13B. | Project page, paper Related to Anthropic paper Dec/2022 |
May/2024 | 74%: GPT-4o: Full multimodal Omnimodel with MMLU=88.7. GPQA=53.6. Note: (Amended) Based on MMMU vision benchmark score and new functionality, GPT-4o represents a minor update to the the AGI countdown. Full explanation. |
OpenAI, ELO rating, Dr Jim Fan analysis |
May/2024 | INFO: Olfaction (smell). New datasets and device. ‘We don’t just need more data: we need entirely new data modalities… Scent is a natural new frontier for this evolution. The oldest sense known to life on earth — tangible, physical, rooted in chemistry — is a vast untapped data source.’ | Osmo |
May/2024 | 73%: GPT-4 + Unitree Go1 quadruped robot = DrEureka (UPenn, NVIDIA, UT Austin) ‘We trained a robot dog to balance and walk on top of a yoga ball purely in simulation, and then transfer zero-shot to the real world… Frontier LLMs like GPT-4 have tons of built-in physical intuition for friction, damping, stiffness, gravity, etc. We are (mildly) surprised to find that DrEureka can tune these parameters competently and explain its reasoning well.’ | Repo + videos, Twitter, NewAtlas analysis |
Apr/2024 | 72%: Wu’s Method + AlphaGeometry outperforms gold medalists at IMO Geometry ‘combining AlphaGeometry with Wu’s method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist.’ | Paper |
Apr/2024 | 72%: The Declaration on AI Consciousness & the Bill of Rights for AI. | LifeArchitect.ai |
Mar/2024 | 72%: Embodiment: Figure 01 + GPT-4V + voice ‘OpenAI models provide high-level visual and language intelligence. Figure neural networks deliver fast, low-level, dexterous robot actions. Everything in this video is a neural network’: |
Video, Source, Explanation by Corey Lynch (Figure, ex-Google) |
Mar/2024 | 71%: Anthropic Claude 3 Opus. State-of-the-art frontier multimodal model for Mar/2024. Higher performance than GPT-4 across benchmarks. Percentage increases for Claude 3 over GPT-4: MMLU +0.46%, BIG-Bench Hard +4.36%, MATH +12.74%, HumanEval (code) +23.57%. Also has 1M+ context window (researchers only) and upcoming ‘advanced agentic capabilities’. | Announce, Paper (PDF), Models Table |
Feb/2024 | 70%: OpenAI Sora (‘sky’). Text-to-video diffusion transformer that can ‘understand and simulate the physical world in motion… solve problems that require real-world interaction.’ Two additional considerations:
|
Project page, technical report (html) |
Feb/2024 | 66%: Google DeepMind Gemini Pro 1.5 sparse MoE. ‘highly compute-efficient multimodal mixture-of-experts model… near-perfect recall on long-context retrieval tasks [1M-10M tokens] across modalities… matches or surpasses Gemini 1.0 Ultra’s state-of-the-art performance across a broad set of benchmarks.’ | Paper (PDF), Models Table |
Feb/2024 | 65%: Meta AI V-JEPA. ‘physical world model excels at detecting and understanding highly detailed interactions between objects.’ | Announce, paper |
Feb/2024 | 65%: Google Goose (Gemini) + Google Duckie chatbot: ‘descendant of Gemini… trained on the sum total of 25 years of engineering expertise at Google… can answer questions around Google-specific technologies, write code using internal tech stacks and supports novel capabilities such as editing code based on natural language prompts.’ See also: Rubber duck debugging (wiki). | BI |
Feb/2024 | 65%: Google DeepMind: OAIF: ‘online AI feedback (OAIF), uses an LLM as annotator… online DPO outperforms RLAIF and RLHF… reduced human annotation effort.’ | Paper |
Jan/2024 | 65%: Google uses Gemini to fix their code: ‘Instead of a software engineer spending an average of two hours to create each of these commits, the necessary patches are now automatically created in seconds [by Gemini].’ | PDF, The Memo |
Jan/2024 | 65%: DeepMind AlphaGeometry. Trained using 100% synthetic data, open source, ‘approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems… under human expert evaluation and discovers a generalized version of a translated IMO theorem…’ – Metaculus prediction of an open-source AI winning IMO Gold Medal in Jan/2028 closer to being achieved. Human crowd-sourced estimates about exponential growth may be becoming irrelevant. – DeepMind CEO Demis: ‘[AlphaGeometry is] Another step on the road to AGI.’ (Twitter) |
Paper, DeepMind blog, Author explanation (video) |
Jan/2024 | 64%: Embodiment: Figure 01 makes a coffee. ‘Learned this after watching humans make coffee… Video in, trajectories out.’ | |
Dec/2023 | 64%: DeepMind: LLMs can now produce new maths discoveries and solve real-world problems. DeepMind head of AI for science (14/Dec/2023 Guardian, MIT): ‘this is the first time that a genuine, new scientific discovery has been made by a large language model… It’s not in the training data—it wasn’t even known.’
Paper: ‘the first time a new discovery has been made for challenging open problems in science or mathematics using LLMs. FunSearch discovered new solutions… its solutions could potentially be slotted into a variety of real-world industrial systems to bring swift benefits… the power of these models [tested with Codey PaLM 2 340B] can be harnessed not only to produce new mathematical discoveries, but also to reveal potentially impactful solutions to important real-world problems.’ Sidenote: In Feb/2007, fellow Aussie Prof Terry Tao called the cap set question his ‘favorite open question’. In Jun/2023 Terry also said that LLMs would take another three years to reach this level of progress (‘2026-level AI… will be a trustworthy co-author in mathematical research’). Read more about exponential growth (wiki). |
Paper, explanation |
Dec/2023 | 61%: Embodiment: Tesla Optimus Gen 2 | Bloomberg |
Dec/2023 | 61%: LLMs for optimizing hyperparameters. ‘LLMs are a promising tool for improving efficiency in the traditional decision-making problem of hyperparameter optimization.’ | Paper |
Dec/2023 | 61%: Google Gemini Ultra breaks 90% mark for MMLU. Also has proper multimodality [inputs were text, image, audio, video; outputs are text, image]. For the first time, a large language model has breached the 90% mark on MMLU, designed to be very difficult for AI. Gemini Ultra scored 90.04%; average humans are at 34.5% (AGI) while expert humans are at 89.8% (ASI). GPT-4 was at 86.4%. Watch the Gemini demo video. | Annotated paper, Models Table |
Nov/2023 | INFO: The Q* maths arch AGI rumor is probably what we in Australia might call a ‘furphy’ (wiki) or a red herring. Here’s ChatGPT lead John Schulman talking about it seven years ago… in 2016. And now, back to our regularly scheduled programming. | YouTube (1h02m 57s) |
Oct/2023 | 56%: Boston Dynamics: More embodiment using Spot + ChatGPT + LLMs. | YouTube (3m7s) |
Oct/2023 | INFO: OpenAI CEO: ‘We define AGI as the thing we don’t have quite yet. There were a lot of people who would have—ten years ago [2013 compared to 2023]—said alright, if you can make something like GPT-4, GPT-5, that would have been an AGI… I think we’re getting close enough to whatever that AGI threshold is going to be.’ | WSJ 22/Oct /2023 YouTube (5m25s) |
Oct/2023 | INFO: Google VP and Fellow Blaise Agüera y Arcas says ‘AGI is already here’: ‘The most important parts of AGI have already been achieved by the current generation of advanced AI large language models… [2023’s] most advanced AI models have many flaws, but decades from now, they will be recognized as the first true examples of artificial general intelligence.’ | NOEMA |
Oct/2023 | INFO: Even more Gobi/GPT-5 rumors and analysis, Oct/2023. | Reddit (archive) |
Oct/2023 | 55%: Microsoft: ‘GPT-4 in our proof-of-concept experiments, is capable of writing code that can call itself to improve itself.’ | Paper (arxiv) |
Sep/2023 | INFO: OpenAI Gobi/GPT-5 rumors and analysis, early rumors from Sep/2023. | Shared Google Doc |
Sep/2023 | 55%: Harvard studies BCG consultants with GPT-4, ‘Consultants using [GPT-4] AI were significantly more productive (they completed 12.2% more tasks on average, and completed tasks 25.1% more quickly), and produced significantly higher quality results (more than 40% higher quality…)’ | Paper (SSRN) |
Sep/2023 | 55%: Google OPRO self-improves, ‘prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K [maths], and by up to 50% on Big-Bench Hard [IQ] tasks.’ | Paper (arxiv) |
Aug/2023 | 54%: GPT-4 scores in 99th percentile for Torrance Tests of Creative Thinking (wiki), questions by Scholastic Testing Service confirmed private/not part of training dataset. | Article |
Jul/2023 | 54%: Google DeepMind Robotics Transformer RT-2 (3x improvement over RT-1, 2x improvement on unseen scenarios to 62% avg. Progress towards Woz’s AGI coffee test.) | Project page |
Jul/2023 | 52%: Anthropic Claude 2: More HHH (TruthfulQA Claude 2=0.69 vs GPT-4=0.60) | Anthropic (PDF), Models Table |
Jul/2023 | 51%: Google DeepMind/ Princeton: Robots that ask for help (‘modeling uncertainty that can complement and scale with the growing capabilities of foundation models.’) | Project page |
Jul/2023 | 51%: Microsoft LongNet: 1B token sequence length (‘opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence.’) | Microsoft (arxiv) |
Jun/2023 | 50%: Google DeepMind RoboCat (‘autonomous improvement loop… RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.’) | DeepMind blog, Paper (PDF) |
Jun/2023 | 50%: Microsoft introduces monitor-guided decoding (MGD) (‘improves the ability of an LM to… generate identifiers that match the ground truth… improves compilation rates and agreement with ground truth.’) | Paper (arxiv) |
Jun/2023 | 50%: Ex-OpenAI consultant uses GPT-4 for embodied AI in chemistry (‘instructions, to robot actions, to synthesized molecule.’) | Paper (arxiv), notes |
Jun/2023 | 50%: Harvard introduces ‘inference-time intervention’ (ITI) (‘At a high level, we first identify a sparse set of attention heads with high linear probing accuracy for truthfulness. Then, during inference, we shift activations along these truth-correlated directions. We repeat the same intervention autoregressively until the whole answer is generated.’) | Harvard (arxiv) |
Jun/2023 | 49%: Google DeepMind trains an LLM (DIDACT) on iterative code in their 86TB code repository (‘the trained model can be used in a variety of surprising ways… by chaining together multiple predictions to roll out longer activity trajectories… we started with a blank file and asked the model to successively predict what edits would come next until it had written a full code file. The astonishing part is that the model developed code in a step-by-step way that would seem natural to a developer’) | Google Blog, Twitter |
May/2023 | 49%: Ability Robotics combines an LLM with their humanlike android (robot), Digit. | Agility Robotics (YouTube) |
May/2023 | 49%: PaLM 2 breaks 90% mark for WinoGrande. For the first time, a large language model has breached the 90% mark on WinoGrande, a ‘more challenging, adversarial’ version of Winograd, designed to be very difficult for AI. Fine-tuned PaLM 2 scored 90.9%; humans are at 94%. | PaLM 2 paper (PDF, Google), Models Table |
May/2023 | 49%: Robot + text-davinci-003 (‘…we show that LLMs can be directly used off-the-shelf to achieve generalization in robotics, leveraging the powerful summarization capabilities they have learned from vast amounts of text data.’). | Princeton/ Google/ others |
Apr/2023 | 48%: Boston Dynamics + ChatGPT (‘We integrated ChatGPT with our [Boston Dynamics Spot] robots.’). | Levatas |
Mar/2023 | 48%: Microsoft introduces TaskMatrix.ai (‘We illustrate how TaskMatrix.AI can perform tasks in the physical world by [LLMs] interacting with robots and IoT devices… All these cases have been implemented in practice… understand the environment with camera API, and transform user instructions to action APIs provided by robots… facilitate the handling of physical work with the assistance of robots and the construction of smart homes by connecting IoT devices…’). | Microsoft (arxiv) |
Mar/2023 | 48%: OpenAI introduces GPT-4, Microsoft research on record that GPT-4 is ‘early AGI’ (‘Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.’). Microsoft’s deleted original title of the paper was ‘First Contact With an AGI System’. Note that LLMs are still not embodied, and this countdown requires physical embodiment to get to 60%. |
Microsoft Research, Models Table |
Mar/2023 | 42%: Google introduces PaLM-E 562B (PaLM-Embodied. ‘PaLM-E can successfully plan over multiple stages based on visual and language input… successfully plan a long-horizon task…’). | Google, Models Table |
Feb/2023 | 41%: Microsoft used ChatGPT in robots, it self-improved (‘we were impressed by ChatGPT’s ability to make localized code improvements using only language feedback.’). | Microsoft |
Dec/2022 | 39%: Anthropic RL-CAI 52B trained by Reinforcement Learning from AI Feedback (RLAIF) (‘we have moved further away from reliance on human supervision, and closer to the possibility of a self-supervised approach to alignment’). | LifeArchitect.ai, Anthropic paper (PDF), Models Table |
Jul/2022 | 39%: NVIDIA’s Hopper (H100) circuits designed by AI (‘The latest NVIDIA Hopper GPU architecture has nearly 13,000 instances of AI-designed circuits’). | LifeArchitect.ai, NVIDIA |
May/2022 | 39%: DeepMind Gato is the first generalist agent, that can ‘play Atari, caption images, chat, stack blocks with a real robot arm, and much more’. | Paper, Watch Alan’s video about Gato, Models Table |
Jun/2021 | 31%: Google’s TPUv4 circuits designed by AI (‘allowing chip design to be performed by artificial agents with more experience than any human designer. Our method was used to design the next generation of Google’s artificial intelligence (AI) accelerators, and has the potential to save thousands of hours of human effort for each new generation. Finally, we believe that more powerful AI-designed hardware will fuel advances in AI, creating a symbiotic relationship between the two fields’). | LifeArchitect.ai, Nature, Venturebeat |
Feb/2021 | INFO: Olfaction (smell). An odor [fragrance] map achieves human-level odor description performance and generalizes to diverse odor-prediction tasks. | Manuscript Feb/2021 (PDF), Preprint Sep/2022, Science Aug/2023 |
Nov/2020 | 30%: GPT-3. Connor Leahy, Co-founder of EleutherAI, re-creator of GPT-2, creator of GPT-J & GPT-NeoX-20B, said about OpenAI GPT-3: ‘I think GPT-3 is artificial general intelligence, AGI. I think GPT-3 is as intelligent as a human. And I think that it is probably more intelligent than a human in a restricted way… in many ways it is more purely intelligent than humans are. I think humans are approximating what GPT-3 is doing, not vice versa.’ | Watch the video (timecode) |
Aug/2017 | 20%: Google Transformer leads to big changes for search, translation, and language models. | Read the launch in plain English. |
Earlier | 0 ➜ 10%: Foundational research by Prof Warren McCulloch, Prof Walter Pitts, & Prof Frank Rosenblatt (Perceptron), Dr Alan Turing & Prof John von Neumann (intelligent machinery), Prof Marvin Minsky, Prof John McCarthy, and many others (neural networks and beyond)… | Turing 1948: prepared by ‘Gabriel‘ |
Older AGI countdown graphs
AGI dates predicted based on this table (#predict)
Thanks to Dennis Xiloj. In Dec/2023, using the current milestones and percentages, GPT-4 now says AGI by 26/Jan/2025…
End of year update on #AGI, as usual using data from @dralandthompson conservative countdown to AGI. Are we accelerating? last one predicted april, now its january. pic.twitter.com/On5cYSLEgq
— Dennis Xiloj (@denjohx) December 26, 2023
Thanks to Dennis Xiloj. In Jun/2023, using the current milestones and percentages, GPT-4 says AGI by 18/Jul/2025…
As requested by @dralandthompson , here is the data updated from https://t.co/5zW33mS4VO, fitted to exponential growth. Seems we will reach AGI in july 2025? pic.twitter.com/Ei4CFwkBVz
— Dennis Xiloj (@denjohx) June 23, 2023
Thanks to The Memo reader BeginningInfluence55 for this more conservative version using polynomial regression. In Jul/2023, using the current milestones and percentages, this method says 100% AGI by Oct/2026…
A third analysis was provided by ‘SecretMan’ in Oct/2023, with this chart showing 100% AGI by Jul/2026…
Key milestones 50–80%
Key milestones 50–80%
– Around 50%: HHH: Helpful, honest, harmless as articulated by Anthropic, with a focus on groundedness and truthfulness. Mustafa Suleyman is the Co-founder of DeepMind, and Founder of Inflection AI (pi.ai), and says: ‘LLM hallucinations will be largely eliminated by 2025’.
LLM hallucinations will be largely eliminated by 2025.
that’s a huge deal. the implications are far more profound than the threat of the models getting things a bit wrong today.
— Mustafa Suleyman (@mustafasuleyman) June 9, 2023
– Around 60%: Physical embodiment backed by a large language model. The AI is autonomous, and can move and manipulate. Current options include:
- OpenAI’s 1X (formerly Halodi Robotics) EVE (wheeled) and NEO (bipedal).
- Sanctuary AI Phoenix.
- Agility Digit.
- Figure 01.
- Tesla Bot.
- Microsoft Autonomous Systems and Robotics Group.
- Google Robotics including the 2023 consolidation of Everyday Robots.
- …and more.
See related page: Humanoid robots ready for LLMs.
– Around 80%: Passes Steve Wozniak’s test of AGI: can walk into a strange house, navigate available tools, and make a cup of coffee from scratch (video with timecode).
AGI levels
Want the text of this viz? Upload the image above to a frontier vision model like GPT-4o or Claude 3.5 and ask for your desired output (text, Markdown, HTML, Bootstrap table, CSV, XLS…)
Where will AGI be born?
Viz updated 18/Dec/2024
Download PDF
Older stuff
Dr Demis Hassabis, Google DeepMind founder, former child prodigy:
Suddenly the nature of money even changes… I don’t know if company constructs would even be the right thing to think about… We don’t want to have to wait till the eve before AGI happens… we should be preparing for that now. (24/Feb/2024)
Download source (PDF)
Permissions: Yes, you can use these visualizations anywhere, please leave the citation intact.
Related videos
Cite
Thompson, A. D. (2024). Alan’s conservative countdown to AGI. LifeArchitect.ai. https://lifearchitect.ai/AGI/
Get The Memo
by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.Informs research at Apple, Google, Microsoft · Bestseller in 142 countries.
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.
Dr Alan D. Thompson is an AI expert and consultant, advising Fortune 500s and governments on post-2020 large language models. His work on artificial intelligence has been featured at NYU, with Microsoft AI and Google AI teams, at the University of Oxford’s 2021 debate on AI Ethics, and in the Leta AI (GPT-3) experiments viewed more than 5 million times. A contributor to the fields of human intelligence and peak performance, he has held positions as chairman for Mensa International, consultant to GE and Warner Bros, and memberships with the IEEE and IET. Technical highlights.
This page last updated: 22/Dec/2024. https://lifearchitect.ai/agi/↑
- 1
- 2