Last update: Sep/2023
Get notified about this countdown’s progress, joining full subscribers from major government (USA, Singapore, Malta…), enterprise (DeepMind, Johnson & Johnson, Lockheed Martin…), and research institutions (RAND, HBS, Berkeley…). Get The Memo.
Milestones & justifications (most recent at top)
|Sep/2023||55%: OpenAI Gobi/GPT-5 leaks and analysis, early rumors from Sep/2023.||Shared Google Doc|
|Sep/2023||55%: Harvard studies BCG consultants with GPT-4, ‘Consultants using [GPT-4] AI were significantly more productive (they completed 12.2% more tasks on average, and completed tasks 25.1% more quickly), and produced significantly higher quality results (more than 40% higher quality…)’||Paper (SSRN)|
|Sep/2023||55%: Google OPRO self-improves, ‘prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K [maths], and by up to 50% on Big-Bench Hard [IQ] tasks.’||Paper (arxiv)|
|Aug/2023||54%: GPT-4 scores in 99th percentile for Torrance Tests of Creative Thinking (wiki), questions by Scholastic Testing Service confirmed private/not part of training dataset.||Article|
|Jul/2023||54%: Google DeepMind Robotics Transformer RT-2 (3x improvement over RT-1, 2x improvement on unseen scenarios to 62% avg. Progress towards Woz’s AGI coffee test.)||Project page|
|Jul/2023||52%: Anthropic Claude 2: More HHH (TruthfulQA Claude 2=0.69 vs GPT-4=0.60)||Anthropic (PDF)|
|Jul/2023||51%: Google DeepMind/Princeton: Robots that ask for help (‘modeling uncertainty that can complement and scale with the growing capabilities of foundation models.’)||Project page|
|Jul/2023||51%: Microsoft LongNet: 1B token sequence length (‘opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence.’)||Microsoft (arxiv)|
|Jun/2023||50%: Google DeepMind RoboCat (‘autonomous improvement loop…RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.’)||DeepMind blog, Paper (PDF)|
|Jun/2023||50%: Microsoft introduces monitor-guided decoding (MGD) (‘improves the ability of an LM to… generate identifiers that match the ground truth… improves compilation rates and agreement with ground truth.’)||Paper (arxiv)|
|Jun/2023||50%: Ex-OpenAI consultant uses GPT-4 for embodied AI in chemistry (‘instructions, to robot actions, to synthesized molecule.’)||Paper (arxiv), notes|
|Jun/2023||50%: Harvard introduces ‘inference-time intervention’ (ITI) (‘At a high level, we first identify a sparse set of attention heads with high linear probing accuracy for truthfulness. Then, during inference, we shift activations along these truth-correlated directions. We repeat the same intervention autoregressively until the whole answer is generated.’)||Harvard (arxiv)|
|Jun/2023||49%: Google DeepMind trains an LLM (DIDACT) on iterative code in their 86TB code repository (‘the trained model can be used in a variety of surprising ways… by chaining together multiple predictions to roll out longer activity trajectories… we started with a blank file and asked the model to successively predict what edits would come next until it had written a full code file. The astonishing part is that the model developed code in a step-by-step way that would seem natural to a developer’)||Google Blog, Twitter|
|May/2023||49%: Ability Robotics combines an LLM with their humanlike android (robot), Digit.||Agility Robotics (YouTube)|
|May/2023||49%: PaLM 2 breaks 90% mark for WinoGrande. For the first time, a large language model has breached the 90% mark on WinoGrande, a ‘more challenging, adversarial’ version of Winograd, designed to be very difficult for AI. Fine-tuned PaLM 2 scored 90.9%; humans are at 94%.||PaLM 2 paper (PDF, Google)|
|May/2023||49%: Robot + text-davinci-003 (‘…we show that LLMs can be directly used off-the-shelf to achieve generalization in robotics, leveraging the powerful summarization capabilities they have learned from vast amounts of text data.’).||Princeton/Google/others|
|Apr/2023||48%: Boston Dynamics + ChatGPT (‘We integrated ChatGPT with our [Boston Dynamics Spot] robots.’).||Levatas|
|Mar/2023||48%: Microsoft introduces TaskMatrix.ai (‘We illustrate how TaskMatrix.AI can perform tasks in the physical world by [LLMs] interacting with robots and IoT devices… All these cases have been implemented in practice… understand the environment with camera API, and transform user instructions to action APIs provided by robots… facilitate the handling of physical work with the assistance of robots and the construction of smart homes by connecting IoT devices…’).||Microsoft (arxiv)|
|Mar/2023||48%: OpenAI introduces GPT-4, Microsoft research on record that GPT-4 is ‘early AGI’ (‘Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.’).
Microsoft’s deleted original title of the paper was ‘First Contact With an AGI System’.
Note that LLMs are still not embodied, and this countdown requires physical embodiment to get to 60%.
|Mar/2023||42%: Google introduces PaLM-E 562B (PaLM-Embodied. ‘PaLM-E can successfully plan over multiple stages based on visual and language input… successfully plan a long-horizon task…’).|
|Feb/2023||41%: Microsoft used ChatGPT in robots, it self-improved (‘we were impressed by ChatGPT’s ability to make localized code improvements using only language feedback.’).||Microsoft|
|Dec/2022||39%: Anthropic RL-CAI 52B trained by Reinforcement Learning from AI Feedback (RLAIF) (‘we have moved further away from reliance on human supervision, and closer to the possibility of a self-supervised approach to alignment’).||LifeArchitect.ai, Anthropic paper (PDF)|
|Jul/2022||39%: NVIDIA’s Hopper (H100) circuits designed by AI (‘The latest NVIDIA Hopper GPU architecture has nearly 13,000 instances of AI-designed circuits’).||LifeArchitect.ai, NVIDIA|
|May/2022||39%: DeepMind Gato is the first generalist agent, that can “play Atari, caption images, chat, stack blocks with a real robot arm, and much more”.||Watch Alan’s video about Gato.|
|Jun/2021||31% Google’s TPUv4 circuits designed by AI (‘allowing chip design to be performed by artificial agents with more experience than any human designer. Our method was used to design the next generation of Google’s artificial intelligence (AI) accelerators, and has the potential to save thousands of hours of human effort for each new generation. Finally, we believe that more powerful AI-designed hardware will fuel advances in AI, creating a symbiotic relationship between the two fields’).||LifeArchitect.ai, Nature, Venturebeat|
|Nov/2020||30%: Connor Leahy, Co-founder of EleutherAI, re-creator of GPT-2, creator of GPT-J & GPT-NeoX-20B, said about OpenAI GPT-3: “I think GPT-3 is artificial general intelligence, AGI. I think GPT-3 is as intelligent as a human. And I think that it is probably more intelligent than a human in a restricted way… in many ways it is more purely intelligent than humans are. I think humans are approximating what GPT-3 is doing, not vice versa.”||Watch the video (timecode)|
|Aug/2017||20%: Google Transformer leads to big changes for search, translation, and language models.||Read the launch in plain English.|
AGI dates predicted based on this table
Thanks to Dennis Xiloj. In Jun/2023, using the current milestones and percentages, GPT-4 says AGI by 18/Jul/2025…
— Dennis Xiloj (@denjohx) June 23, 2023
Thanks to The Memo reader BeginningInfluence55 for this more conservative version using polynomial regression. In Jul/2023, using the current milestones and percentages, this method says 80% probability of hitting AGI by Aug/2025…
– Around 50%: HHH: Helpful, honest, harmless as articulated by Anthropic, with a focus on groundedness and truthfulness. Mustafa Suleyman is the Co-founder of Google DeepMind, and Founder of Inflection AI (heypi.com), and says: ‘LLM hallucinations will be largely eliminated by 2025’.
LLM hallucinations will be largely eliminated by 2025.
that’s a huge deal. the implications are far more profound than the threat of the models getting things a bit wrong today.
— Mustafa Suleyman (@mustafasuleyman) June 9, 2023
– Around 60%: Physical embodiment backed by a large language model. The AI is autonomous, and can move and manipulate. Current options include:
- OpenAI’s 1X (formerly Halodi Robotics) EVE (wheeled) and NEO (bipedal).
- Sanctuary AI Phoenix.
- Agility Digit.
- Figure 01.
- Tesla Bot.
- Microsoft Autonomous Systems and Robotics Group.
- Google Robotics including the 2023 consolidation of Everyday Robots.
- …and more.
– Around 80%: Passes Steve Wozniak’s test of AGI: can walk into a strange house, navigate available tools, and make a cup of coffee from scratch (video with timecode).
Where will AGI be born?
Dr Alan D. Thompson is an AI expert and consultant, advising Fortune 500s and governments on post-2020 large language models. His work on artificial intelligence has been featured at NYU, with Microsoft AI and Google AI teams, at the University of Oxford’s 2021 debate on AI Ethics, and in the Leta AI (GPT-3) experiments viewed more than 3.5 million times. A contributor to the fields of human intelligence and peak performance, he has held positions as chairman for Mensa International, consultant to GE and Warner Bros, and memberships with the IEEE and IET. He is open to consulting and advisory on major AI projects with intergovernmental organizations and enterprise.
This page last updated: 29/Sep/2023. https://lifearchitect.ai/agi/↑