Integrated AI: The sky is steadfast (2024 AI retrospective)

The image above was generated by AI for this paper (Google Imagen 3-002)1Image generated in a few seconds, on 19 December 2024, text prompt by Alan D. Thompson, via Google Imagen 3-002: ‘Beautiful sky, Australian rural setting, in the style of Kazuo Oga and Atey Ghailan, rolling hills, sheep and wallabies and a small cottage, cinematic lighting, otherworldly colors.’

Alan D. Thompson
December 2024

Download original article with all footnotes/references Integrated AI: The sky is steadfast (2024 AI retrospective) (PDF)

Watch the video version of this paper at: (coming soon)

All reports in The sky is... AI retrospective report series (most recent at top)
Date Report title
Dec/2024 The sky is steadfast
Jun/2024 The sky is quickening
Dec/2023 The sky is comforting
Jun/2023 The sky is entrancing
Dec/2022 The sky is infinite
Jun/2022 The sky is bigger than we imagine
Dec/2021 The sky is on fire

‘Some modest fraction of Upwork tasks can now be done [by AI] with a handful of electrons. Suppose everyone has an agent like this they can hire. Suppose everyone has 1,000 agents like this they can hire… What does one do in a world like this?… If globalization is the metaphor, and [AI] can just write all software, is San Francisco the new Detroit?’
— Daniel Gross, Safe Superintelligence Inc founder (2024)2https://dcgross.com/agitrades

Author’s note: I advise family offices, major governments, research teams like RAND, and companies like Microsoft via The Memo: LifeArchitect.ai/memo.

Artificial intelligence progress in 2024 was steady and reliable, and the exponential curve continued. Looking back on the second half of this year, humanity was presented with some massive advances in the area of AI models, most of them offered for free.

I anticipate 2025 bringing more agents, more humanoids, and more innovation. These advances will reveal artificial general intelligence (a machine performing at the level of an average human across fields), followed closely by artificial superintelligence (a system performing beyond the level of our smartest humans).

Given the firehose of information over the last six months, this report is designed to be as succinct as possible. Let’s jump in!

The BIG Stuff

On average, 20 major new large language models were released each month, or a new model every 38 hours. 

An average of 10,667 derivative models were released monthly on Hugging Face, or a new model every 4.3 minutes.3See my working


Chart. LLMs released per month (Dec/2024). https://lifearchitect.ai/models/  

One of these models is Meta AI’s Llama. The Llama family of models reached 600 million downloads4 ‘Llama models have been downloaded over 600 million times on Hugging Face alone, making Llama the leading open source model family.’ https://www.llama.com/ no archive available. on Hugging Face. That’s one model download for each person in the US, UK, Canada, Australia, and Japan combined.

The first song generated entirely by AI (Udio), titled ‘Verknallt in einen Talahon’, entered the Top 50 most listened to songs in Germany. Created by Josua Waghubinger, aka Butterbro, the song rose to #48 on the German charts.5rnd.de and charts.de and theguardian.com and reddit.com With 1.8 million YouTube views to the end of 2024, you can listen here:  https://youtu.be/1EbSpT5weWE

Companies like EY quantified gains of up to 14 hours a week in time saved by using AI.6Microsoft 1, 2

The first human-led companies began failing due to AI-led business. In Nov/2024, online education platform Chegg saw its stock drop by 99% or US$14.5B market value.7https://archive.md/MFNSu Despite attempting partnerships with OpenAI and Scale AI, Chegg may become one of the first major victims of the AI revolution.

Agentic AI—AI that can take action autonomously—hit US$31B, with a hilariously low revenue forecast of US$368B by 2033.8emergenresearch.com I look forward to us hitting this number just a little earlier (maybe 2025).

AI-driven humanoids raced forward, mainly propelled by China, but with some decent progress in the US by labs like Tesla9https://lifearchitect.substack.com/p/the-memo-16dec2024 and Google,10https://arxiv.org/abs/2407.07775v1 and with a more sensible estimate of U$24 trillion in revenue by 2030.11ark-invest.com

Brain-machine interfaces showed interesting progress, with Neuralink’s second participant in the PRIME Study, Alex, playing the game Counter-Strike via thought.12https://youtu.be/X7OpjB_8sHQ

Large language models

My Models Table now lists close to 500 major large language models, without a focus on derivatives or Chinese models. Here’s how that looks in a treemap chart:


Download source (PDF)
Permissions: Yes, you can use these visualizations anywhere, please leave the citation intact.

In September 2024, OpenAI released an extended reasoning model that ‘thinks’ for seconds or minutes before responding. The state-of-the-art model, o1, consistently scored 100% in all ALPrompts, and would qualify for Mensa admission based on its 2024 LSAT test score.13https://lifearchitect.substack.com/p/the-memo-1oct2024 Dr Dan Hendrycks, creator of the popular MMLU and MATH tests told Reuters that o1 ‘destroyed the most popular reasoning benchmarks… They’re now crushed.’14https://www.reuters.com/technology/artificial-intelligence/ai-experts-ready-humanitys-last-exam-stump-powerful-tech-2024-09-16/ The early announcement of o3 in December 2024 took this even further, boasting performance in the 99.8th percentile of competition-level software development.

Several new models added modalities beyond text. In a single pretrained model, the December 2024 release of Google Gemini 2.0 Flash expanded to:

  • Input: text, images, video, audio
  • Output: text, images, text-to-speech audio

I anticipate these modalities branching out even further, allowing models to leverage unusual information streams like olfactory data, motion tracking, thermal and infrared imaging, bioelectrical signals, geospatial data, and every other available signal format, including both human-detectable and imperceptible forms of data.


Chart. GPT modalities (2024). https://lifearchitect.ai/GPT-5/ 

Datasets

There were a number of fascinating new datasets created in the second half of 2024. 

China

While the media focuses on the big 5—Google DeepMind, OpenAI, Anthropic, Meta AI, xAI—dozens of major AI labs in China continue pumping out incredible AI models. As a former permanent resident of China, I make sure to keep tabs on the region, as its scientists and model releases regularly outperform models from the West Coast of the US. Popular models in the second half of 2024 include:22https://lifearchitect.ai/models-table/

  • 01-ai Yi
  • Alibaba Qwen 72B
  • Alibaba QwQ 32B (a reasoning model like o1)
  • Baidu ERNIE 4.0 Turbo
  • DeepSeek V2.5 236B
  • SenseTime SenseNova 5.5
  • StepFun Step-2 1T
  • Tencent Hunyuan 389B

My visualization of likely AGI birthplaces is regularly updated based on AI model releases as well as training hardware capability. Google DeepMind is in an interesting position, with exclusive access to hundreds of thousands of their own TPUs. And while xAI isn’t leading the race just yet, I expect them to pull ahead very soon due to their Colossus supercomputer now boasting ‘at least one million graphics processing units (GPUs)’.23https://www.reuters.com/technology/artificial-intelligence/musks-xai-plans-massive-expansion-ai-supercomputer-memphis-2024-12-04/


Chart. Where will AGI be born? (Dec/2024). https://lifearchitect.ai/AGI/

Performance

With the proliferation of new optimizations like inference-time compute, where models spend time reasoning through problems before responding, my ‘Billboard chart for language models’ based on model size is now redundant. Bigger models (more parameters) are not better models (higher performance). Instead, here’s a look at the top models by MMLU and GPQA scores. Please note that the ceilings on both tests are about 90%.24MMLU Jun/2024: ‘…more than 9% of the examples are incorrect, suggesting a substantial presence of errors in the MMLU.’ https://arxiv.org/pdf/2406.04127#page=5 & GPQA ‘we estimate the proportion of questions that have uncontroversially correct answers at 74% on GPQA Extended.’ https://arxiv.org/pdf/2311.12022#page=8

# Model MMLU # Model GPQA
1 o1 92.3 1 o3 87.7
2 Claude 3.5S new 90.5 2 o1 79
3 Hunyuan-Large 89.9 3 QwQ-32B 65.2
4 Human expert 89.8 4 Claude 3.5S new 65
5 Llama 3.1 405B 88.6 5 Gemini 2.0 Flash 62.1
6 Grok-2 87.5 6 DeepSeek-R1-Lite 58.5
7 Gemini 2.0 Flash 87 7 Phi-4 14B 56.1
8 Claude 3 Opus 86.8 8 Grok-2 56
9
10 13 Human with PhD 34

Table: AI rankings via MMLU and GPQA to Dec/2024. LifeArchitect.ai/models-table

Progressing to AGI and ASI

For several years now, I’ve documented the progress of AI working its way towards performing at the level of an average (median) human across practically all fields. This is also known as artificial general intelligence (AGI). The ‘conservative countdown to AGI’ is one of my most-visited online resources. In Dec/2023, we were at 64%. Now, in Dec/2024, we are already at 88%.25https://lifearchitect.ai/agi/

A couple of months ago, The Memo subscribers26https://lifearchitect.ai/memo/ were introduced to my newest countdown: ‘Alan’s ASI checklist or: What to expect when you’re expecting artificial superintelligence.’27https://lifearchitect.ai/asi/ ASI is defined as a system whose intelligence surpasses that of the brightest and most gifted human minds.

The checklist currently sits at 0/50, and no movement is anticipated until we reach 100% AGI.

How I’m preparing for AGI and ASI

In my mid-2024 report, I talked about Avital Balwit—25-year-old Rhodes Scholar, Future of Humanity Institute researcher, and current Chief of Staff to the CEO at Anthropic. Avital is already counting down to her own retirement by 2027.28The original article said ‘three years’ https://archive.md/80Ssy and this has since been changed to ‘five years’ palladiummag.com

I am 25. These next three years might be the last few years that I work. I am not ill, nor am I becoming a stay-at-home mom, nor have I been so financially fortunate to be on the brink of voluntary retirement. I stand at the edge of a technological development that seems likely, should it arrive, to end employment as I know it…

The economically and politically relevant comparison on most tasks is not whether the language model is better than the best human, it is whether they are better than the human who would otherwise do that task… 

The shared goal of the field of artificial intelligence is to create a system that can do anything. I expect us to soon reach it.

Throughout 2024, I’ve been doing my best to prepare for the advent of AI that performs at or above human level across most fields, and the likely impact to the world. There’s not actually that much we can do to get ready, except wait. Here’s a quick look at what I’m focusing on, noting that this is not advice for anyone except myself:

  1. Stay alive. For me, this means regular gym, healthy eating, and my usual no alcohol/smoking/etc. As a sidenote that I articulated in The Memo edition 1/May/2024,29https://lifearchitect.substack.com/p/the-memo-1may2024 for some of my peers, there is a very real risk that those designing AI may need to take extra security measures to stay alive. I do know that neural network pioneer Prof Geoffrey Hinton is ‘tidying up his affairs’,30https://www.youtubetranscript.com/?v=UvvdFZkhhqE&t=1415 but I don’t know what that looks like specifically.
  2. Focus on real friends and family. AI will enhance our ability to genuinely commune with each other and put ourselves in others’ shoes far more than social media or cell phones have done. But for now, basking in the humanness of real-life social connections is a priority for me.
  3. Enjoy interests. From my decades-old hobby of fragrances, to new interests like archery, ‘being human’ might just be on its last legs, as we move into an entirely new era. I enjoy exploring and discovering things that I may not be very good at, and immersing myself in them for the sake of enjoyment.
  4. Live in a technology-centric region. Early distribution of AI will depend on location (among other things). We’ve seen how many of the latest AI models have been banned from countries across the EU. Getting access to humanoids will be a priority for many, and I expect that megacities like Shanghai and Los Angeles will be among the first destinations to receive these.
  5. Don’t purchase property. More than a decade ago, I published an article called ‘Why I’ll never buy a house’.31https://lifearchitect.ai/articles/property/ While the reasons then were unrelated to AGI, sitting in the early stages of the Singularity further cements this position. If we leapfrog just a few years into the future, the optimizations afforded by AGI will bring more luxury, effectiveness, and freedom in where and how we live. Slaving away now for a brick-and-mortar house (not to mention the immorality of high-interest mortgages) will look absurd very soon.
  6. Keep informed. The progress of AI is exponential, not linear. While it’s not possible to read every AI paper (a new one is still published about every eight minutes),32https://lifearchitect.ai/the-sky-is-comforting/ or download every model (a new one is deployed every four minutes), my full-time ‘job’ remains tracking and analyzing the major milestones in the lead-up to full artificial superintelligence. And the best way to keep informed is still to join the world’s leading organisations—from Alphabet to Yandex, and many others—at The Memo: LifeArchitect.ai/memo.

We continue to live through the most exciting time in the history of humanity. We are sitting in the early stages of the technological singularity. Progress is increasing exponentially, and the improvements are visceral. You can rely on AI technology continuing this trend through 2025.

No matter how many people scream that it’s falling, and no matter how many old people yell at its clouds, the sky remains resolute. It’s there when we wake: enveloping us, protecting us, and bringing a new dawn. The sky is steadfast.


This paper has a related video at: [coming soon]

References, Further Reading, and How to Cite

To cite this paper:
Thompson, A. D. (2024). Integrated AI: The sky is steadfast (2024 AI retrospective).
https://lifearchitect.ai/sky-is-steadfast/

The previous paper in this series was:
Thompson, A. D. (2024). Integrated AI: The sky is quickening (mid-2024 AI retrospective).
https://lifearchitect.ai/sky-is-quickening/

Further reading

For brevity and readability, footnotes were used in this paper, rather than in-text citations. Additional reference papers are listed below, or please see http://lifearchitect.ai/papers for the major foundational papers in the large language model space.

Models Table
https://lifearchitect.ai/models-table/

Model: o1
https://openai.com/index/openai-o1-system-card/

Model: o3
https://lifearchitect.ai/o3/

Model: Gemini 2.0
https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/

Model: phi-4
phi-4

Benchmark: MMLU (2020)
https://arxiv.org/abs/2009.03300

Benchmark: MMLU-Pro (2024)
https://arxiv.org/abs/2406.01574

Benchmark: GPQA (2023)
https://arxiv.org/abs/2311.12022


Get The Memo

by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.
Informs research at Apple, Google, Microsoft · Bestseller in 142 countries.
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.

Dr Alan D. Thompson is an AI expert and consultant, advising Fortune 500s and governments on post-2020 large language models. His work on artificial intelligence has been featured at NYU, with Microsoft AI and Google AI teams, at the University of Oxford’s 2021 debate on AI Ethics, and in the Leta AI (GPT-3) experiments viewed more than 5 million times. A contributor to the fields of human intelligence and peak performance, he has held positions as chairman for Mensa International, consultant to GE and Warner Bros, and memberships with the IEEE and IET. Technical highlights.

This page last updated: 21/Dec/2024. https://lifearchitect.ai/the-sky-is-steadfast/