Informing leadership at Google & Microsoft, as well as decision makers in major governments, Alan’s monthly analysis, The Memo, is a ‘bestseller’ in 142 countries:
Get The Memo.



Organization OpenAI
Model name GPT-5
Internal/project name Gobi, Arrakis (GPT-4.5),
Model type Multimodal
Parameter count Alan expects: 2T-5T (2,000B-5,000B)
Based on:
i. doubling of compute power (10,000 ➜ 25,000 NVIDIA A100 GPUs with some H100s).
ii. doubling of training time (~3 months ➜ ~ 4-6 months)
Dataset size (tokens) Alan expects: 40T-100T (around 80TB-200TB). See also: Gemini
Training data end date Alan expects: Dec/2022
Training start date Alan expects: Dec/2023
Training end/convergence date Alan expects: Apr/2024
Training time (total) See working, with sources.
Release date (public) Alan expects: Aug/2024

2024 optimal LLM highlights

Download source (PDF)
Permissions: Yes, you can use these visualizations anywhere, please leave the citation intact.

Interview about GPT-4 and GPT-5

GPT-5 Updates

31/Jan/2024: The Memo: Exclusive: GPT-5 and gold datasets (31/Jan/2024)

When raising a child prodigy, should we provide more learning and experiences or higher-quality learning and experiences?

When training frontier models like GPT-5, should we use more data or higher-quality data?

In Jun/2021, I published a paper called ‘Integrated AI: Dataset quality vs quantity via bonum (GPT-4 and beyond)’. It explored high-quality data aligned with ‘the ultimate good’ (in Latin, this is ‘summum bonum’).

OpenAI’s CEO recently spoke at a number of big venues including the 54th annual meeting of the World Economic Forum (WEF) at Davos-Klosters, Switzerland from 15th to 19th January 2024. He was recorded as making a very interesting comment:

As models become smarter and better at reasoning, we need less training data. For example, no one needs to read 2000 biology textbooks; you only need a small portion of extremely high-quality data and to deeply think and chew over it. The models will work harder on thinking through a small portion of known high-quality data. (Reddit, not verbatim, 22/Jan/2024)

One researcher (22/Jan/2024) similarly notes:

…potentially ‘infinity efficient’ because they may be one-time costs to create. Depending on the details, you may simply create them once and then never again. For example, in ‘AlphaGeometry’, it seems likely that for most problems there’s going to be one and only one best & shortest proof, and that any search process would converge upon it quickly, and now you can just train all future geometry models on that ideal proof. Similarly, in chess or Go I expect that in the overwhelming majority of positions (even excluding the opening book & endgame databases), the best move is known and the engines aren’t going to change the choice no matter how long you run them. ‘Gold datasets’ may be a good moat.

For text training, we’ve now hit massive datasets like the 125TB (30 trillion token) RedPajama-Data-v2, and I continue to track the other highlights on the Datasets Table.

Nearly three years after my data quality paper, are we finally on the way to higher quality (and perhaps temporarily smaller) datasets rather than ‘more is better’?

Explore more in my Mar/2022 comprehensive analysis of datasets, ‘What’s in my AI?’.

18/Jan/2024: OpenAI CEO: ‘GPT-2 was very bad. GPT-3 was pretty bad. GPT-4 was pretty bad. But GPT-5 would be okay.’ (Korean media)

GPT5 is much smarter (than previous models) and will offer more features. It adds inference capabilities, which is an important advance in its general-purpose ability to process tasks on behalf of users. Since people love ChatGPT’s voice feature, much better audio will be provided.

If I had to pick one thing, the writing would be greatly improved.

If you hold the iPhone released in 2007 in one hand and the (latest model) iPhone 15 in the other, you see two very different devices. I believe the same thing is true about AI.

12/Jan/2024: OpenAI CEO: ‘GPT-5 and AGI will be achieved “relatively soon”‘. (Twitter)

24/Nov/2023: Coke CMO probably got GPT-V and GPT-5 confused when he said:Coca Cola Diwali has been done with GPT-5.’

Manolo Arroyo (Global Chief Marketing Officer for The Coca‑Cola Company):

I can give you maybe an insight, some pieces of new news that no one has shared so far. We have a partnership with Bain and OpenAI…

We were actually the first company that was combining GPT, which is the engine that enables ChatGPT, and DALL-E. Back then, no one knew that because of the partnership with OpenAI, we were the first company using GPT-4 and DALL-E 2 into one integrated consumer digital experience. No one knows, because it hasn’t been launched yet, that Coca Cola Diwali has been done with GPT-5 [probably referring to GPT-V] which is still not commercially available, and DALL-E 3 that has also not been launched…

And that’s how in just six months this technology is progressing… launch it for Christmas [2023] globally…
(— Coca-Cola’s Mega Marketing Transformation by The Morning Brief (The Economic Times), 17m31s – transcribed with my fingers because both otter and whisper were down…)

Edit: Making it clearer that the quote above must have been confused between GPT-V and GPT-5. GPT-V(ision) was announced as part of GPT-4 in Mar/2023. GPT-5 will be in training between Dec/2023 and Apr/2024 as an estimate.

13/Nov/2023: OpenAI CEO on GPT-5: The company is also working on GPT-5, the next generation of its AI model, Altman said, although he did not commit to a timeline for its release. It will require more data to train on, which Altman said would come from a combination of publicly available data sets on the internet, as well as proprietary data from companies. OpenAI recently put out a call for large-scale data sets from organisations that “are not already easily accessible online to the public today”, particularly for long-form writing or conversations in any format. While GPT-5 is likely to be more sophisticated than its predecessors, Altman said it was technically hard to predict exactly what new capabilities and skills the model might have. “Until we go train that model, it’s like a fun guessing game for us,” he said. “We’re trying to get better at it, because I think it’s important from a safety perspective to predict the capabilities. But I can’t tell you here’s exactly what it’s going to do that GPT-4 didn’t.” (FT)

Read more about emerging abilities.

22/Oct/2023: OpenAI CEO on GPT-5 being AGI: “We define AGI as ‘the thing we don’t have quite yet.’ There were a lot of people who would have—ten years ago [2013 compared to 2023]—said alright, if you can make something like GPT-4, GPT-5 maybe, that would have been an AGI… I think we’re getting close enough to whatever that AGI threshold is going to be.” (YouTube).

19/Oct/2023: Bill Gates says GPT-5 won’t be much better than GPT-4.

[Gates] predicts stagnation in development at first. “We have reached a plateau,” said Gates, referring to OpenAI’s GPT AI model, which has caused a stir around the world. The next version won’t be much better than the current GPT4; a limit has been reached.

(Handelsblatt, German media)

7/Oct/2023: More Gobi rumors and analysis in Reddit thread by FeltSteam (archive).

29/Sep/2023: Gobi rumors and analysis in shared Google Doc.

18/Jul/2023: OpenAI mentions ‘GPT-V’ in job listing (may be GPT-4V/Vision as in GPT-4).

18/Jul/2023: OpenAI files to trademark the term ‘GPT-5’. Full filing at USPTO: and application table.

6/Jul/2023: OpenAI Alignment team lead comments on GPT-5 alignment (Alan: I don’t like to give airtime to AI doomers like EY, so this is mainly for Dr Jan’s response). ‘We can see how well alignment of GPT-5 will go. We’ll monitor closely how quickly the tech develops.’

7/Jun/2023: “We have a lot of work to do before we start that model [GPT-5],” Altman, the chief executive of OpenAI, said at a conference hosted by Indian newspaper Economic Times. “We’re working on the new ideas that we think we need for it, but we are certainly not close to it to start.” (TechCrunch)

2/Jun/2023: OpenAI CEO updates, requested to be removed from the web, archived here.

OpenAI CEO updates Jun/2023

Archived from:

1. OpenAI is heavily GPU limited at present

A common theme that came up throughout the discussion was that currently OpenAI is extremely GPU-limited and this is delaying a lot of their short-term plans. The biggest customer complaint was about the reliability and speed of the API. Sam acknowledged their concern and explained that most of the issue was a result of GPU shortages.

The longer 32k context can’t yet be rolled out to more people. OpenAI haven’t overcome the O(n^2) scaling of attention and so whilst it seemed plausible they would have 100k – 1M token context windows soon (this year) anything bigger would require a research breakthrough.

The finetuning API is also currently bottlenecked by GPU availability. They don’t yet use efficient finetuning methods like Adapters or LoRa and so finetuning is very compute-intensive to run and manage. Better support for finetuning will come in the future. They may even host a marketplace of community contributed models.

Dedicated capacity offering is limited by GPU availability. OpenAI also offers dedicated capacity, which provides customers with a private copy of the model. To access this service, customers must be willing to commit to a $100k spend upfront.

2. OpenAI’s near-term roadmap

Sam shared what he saw as OpenAI’s provisional near-term roadmap for the API.


  • Cheaper and faster GPT-4 — This is their top priority. In general, OpenAI’s aim is to drive “the cost of intelligence” down as far as possible and so they will work hard to continue to reduce the cost of the APIs over time.
  • Longer context windows — Context windows as high as 1 million tokens are plausible in the near future.
  • Finetuning API — The finetuning API will be extended to the latest models but the exact form for this will be shaped by what developers indicate they really want.
  • A stateful API — When you call the chat API today, you have to repeatedly pass through the same conversation history and pay for the same tokens again and again. In the future there will be a version of the API that remembers the conversation history.


  • Multimodality — This was demoed as part of the GPT-4 release but can’t be extended to everyone until after more GPUs come online.

3. Plugins “don’t have PMF” and are probably not coming to the API anytime soon

A lot of developers are interested in getting access to ChatGPT plugins via the API but Sam said he didn’t think they’d be released any time soon. The usage of plugins, other than browsing, suggests that they don’t have PMF yet. He suggested that a lot of people thought they wanted their apps to be inside ChatGPT but what they really wanted was ChatGPT in their apps.

4. OpenAI will avoid competing with their customers — other than with ChatGPT

Quite a few developers said they were nervous about building with the OpenAI APIs when OpenAI might end up releasing products that are competitive to them. Sam said that OpenAI would not release more products beyond ChatGPT. He said there was a history of great platform companies having a killer app and that ChatGPT would allow them to make the APIs better by being customers of their own product. The vision for ChatGPT is to be a super smart assistant for work but there will be a lot of other GPT use-cases that OpenAI won’t touch.

5. Regulation is needed but so is open source

While Sam is calling for regulation of future models, he didn’t think existing models were dangerous and thought it would be a big mistake to regulate or ban them. He reiterated his belief in the importance of open source and said that OpenAI was considering open-sourcing GPT-3. Part of the reason they hadn’t open-sourced yet was that he was skeptical of how many individuals and companies would have the capability to host and serve large LLMs.

6. The scaling laws still hold

Recently many articles have claimed that “the age of giant AI Models is already over”. This wasn’t an accurate representation of what was meant.

OpenAI’s internal data suggests the scaling laws for model performance continue to hold and making models larger will continue to yield performance. The rate of scaling can’t be maintained because OpenAI had made models millions of times bigger in just a few years and doing that going forward won’t be sustainable. That doesn’t mean that OpenAI won’t continue to try to make the models bigger, it just means they will likely double or triple in size each year rather than increasing by many orders of magnitude.

The fact that scaling continues to work has significant implications for the timelines of AGI development. The scaling hypothesis is the idea that we may have most of the pieces in place needed to build AGI and that most of the remaining work will be taking existing methods and scaling them up to larger models and bigger datasets. If the era of scaling was over then we should probably expect AGI to be much further away. The fact the scaling laws continue to hold is strongly suggestive of shorter timelines.


31/May/2023: OpenAI announces GPT-4 MathMix (paper).

29/May/2023: NVIDIA Announces DGX GH200 AI Supercomputer (NVIDIA). ‘New Class of AI Supercomputer Connects 256 Grace Hopper Superchips Into Massive, 1-Exaflop, 144TB GPU for Giant Models… GH200 superchips eliminate the need for a traditional CPU-to-GPU PCIe connection by combining an Arm-based NVIDIA Grace™ CPU with an NVIDIA H100 Tensor Core GPU in the same package, using NVIDIA NVLink-C2C chip interconnects.’

Expect trillion-parameter models like OpenAI GPT-5, Anthropic Claude-Next, and beyond to be trained with this groundbreaking hardware. Some have estimated that this could train language models up to 80 trillion parameters, which gets us closer to brain-scale.

20/May/2023: Updated GPT-4 chart for reference.
By request, here’s a simplified version of this full GPT-4 vs human viz; easier to read on a big screen!
Download source (PDF)

19/May/2023: OpenAI CEO (Elevate):

…with the arrival of GPT-4, people started building entire companies around it. I believe that GPT-5, 6, and 7 will continue this trajectory in future years, really increasing the utility they can provide.

This development is a big, new, exciting thing to have in the world. It’s as though all of computing got an upgrade.

I think we’ll look back at this period like we look back at the period where people were discovering fundamental physics. The fact that we’re discovering how to predict the intelligence of a trained AI before we start training it suggests that there is something close to a natural law here. We can predictably say this much compute, this big of a neural network, this training data – these will determine the capabilities of the model. Now we can predict how it’ll score on some tests.

whether we can predict the sort of qualitative new things – the new capabilities that didn’t exist at all in GPT-4 but do exist in future versions like GPT-5. That seems important to figure out. But right now, we can say, ‘Here’s how we predict it’ll do on this evaluation or this metric.’ I really do think we’ll look back at this period as if we were all living through one of the most important periods of human discovery.

I believe that this will be a monumental deal in terms of how we think about when we go beyond human intelligence. However, I don’t think that’s quite the right framework because it’ll happen in some areas and not others. Already, these systems are superhuman in some limited areas and extremely bad in others, and I think that’s fine.

…this analogy: it’s like everybody’s going to be the CEO of all of the work they want to do. They’ll have tons of people that they’re able to coordinate and direct, provide the text and the feedback on. But they’ll also have lots of agents, for lack of a better word, that go off and do increasingly complex tasks.

16/May/2023: OpenAI CEO to Congress: ‘We are not currently training what will be GPT-5; we don’t have plans to do it in the next 6 months’.

11/May/2023: Microsoft Korea: ‘We are preparing for GPT-5, and GPT-6 will also be released.’ (Yonhap News Agency (Korean)).

13/Apr/2023: At an MIT event, OpenAI CEO confirmed previous statement from two weeks ago, saying “We are not [training GPT-5] and won’t for some time.”

Meme inspired by /r/singularity.

29/Mar/2023: Hannah Wong, a spokesperson for OpenAI, says… OpenAI is not currently training GPT-5. (Wired).

29/Mar/2023: ‘i have been told that gpt5 is scheduled to complete training this december and that openai expects it to achieve agi. which means we will all hotly debate as to whether it actually achieves agi. which means it will.’

Siqi is the founder and CEO of Runway, an a16z funded startup.

23/Mar/2023: Microsoft paper on GPT-4 and early artificial general intelligence:

20/Mar/2023: OpenAI paper on GPT and employment: ‘We investigate the potential implications of Generative Pre-trained Transformer (GPT) models and related technologies on the U.S. labor market.’

13/Feb/2023: Morgan Stanley research note:

We think that GPT 5 is currently being trained on 25k GPUs – $225 mm or so of NVIDIA hardware…

The current version of the model, GPT-5, will be trained in the same facility—announced in 2020 [May/2020, Microsoft], the supercomputer designed specifically for OpenAI has 285k CPU cores, 10k GPU cards, and 400 Gb/s connectivity for each GPU server; our understanding is that there has been substantial expansion since then. From our conversation, GPT-5 is being trained on about 25k GPUs, mostly A100s, and it takes multiple months; that’s about $225m of NVIDIA hardware, but importantly this is not the only use, and many of the same GPUs were used to train GPT-3 and GPT-4…

We also would expect the number of large language models under development to remain relatively small. IF the training hardware for GPT-5 is $225m worth of NVIDIA hardware, that’s close to $1b of overall hardware investment; that isn’t something that will be undertaken lightly. We see large language models at a similar scale being developed at every hyperscaler, and at multiple startups.

Datacenter location

Models table

Summary of current models: View the full data (Google sheets)
Download PDF version


OpenAI President, Greg Brockman (Oct/2022):

…there’s no human who’s been able to consume 40TB of text [≈20T tokens, probably trained to ≈1T parameters in line with Chinchilla scaling laws]

Summary of current models: View the full data (Google sheets)

Timeline to GPT-5

Date Milestone
11/Jun/2018 GPT-1 announced on the OpenAI blog.
14/Feb/2019 GPT-2 announced on the OpenAI blog.
28/May/2020 Initial GPT-3 preprint paper published to arXiv.
11/Jun/2020 GPT-3 API private beta.
22/Sep/2020 GPT-3 licensed to Microsoft.
18/Nov/2021 GPT-3 API opened to the public.
27/Jan/2022 InstructGPT released as text-davinci-002, now known as GPT-3.5. InstructGPT preprint paper Mar/2022.
28/Jul/2022 Exploring data-optimal models with FIM, paper on arXiv.
1/Sep/2022 GPT-3 model pricing cut by 66% for davinci model.
21/Sep/2022 Whisper (speech recognition) announced on the OpenAI blog.
28/Nov/2022 GPT-3.5 expanded to text-davinci-003, announced via email:
1. Higher quality writing.
2. Handles more complex instructions.
3. Better at longer form content generation.
30/Nov/2022 ChatGPT announced on the OpenAI blog.
14/Mar/2023 GPT-4 released.
31/May/2023 GPT-4 MathMix and step by step, paper on arXiv.
6/Jul/2023 GPT-4 available via API.
25/Sep/2023 GPT-4V finally released.
Next… GPT-5…

AI Race

Download source (PDF)
Permissions: Yes, you can use these visualizations anywhere, please leave the citation intact.


OpenAI’s diplomatic mission (2023)

On 9/Jun/2023, at a fireside chat in Seoul, Korea, the OpenAI CEO acknowledged he was on a “diplomatic mission.” After the release of GPT-4 in Mar/2023, OpenAI staff visited the following regions:

  1. Canada: Toronto
  2. USA: Washington D.C.
  3. Brazil: Rio De Janeiro
  4. Nigeria: Lagos
  5. Spain: Madrid (the Spanish Presidency of the Council of the European Union ran from 1/July/2023-31/Dec/2023)
  6. Belgium: Brussels
  7. Germany: Munich
  8. UK: London
  9. France: Paris
  10. Israel: Tel Aviv
  11. UAE: Dubai
  12. India: New Delhi
  13. Singapore
  14. Indonesia: Jakarta
  15. South Korea: Seoul
  16. Japan: Tokyo
  17. Australia: Melbourne


  1. Jordan
  2. Qatar
  3. China: Beijing
  4. Poland



Read more about Alan’s conservative countdown to AGI

Get The Memo

by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.
Bestseller. 10,000+ readers from 142 countries. Microsoft, Tesla, Google...
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.

Dr Alan D. Thompson is an AI expert and consultant, advising Fortune 500s and governments on post-2020 large language models. His work on artificial intelligence has been featured at NYU, with Microsoft AI and Google AI teams, at the University of Oxford’s 2021 debate on AI Ethics, and in the Leta AI (GPT-3) experiments viewed more than 4.5 million times. A contributor to the fields of human intelligence and peak performance, he has held positions as chairman for Mensa International, consultant to GE and Warner Bros, and memberships with the IEEE and IET. Technical highlights.

This page last updated: 6/Feb/2024.