Image above generated by AI for this analysis (Imagen 3)1Image generated in a few seconds, on 21 October 2024, text prompt by Alan D. Thompson via Imagen 3: ‘a zoomed out background header for oxygen, with title ‘o2’
Advising the majority of Fortune 500s, informing government policy, and in Sep/2024 used by Apple as their primary source for model sizes in their new model paper and viz, Alan’s monthly analysis, The Memo, is a Substack bestseller in 142 countries:Get The Memo.
Alan D. Thompson
October 2024 (placeholder…)
Summary
Organization | OpenAI |
Model name | o2 (OpenAI model number two) |
Internal/project name | – |
Model type | Multimodal |
Parameter count | See below: Size estimate |
Dataset size (tokens) | See report: What’s in GPT-5? |
Training data end date | Oct/2024 (est) |
Training start date | Dec/2024 (est) |
Training end/convergence date | Jun/2025 (est) |
Training time (total) | – See working, with sources. |
Release date (public) | 2025 |
Paper | – |
Playground | – |
Updates
3/Nov/2024: OpenAI’s first mention of o2 by OpenAI CEO: ‘I heard o2 gets 105% on GPQA’ (Twitter, 3/Nov/2024)
Major points
Model name
OpenAI (17/Oct/2024 timecode 21m11s, transcribed by Whisper):
We plan to continue developing and releasing models in the new OpenAI o1 series, as well as our GPT series. In addition to model updates, we expect to add web browsing, file and image uploading, and other features to [o1 to] make them more useful in use cases in ChatGPT. And while even today you are able to switch between models in the same conversation, like you saw in the demo, we’re working to enable ChatGPT to automatically choose the right model for your given prompt.
Smarts
Coming soon…
Size estimate (o1)
Perhaps beginning with gpt-3.5-turbo (high performance with just 20B parameters, as revealed in a paper published and then quickly withdrawn by Microsoft in Oct/2023), the number of parameters in a model is no longer the primary indicator of the model’s power and capabilities. Other factors, such as architecture, training data quality, and inference optimization, now play equally important roles in determining a model’s overall performance.
In Apr/2021, Jones (now at Anthropic) released a paper called ‘Scaling Scaling Laws with Board Games’. It found that ‘for each additional 10× of train-time compute, about 15× of test-time compute can be eliminated.’
Reversing this relationship suggests that:
A 15× increase in inference-time compute would equate to a 10× increase in train-time compute.
Dr Noam Brown at OpenAI discussed this finding in a presentation to the Paul G. Allen School on 23/May/2024 and released to the public on 17/Sep/2024 (timecode is 28m17s).
In Aug/2024, Google DeepMind (with UC Berkeley) released a paper called ‘Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters’. They found that:
“on problems where a smaller base model attains somewhat non-trivial success rates, test-time compute can be used to outperform a 14x larger model.
…
Test-time and pretraining compute are not 1-to-1 “exchangeable”. On easy and medium questions, which are within a model’s capabilities, or in settings with small inference requirement, test-time compute can easily cover up for additional pretraining. However, on challenging questions which are outside a given base model’s capabilities or under higher inference requirement, pretraining is likely more effective for improving performance.”
The table below is an extrapolation of DeepMind’s findings. It is oversimplified, all figures are rounded, and there may be a larger error margin as scale increases. I have added an ‘MoE equiv’ column, which shows the mixture-of-experts model size equivalent by applying a a 5× multiplier rule to the Dense model size (Standard inference-time compute, ITC).
Dense (Increased ITC) | Dense (Standard ITC) ×14 |
MoE equiv (Standard ITC) ×5 |
---|---|---|
1B | 14B | 70B |
7B | 98B | 490B |
8B | 112B | 560B |
20B | 280B | 1.4T |
25B | 350B | 1.76T (GPT-4) |
30B | 420B | 2.1T |
70B | 980B | 4.9T |
180B | 2.52T | 12.6T |
200B (o1) | 2.8T | 14T |
280B | 3.92T | 19.6T |
540B | 7.56T | 37.8T |
Read more at LifeArchitect.ai/o1
Dataset
A 200B parameter model trained on 20T tokens would have a tokens:parameters ratio of 100:1, an optimal pretraining ratio in 2024. See: Chinchilla data-optimal scaling laws: In plain English.
The o2 dataset is expected to use much of the initial GPT-3 dataset as detailed in my report What’s in my AI?, and to be very similar to the dataset used to train GPT-4 Classic 1.76T (available in lab Aug/2022), with some additional datasets from new synthetic data and partnerships as outlined in my GPT-5 dataset report.
A Comprehensive Analysis of Datasets Likely Used to Train GPT-5
Alan D. Thompson
LifeArchitect.ai
August 2024
27 pages incl title page, references, appendices.
Use cases
Coming soon…
2024 optimal LLM highlights
Download source (PDF)Permissions: Yes, you can use these visualizations anywhere, please leave the citation intact.
Older bubbles viz
Feb/2024
Nov/2023
Download source (PDF)Permissions: Yes, you can use these visualizations anywhere, please leave the citation intact.
Mar/2023
Download source (PDF)Apr/2022
Download source (PDF)Models Table
Summary of current models: View the full data (Google sheets)Timeline to o2
Date | Milestone |
11/Jun/2018 | GPT-1 announced on the OpenAI blog. |
14/Feb/2019 | GPT-2 announced on the OpenAI blog. |
28/May/2020 | Initial GPT-3 preprint paper published to arXiv. |
11/Jun/2020 | GPT-3 API private beta. |
22/Sep/2020 | GPT-3 licensed to Microsoft. |
18/Nov/2021 | GPT-3 API opened to the public. |
27/Jan/2022 | InstructGPT released as text-davinci-002, now known as GPT-3.5. InstructGPT preprint paper Mar/2022. |
28/Jul/2022 | Exploring data-optimal models with FIM, paper on arXiv. |
1/Sep/2022 | GPT-3 model pricing cut by 66% for davinci model. |
21/Sep/2022 | Whisper (speech recognition) announced on the OpenAI blog. |
28/Nov/2022 | GPT-3.5 expanded to text-davinci-003, announced via email: 1. Higher quality writing. 2. Handles more complex instructions. 3. Better at longer form content generation. |
30/Nov/2022 | ChatGPT announced on the OpenAI blog. |
14/Mar/2023 | GPT-4 released. |
31/May/2023 | GPT-4 MathMix and step by step, paper on arXiv. |
6/Jul/2023 | GPT-4 available via API. |
25/Sep/2023 | GPT-4V finally released. |
13/May/2024 | GPT-4o announced. |
18/Jul/2024 | GPT-4o mini announced. |
12/Sep/2024 | o1 released. |
2024 | GPT-5… |
2025 | o2… |
2025 | GPT-6… |
Videos
Coming soon…
Get The Memo
by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.Bestseller. 10,000+ readers from 142 countries. Microsoft, Tesla, Google...
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.
Dr Alan D. Thompson is an AI expert and consultant, advising Fortune 500s and governments on post-2020 large language models. His work on artificial intelligence has been featured at NYU, with Microsoft AI and Google AI teams, at the University of Oxford’s 2021 debate on AI Ethics, and in the Leta AI (GPT-3) experiments viewed more than 4.5 million times. A contributor to the fields of human intelligence and peak performance, he has held positions as chairman for Mensa International, consultant to GE and Warner Bros, and memberships with the IEEE and IET. Technical highlights.
This page last updated: 3/Nov/2024. https://lifearchitect.ai/o2/↑
- 1Image generated in a few seconds, on 21 October 2024, text prompt by Alan D. Thompson via Imagen 3: ‘a zoomed out background header for oxygen, with title ‘o2’