Models Table: Methodology

Alan D. Thompson
April 2026

Context

The Models Table is the running output of years of full-time LLM analysis, currently 10,000+ data points across 800+ models, documented by hand. Apple, Microsoft, Harvard, Stanford, MIT, and several governments use it as an independent reference for LLM details, including proprietary data on frontier models.

Since around 2023, when DeepMind stopped publishing detailed training papers for its models, frontier labs have generally kept architecture and training details confidential. Independent estimation work fills the gap, and that is what the parameter and token columns of the Models Table represent.

Every italic figure in the Models Table is a centrepoint estimate, the working comes from several lenses cross-checked against open-source releases, and a reasonable error band on any single estimated model is wide. Specific entries are documented in detail in my paid analyses (The Memo special editions) and in the methodology papers like What’s in My AI?, What’s in GPT-5?, and What’s in Grok?

Estimates and accuracy
In the Models Table, any parameter or token figure in italics is my estimate. Anything in regular weight is either disclosed by the lab, quoted directly from a paper, or otherwise on the public record. All italic figures are my best-informed estimates, as we can’t derive ground truth in the current climate of trade secrets. Every italic figure in the table is a centrepoint, so when I estimate ‘2T’ the range is roughly 1T–3T (and up to 4T at the wider, factor-of-two band that Epoch AI uses), and the same applies to token counts. The table shows the centrepoint to keep rows legible; my paid analyses document the ranges. A useful comparison: Ege Erdil at Epoch AI arrived at the same ~200B number for o1 that I had, and put it like this:

This would put its total parameter count around 200 billion, though this estimate could easily be off by a factor of 2 [100B to 400B] given the rough way I’ve arrived at it. (Epoch AI)

That is the right framing for everything in italics in the Models Table.

Total Parameters
The parameter figure shown in the Models Table is total parameters. For Mixture-of-Experts (MoE) models, this is the total parameter count, not the active-per-token count. The two can differ by an order of magnitude on modern frontier MoE models.

Tokens seen
Token figures in the Models Table use the same italics convention. My token estimates start from the parameter estimate and use a tokens-per-parameter ratio (typically 20:1 or higher, in line with post-Chinchilla data ratios) as a first-pass sanity check. That figure then gets refined against lab statements about training compute, training duration, hardware allocation, and dataset composition. The detailed token-counting methodology for publicly-known corpora is documented in What’s in My AI?, and the Chinchilla page (https://lifearchitect.ai/chinchilla/) tracks how the ratio itself has evolved from 1.7:1 (Kaplan, 2020) through 20:1 (Chinchilla, 2022) to 80,000:1 (LFM2.5-350M, 2026).

Lenses for estimates

My estimates come from several lenses, cross-checked against each other and against open-source releases.

Lens 1: Pricing signal
Per-token price gives some signal on active-parameter size, since inference cost scales with active params, but margin bands vary widely across labs (some price close to cost, others add substantial profit). I treat pricing as one input among many, rather than a primary driver. I maintain an older public visualisation of frontier model pricing over time: https://lifearchitect.ai/viz/#frontier-prices.

Lens 2: Capability signal
Benchmark jumps between model generations, on saturating-but-still-useful evals like MMLU-Pro, GPQA, HLE, historically track with both a parameter step and a training-compute step. Combined with the pricing lens, this helps me bracket the estimate.

Lens 3: Supply and demand signals
Three constraints together set the plausible size envelope for any frontier model:

Signal What it constrains Examples
Training supply Maximum training compute available TPU/GPU allocations, datacentre capacity disclosures
Inference supply Deployment footprint and serving cost Bedrock, Vertex, Foundry, Azure, OpenRouter availability
Demand Realistic serving load User counts, usage disclosures, queue behaviour

Lens 4: Cross-checks against open-source frontier releases
When labs release detailed numbers (or full weights), we get ground truth, and I check my proprietary estimates against the known open frontier as they land.

Small models circa 2024

Model Lab Total params Active params Tokens seen GPQA HLE Released
gpt-oss-120b OpenAI 116.8B 5.1B 30T 80.1 19 Aug/2025
gpt-oss-20b OpenAI 20.9B 3.6B 13T 71.5 17.3 Aug/2025
GPT-4o OpenAI 200B 20T 53.6 3.1 May/2024

Large models circa 2025

Model Lab Total params Active params Tokens seen GPQA HLE Released
Grok-5 xAI 6T
DeepSeek-V4-Pro DeepSeek 1.6T 49B 33T 90.1 37.7 Apr/2026
Kimi K2.6 Moonshot 1T 32B 30.5T 90.5 54 Apr/2026
Grok-4 xAI 3T Jul/2025
Grok-3 xAI 3T 84.6 Feb/2025

Lens 5: Direct insights from labs, plus signals from rumours, news, and disclosures
I have private channels with several labs where I receive context I can’t disclose publicly, which informs the table and in some cases directly shapes specific entries. Separately, I monitor and weight credible rumours, public commentary from researchers and executives, news reporting (Reuters in particular has broken useful detail on frontier training runs and chip allocations), and partial disclosures (Amazon’s Titan family is a good example, where most of what we know publicly came through fragmented announcements and sideways comments rather than published papers). These get cross-checked against the lenses above before I commit a number.

Worked example: Grok-3 and Grok-4

This is a case where my estimates were carried publicly for months and then confirmed by a direct lab disclosure.

In Jul/2025, my Models Table centrepoints for the Grok family were:

Model My estimate (Jul/2025) Lens reasoning
Grok-3 ~3T total parameters Capability bracket against GPT-4-class models, Colossus training-supply disclosure (100K+ H100s), public benchmark performance
Grok-4 ~3T total parameters xAI public statements that Grok-4 was a post-training/RL upgrade rather than a new pretrain, similar inference cost

On 15/Nov/2025, the xAI CEO disclosed publicly:

[Grok-5] is a 6 trillion parameter model, whereas Grok-3 and -4 are based on a 3 trillion parameter model. (xAI CEO, 15/Nov/2025, covered at https://lifearchitect.ai/grok/)

Comparison of my carried estimate to the disclosed figure:

Model My estimate Disclosed (Nov/2025) Delta
Grok-3 3T (Feb/2025) 3T (Nov/2025) Match
Grok-4 3T (Jul/2025) 3T (Nov/2025) Match

The Grok-3 and Grok-4 centrepoints landed on the disclosed figures. This is the kind of cross-check that validates the lens method, particularly Lens 3 (training supply via Colossus), Lens 1 (pricing/capability bracket), and Lens 4 (open-source frontier comparisons).

Worked example: Claude Mythos Preview

The following is reproduced from my paid Mythos analysis at The Memo – Special Edition – Claude Mythos. It shows the lenses applied to a single recent model (as of Apr/2026).

Size estimates

General model size is no longer an indicator of performance, but I still find it interesting. With all model details kept confidential, plus added complexity in reasoning/thinking mode, it is more challenging than ever to estimate token and parameter counts.

Now in 2026, based on my ongoing analysis, known Claude models pricing, similar known frontier MoE model sizes and pricing, estimates of training supply (TPUs), inference supply (TPUs), and demand (users), here are my initial estimates for the Claude Mythos model. Working is based on the Glasswing pricing disclosure ($25/$125 per million input/output tokens), the benchmark jumps over Opus 4.6, and my previous established estimate of Opus 4.6 at ~5T parameters MoE on ~100T tokens.

Pricing as a sizing signal. Mythos at $25/$125 sits roughly 1.7x above Opus 4.6 ($15/$75) on both input and output. Frontier labs price close to inference cost plus a margin band, so a 1.7x price step usually reflects a 1.5x to 1.8x active-parameter step, not a proportional total-parameter step (MoE total params can grow much faster than active params without moving price much).

Capability as a sizing signal. The benchmark jumps over Opus 4.6 are larger than Anthropic has previously delivered on a single model generation:

Benchmark Claude Opus 4.6 Claude Mythos Preview Step
SWE-bench Verified 80.8% 93.9% +13.1pp
CyberGym 66.6% 83.1% +16.5pp
HLE 53.1% 64.7% +11.6pp

The system card explicitly says Mythos has hit the ceiling on their tests (‘saturates many of our most concrete, objectively-scored evaluations’). Jumps of this size historically track with both a parameter step and a meaningful training-compute step.

Training data. Anthropic has been consistent about heavy synthetic data and curriculum work (‘Claude Mythos Preview was trained on a proprietary mix of publicly available information from the internet, public and private datasets, and synthetic data generated by other models.’), and the system card flags extensive RL on long-horizon agentic tasks (‘extremely large amounts of reinforcement learning’). A reasonable read is that Mythos saw materially more tokens than Opus 4.6, with a much higher synthetic fraction, particularly for code, cyber, and tool-use trajectories.

Update from Apr/2026: Subsequent NVIDIA commentary suggests Mythos was ‘trained on fairly mundane capacity, and a fairly mundane amount of it’ (YouTube, 17/Apr/2026), pointing to looped Transformer/reasoning gains rather than an exponential compute step. So the Mythos compute story is different to what I had at the time of writing, though the original estimate may still be in the right range.

Closing

No method gets every estimate right when labs keep their architecture and training details confidential. Grok-3 and Grok-4 landed on the disclosed figures. The Claude Mythos compute estimate will move once the picture settles. I’ll continue providing informed estimates from the best signals available, documenting the working, and updating when the ground truth arrives.

The Models Table is a free public resource, updated continuously. Detailed analyses are available to full subscribers in The Memo.

Further reading

Resource Type What it covers
Models Table Reference table The reference table for 10,000+ LLM data points, including the parameter and token estimates this page documents.
What’s in My AI? Free public paper Token-counting methodology for publicly-known training corpora
What’s in GPT-5? Methodology paper Full parameter/token derivation for GPT-5, including synthetic-data analysis (cited by the G7)
What’s in Grok? Methodology paper Full parameter/token derivation for the Grok family
Frontier pricing visualisation Viz Older version of the price-lens method
The Memo – GPT-5 special edition Paid per-model analysis Per-model sizing working for GPT-5
The Memo – Claude Mythos special edition Paid per-model analysis Per-model sizing working for Claude Mythos Preview
Datasets Table Reference table Largest known training datasets, updated continuously
Chinchilla data scaling laws page Reference page Tokens-to-parameters ratio research from Chinchilla onward

Get The Memo

by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.
Informs research at Apple, Google, Microsoft · Bestseller in 147 countries.
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.

Alan D. Thompson is a world expert in artificial intelligence, advising everyone from Apple to the US Government on integrated AI. Throughout Mensa International’s history, both Isaac Asimov and Alan held leadership roles, each exploring the frontier between human and artificial minds. His landmark analysis of post-2020 AI—from his widely-cited Models Table to his regular intelligence briefing The Memo—has shaped how governments and Fortune 500s approach artificial intelligence. With popular tools like the Declaration on AI Consciousness, and the ASI checklist, Alan continues to illuminate humanity’s AI evolution. Technical highlights.

This page last updated: 26/Apr/2026. https://lifearchitect.ai/models-table-methodology/