Models Table Pro: Methodology

Alan D. Thompson
April 2026

Context

The Models Table Pro (and public-facing Models Table) show the running output of years of full-time LLM analysis, currently 30,000+ data points across 800+ models, documented by hand and updated the day each model launches. This data is used as an independent reference inside frontier labs, Fortune 500s, and several governments.

Since around 2023, when DeepMind stopped publishing detailed training papers, frontier labs have generally kept architecture and training details confidential. Independent estimation fills that gap, and it’s what the parameter and token columns represent. The rest of this page shows the method, and a few cases where carried estimates were later confirmed by direct lab disclosure.

Estimates and accuracy
Any parameter or token figure in italics is my estimate; anything in regular weight is disclosed by the lab, quoted from a paper, or otherwise on the public record. Since we can’t derive ground truth in the current climate of trade secrets, every italic figure is a best-informed centrepoint: when I show ‘2T’, the real range is roughly 1T to 3T, widening to 4T at the factor-of-two band Epoch AI uses. Models Table Pro adds my confidence grade on every estimate, and selected per-model working is documented in The Memo special editions and methodology papers What’s in My AI?, What’s in GPT-5?, and What’s in Grok?

A useful external check: Ege Erdil at Epoch AI arrived at the same ~200B figure for o1 that I had, and framed the uncertainty this way:

This would put its total parameter count around 200 billion, though this estimate could easily be off by a factor of 2 [100B to 400B] given the rough way I’ve arrived at it. (Epoch AI)

Total Parameters
Models Table Pro now shows both total and active parameters. For Mixture-of-Experts (MoE) models the two can differ by an order of magnitude, so both matter: total params track the model’s full capacity, while active-per-token params track what actually fires on each forward pass (and what inference cost scales with).

Tokens seen
Token figures use the same italics convention. Each estimate starts from the parameter figure and a tokens-per-parameter ratio as a first-pass sanity check, then gets refined against lab statements on training compute, duration, hardware allocation, and dataset composition. The token-counting methodology for publicly-known corpora is documented in What’s in My AI?, and the Chinchilla page (https://lifearchitect.ai/chinchilla/) tracks how the ratio itself has evolved from 1.7:1 (Kaplan, 2020) through 20:1 (Chinchilla, 2022) to 80,000:1 (LFM2.5-350M, 2026).

Lenses for estimates

My estimates come from several lenses, cross-checked against each other and against open-source releases.

Lens 1: Pricing signal
Per-token price gives some signal on active-parameter size, since inference cost scales with active params, but margin bands vary widely across labs (some price close to cost, others add substantial profit). I treat pricing as one input among many, rather than a primary driver. I maintain an older public visualisation of frontier model pricing over time: https://lifearchitect.ai/viz/#frontier-prices.

Lens 2: Capability signal
Benchmark jumps between model generations, on saturating-but-still-useful evals like MMLU-Pro, GPQA, HLE, historically track with both a parameter step and a training-compute step. Combined with the pricing lens, this helps me bracket the estimate.

Lens 3: Supply and demand signals
Three constraints together set the plausible size envelope for any frontier model:

Signal What it constrains Examples
Training supply Maximum training compute available TPU/GPU allocations, datacentre capacity disclosures
Inference supply Deployment footprint and serving cost Bedrock, Vertex, Foundry, Azure, OpenRouter availability
Demand Realistic serving load User counts, usage disclosures, queue behaviour

Lens 4: Cross-checks against open-source frontier releases
When labs release detailed numbers (or full weights), we get ground truth, and I check my proprietary estimates against the known open frontier as they land.

Small models circa 2024

Model Lab Total params Active params Tokens seen GPQA HLE Released
gpt-oss-120b OpenAI 116.8B 5.1B 30T 80.1 19 Aug/2025
gpt-oss-20b OpenAI 20.9B 3.6B 13T 71.5 17.3 Aug/2025
GPT-4o OpenAI 200B 20T 53.6 3.1 May/2024

Large models circa 2025

Model Lab Total params Active params Tokens seen GPQA HLE Released
Grok-5 xAI 6T
DeepSeek-V4-Pro DeepSeek 1.6T 49B 33T 90.1 37.7 Apr/2026
Kimi K2.6 Moonshot 1T 32B 30.5T 90.5 54 Apr/2026
Grok-4 xAI 3T Jul/2025
Grok-3 xAI 3T 84.6 Feb/2025

Lens 5: Direct insights from labs, plus signals from rumours, news, and disclosures
I have private channels with several labs where I receive context I can’t disclose publicly, which informs the table and in some cases directly shapes specific entries. Separately, I monitor and weight credible rumours, public commentary from researchers and executives, news reporting (Reuters in particular has broken useful detail on frontier training runs and chip allocations), and partial disclosures (Amazon’s Titan family is a good example, where most of what we know publicly came through fragmented announcements and sideways comments rather than published papers). These get cross-checked against the lenses above before I commit a number.

Worked example: Grok-3 and Grok-4

The clearest test of any estimation method is a carried estimate that later meets ground truth. The Grok family is exactly that case: centrepoints I published months ahead, left unchanged, and then matched against direct disclosures from the lab.

In Jul/2025, Models Table Pro centrepoints for the Grok family were:

Model My estimate (Jul/2025) Lens reasoning
Grok-3 ~3T total parameters Capability bracket against GPT-4-class models, Colossus training-supply disclosure (100K+ H100s), public benchmark performance
Grok-4 ~3T total parameters xAI public statements that Grok-4 was a post-training/RL upgrade rather than a new pretrain, similar inference cost

On 15/Nov/2025, the xAI CEO disclosed publicly:

[Grok-5] is a 6 trillion parameter model, whereas Grok-3 and -4 are based on a 3 trillion parameter model. (xAI CEO, 15/Nov/2025, covered at https://lifearchitect.ai/grok/)

Comparison of my carried estimate to the disclosed figure:

Model My estimate Disclosed (Nov/2025) Delta
Grok-3 3T (Feb/2025) 3T (Nov/2025) Match
Grok-4 3T (Jul/2025) 3T (Nov/2025) Match

The Grok-3 and Grok-4 centrepoints landed on the disclosed figures. This is the kind of cross-check that validates the lens method, particularly Lens 3 (training supply via Colossus), Lens 1 (pricing/capability bracket), and Lens 4 (open-source frontier comparisons).

Pine AI’s knowledge-probe method

In April 2026, Bojie Li at Pine AI published Incompressible Knowledge Probes (arXiv 2604.24827), a separate parameter-estimation method that uses factual recall to lower-bound how much a model knows, and inverts a calibration on 89 open-weight models (R²=0.917) to estimate proprietary model sizes. It is the first published method to estimate frontier proprietary model sizes by measuring factual recall directly, and a useful cross-reference for the figures in Models Table Pro.

Model Vendor Accuracy Est. Size 90% PI
GPT-5.5 OpenAI 71.9% ~9.7T [3.2–28.7T]
Claude Opus 4.6 Anthropic 68.0% ~5.3T [1.8–15.6T]
GPT-5 Pro OpenAI 66.5% ~4.1T [1.4–12.2T]
GPT-5 OpenAI 66.4% ~4.1T [1.4–12.1T]
Claude Opus 4.7 Anthropic 66.4% ~4.0T [1.4–12.0T]
o1 OpenAI 65.4% ~3.5T [1.2–10.3T]
Claude Opus 4.5 Anthropic 65.2% ~3.4T [1.1–10.0T]
Claude Opus 4.1 Anthropic 64.9% ~3.2T [1.1–9.5T]
Grok-4 xAI 64.8% ~3.2T [1.1–9.4T]
o3 OpenAI 64.4% ~3.0T [1.0–8.9T]
GPT-5.4 Pro OpenAI 62.5% ~2.2T [736B–6.5T]
GPT-4.1 OpenAI 62.3% ~2.2T [719B–6.4T]
Grok-3 xAI 62.3% ~2.1T [715B–6.3T]
Claude Sonnet 4.6 Anthropic 60.9% ~1.7T [579B–5.1T]
GPT-5.3 OpenAI 60.0% ~1.5T [503B–4.5T]
GPT-5.2 Pro OpenAI 59.7% ~1.4T [478B–4.2T]
Claude Opus 4 Anthropic 59.7% ~1.4T [478B–4.2T]
GPT-5.1 OpenAI 59.3% ~1.3T [450B–4.0T]
GPT-5.2 OpenAI 58.9% ~1.3T [417B–3.8T]
Gemini 2.5 Pro Google 58.4% ~1.2T [387B–3.4T]
GPT-5.4 OpenAI 57.7% ~1.0T [348B–3.1T]
GPT-4o OpenAI 55.3% ~720B [241B–2.1T]
Qwen3-Max Alibaba 55.0% ~685B [229B–2.0T]
GPT-4 OpenAI 54.8% ~666B [223B–2.0T]
GPT-4-Turbo OpenAI 54.5% ~630B [211B–1.9T]
GPT-5 Mini OpenAI 51.7% ~410B [137B–1.2T]
Gemini 2.5 Flash Google 47.4% ~207B [69B–617B]
Claude 3.5 Haiku Anthropic 45.6% ~158B [53B–470B]
GPT-5 Nano OpenAI 40.5% ~71B [24B–212B]
Claude Haiku 4.5 Anthropic 39.9% ~65B [22B–194B]

Source: Pine AI’s paper (Apr/2026), p13.

Caveats on Pine AI’s specific estimates
Pine’s method is intrinsic (it measures stored facts through black-box probes), so its numbers represent effective knowledge capacity rather than raw parameter count. That framing explains most gaps from working consensus, for example the GPT-4 and o1 figures sitting below the widely-reported SemiAnalysis lineage. I anchor each row against the disclosed and open-weight figures rather than reading it alone.

Where the two methods agree on recent frontier estimates:

Model My estimate Pine (Apr/2026) Notes
Claude Opus 4.6 ~5T ~5.3T Match
GPT-5 3T ~4.1T Within Pine’s interval, slightly above mine
Grok-4 3T (disclosed) ~3.2T Both close to disclosed
Grok-3 3T (disclosed) ~2.1T Both inside intervals; mine landed exactly on the disclosure

The two methods are measuring slightly different things, Pine’s estimates represent effective knowledge capacity in open-model-equivalent parameters, which can run higher than raw parameter count for proprietary models with heavy post-training or denser data. So a gap on a single model is not necessarily evidence that one method is wrong; it can also mean the model stores more factual knowledge per parameter than the open-weight calibration set predicts. This convergence is what feeds the confidence grade in Models Table Pro: the more independent signals that agree on a figure, the higher it is graded.

Closing

No method gets every estimate right when labs keep their architecture and training details confidential. Grok-3 and Grok-4 landed on the disclosed figures. The Claude Mythos compute estimate will move once the picture settles. I’ll continue providing informed estimates from the best signals available, documenting the working, and updating when the ground truth arrives.

Models Table Pro is updated continuously. Detailed analyses are available to full subscribers in The Memo.

Further reading

Resource Type What it covers
Models Table Pro Reference table The reference table for 30,000+ LLM data points, including full compute, pricing, and more.
Models Table Public table The public table for 10,000+ LLM data points, including parameter and token estimates.
What’s in My AI? Free public paper Token-counting methodology for publicly-known training corpora
What’s in GPT-5? Methodology paper Full parameter/token derivation for GPT-5, including synthetic-data analysis (cited by the G7)
What’s in Grok? Methodology paper Full parameter/token derivation for the Grok family
Frontier pricing visualisation Viz Older version of the price-lens method
The Memo – GPT-5 special edition Paid per-model analysis Per-model sizing working for GPT-5
The Memo – Claude Mythos special edition Paid per-model analysis Per-model sizing working for Claude Mythos Preview
Datasets Table Reference table Largest known training datasets, updated continuously
Chinchilla data scaling laws page Reference page Tokens-to-parameters ratio research from Chinchilla onward

Get The Memo

by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.
Informs research at Apple, Google, Microsoft · Bestseller in 147 countries.
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.

Alan D. Thompson is a world expert in artificial intelligence, advising everyone from Apple to the US Government on integrated AI. Throughout Mensa International’s history, both Isaac Asimov and Alan held leadership roles, each exploring the frontier between human and artificial minds. His landmark analysis of post-2020 AI—from his widely-cited Models Table to his regular intelligence briefing The Memo—has shaped how governments and Fortune 500s approach artificial intelligence. With popular tools like the Declaration on AI Consciousness, and the ASI checklist, Alan continues to illuminate humanity’s AI evolution. Technical highlights.

This page last updated: 3/Jun/2026. https://lifearchitect.ai/models-table-methodology/