Get The Memo.
Alan D. Thompson
February 2026
Summary
By June 2026, the total daily word output of all AI models combined is projected to exceed the total words spoken and written each day by the world’s 8.3 billion people.
By January 2027, model output from a single AI lab (likely ByteDance, Google, or OpenAI) is projected to surpass the world’s 8.3 billion people in daily spoken and written words.
Intelligence explosion chart
Methodology
This sheet (and chart) tracks estimated daily output tokens (trillions) for major frontier AI providers, with stylised projections through 2028. Estimates are assembled from primary disclosures (earnings calls, keynotes, executive statements), secondary reports (analyst research, market data), and informed estimates derived from revenue, infrastructure, and usage data where direct disclosure is unavailable. All sources and quotes are documented in the sheet.
The scope covers all inference surfaces (consumer apps, embedded AI via search/email/docs, enterprise API, and internal use) not just public API traffic. ‘Output’ refers to tokens generated by the model; ‘total’ includes both input and output tokens. Where providers report only total tokens processed, an input-to-output ratio is applied. Projections use a uniform annual multiplier, consistent with observed year-over-year growth rates across major providers. On-device inference (including Apple and Samsung LLMs) is largely uncounted. The true global total could be even higher than tracked here.
Assumptions
– Human output: Mehl et al (2007) originally estimated ~16K spoken words per person per day. A 2024 registered replication across 2,197 participants, 22 samples, and multiple cultures revised this down to ~13K (men ~12K, women ~13.3K). Adding ~3K words for typing, writing, and other text creation brings the total back to ~16K words per person per day. At 1.33 tokens per word across 8.3B people, total human output is ~176T tokens/day. This excludes thinking.
– Token terminology: ‘Processed’, ‘calls’, and ‘handled’ are interpreted as total tokens (input + output). ‘Generated’ is interpreted as output tokens only. This analysis focuses on output.
– In/out ratio: Where only a total is reported, a 3:1 input-to-output ratio is applied (75% input, 25% output), typical of chat workloads. Agentic and coding workloads skew higher on input.
– Scope: Figures attempt to capture all inference surfaces (consumer apps, embedded, enterprise API, internal), not just public API. Self-hosted open-source inference is included where possible.
– Point-in-time vs annual average: Most sources quote monthly figures. These are shown as representative value for end-of-year.
– Projections: A uniform 10× annual multiplier is applied. This is consistent with observed growth rates (e.g. 2024–2025 YoY: Google≈50×, ByteDance≈137×) and reflects expected growth.
– Revenue-derived estimates (Anthropic, xAI): Token volumes inferred from revenue ÷ blended price-per-token. These tend to undercount free-tier and subsidised usage.
– Infrastructure-derived estimates (xAI): Based on GPU count × estimated inference allocation × per-GPU throughput. Sensitive to the inference/training split assumption.
Conclusions
– Model output from all AI labs combined is projected to overtake total daily human output (~176T tokens) by June 2026.
– Model output from a single AI lab (likely ByteDance, Google, or OpenAI) is projected to overtake total daily human output by January 2027.
– China-based labs account for nearly half of all global AI output at ~29T tokens/day (to the end of 2025).
– ByteDance Doubao (not OpenAI) is the single largest token generator to the end of 2025, at 15.75T output tokens/day.
– Token growth is driven by more than just user adoption: reasoning models (chain-of-thought) consume 10–40× more tokens per query.
– Output from on-device inference (Apple Intelligence models, Samsung Galaxy AI using Gauss) remains the key uncounted volume.
Frequently asked questions
Q: What’s the difference between tokens ‘processed’ and tokens ‘generated’?
A: On 9/Oct/2025, Google CEO Sundar Pichai said:
AI Overviews in Search are used by over 2 billion people, and this summer, we reached a milestone of 1.3 quadrillion monthly tokens processed across our surfaces (up from 980 trillion monthly tokens announced in July).
The keyword here is ‘processed’ (rather than ‘generated’), which possibly refers to both input tokens (typed by a human or read from a document) and output tokens (generated by AI). Various estimates place input:output ratios at about 5:95 for chat (there is extensive literature to guide a closer average). In the viz above, OpenAI’s commentary around the same time as Google’s quote points to 10T tok/day generated ≈ 300T tok/month ∴ 1,000T:300T input:output ≈ 1.3Qa total. This suggests a flipped ratio of 77:23, likely caused by ‘AI Overviews in Search’ (search result summaries), email summaries, and docs/sheets assistants (big input, small output) where users interact with large documents for a short answer. This shift in token ratios toward high-input usage is a direct result of how users interact with enterprise and productivity tools.
Q: What does ’10× annual multiplier’ mean for projections?
A: Each lab’s output figure is multiplied by 10× for next year and beyond. This is a stylised projection rather than a forecast. It reflects the order of magnitude of growth actually observed: ByteDance’s Doubao grew 137× year-over-year to May 2025; Google’s token volume grew roughly 50× from 2024 to late 2025. The 10× multiplier is deliberately uniform and conservative relative to historical data.
Q: What’s driving the 100×+ growth? Is it real demand?
A: Partially. User growth is substantial (ChatGPT: 400M weekly users in Feb/2025 → 910M by Jan/2026). But much of the token volume explosion is structural. Reasoning models consume 10–40× more tokens per query through internal chain-of-thought. Google’s Gemini 2.5 Flash uses 17× more tokens per request than its predecessor. Agentic workflows generate tens of thousands of tokens where basic chat used hundreds.
Q: How reliable are the China figures?
A: China market data relies on a mix of company announcements (ByteDance/Volcano Engine, Baidu earnings, Tencent developer conferences), and market research, typically reported through Chinese-language outlets and regional English-language press. Where direct token disclosures are unavailable, estimates are derived from user counts, search volume comparisons, and cloud market share data. The China figures carry wider confidence intervals than US figures.
Q: Why isn’t [lab X] included?
A: The table tracks the largest known providers by inference volume. Self-hosted open-source inference (Llama alone has been downloaded over 1 billion times) is estimated. For details on 700+ more models, see the Models Table.
Viz
AI output has expanded at a pace that is difficult to intuit without scale comparisons. This visualization tracks the change from GPT-3 in 2021 to GPT-4o in 2024 and to frontier systems in 2025, then sets those figures against aggregate human output. The shift is quantitative and structural. Framed against a physical benchmark, model output advances from a single New York Public Library per year to multiple libraries per day. The comparison clarifies the order-of-magnitude transition in how much text a frontier AI model can generate.
Figures rounded for readability. 1 word ≈ 1.33 tokens. 133K tokens ≈ 100K words ≈ 1 book. 1 New York Public Library ≈ 12M books ≈ 1.6T tokens ≈ 1.2T words. Human output ≈ 20K words x 8B people ≈ 160T words ≈ 213.3T tokens. Alan D. Thompson, 2026.
Text for indexing
| 2021 (GPT-3) |
2024 (GPT-4o) |
2025 (Frontier LLM) |
Human output (1x) | 10x | 100x | |
|---|---|---|---|---|---|---|
| Tokens per day | 6B | 200B | 12T | 176T | 1.7Qa | 17Qa |
| Words per minute | 3.1M | 104M | 6.2B | 92B | 922B | 9.2T |
| Tokens per second | 70K | 2.3M | 139M | 2B | 20B | 204B |
| New York Public Libraries per day | 0.004 | 0.125 | 7.5 | 110 | 1,100 | 11,000 |
| 1 New York Public Library per x… | 266 days | 8 days | 3.2 hours | 13.1 mins | 1.3 min | 7.9 secs |
Header image photo by Nick Fewings, ‘A very large cluster of stars in the night sky photo’.
Get The Memo
by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.Informs research at Apple, Google, Microsoft · Bestseller in 147 countries.
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.
Alan D. Thompson is a world expert in artificial intelligence, advising everyone from Apple to the US Government on integrated AI. Throughout Mensa International’s history, both Isaac Asimov and Alan held leadership roles, each exploring the frontier between human and artificial minds. His landmark analysis of post-2020 AI—from his widely-cited Models Table to his regular intelligence briefing The Memo—has shaped how governments and Fortune 500s approach artificial intelligence. With popular tools like the Declaration on AI Consciousness, and the ASI checklist, Alan continues to illuminate humanity’s AI evolution. Technical highlights.This page last updated: 27/Feb/2026. https://lifearchitect.ai/output/↑
Open chart




