What’s in Grok? (Independent Grok-3 Paper)

A Comprehensive Analysis of xAI’s Grok Models
Alan D. Thompson
LifeArchitect.ai
February 2025
30+ pages incl title page, references, appendices.

The report

Download report
(Updated post-release of Grok-3, 17/Feb/2025. Available exclusively to full subscribers of The Memo.)

Abstract
The xAI Grok series of large language models represents one of the most fascinating disruptions in artificial intelligence history. Since launching in March 2023, the series has evolved rapidly from a 33B parameter prototype to frontier models measured in trillions of parameters. xAI maintains secrecy around Grok, releasing no comprehensive technical documentation about its architecture, training methodology, or datasets. This report presents the first quantitative analysis of xAI’s closely guarded development process, exploring Grok’s architecture, datasets, tokens, parameters, and capabilities across several major model announcements. Using data from the Twitter platform, and training on Colossus—said to be 2024’s largest AI supercomputer—Grok represents an ambitious project in scale and development speed. Building on the acclaimed reports What’s in my AI? (2022), and What’s in GPT-5? (2024), this analysis reveals the details of xAI’s rapid path to superintelligence.

Contents
1. Overview
2. xAI
    Knowledge transfer through staff migration
3. The Grok series of models
    Grok-0
    Grok-1
    Grok-1.5 (+ Grok-1.5V)
    Grok-2 (+ Grok-2-Vision, + Grok-2-mini, + Aurora)
    Grok-3 (+ Grok-3 mini, + Grok-3 reasoning)
    Grok-Video
    Grok-4
    Grok-5
4. Dataset: Twitter posts
5. Dataset: Twitter outbound links
6. Dataset: Everything else
7. Hardware: Colossus
8. Model size estimate
9. Integrated AI: Grok + Optimus humanoid robots
10. Integrated AI: Grok + brain-machine interfacing
11. Conclusion
12. Further reading
Appendix A: Datasets Table (Jan/2025 snapshot)

Reviews

Report excerpts

p6: ‘xAI co-founder Igor Babuschkin was originally a physicist, and briefly worked at CERN’s Large Hadron Collider. His journey through AI research is the basis for many of today’s frontier models. From his close collaborations with Ilya Sutskever and Alec Radford to introducing large language models GPT-4 and Codex, his research footprint spans many key developments in post-2020 AI.’

p16: ‘Building on this ‘gold’ synthetic dataset, and the Twitter datasets outlined in this report, xAI is expected to leverage several high-quality datasets drawn from the Datasets Table (see Appendix for a snapshot to Jan/2025). DeepMind’s MassiveText multilingual dataset contributed 13.3B tokens of multilingual Wikipedia for the RETRO model, and this could be replicated by former DeepMind employees now at xAI. EleutherAI’s The Pile dataset still offers high-quality books and journals corpora already in use at other AI labs. More recently, Hugging Face’s FineWeb presents an enhanced ‘educational’ curation of Common Crawl, and their collaboration with ServiceNow produced The Stack v2, a refined code dataset. The final dataset composition of the Grok-3 model can now be estimated based on these corpora, adding my subjective ‘ALQual’ data quality rating and priority orders drawn from weightings documented in literature like the RETRO and GPT-3 papers.’

p21: ‘Optimizations like quantization, pruning, and compression would still allow a model of this size to be served to hundreds of millions of users. On 20 January 2025, Musk demonstrated a Grok-3 output using 4-bit quantization. We estimate that this would amount to an 87.5% reduction in memory usage, easily fitting onto a single NVIDIA DGX B200 server, or several H100s, for inference. Grok-3 is one of the main frontier models alongside GPT, Claude, Gemini, and Llama.’

p23: ‘BMI technology enables two-way communication between brain and machine: neural signals flowing from human thoughts to computers, and AI signals feeding back into human biological intelligence. This bi-directional capability could revolutionize human–AI interaction. The architecture could combine an onboard ‘Grok mini’ model, running locally on advanced neural hardware, with wireless access to the frontier model in the cloud. The local model would handle immediate cognitive enhancement: boosting processing speed through direct neural acceleration, expanding working memory by oﬄoading to AI systems, and providing instant access to a baseline knowledge repository.’

Viz highlights

Viz. Journey to Grok-5 (2023–2026).

Viz. Igor Babuschkin: AI paper highlights. Jan/2025. LifeArchitect.ai

Image. Grok-2 + Aurora. Prompt by Musk’s four-year-old son, X: ‘Bunnies flying spaceships in Star Wars with a monster truck.’ Jan/2025. LifeArchitect.ai

List of all figures and tables

Videos

Get The Memo

by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.
Informs research at Apple, Google, Microsoft · Bestseller in 147 countries.
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.

Alan D. Thompson is a world expert in artificial intelligence, advising everyone from Apple to the US Government on integrated AI. Throughout Mensa International’s history, both Isaac Asimov and Alan held leadership roles, each exploring the frontier between human and artificial minds. His landmark analysis of post-2020 AI—from his widely-cited Models Table to his regular intelligence briefing The Memo—has shaped how governments and Fortune 500s approach artificial intelligence. With popular tools like the Declaration on AI Consciousness, and the ASI checklist, Alan continues to illuminate humanity’s AI evolution. Technical highlights.

This page last updated: 18/Apr/2025. https://lifearchitect.ai/whats-in-grok/↑