Get The Memo.
Alan D. Thompson
January 2025
Announced in Jan/2025, the NVIDIA Cosmos video dataset became the largest publicly-known training dataset in the world. With 20M hours of video (est 2.5B videos, avg 30sec), it has a raw size of ~45PB or ~9 quadrillion tokens (9Qa, 9,000T tokens). It surpassed the previous record holder from Jun/2024—the Common Crawl text dataset DCLM-Pool with 240T tokens—by 37.5×. All working by LifeArchitect.ai, assisted by OpenAI o1 on poe.com.
Summary
Organization | NVIDIA |
Dataset name | Cosmos |
Announce date | Jan/2025 |
Documents | ≈2.5B videos |
#Raw dataset size | ≈45PB |
#Raw dataset token count | ≈9,000T tokens |
#Filtered dataset size | ≈2PB (4.44%) |
#Filtered dataset token count | ≈400T tokens |
Paper | research.nvidia.com |
1. Raw dataset size calcs (≈45PB)
1) Paper: "20M hours of raw videos, 720p-4k" We'll assume ~5 Mbps average. 2) 5 Mbps = 5,000,000 bits/s = 625,000 bytes/s = 0.625 MB/s 3) MB per hour: 0.625 MB/s × 3600 s = 2250 MB/hour = 2.25 GB/hour 4) For 20M hours total: 2.25 GB/hour × 20,000,000 hours = 45,000,000 GB = 45,000 TB = ~45 PB total for the raw dataset
2. Raw dataset tokens calcs (≈9Qa tokens)
# (assuming "8×8×8" temporal & spatial compression, 30 fps @ 1080p) 1) Frames in 20M hours: • Each hour = 3600 s • 30 fps × 3600 s = 108,000 frames/hour • 108,000 frames/hour × 20,000,000 hours = 2.16×10^12 frames 2) Temporal compression (divide by 8): 2.16×10^12 frames ÷ 8 = 2.7×10^11 token-frames (270 billion) 3) Spatial compression (divide width by 8, height by 8): • 1920÷8=240 • 1080÷8=135 = 240×135 = 32,400 spatial tokens per token-frame 4) Multiply token-frames by spatial tokens: 2.7×10^11 × 3.24×10^4 = 8.748×10^15 = ~8.75×10^15 tokens = ~9,000 trillion tokens for the raw dataset = ~9 quadrillion tokens total for the raw dataset
Confirmed by NVIDIA interview (6/Jan/2025) ‘Cosmos WFM models [world foundation models] were trained on 9,000 trillion tokens from 20 million hours of (video data)…’
3. Filtered dataset size calcs (≈2PB)
1) Paper: "100M curated clips, each ~2–60s. We'll pick 30s average." 2) 100,000,000 clips × 30s = 3,000,000,000s total 3,000,000,000s ÷ 3600 = ~833,333 hours 3) 2.25 GB/hour × 833,333 hours = ~1,875,000 GB = 1875 TB = 1.875 PB = ~1.9 PB total for pretraining 4) And an additional 10M clips for fine-tuning = 190TB 5) 2.09 PB (or 2090 TB) for the filtered dataset
4. Filtered dataset tokens calcs (≈400T)
# (assuming "8×8×8" temporal & spatial compression, 30 fps @ 1080p) 1) Each 30s clip has 30 fps × 30 s = 900 frames 2) Temporal compression: 900 frames ÷ 8 = 112.5 → ~112 token-frames 3) Spatial compression: • 1920÷8=240 • 1080÷8=135 = 240×135 = 32,400 spatial tokens 4) Tokens per clip: 32,400 × 112 = 3,628,800 = ~3.63 million tokens each clip 5) For 100M clips: 3.63×10^6 tokens/clip × 100,000,000 clips = 3.63×10^14 = 363 trillion tokens total for pretraining 6) And an additional 10M clips for fine-tuning = 36.3 trillion tokens for fine-tuning 7) 400T tokens for the filtered dataset
NVIDIA Cosmos video dataset categories
- Nature dynamics (20%)
- Spatial awareness and navigation (16%)
- Hand motion and object manipulation (16%)
- Driving (11%)
- Human motion and activity (10%)
- First person point-of-view (8%)
- Dynamic camera movements (8%)
- Synthetically rendered (4%)
- Others (7%).
All dataset reports by LifeArchitect.ai (most recent at top)
Date | Title |
Jan/2025 | What's in Grok? (paper) |
Jan/2025 | NVIDIA Cosmos video dataset (page) |
Aug/2024 | What's in GPT-5? (paper) |
Jul/2024 | Argonne National Laboratory AuroraGPT (page) |
Sep/2023 | Google DeepMind Gemini: A general specialist (paper) |
Aug/2022 | Google Pathways (paper) |
Mar/2022 | What's in my AI? (GPT-1, GPT-2, GPT-3, MT-NLG, Chinchilla...) |
Sep/2021 | Megatron the Transformer, and related language models (page) |
Get The Memo
by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.Informs research at Apple, Google, Microsoft · Bestseller in 147 countries.
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.

This page last updated: 10/Jan/2025. https://lifearchitect.ai/cosmos/↑