Open the Models Table in a new tab | Back to LifeArchitect.ai
Open the Models Table in a new tab | Back to LifeArchitect.ai
2024 optimal LLM highlights
Download source (PDF)Permissions: Yes, you can use these visualizations anywhere, please leave the citation intact.
Older bubbles viz
Feb/2024
Nov/2023
Download source (PDF)Permissions: Yes, you can use these visualizations anywhere, please leave the citation intact.
Mar/2023
Download source (PDF)Apr/2022
Download source (PDF)Data dictionary
Model (Text)
Name of the large language model. Sometimes uses filename syntax.
Lab (Text)
Name of the organization or group responsible for training or publishing the model. Sometimes lists a consortium such as “International”. Color highlights popular lab names.
Playground (URI)
URI pointing to a playground of the model, or HuggingFace repository for hosting weights.
Parameters (B) (Float)
Total number of parameters (weights) in the model. Using total weights for Dense, and total weights (not just active weights) for MoE.
Tokens trained (B) (Integer)
Total number of tokens (sub-words) used to train the model end-to-end, taking into account reported dataset, epochs, pretraining, and fine-tuning tokens.
Ratio Tokens:Params (Ratio)
Number of tokens trained per parameter. Chinchilla scaling ≥ 20:1. Color highlights RED=0–7, ORANGE=8–16, GREEN=17–499, DARK GREEN=500–9999.
ALScore (Float)
Quick and dirty rating of the model’s power. The formula is: Sqr Root of (Parameters x Tokens) ÷ 300. Any ALScore ≥ 1.0 is a powerful model in mid-2023. Color highlights centerpoint 15.
MMLU (Float)
Benchmark score 0–100 on Massive Multitask Language Understanding, released Sep/2020: MMLU paper. Color highlights centerpoint 80.
MMLU-Pro (Float)
Benchmark score 0–100 on Massive Multitask Language Understanding Pro, released Jun/2024: MMLU-Pro paper. Color highlights centerpoint 55.
GPQA (Float)
Benchmark score 0–100 on Google-Proof Q&A, released Nov/2023: GPQA paper. Color highlights centerpoint 40.
Training dataset (Symbol)
Rough guide of major datasets used to train the model. See key. Note increasing use of synthetic data from 2023.
Announced (Date)
Date as month/year. All data sorted by this column descending.
Public? (Symbol)
Ternary: GREEN=publicly accessible (weights, API, playground…), YELLOW=video or scripted demo only, RED=held in lab and never released.
Paper / Repo (URI)
URI pointing to official paper, technical note, or model card. Sometimes shows link to GitHub repository.
Arch (Text)
Architecture: Dense versus Mixture of Experts (MoE).
Notes (Text)
Any further comments or useful highlights.
List of all models shown in the Models Table
Models Table, list of 400+ large language models, as used by all major AI labs, including:
Amazon Olympus, OpenAI GPT-5, OpenAI GPT-6, ANL AuroraGPT (ScienceGPT), xAI Grok-2, Alibaba Qwen2, Google DeepMind Gemma 2, Microsoft MAI-1
Infection AI Inflection-3 Productivity (3.0), Inflection AI Inflection-3 Pi (3.0), Liquid AI LFM-40B (40B), Salesforce SFR-LLaMA-3.1-70B-Judge (70B), BAAI Emu3 (8B), NVIDIA NLVM 1.0 (72B), China Telecom Artificial Intelligence Research Institute Unnamed (1T), China Telecom Artificial Intelligence Research Institute TeleChat2-115B (115B), AMD AMD-Llama-135m, Meta AI Llama 3.2 (90B), Meta AI Llama 3.2 (3.21B), Allen AI Molmo (72B), Google DeepMind Gemini-1.5-Pro-002 (1.5T), Alibaba Qwen2.5 (72B), Microsoft GRIN MoE (60B), Google DeepMind Data-Gemma (27B), OpenAI o1 (200B), Jina AI Reader-LM (1.54B), Mistral Pixtral-12b-240910 (12B), DeepSeek-AI DeepSeek-V2.5 (236B), 01-ai Yi-Coder (9B), Allen AI OLMoE-1B-7B, Salesforce xLAM (141B), Magic LTM-2-mini (20B), Cartesia Rene (1.3B), Google DeepMind Gemini 1.5 Flash-8B, Aleph Alpha Pharia-1-LLM-7B, Stanford TTT-Linear (1.3B), AI21 Jamba 1.5 (398B), Microsoft phi-3.5-MoE (60B), Microsoft phi-3.5-mini (3.8B), NVIDIA Minitron-4B, Sarvam AI sarvam-2b (2B), xAI Grok-2 (600B), LG EXAONE 3.0 (7.8B), TII Falcon Mamba 7B, Writer Palmyra-Med-70B, Writer Palmyra-Fin-70B, Zyphra Zamba2-small (2.7B), NVIDIA Minitron-8B (4B), Mistral Mistral Large 2 (123B), Meta AI Llama 3.1 405B, OpenAI GPT-4o mini 8B, Mistral NeMo 12B, Mistral Codestral Mamba (7B), Mistral Mathstral (7B), DeepL next-gen, Hugging Face SmolLM (1.7B), Vectara Mockingbird (9B), Google DeepMind FLAMe (24B), H2O.ai H2O-Danube3-4B, Microsoft Causal Axioms (67M), SenseTime SenseNova 5.5 (600B), Kyutai Helium 7B, Shanghai AI Laboratory/SenseTime InternLM2.5 (7B), Meta AI Llama 3 405B, Baidu ERNIE 4.0 Turbo, Google DeepMind Gemma 2 (27B), OpenAI CriticGPT, Apple 4M-21 (3B), EvolutionaryScale ESM3 (98B), Huawei PanGu 5.0 Super (1T), Anthropic Claude 3.5 Sonnet, DeepSeek-AI DeepSeek-Coder-V2, NVIDIA Nemotron-4-340B, Apple Apple On-Device model Jun/2024, UCSC MatMul-Free LM, Galileo Luna, Alibaba Qwen2, Alibaba Qwen2-57B-A14B, Kunlun Tech Skywork MoE 16x13B, CMU Mamba-2, International MAP-Neo, LLM360 K2, Mistral Codestral, Cohere Aya-23, 01-ai Yi-XLarge, 01-ai Yi-Large, Cerebras Sparse Llama, Google DeepMind Gemini, OpenAI GPT-4o, TII Falcon 2, Fujitsu Fugaku-LLM, 01-ai Yi 1.5, Microsoft YOCO, DeepSeek-AI DeepSeek-V2, Independent ChuXin, RWKV RWKV-v6 Finch, ELLIS xLSTM, IBM Granite Code, Alibaba Qwen-Max, Google DeepMind Med-Gemini-L 1.0, Alibaba Qwen-1.5 (110B), Snowflake AI Research Arctic, SenseTime SenseNova 5.0, Apple OpenELM, Microsoft phi-3-medium, Microsoft phi-3-mini, Meta AI Llama 3, Amazon HLAT, Hugging Face Idefics2, Reka AI Reka Core, Microsoft WizardLM-2-8x22B, EleutherAI Pile-T5, Hugging Face H4 Zephyr 141B-A35B, Cohere Rerank 3, OpenAI gpt-4-turbo-2024-04-09, Tsinghua MiniCPM-2.4BMistral AI Mixtral-8x22b, Sail Sailor, MIT JetMoE-8B, Tsinghua Eurus, Cohere Command-R+, Silo AI Viking, Nous Research OLMo-Bitnet-1B, International Aurora-M, Apple ReALM-3B, Alibaba Qwen1.5-MoE-A2.7B, xAI Grok-1.5, AI21 Jamba, MosaicML DBRX, Stability AI Stable Code Instruct 3B, Sakana AI EvoLLM-JP, Rakuten Group RakutenAI-7B, Independent Parakeet, RWKV RWKV-v5 EagleX, Apple MM1, Covariant RFM-1, Cohere Command-R, DeepSeek-AI DeepSeek-VL, Fudan University AnyGPT, Stability AI Stable Beluga 2.5, Inflection AI Inflection-2.5, SRIBD/CUHK Apollo, Anthropic Claude 3 Opus, HF/ServiceNow StarCoder 2, ByteDance 530B, ByteDance 175B, Mistral AI Mistral Small, Mistral AI Mistral Large, Reliance Hanooman, Apple Ask, Reka AI Reka Edge, Reka AI Reka Flash, Google DeepMind Gemma, Google DeepMind Gemini 1.5 Pro, Alibaba Qwen-1.5, BRAIN GOODY-2, ChatDB Natural-SQL-7B, AI Singapore Sea-Lion, Google TimesFM, Allen AI OLMo, Cerebras FLOR-6.3B, AIWaves.cn Weaver, Mistral AI miqu 70b, iFlyTek iFlytekSpark-13B, iFlyTek Xinghuo 3.5 (Spark), Apple MGIE, Meta AI CodeLlama-70B, RWKV RWKV-v5 Eagle 7B, LMU MaLA-500, Cornell MambaByte, DeepSeek-AI DeepSeek-Coder, Tencent FuseLLM, Adept Fuyu-Heavy, Zhipu AI (Tsinghua) GLM-4, DeepSeek-AI DeepSeekMoE, DeepSeek-AI DeepSeek, Tencent LLaMA Pro, SUTD/Independent TinyLlama, JPMorgan DocLLM, Allen AI Unified-IO 2, Microsoft WaveCoder-DS-6.7B, Huawei YunShan, Huawei PanGu-Pi, Wenge YAYI 2, BAAI Emu2, Google DeepMind MedLM, Upstage AI SOLAR-10.7B, Deci DeciLM-7B, Mistral AI Mistral-medium, Mistral AI mixtral-8x7b-32kseqlen, Together StripedHyena 7B, Nexusflow.ai NexusRaven-V2 13B, Google DeepMind Gemini Ultra 1.0, CMU Mamba, Berkeley/JHU LVM-3B, Alibaba SeaLLM-13b, Perplexity pplx-70b-online, Meta AI SeamlessM4T-Large v2, Google DeepMind Q-Transformer, IEIT Yuan 2.0, EPFL MEDITRON, Microsoft Transformers-Arithmetic, Berkeley Starling-7B, Inflection AI Inflection-2, Anthropic Claude 2.1, Allen AI TÜLU 2, Microsoft Orca 2, Microsoft Phi-2, Microsoft Florence-2, Google DeepMind Mirasol3B, NTU OtterHD-8B, Samsung Gauss, xAI Grok-1, xAI Grok-0, 01-ai Yi-34B, OpenAI GPT-4 Turbo, Jina AI jina-embeddings-v2, Adept Fuyu, Baidu ERNIE 4.0, Hugging Face H4 Zephyr, Google DeepMind PaLI-3, NVIDIA Retro 48B, Apple Ferret, XLANG Lab Lemur, KAUST/Shenzhen AceGPT, Reka AI Yasa-1, Google DeepMind RT-X, Waymo MotionLM, Wayve GAIA-1, Alibaba Qwen, Meta AI Llama 2 Long, Hessian AI/LAION LeoLM, Mistral AI Mistral 7B, Microsoft Kosmos-2.5, Baichuan Baichuan 2, ThirdAI BOLT2.5B, Deci DeciLM, IBM MoLM, Singapore NExT-GPT, Microsoft Phi-1.5, Apple UniLM, Adept Persimmon-8B, BAAI FLM-101B, TII Falcon 180B, Tencent Hunyuan, Independent phi-CTNL, Inception Jais, Meta AI Code Llama 34B, Hugging Face IDEFICS, UI/NVIDIA Raven, AzaleAI DukunLM, Microsoft WizardLM, Boston University Platypus, Stability AI Japanese StableLM Alpha 7B, Stability AI StableCode, Stanford Med-Flamingo, LightOn Alfred-40B-0723, Together LLaMA-2-7B-32K, Google DeepMind Med-PaLM M, Cerebras BTLM-3B-8K, Stability AI Stable Beluga 2, Stability AI Stable Beluga 1, Shanghai AI Laboratory/CUHK Meta-Transformer, Meta AI Llama 2, (Undisclosed) WormGPT, Anthropic Claude 2, IDEAS/DeepMind LongLLaMA, Tsinghua xTrimoPGLM, Salesforce XGen, 360 Zhinao (Intellectual Brain), Reka AI Yasa, Microsoft Kosmos-2, Google AudioPaLM, Inflection AI Inflection-1, Microsoft Phi-1, Shanghai AI Laboratory/SenseTime InternLM, Meta AI BlenderBot 3x, Microsoft Orca, ETH Zürich PassGPT, Google DeepMind DIDACT, Magic LTM-1, OpenAI GPT-4 MathMix, Cambridge/Tencent PandaGPT, TII Falcon, Refact 202305-refact2b-mqa-lion, UW Guanaco, Meta AI LIMA, Asus/TWS Formosa (FFM), Salesforce CodeT5+, Google PaLM 2, HF/ServiceNow StarCoder, MosaicML MPT, Inflection AI Pi, NVIDIA GPT-2B-001, Amazon Titan, Microsoft WizardLM, MosaicML MPT, Stability AI StableLM, Databricks Dolly 2.0, EleutherAI Pythia, Berkeley Koala-13B, Character.ai C1.2, Bloomberg BloombergGPT, LAION OpenFlamingo-9B, Nomic GPT4All-LoRa, Cerebras Cerebras-GPT, Huawei PanGu-Sigma, Google CoLT5, Google DeepMind Med-PaLM 2, OpenAI GPT-4, Stanford Alpaca, AI21 Jurassic-2, Together GPT-NeoX-Chat-Base-20B, Microsoft Kosmos-1, Meta AI LLaMA-65B, Fudan University MOSS, Writer Palmyra, Aleph Alpha Luminous Supreme Control, Meta AI Toolformer+Atlas 11B+NLLB 54B, Amazon Multimodal-CoT, Microsoft FLAME, Google DeepMind Med-PaLM 1, Meta AI OPT-IML, Anthropic RL-CAI, Baidu ERNIE-Code, Google RT-1, OpenAI ChatGPT (gpt-3.5-turbo), OpenAI text-davinci-003, Together GPT-JT, RWKV RWKV-4, Meta AI Galactica, DeepMind SED, BigScience mT0, BigScience BLOOMZ, Microsoft PACT, Google Flan-T5, Google Flan-PaLM, Google U-PaLM, NVIDIA VIMA, Tsinghua OpenChat, Wechat WeLM, Tsinghua CodeGeeX, DeepMind Sparrow, Google PaLI, NVIDIA NeMo Megatron-GPT 20B, Microsoft Z-Code++, Meta AI Atlas, Meta AI BlenderBot 3, Tsinghua GLM-130B, Amazon Alexa AI AlexaTM 20B, OpenAI 6.9B FIM, Google ‘monorepo-Transformer’, Huawei PanGu-Coder, Meta AI NLLB, AI21 J-1 RBG, BigScience BLOOM (tr11-176B-ml), Google Minerva, Microsoft GODEL-XL, Yandex YaLM 100B, Allen AI Unified-IO, DeepMind Perceiver AR, Google LIMoE, Independent GPT-4chan, Stanford Diffusion-LM, Google UL2 20B, DeepMind Gato (Cat), Google LaMDA 2, Meta AI OPT-175B, Hugging Face Tk-Instruct, Meta AI InCoder, TII NOOR, Sber mGPT, Google PaLM-Coder, Google PaLM, Meta AI SeeKeR, Salesforce CodeGen, LightOn VLM-4, Meta AI CM3, Aleph Alpha Luminous, DeepMind Chinchilla, EleutherAI GPT-NeoX-20B, Baidu ERNIE 3.0 Titan, Meta AI XGLM, Meta AI Fairseq, DeepMind Gopher, Google GLaM, Anthropic Anthropic-LM 52B, DeepMind RETRO, Google BERT-480, Google BERT-200, Coteries Cedille FR-Boris, Microsoft/NVIDIA MT-NLG, Google FLAN, Cohere xlarge, Baidu PLATO-XL, Allen AI Macaw, Salesforce CodeT5, OpenAI Codex, AI21 Jurassic-1, Meta AI BlenderBot 2.0, EleutherAI GPT-J, Google LaMDA, Huawei/Sberbank ruGPT-3, Google Switch, OpenAI GPT-3, Meta AI Megatron-11B, Google Meena, Google T5, Meta AI RoBERTa, OpenAI GPT-2, Google BERT, OpenAI GPT-1, Fast.ai ULMFiT
The sheet also shows a set of Chinese models, including:
Baidu Wenxin Yiyi, iFLYTEK Sibichi, Dachang Data Mooc, Huawei Cloud Daoyi Tianwen, Chongqing University MOSS, Zhixin Technology ChatGLM, Qingmang Qingmang, Qingmang+Guangcone, Qingmang-Wang, Intengine Daoyi Tianwen, Q&A Track Mountain University Bense, Shell BELLE, Baichuan Intelligence baichuan, OpenBMB CPM, Intengine Yingjie: Qingyuan, OpenMEDLab, Yunhezhi Shanhai, Beijing North University TechGPT, Zhizhongwen Shenzhen Jiwei, Lü Ying, Chinese Academy of Sciences Enhanced Dal Liu, Ideal Technology TigerBot, IDEA Research Institute Xiaozhe Technology MindBot, Shanghai Jiao Tong University K2, Baiyulan, 360 Zhineng, Yijian, Duxiaoman Qianyan, Doctoral Engineering Technology Research Institute ProactiveHealthGPT, Heihei, Huru SoulChat, Wenzi Technology Anima, Peking University Law Artificial Intelligence Research Institute ChatLaw, Xiangde Technology Co., Ltd. Muyuan, Horgos MiniMax, Tencent Cloud Tencent, Race Technology+Chongqing Replay Network Race Type XPT, Institute of Computing Technology, Chinese Academy of Sciences Baima, Beijing Language University Bangbang, SenseTime Ririxin, National Supercomputing Center in Tianjin Tianjin Tianyuan, Guoke Technology No Weight, Saisen, Race Technology+Tianjin University Haihe·Mint, Bian Sheng Electronic LightGPT, Telecom Zhike Xingyin, Xiamen Yunji Xiamen YunGPT, Zhizhuyan Jingshi, TAL MathGPT, Shugan Space Great Wall, Ideal Technology Dadao Dao, Huisheng Intelligence Zhixin, China Internet Zhigong, Chuangye Black Horse Tianqi, Together Technology Bowen, NetEase Youdao Yuchuan, NetEase Youdao Wangyan, Weiding Tianji, Zhihu Zhihu Zhihu, Yixing Network Science Uni-talk, Luwen Education Luwen, Zhongke Chuangda Magic Cube Rubik, Tencent Pao Pao, Douyin Vision Dou Tian, Leyan Technology Leyan, Didi Intelligence Xianxiang, Zhizi Engine Metaverse, Douyin Technology Douyin, Microhuan Intelligence Ronggu, Evernote Elephant GPT, Hummingbird Unity Hummingbird, Universe Leap Grace, Aomen Nuomen Kang Jianuo, Shuzu Technology SocialGPT, Cloud from Technology Congrong, Dianke Daxiao Xiao Ke, Agricultural Bank of China Xiaomi ChatABC, Tencent Fusion Tianlai AllMe, Taijiu Cloud Ensespers FFM, Yiyi Technology medGPT, Chaos Science MindGPT, Lingjing Multi-AI Dongni, Changhong IT Changhong Totem, Child King KidsGPT, Zhongke Wendao Daoyi, Didi Technology Lanzi, JD Jixing, ChatJD, Zhizuan Intelligence Huajun, H3C Baitian Cloud House, Tencent Blue Whale Tencent Brain·Brain Sea, Ushi Technology Huimu, China Unicom Yuxiang, Meituan Technology Dahuangfeng, Zitian Power Technology Darwin, Really Smart Zhao Bin, Jiadu Technology Jiadu Zhiyin, Smart Environment Research Institute Smart, Xinyun Research Institute Science EmoGPT, EduChat, Yandao Intelligent ArynGPT, Tencent WAI, Northwestern Polytechnical University Huawei Technology Ziguang·Observation, Singularity Intelligent Singularity OpenAPI, Lenovo Technology Lenovo, Shanghai University of Science and Technology DoctorGLM, Xuannengao Zhimei Couple System, Hong Kong University of Science and Technology Robin, Shengang Communication Source, China Mobile Datian, China Telecom TeleChat, Rongyun Cloud Fanke, Yuntian Lifly Tianshu, Smart Technology CityGPT.