Report: Google DeepMind Gemini

A general specialist

An independent report by
Alan D. Thompson
LifeArchitect.ai
February 2024
19 pages incl title page, references, appendix.

The report

Download report (PDF).

Notice: A pre-release edition of this independent report (Rev A) was made available in Sep/2023, before the release of Gemini. Following the release of the complete Gemini model family, this Feb/2024 report is the final edition (Rev 0).

Abstract

Since Google’s discovery of the Transformer architecture in 2017, and successive release of their pre-trained transformer language model BERT in October 2018, training large language models (LLMs) has become a new space race, bringing humanity towards its largest evolutionary change yet: ‘superintelligence.’

Between 2020 and 2024, LLMs continued to be trained on increasingly larger datasets, by ever larger teams of data scientists, with compute now measured in the hundreds of millions of dollars. The information synthesized here covers the progress made by Google and DeepMind, presenting as one company under the Alphabet umbrella in 2023, with a focus on the massive Gemini multimodal model.

Gemini Nano and Pro were released on 6/Dec/2023, and Gemini Ultra 1.0 was released on 7/Feb/2024. Gemini Ultra 1.0 is likely to be a dense model of around 1.5 trillion parameters trained on 30 trillion tokens. Compared to the GPT-4 sparse MoE model, Gemini Ultra 1.0 has a similar parameter count while being trained on 2× more data.

Contents

Background
- 1.1 Etymology
- 1.2 Google DeepMind: Two archers with one target
- 1.3 Gemini personnel
- 1.4 Gemini compute resources
- 1.5 Large language models
- 1.6 Text-to-image and visual language models
- 1.7 The Alpha series of AI systems
- 1.8 Putting it together: LLM + VLM + Text-to-image
Datasets
- 2.1 Datasets: Text: MassiveText multilingual
- 2.2 Datasets: Visual (images and video)
- 2.3 Datasets: Audio
Gemini capabilities and performance
- 3.1 Languages
- 3.2 Visual
- 3.3 IQ
Size comparison
Implementing and applying Gemini
Conclusion
Further reading
Appendix

Selected tables and viz from the report

Gemini compute
See working, with sources.

DeepMind Alpha systems

#	System	Expertise	Date	Description
1	AlphaGo	Go	Oct/2015	First AI in the series. Designed to play the board game Go; a huge number of possible board configurations. Mar/2016: beat world champion Lee Sedol.
2	AlphaGo Zero	Go	Oct/2017	Improved version of AlphaGo, learned to play from scratch, no prior knowledge beyond the game’s rules.
3	AlphaZero	Board games	Dec/2017	Further generalization of AlphaGo Zero’s approach to play other board games, chess and shogi. Superhuman performance.
4	AlphaFold 1	Protein folding	Dec/2018	Change in focus from games to scientific problems. Designed to predict protein folding structures, a complex problem in biology.
5	AlphaStar	StarCraft II	Jan/2019	Designed to play the real-time strategy game StarCraft II. First AI to reach a professional (Grandmaster) level.
6	AlphaFold 2	Protein folding	Nov/2020	New version of AlphaFold, later open sourced. Allows users to predict 3D structure of arbitrary proteins with exceptional accuracy.
7	AlphaCode	Software code	Feb/2022	A coding engine that creates computer programs at a rate comparable to that of an average programmer.
8	AlphaTensor	Maths	Oct/2022	The first AI system for discovering novel and provably correct algorithms for fundamental tasks such as matrix multiplication.
9	AlphaDev	Algorithms	Jun/2023	A system to discover enhanced computer science algorithms. Uses AlphaZero approach to find faster algorithms for tasks such as sorting and hashing.
10	AlphaCode 2	Software code	Dec/2023	Gemini-powered agent combining Gemini’s reasoning capabilities with search and tool-use for solving competitive programming problems.
11	AlphaGeometry	Geometry	Jan/2024	An AI system that solves complex geometry problems at a level approaching a human Olympiad gold-medalist.

Table. DeepMind Alpha systems 2015–2024. Highlights only. Assisted by GPT-4, HTML formatted by Gemini Pro.

Get The Memo

by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.
Bestseller. 10,000+ readers from 142 countries. Microsoft, Tesla, Google...
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.

Dr Alan D. Thompson is an AI expert and consultant, advising Fortune 500s and governments on post-2020 large language models. His work on artificial intelligence has been featured at NYU, with Microsoft AI and Google AI teams, at the University of Oxford’s 2021 debate on AI Ethics, and in the Leta AI (GPT-3) experiments viewed more than 4.5 million times. A contributor to the fields of human intelligence and peak performance, he has held positions as chairman for Mensa International, consultant to GE and Warner Bros, and memberships with the IEEE and IET. Technical highlights.

This page last updated: 13/Feb/2024. https://lifearchitect.ai/gemini-report/↑