Report: Google DeepMind Gemini

A general specialist

An independent report by
Alan D. Thompson
LifeArchitect.ai
February 2024
19 pages incl title page, references, appendix.

 

 

The report

Download report (PDF).

Notice: A pre-release edition of this independent report (Rev A) was made available in Sep/2023, before the release of Gemini. Following the release of the complete Gemini model family, this Feb/2024 report is the final edition (Rev 0).

Abstract

Since Google’s discovery of the Transformer architecture in 2017, and successive release of their pre-trained transformer language model BERT in October 2018, training large language models (LLMs) has become a new space race, bringing humanity towards its largest evolutionary change yet: ‘superintelligence.’

Between 2020 and 2024, LLMs continued to be trained on increasingly larger datasets, by ever larger teams of data scientists, with compute now measured in the hundreds of millions of dollars. The information synthesized here covers the progress made by Google and DeepMind, presenting as one company under the Alphabet umbrella in 2023, with a focus on the massive Gemini multimodal model.

Gemini Nano and Pro were released on 6/Dec/2023, and Gemini Ultra 1.0 was released on 7/Feb/2024. Gemini Ultra 1.0 is likely to be a dense model of around 1.5 trillion parameters trained on 30 trillion tokens. Compared to the GPT-4 sparse MoE model, Gemini Ultra 1.0 has a similar parameter count while being trained on 2× more data.

Contents

  1. Background
    • 1.1 Etymology
    • 1.2 Google DeepMind: Two archers with one target
    • 1.3 Gemini personnel
    • 1.4 Gemini compute resources
    • 1.5 Large language models
    • 1.6 Text-to-image and visual language models
    • 1.7 The Alpha series of AI systems
    • 1.8 Putting it together: LLM + VLM + Text-to-image
  2. Datasets
    • 2.1 Datasets: Text: MassiveText multilingual
    • 2.2 Datasets: Visual (images and video)
    • 2.3 Datasets: Audio
  3. Gemini capabilities and performance
    • 3.1 Languages
    • 3.2 Visual
    • 3.3 IQ
  4. Size comparison
  5. Implementing and applying Gemini
  6. Conclusion
  7. Further reading
  8. Appendix

Selected tables and viz from the report

Gemini compute
See working, with sources.

DeepMind Alpha systems

# System Expertise Date Description
1 AlphaGo Go Oct/2015 First AI in the series. Designed to play the board game Go; a huge number of possible board configurations. Mar/2016: beat world champion Lee Sedol.
2 AlphaGo Zero Go Oct/2017 Improved version of AlphaGo, learned to play from scratch, no prior knowledge beyond the game’s rules.
3 AlphaZero Board games Dec/2017 Further generalization of AlphaGo Zero’s approach to play other board games, chess and shogi. Superhuman performance.
4 AlphaFold 1 Protein folding Dec/2018 Change in focus from games to scientific problems. Designed to predict protein folding structures, a complex problem in biology.
5 AlphaStar StarCraft II Jan/2019 Designed to play the real-time strategy game StarCraft II. First AI to reach a professional (Grandmaster) level.
6 AlphaFold 2 Protein folding Nov/2020 New version of AlphaFold, later open sourced. Allows users to predict 3D structure of arbitrary proteins with exceptional accuracy.
7 AlphaCode Software code Feb/2022 A coding engine that creates computer programs at a rate comparable to that of an average programmer.
8 AlphaTensor Maths Oct/2022 The first AI system for discovering novel and provably correct algorithms for fundamental tasks such as matrix multiplication.
9 AlphaDev Algorithms Jun/2023 A system to discover enhanced computer science algorithms. Uses AlphaZero approach to find faster algorithms for tasks such as sorting and hashing.
10 AlphaCode 2 Software code Dec/2023 Gemini-powered agent combining Gemini’s reasoning capabilities with search and tool-use for solving competitive programming problems.
11 AlphaGeometry Geometry Jan/2024 An AI system that solves complex geometry problems at a level approaching a human Olympiad gold-medalist.

Table. DeepMind Alpha systems 2015–2024. Highlights only. Assisted by GPT-4, HTML formatted by Gemini Pro.


Get The Memo

by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.
Bestseller. 10,000+ readers from 142 countries. Microsoft, Tesla, Google...
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.

Dr Alan D. Thompson is an AI expert and consultant, advising Fortune 500s and governments on post-2020 large language models. His work on artificial intelligence has been featured at NYU, with Microsoft AI and Google AI teams, at the University of Oxford’s 2021 debate on AI Ethics, and in the Leta AI (GPT-3) experiments viewed more than 4.5 million times. A contributor to the fields of human intelligence and peak performance, he has held positions as chairman for Mensa International, consultant to GE and Warner Bros, and memberships with the IEEE and IET. Technical highlights.

This page last updated: 13/Feb/2024. https://lifearchitect.ai/gemini-report/