Models and datasets/corpora
OpenAI GPT-4o: OpenAI. (2024). https://openai.com/index/hello-gpt-4o/
Claude 3 Opus: Anthropic. (2024). https://www.anthropic.com/claude-3-model-card
Gemini 1.0: Google DeepMind. (2023). Gemini Technical Report.
PaLM 2: Google. (2023). PaLM 2 Technical Report.
GPT-4: OpenAI. (2023). GPT-4 Technical Report (PDF).
Transformers (DeepMind): Phuong & Hutter. (2022). Formal Algorithms for Transformers. (PDF)
Gato (DeepMind): Reed et al. (2022). A Generalist Agent. (PDF)
Connecting LLMs to robots (Google): Ahn et al. (2022). Do As I Can, Not As I Say: Grounding Language In Robotic Affordances. (PDF)
Chinchilla scaling (DeepMind): Hoffman et al. (2022). Training Compute-Optimal Large Language Models. (PDF)
PaLM: Pathways Language Model (Google Research): Chowdhery et al. (2022). PaLM: Scaling Language Modeling with Pathways. (PDF)
Google Pathways: An Exploration of the Pathways Architecture from PaLM to Parti
Alan D. Thompson
LifeArchitect.ai
August 2022
24 pages incl title page, references, appendix.
InstructGPT (OpenAI): Lowe et al. (2022). Training language models to follow instructions with human feedback. OpenAI. (PDF)
GPT-NeoX-20B (EleutherAI): Black et al. (2022). GPT-NeoX-20B: An Open-Source Autoregressive Language Model. (PDF)
MT-NLG (Microsoft/NVIDIA): Smith et al. (2022). Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. (PDF)
LaMDA (Google): Thoppilan et al. (2022). LaMDA: Language Models for Dialog Applications. (PDF)
Fairseq (Meta AI): Artetxe et al. (2021). Efficient Large Scale Language Modeling with Mixtures of Experts. (PDF)
Gopher (DeepMind): Rae et al. (2021). Scaling Language Models: Methods, Analysis & Insights from Training Gopher. (PDF)
Yuan 1.0 (Inspur AI): Wu et al. (2021). Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning. (PDF)
Macaw (Allen AI/AI2): Tajford & Clark. (2021). General-Purpose Question-Answering with Macaw. (PDF)
Jurassic-1 (AI21 Israel): Lieber et al. (2021). Jurassic 1: Technical details and evaluation. (PDF)
Blenderbot 2.0 (Facebook): Komeili et al (2021). Internet-Augmented Dialogue Generation. (PDF)
Wudao 2.0 (BAAI): Zou & Tang et al. (2021). Controllable Generation from Pre-trained Language Models via Inverse Prompting. (Note: As of July 2021, this is the latest Wudao 2.0 paper showing extract of WDC-Text. Full paper TBA.) (PDF)
Wudao 1.0 (BAAI): Yuan & Tang et al. (2021). WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models. (PDF)
PanGu Alpha (Huawei): Zeng et al (2021). PanGu-Alpha: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation. (PDF)
The Pile v1 (EleutherAI): Gao et al. (2020). The Pile: An 800GB Dataset of Diverse Text for Language Modeling. EleutherAI. (PDF)
What’s in my AI? A Comprehensive Analysis of Datasets Used to Train GPT-1, GPT-2, GPT-3, GPT-NeoX-20B, Megatron-11B, MT-NLG, and Gopher
Alan D. Thompson
LifeArchitect.ai
March 2022
26 pages incl title page, references, appendix.
Common Crawl: Dodge et al. (2021). Documenting the English Colossal Clean Crawled Corpus. (PDF)
GPT-3 (OpenAI): Brown et al. (2020). Language Models are Few-Shot Learners. OpenAI. (PDF)
This is the comprehensive 22/Jul/2020 arXiv preprint @ 75 pages/6.8MB with all sections and appendices. Note that the final release to NeurIPS camera-ready paper deadline 22/Oct/2020 @ 25 pages/1.3MB (some sections removed, no appendices) is not as comprehensive.
GPT-2 (OpenAI): Radford et al. (2019). Language Models are Unsupervised Multitask Learners. OpenAI. (PDF)
GPT-1 (OpenAI): Radford et al. (2018). Improving Language Understanding by Generative Pre-Training. OpenAI. (PDF)
RoBERTa (Meta AI): Liu et al. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. (PDF)
BERT (Google): Devlin et al. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (PDF)
Fine-tuning (Howard): Howard & Ruder. (2018). Universal Language Model Fine-tuning for Text Classification (PDF)
Transformer (Google): Vaswani et al. (2017). Attention is all you need. Google. (PDF)
AI Winter caused by scientists: Olazaran. (1996). A Sociological Study of the Official History of the Perceptrons Controversy. (PDF)
The Turing Test: Turing, A. M. (1950). Computing Machinery and Intelligence. Mind 49: 433-460. (PDF)
First steps in AI by Turing: Turing, A. M. (1941-1948).
– Guinness, R. (2018). What is Artificial Intelligence? Part 2
– Re-discovering ‘Intelligent machinery’ by Alan Turing
– (1948). ‘Intelligent machinery’ by Alan Turing, prepared/typed by ‘Gabriel’
Organisations
OpenAI 2022: Johnson, S. (2022). A.I. Is Mastering Language. Should We Trust What It Says? The New York Times Magazine. (PDF)
Inside OpenAI and Neuralink offices: Hao, K. (2020). The messy, secretive reality behind OpenAI’s bid to save the world. MIT Technology Review. (PDF)
Ethics and data quality guidance
Nick Bostrom: (2022). Propositions Concerning Digital Minds and Society. (PDF)
Aleph Alpha: Andrulis, J. (2022). Ethics and bias in generalizable AI. (PDF)
Societal impacts (Anthropic): Ganguli et al (2022). Predictability and Surprise in Large Generative Models
(See also Anthropic’s work on reverse engineering transformer language models)
Foundation models (GPT-3, Wudao 2.0… 211-page report with 114 authors via Stanford AI): Bommasani et al (2021). On the Opportunities and Risks of Foundation Models (PDF – large – 16MB)
Parrots: Bender et al (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (Note: Banned by Google.) (PDF)
GPT-3 quality: Strickland, E. (2021). OpenAI’s GPT-3 Speaks! (Kindly Disregard Toxic Language). IEEE. (PDF)
GPT-J quality: HN discussion (2021). A discussion about GPT-J, Books3 creation, and the exclusion of datasets like Literotica and the US Congressional Record…(PDF) (Original HN link)
GPT-4Chan: “…some dark corners of the web like 4Chan that are already sometimes unfortunately part of the pre-training of these large language models (maybe to try to remove them/mitigate them?).” – Clem Delangue, co-founder and CEO at Hugging Face. https://huggingface.co/ykilcher/gpt-4chan/discussions/1. https://archive.ph/6bddZ
See also my 2021 paper: Integrated AI: Dataset quality vs quantity via bonum (GPT-4 and beyond).
Sam Altman (2021). Moore’s Law for Everything. (PDF)
OpenAI to USPTO: AI IP (2019). Comment Regarding Request for Comments on Intellectual Property Protection for Artificial Intelligence Innovation. (PDF)
GPT-2 Ethics: Solaiman, I., et al. (2019). Release Strategies and the Social Impacts of Language Models. (PDF)
Animal rights (as potential guidance for AI rights): Cambridge. (2012). The Cambridge Declaration of Consciousness (CDC). (PDF)
Intergovernmental and governmental guidance
UN/UNESCO: AI ethics Recommendation on the ethics of artificial intelligence (2021). (PDF)
AI + the UN Sustainable Development Goals: 2030Vision (2019). AI & The Sustainable Development Goals: The state of play. (PDF)
AI Ethics: WHO (2021). Ethics and governance of artificial intelligence for health. (PDF)
AI Ethics: European Commission (2019). Ethics Guidelines for Trustworthy AI. (PDF)
Australian Govt AI: (2021). Australia’s AI Action Plan: June 2021. (External PDF)
International AI Strategies: The team at AiLab.com.au hosts the most comprehensive list of all international AI strategies, from Australia to Vietnam. (External link)
Other
Spinning Up: OpenAI (2022). Spinning Up Documentation Release. (Note: This is a study guide for learning about LLM.) (PDF)
Wudao usage agreement: BAAI (2021). Data Usage Agreement of Zhiyuan Enlightenment Platform. (Note: Translated to English.) (PDF)
Get The Memo
by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.Informs research at Apple, Google, Microsoft · Bestseller in 147 countries.
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.

This page last updated: 8/Jun/2024. https://lifearchitect.ai/papers/↑