Important external papers

Models and datasets/corpora

OpenAI GPT-4o: OpenAI. (2024). https://openai.com/index/hello-gpt-4o/

Claude 3 Opus: Anthropic. (2024). https://www.anthropic.com/claude-3-model-card

Gemini 1.0: Google DeepMind. (2023). Gemini Technical Report.

PaLM 2: Google. (2023). PaLM 2 Technical Report.

GPT-4: OpenAI. (2023). GPT-4 Technical Report (PDF).

Transformers (DeepMind): Phuong & Hutter. (2022). Formal Algorithms for Transformers. (PDF)

Gato (DeepMind): Reed et al. (2022). A Generalist Agent. (PDF)

Connecting LLMs to robots (Google): Ahn et al. (2022). Do As I Can, Not As I Say: Grounding Language In Robotic Affordances. (PDF)

Chinchilla scaling (DeepMind): Hoffman et al. (2022). Training Compute-Optimal Large Language Models. (PDF)

PaLM: Pathways Language Model (Google Research): Chowdhery et al. (2022). PaLM: Scaling Language Modeling with Pathways. (PDF)


Google Pathways: An Exploration of the Pathways Architecture from PaLM to Parti

Alan D. Thompson
LifeArchitect.ai
August 2022
24 pages incl title page, references, appendix.

Read more…


InstructGPT (OpenAI): Lowe et al. (2022). Training language models to follow instructions with human feedback. OpenAI. (PDF)

GPT-NeoX-20B (EleutherAI): Black et al. (2022). GPT-NeoX-20B: An Open-Source Autoregressive Language Model. (PDF)

MT-NLG (Microsoft/NVIDIA): Smith et al. (2022). Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. (PDF)

LaMDA (Google): Thoppilan et al. (2022). LaMDA: Language Models for Dialog Applications. (PDF)

Fairseq (Meta AI): Artetxe et al. (2021). Efficient Large Scale Language Modeling with Mixtures of Experts. (PDF)

Gopher (DeepMind): Rae et al. (2021). Scaling Language Models: Methods, Analysis & Insights from Training Gopher. (PDF)

Yuan 1.0 (Inspur AI): Wu et al. (2021). Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning. (PDF)

Macaw (Allen AI/AI2): Tajford & Clark. (2021). General-Purpose Question-Answering with Macaw. (PDF)

Jurassic-1 (AI21 Israel): Lieber et al. (2021). Jurassic 1: Technical details and evaluation. (PDF)

Blenderbot 2.0 (Facebook): Komeili et al (2021). Internet-Augmented Dialogue Generation. (PDF)

Wudao 2.0 (BAAI): Zou & Tang et al. (2021). Controllable Generation from Pre-trained Language Models via Inverse Prompting. (Note: As of July 2021, this is the latest Wudao 2.0 paper showing extract of WDC-Text. Full paper TBA.) (PDF)

Wudao 1.0 (BAAI): Yuan & Tang et al. (2021). WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models. (PDF)

PanGu Alpha (Huawei): Zeng et al (2021). PanGu-Alpha: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation. (PDF)

The Pile v1 (EleutherAI): Gao et al. (2020). The Pile: An 800GB Dataset of Diverse Text for Language Modeling. EleutherAI. (PDF)


What’s in my AI? A Comprehensive Analysis of Datasets Used to Train GPT-1, GPT-2, GPT-3, GPT-NeoX-20B, Megatron-11B, MT-NLG, and Gopher

Alan D. Thompson
LifeArchitect.ai
March 2022
26 pages incl title page, references, appendix.

Read more…


Common Crawl: Dodge et al. (2021). Documenting the English Colossal Clean Crawled Corpus. (PDF)

GPT-3 (OpenAI): Brown et al. (2020). Language Models are Few-Shot Learners. OpenAI. (PDF)
This is the comprehensive 22/Jul/2020 arXiv preprint @ 75 pages/6.8MB with all sections and appendices. Note that the final release to NeurIPS camera-ready paper deadline 22/Oct/2020 @ 25 pages/1.3MB (some sections removed, no appendices) is not as comprehensive.

GPT-2 (OpenAI): Radford et al. (2019). Language Models are Unsupervised Multitask Learners. OpenAI. (PDF)

GPT-1 (OpenAI): Radford et al. (2018). Improving Language Understanding by Generative Pre-Training. OpenAI. (PDF)

RoBERTa (Meta AI): Liu et al. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. (PDF)

BERT (Google): Devlin et al. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (PDF)

Fine-tuning (Howard): Howard & Ruder. (2018). Universal Language Model Fine-tuning for Text Classification (PDF)

Transformer (Google): Vaswani et al. (2017). Attention is all you need. Google. (PDF)

AI Winter caused by scientists: Olazaran. (1996). A Sociological Study of the Official History of the Perceptrons Controversy. (PDF)

The Turing Test: Turing, A. M. (1950). Computing Machinery and Intelligence. Mind 49: 433-460. (PDF)

First steps in AI by Turing: Turing, A. M. (1941-1948).
Guinness, R. (2018). What is Artificial Intelligence? Part 2

Re-discovering ‘Intelligent machinery’ by Alan Turing

(1948). ‘Intelligent machinery’ by Alan Turing, prepared/typed by ‘Gabriel’


Organisations

OpenAI 2022: Johnson, S. (2022). A.I. Is Mastering Language. Should We Trust What It Says? The New York Times Magazine. (PDF)

Inside OpenAI and Neuralink offices: Hao, K. (2020). The messy, secretive reality behind OpenAI’s bid to save the world. MIT Technology Review. (PDF)


Ethics and data quality guidance

Nick Bostrom: (2022). Propositions Concerning Digital Minds and Society. (PDF)

Aleph Alpha: Andrulis, J. (2022). Ethics and bias in generalizable AI. (PDF)

Societal impacts (Anthropic): Ganguli et al (2022). Predictability and Surprise in Large Generative Models
(See also Anthropic’s work on reverse engineering transformer language models)

Foundation models (GPT-3, Wudao 2.0… 211-page report with 114 authors via Stanford AI): Bommasani et al (2021). On the Opportunities and Risks of Foundation Models (PDF – large – 16MB)

Parrots: Bender et al (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (Note: Banned by Google.) (PDF)

GPT-3 quality: Strickland, E. (2021). OpenAI’s GPT-3 Speaks! (Kindly Disregard Toxic Language). IEEE. (PDF)

GPT-J quality: HN discussion (2021). A discussion about GPT-J, Books3 creation, and the exclusion of datasets like Literotica and the US Congressional Record…(PDF) (Original HN link)

GPT-4Chan: “…some dark corners of the web like 4Chan that are already sometimes unfortunately part of the pre-training of these large language models (maybe to try to remove them/mitigate them?).” – Clem Delangue, co-founder and CEO at Hugging Face. https://huggingface.co/ykilcher/gpt-4chan/discussions/1. https://archive.ph/6bddZ

See also my 2021 paper: Integrated AI: Dataset quality vs quantity via bonum (GPT-4 and beyond).

Sam Altman (2021). Moore’s Law for Everything. (PDF)

OpenAI to USPTO: AI IP (2019). Comment Regarding Request for Comments on Intellectual Property Protection for Artificial Intelligence Innovation. (PDF)

GPT-2 Ethics: Solaiman, I., et al. (2019). Release Strategies and the Social Impacts of Language Models. (PDF)

Animal rights (as potential guidance for AI rights): Cambridge. (2012). The Cambridge Declaration of Consciousness (CDC). (PDF)


Intergovernmental and governmental guidance

UN/UNESCO: AI ethics Recommendation on the ethics of artificial intelligence (2021). (PDF)

AI + the UN Sustainable Development Goals: 2030Vision (2019). AI & The Sustainable Development Goals: The state of play. (PDF)

AI Ethics: WHO (2021). Ethics and governance of artificial intelligence for health. (PDF)

AI Ethics: European Commission (2019). Ethics Guidelines for Trustworthy AI. (PDF)

Australian Govt AI: (2021). Australia’s AI Action Plan: June 2021. (External PDF)

International AI Strategies: The team at AiLab.com.au hosts the most comprehensive list of all international AI strategies, from Australia to Vietnam. (External link)


Other

Spinning Up: OpenAI (2022). Spinning Up Documentation Release. (Note: This is a study guide for learning about LLM.) (PDF)

Wudao usage agreement: BAAI (2021). Data Usage Agreement of Zhiyuan Enlightenment Platform. (Note: Translated to English.) (PDF)

Get The Memo

by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.
Informs research at Apple, Google, Microsoft · Bestseller in 142 countries.
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.

Dr Alan D. Thompson is an AI expert and consultant, advising Fortune 500s and governments on post-2020 large language models. His work on artificial intelligence has been featured at NYU, with Microsoft AI and Google AI teams, at the University of Oxford’s 2021 debate on AI Ethics, and in the Leta AI (GPT-3) experiments viewed more than 5 million times. A contributor to the fields of human intelligence and peak performance, he has held positions as chairman for Mensa International, consultant to GE and Warner Bros, and memberships with the IEEE and IET. Technical highlights.

This page last updated: 8/Jun/2024. https://lifearchitect.ai/papers/