Important external papers

Models and datasets/corpora

GPT-4: OpenAI. (PDF) Note: TBA.

Jurassic-1 (AI21 Israel): Lieber et al. (2021). Jurassic 1: Technical details and evaluation. (PDF)

Blenderbot 2.0 (Facebook): Komeili et al (2021). Internet-Augmented Dialogue Generation. (PDF)

Wudao 2.0 (BAAI): Zou & Tang et al. (2021). Controllable Generation from Pre-trained Language Models via Inverse Prompting. (Note: As of July 2021, this is the latest Wudao 2.0 paper showing extract of WDC-Text. Full paper TBA.) (PDF)

Wudao 1.0 (BAAI): Yuan & Tang et al. (2021). WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models. (PDF)

PanGu Alpha (Huawei): Zeng et al (2021). PanGu-Alpha: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation. (PDF)

The Pile v1 (EleutherAI): Gao et al. (2020). The Pile: An 800GB Dataset of Diverse Text for Language Modeling. EleutherAI. (PDF)

Common Crawl: Dodge et al. (2021). Documenting the English Colossal Clean Crawled Corpus. (PDF)

GPT-3 (OpenAI): Brown et al. (2020). Language Models are Few-Shot Learners. OpenAI. (PDF)

GPT-2 (OpenAI): Radford et al. (2019). Language Models are Unsupervised Multitask Learners. OpenAI. (PDF)

GPT-1 (OpenAI): Radford et al. (2018). Improving Language Understanding by Generative Pre-Training. OpenAI. (PDF)

Fine-tuning (Howard): Howard & Ruder. (2018). Universal Language Model Fine-tuning for Text Classification (PDF)

Transformer (Google): Vaswani et al. (2017). Attention is all you need. Google. (PDF)

The Turing Test: Turing, A. M. (1950). Computing Machinery and Intelligence. Mind 49: 433-460. (PDF)

First steps in AI by Turing: Turing, A. M. (1941-1948).
Guinness, R. (2018). What is Artificial Intelligence? Part 2

Re-discovering ‘Intelligent machinery’ by Alan Turing

(1948). ‘Intelligent machinery’ by Alan Turing, prepared/typed by ‘Gabriel’


Inside OpenAI and Neuralink offices: Hao, K. (2020). The messy, secretive reality behind OpenAI’s bid to save the world. MIT Technology Review. (PDF)

Ethics and data quality guidance

Foundation models (GPT-3, Wudao 2.0…): Bommasani et al (2021). On the Opportunities and Risks of Foundation Models (PDF – large – 16MB)

Parrots: Bender et al (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (Note: Banned by Google.) (PDF)

GPT-3 quality: Strickland, E. (2021). OpenAI’s GPT-3 Speaks! (Kindly Disregard Toxic Language). IEEE. (PDF)

GPT-J quality: HN discussion (2021). A discussion about GPT-J, Books3 creation, and the exclusion of datasets like Literotica and the US Congressional Record…(PDF) (Original HN link)

See also my 2021 paper: Integrated AI: Dataset quality vs quantity via bonum (GPT-4 and beyond).

Animal rights (as potential guidance for AI rights): Cambridge. (2012). The Cambridge Declaration of Consciousness (CDC). (PDF)

Intergovernmental and governmental guidance

AI Ethics: WHO (2021). Ethics and governance of artificial intelligence for health. (PDF)

Australian Govt AI: (2021). Australia’s AI Action Plan: June 2021. (External PDF)

International AI Strategies: The team at hosts the most comprehensive list of all international AI strategies, from Australia to Vietnam. (External link)


Wudao usage agreement: BAAI (2021). Data Usage Agreement of Zhiyuan Enlightenment Platform. (Note: Translated to English.) (PDF)

