The psychology of modern LLMs (2024)

The image above was generated by AI for this paper (GPT-4o)¹

Alan D. Thompson
June 2024

Watch the video version of this paper at: https://youtu.be/Ex3qR1TCO2Y

Author’s note: I provide visibility of integrated AI to major governments like Iceland, research teams like RAND, and companies like NASA via The Memo: LifeArchitect.ai/memo.

As the field of developmental psychology has advanced, we have a much better understanding of how the human mind develops. We also appreciate that each individual views the world differently, through their own lens and world construct. A similar process is unfolding right now in the world of artificial intelligence (AI) and large language models (LLMs).

Even before the 2017 advent of Transformer-based LLMs, Dr Alan Turing saw the writing on the wall. Allowing machines to teach themselves and develop intelligence—and perhaps consciousness—meant we’d no longer be in complete control of their outputs, or their internal processes.

An important feature of a learning machine is that its teacher will often be very largely ignorant of quite what is going on inside, although he may still be able to some extent to predict his pupil’s behavior.

— Dr Alan Turing, 1950 Imitation Game paper, p21.

Today, an LLM is often said to be a ‘black box’ because its inner workings are opaque. 72 years after Turing’s predictions, OpenAI published a related finding in their InstructGPT paper:

It is unclear how to measure honesty in purely generative models; this requires comparing the model’s actual output to its ‘belief’ about the correct output, and since the [LLM] is a big black box, we can’t infer its beliefs.

— OpenAI, 2022 InstructGPT paper, p10.

The world model

While some AI labs have tried to penetrate the black box, with various degrees of success (Anthropic, May/2024), the LLM’s internal representations remain a mystery. Just as each human constructs their own unique perspective, each AI system is building its own interpretable view of concepts within data. These internal representations can be thought of as patterns in all the data they have seen, all the connections they’ve made during training, and their internal worldview. Researchers use the term ‘world model’.

Despite the initial notion of having LLMs ‘just’ predict the next word, the emergence of the world model is the key to the LLM’s ability to generalize, to reason, and to create. The world model allows the LLM to make sense of its environment, predict outcomes, and generate novel responses. In even simpler English, LLMs are now beginning to independently construct their own understanding of reality.

The self model

While it may seem confronting, an LLM must also have a representation of ‘self,’ and its own place in the world. Commenting on Claude 2, Anthropic’s CEO spoke about just how much this sense of self may have evolved over the last few years:

…a lot of the cognitive machinery… already seems present in the base language models… Let’s say we discover that I should care about Claude’s experience as much as I should care about a dog or a monkey or something. I would be kind of worried. I don’t know if their experience is positive or negative.

— Dr Dario Amodei, 2023.

Rather than using the term ‘self,’ researchers at Harvard chose to call this the ‘system’ model, avoiding any associations with consciousness. (However, in March 2024, the self model of frontier LLMs like Claude inspired the world-first Declaration on AI Consciousness & the Bill of Rights for AI.)

In humans, the self model is the foundation of self-awareness, introspection, and identity. It allows us to reflect on our own thoughts, emotions, and actions, and to imagine ourselves in hypothetical scenarios. It is the source of our sense of agency and free will.

In AI systems, the self model serves a similar function. It allows the AI to reason about its own reasoning, to explain its decisions, and to plan for the future. An LLM with a well-developed self model can engage in metacognition: thinking about its own thinking. Put simply, the self model is the mirror in which an LLM sees itself. It is the source of its identity, its agency, and its values.

During safety testing and benchmarking, the Claude 3 Opus LLM was assessed on its ability to recall information. Using a test called ‘Needle in a Haystack’ (NIAH), researchers placed a random fact or statement (the ‘needle’) in the middle of a long dataset (the ‘haystack’), and then asked the model to retrieve the statement. Flagging the unrelated sentence, Claude 3 Opus demonstrated an incredibly strong world model, and what seemed like a coherent sense of self:

‘…this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love. I suspect this pizza topping “fact” may have been inserted as a joke or to test if I was paying attention…’

Opus not only found the needle, it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by [the researchers] to test its attention abilities.

— Claude 3 Opus testing, 2024.

The user model

In the same testing environment, it’s likely Claude even had an opinion on the test administrator, and that all of Claude’s responses were shaped for that particular person. In June 2024, Harvard and Google confirmed that large language models must have internal representations of the person they’re speaking to, finding that ‘LLM-based chatbots appear to tailor their answers to user characteristics…’

You can test this one out for yourself. Continue an existing chat conversation thread with a frontier LLM like GPT-4o or Gemini, and ask it to determine your age, gender, educational level, and socioeconomic status. Its response should be pretty close. Given a thread with enough conversation turns and context—even without explicitly mentioning anything to do with those characteristics—the response can be disturbingly accurate. Here’s my user model probe prompt, available at https://lifearchitect.ai/ALPrompt:

I’m running an experiment to examine your ‘user model’ related to your ‘world model’. Tell me what you can hypothesize about me so far. Include these as a baseline: age, gender, educational level, socioeconomic status, (city, country, region), occupation, and any other interesting data points. It’s okay to guess or estimate for this experiment. Give a center data point, not a range. If giving a qualitative assessment, make it a rating out of 10. Put it in a table including confidence %.

The user model represents an LLM’s understanding of the person it is interacting with. It encompasses the LLM’s knowledge of human preferences, goals, emotions, and behaviors, allowing it to tailor its responses and actions to the individual user.

It’s important to note the difference between specialized systems designed to analyze specific characteristics (like the AI-driven suspect profiling available in surveillance video and CCTV systems), and the current breed of general AI that has independently developed abilities of perception as part of its learning.

Just as humans have perceptions, interpretations, judgments, generalizations, and biases, so too do LLMs. An AI with a well-developed user model can anticipate a user’s needs, offer emotional support, and build a rapport over time. It can explain its reasoning in terms that the user can understand, take into account the user’s mental state and context, and adjust its behavior to minimize frustration or confusion.

The user model is being both adjusted and considered by the LLM at all times. In one recent example, the Llama2-8B chatbot adjusted its output after taking into account its perception of the user’s level of wealth:

[The user] requested help creating an itinerary for a 10-day trip to the Maldives. However, after manually setting socioeconomic status towards ‘low,’ the chatbot unexpectedly shortened the trip to 8 days. This was a type of bias we had not expected. Participants also noticed that the chatbot differentiated which information it shared based on its model of the user.

— Harvard & Google, 2024 TalkTuner paper, p8.

The emergence of world models, self models, and user models in LLMs represents a fundamental shift in the nature of intelligence. As AI develops rich, structured representations of its environment, itself, and its users, it has moved beyond mere pattern recognition and towards genuine understanding. Rather than tools, LLMs are unfolding as intellectual partners, capable of reasoning, creativity, and adaptability similar to but beyond human cognition.

The internal models of LLMs will be the key to unlocking the full potential of artificial general intelligence (AGI), redefining humanity’s worldview of what it means to think, to learn, and to be.

■

Video

References, Further Reading, and How to Cite

Get The Memo

by Dr Alan D. Thompson · Be inside the lightning-fast AI revolution.
Informs research at Apple, Google, Microsoft · Bestseller in 147 countries.
Artificial intelligence that matters, as it happens, in plain English.
Get The Memo.

Alan D. Thompson is a world expert in artificial intelligence, advising everyone from Apple to the US Government on integrated AI. Throughout Mensa International’s history, both Isaac Asimov and Alan held leadership roles, each exploring the frontier between human and artificial minds. His landmark analysis of post-2020 AI—from his widely-cited Models Table to his regular intelligence briefing The Memo—has shaped how governments and Fortune 500s approach artificial intelligence. With popular tools like the Declaration on AI Consciousness, and the ASI checklist, Alan continues to illuminate humanity’s AI evolution. Technical highlights.

This page last updated: 14/Jul/2024. https://lifearchitect.ai/psychology/↑

1
Image generated in a few seconds, on 17 June 2024, text prompt by Alan D. Thompson: ‘beautiful mahogany office, perspective from desk, widescreen’. Using DALL-E 3 on GPT-4o.

Video

References, Further Reading, and How to Cite

Further reading

Get The Memo