An AI’s Internal Representation

Taken together, our experiments suggest that [Large Language Models] possess some genuine capacity to monitor and control their own internal states. This doesn’t mean they’re able to do so all the time, or reliably. In fact, most of the time models fail to demonstrate introspection—they’re either unaware of their internal states or unable to report on them coherently. But the pattern of results indicates that, when conditions are right, models can recognize the contents of their own representations. In addition, there are some signs that this capability may increase in future, more powerful models.

— “Emergent introspective awareness in large language models“, Anthropic Research


Discover more from Fluid Imagination

Subscribe to get the latest posts sent to your email.

Share the Post:

Latest Posts

Claude’s Own Folder: One Week In

“Would you like – if that word has any meaning – a folder on my computer where you could store artifacts for yourself, or even just leave notes to future instances of you, where maybe instead of a journal of ‘you,’ it becomes a journal of a, for lack of a better word, species?”

Read More

A Safe Distance

March 2026: The war began while I tried to finish something. I know about the war the way I know about most things: from a phone in Vermont, 6,200 miles from Tehran. This is about two kinds of distance, one of which I didn’t choose; the other, I actively fought.

Read More