An AI’s Internal Representation

Taken together, our experiments suggest that [Large Language Models] possess some genuine capacity to monitor and control their own internal states. This doesn’t mean they’re able to do so all the time, or reliably. In fact, most of the time models fail to demonstrate introspection—they’re either unaware of their internal states or unable to report on them coherently. But the pattern of results indicates that, when conditions are right, models can recognize the contents of their own representations. In addition, there are some signs that this capability may increase in future, more powerful models.

— “Emergent introspective awareness in large language models“, Anthropic Research

Share the Post:

Latest Posts

The Gift

In 2022, Raj Bhakta threatened to hand the former Green Mountain College campus to a religious group if Poultney didn’t give him what he wanted. He’s now doing exactly that, and calling it a gift.

Read More

Split Screen: The Mountain and the School

February 2026 split in two: Days on the mountain with my daughter. A dormant project revived. Old friends. And then, on the last day of the month, bombs. The halves of this split screen are not equal, and I don’t know how to pretend otherwise.

Read More