At the moment, Anthropic’s framing was fully mechanical, establishing guidelines for the mannequin to critique itself towards, with no point out of Claude’s well-being, identification, feelings, or potential consciousness. The 2026 structure is a distinct beast fully: 30,000 phrases that learn much less like a behavioral guidelines and extra like a philosophical treatise on the character of a doubtlessly sentient being.
As Simon Willison, an unbiased AI researcher, noted in a weblog submit, two of the 15 exterior contributors who reviewed the doc are Catholic clergy: Father Brendan McGuire, a pastor in Los Altos with a Grasp’s diploma in Laptop Science, and Bishop Paul Tighe, an Irish Catholic bishop with a background in ethical theology.
Someplace between 2022 and 2026, Anthropic went from offering guidelines for producing much less dangerous outputs to preserving mannequin weights in case the corporate later decides it must revive deprecated fashions to handle the fashions’ welfare and preferences. That’s a dramatic change, and whether or not it displays real perception, strategic framing, or each is unclear.
“I’m so confused concerning the Claude ethical humanhood stuff!” Willison informed Ars Technica. Willison research AI language fashions like those who energy Claude and mentioned he’s “keen to take the structure in good religion and assume that it’s genuinely a part of their coaching and never only a PR train—particularly since most of it leaked a few months in the past, lengthy earlier than that they had indicated they have been going to publish it.”
Willison is referring to a December 2025 incident during which researcher Richard Weiss managed to extract what turned often called Claude’s “Soul Doc”—a roughly 10,000-token set of tips apparently educated straight into Claude 4.5 Opus’s weights slightly than injected as a system immediate. Anthropic’s Amanda Askell confirmed that the doc was actual and used throughout supervised studying, and he or she mentioned the corporate meant to publish the complete model later. It now has. The doc Weiss extracted represents a dramatic evolution from the place Anthropic began.

