I’ve moved my AI subscription from ChatGPT to Claude. The immediate trigger was mundane: bugs in ChatGPT’s Mac app that kept breaking my workflow. But I’d been circling the idea of switching for months. What finally tipped me wasn’t frustration. It was curiosity about what Anthropic is building.
I care about how these tools are made, not just what they do.
Constitutional AI versus RLHF
OpenAI trains models using Reinforcement Learning from Human Feedback (RLHF). Human contractors compare model outputs and select the better response according to principles like helpfulness or harmlessness. It works, but it’s expensive, slow, and forces people to read disturbing content repeatedly. The values guiding the model emerge implicitly from patterns in human judgments.
Anthropic uses Constitutional AI instead. They define an explicit set of principles (a “constitution”) written in natural language. The model learns to critique its own outputs against these principles. During training, Claude generates responses to prompts, evaluates them against constitutional rules like “choose the response that’s most ethical” or “least likely to encourage illegal behavior,” then revises and improves. In the reinforcement learning phase, the model judges which of two responses better adheres to the constitution, creating preference data without human labeling. They call this RL from AI Feedback (RLAIF).
The constitution itself is public. You can read exactly which principles govern Claude’s behavior. Anthropic draws from the UN Declaration of Human Rights, Apple’s terms of service, DeepMind’s Sparrow principles, and sources attempting to capture non-Western perspectives. They publish it, iterate on it, and welcome scrutiny. When things break, they explain why in technical detail rather than issuing PR statements.
That transparency matters to me.
Amodei versus Altman on AI Risk
Both Dario Amodei and Sam Altman have written extensively about AI’s future, but they frame the core problem differently.
Altman treats AI capability growth as essentially inevitable, something like a physics process we can’t stop. His essays focus on governance and distribution. The main challenge, in his view, is social adaptation. How do we ensure prosperity doesn’t pool at the top? How do we govern something this powerful? The technology itself is framed as a basically-good force that needs proper management.
Amodei frames the problem as keeping power from being misused. Capability itself increases the attack surface of society. Without strong constraints, you get authoritarian capture, weaponization, and brittle institutions. His focus is on preventing a basically-powerful force from becoming a social and geopolitical weapon.
This isn’t a subtle difference. Altman implicitly assumes the main risk is poor distribution of benefits. Amodei implicitly assumes the main risk is concentration of dangerous capabilities in the wrong hands.
Amodei’s framing resonates more with my concerns. I don’t think the primary challenge is making sure everyone gets a piece of the AI pie. I think the primary challenge is preventing the pie from exploding.
Model Context Protocol
As I’ve started tinkering more with coding and automation, one feature keeps catching my attention: the Model Context Protocol (MCP).
MCP is an open standard for connecting AI applications to external systems. Think of it as a USB-C port for AI. Instead of building custom integrations for every app, tool, or data source, developers build MCP servers that expose their functionality through a standardized interface. Any MCP-compatible AI can then connect to any MCP server.
In practice, this means Claude can check my calendar, pull information from my notes, or interact with web services without needing separate custom code for each one.
What excites me about MCP is the potential for composability. I could build a workflow where Claude checks my schedule, looks at my task list, cross-references files on my computer, and generates a daily summary without me manually copying information between apps. The protocol handles the connections. I just describe what I want.
Some ideas I’m considering:
- Auto-organizing my Downloads folder based on file content and when I saved things
- Searching through my personal notes and past writing without uploading everything manually each time
- Connecting to my read-later app so Claude can summarize articles I’ve saved and suggest what to read next
MCP is still early, but the architecture is sound. Anthropic published ready-to-use servers for common tools and the developer community is building more. The fact that it’s an open protocol means I’m not locked into a single company’s ecosystem.
Claude Cowork
Anthropic just released Claude Cowork, which extends Claude’s capabilities beyond the chat interface into something closer to a digital assistant that actually does work.
Here’s how it works: you give Claude access to a specific folder on your Mac. Claude can read, edit, and create files in that folder. You describe an outcome you want. Claude makes a plan, breaks it into subtasks, and executes them autonomously. You can watch progress or walk away and come back to finished work.
The difference from regular chat is agency. Normal Claude gives you advice. You implement it. Cowork Claude does the implementation. You come back to completed files.
Use cases Anthropic highlights include organizing messy download folders, generating expense spreadsheets from receipt screenshots, turning scattered notes into formatted reports, and creating presentations from rough outlines. The system can work for hours on complex tasks, coordinating multiple sub-agents in parallel if needed.
It runs in a sandboxed virtual machine using Apple’s Virtualization Framework. Claude can’t access anything outside the folder you explicitly grant permission to. It asks before taking destructive actions like deleting files. The safety model is similar to Claude Code (which Cowork is built on) but wrapped in a more approachable interface for non-developers.
The prompt injection risks are real. If Claude reads malicious instructions from a website or document during a task, those instructions could alter its behavior. Anthropic has defenses built in, but they’re honest about the limitations. Agent safety is an active area of research, not a solved problem. Their documentation recommends keeping backups and being cautious with sensitive files.
I’m excited to test Cowork for tasks I’ve been putting off. File organization, synthesizing research notes into structured documents, batch processing of files that require judgment calls. Things that sit in the uncomfortable zone of “takes human judgment” and “too tedious to be worth doing manually.”
Putting Claude Through Its Paces
Right now, I’m using Claude to write this blog post. Meta, I know. I’m also using Claude Code to put the finishing touches on the read-later Mac app I’ve been slowly building for months. The pieces have been sitting there, functional but rough. I needed someone (something?) to help me clean up the edges and ship it.
So far, the results are promising. But I’m staying skeptical. Tools earn trust through use, not marketing.
I’ll report back as I learn more.