By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Scoopico
  • Home
  • U.S.
  • Politics
  • Sports
  • True Crime
  • Entertainment
  • Life
  • Money
  • Tech
  • Travel
Reading: Claude Code's '/goals' separates the agent that works from the one that decides it's done
Share
Font ResizerAa
ScoopicoScoopico
Search

Search

  • Home
  • U.S.
  • Politics
  • Sports
  • True Crime
  • Entertainment
  • Life
  • Money
  • Tech
  • Travel

Latest Stories

JW Marriott’s first all-inclusive is now taking reservations
JW Marriott’s first all-inclusive is now taking reservations
M5 MacBook Pro ,499 Low, iPhone 16e 9, Top Apple Deals
M5 MacBook Pro $1,499 Low, iPhone 16e $449, Top Apple Deals
Software company owner convicted for running “cold, calculated”  billion Medicare fraud scheme
Software company owner convicted for running “cold, calculated” $1 billion Medicare fraud scheme
As Trump meets with Xi, security expert says China now faces the U.S. as a peer
As Trump meets with Xi, security expert says China now faces the U.S. as a peer
Niall Horan Weighs In on Harry Styles, Zoe Kravitz’s Wedding
Niall Horan Weighs In on Harry Styles, Zoe Kravitz’s Wedding
Have an existing account? Sign In
Follow US
  • Contact Us
  • Privacy Policy
  • Terms of Service
2025 Copyright © Scoopico. All rights reserved
Claude Code's '/goals' separates the agent that works from the one that decides it's done
Tech

Claude Code's '/goals' separates the agent that works from the one that decides it's done

Scoopico
Last updated: May 14, 2026 7:20 pm
Scoopico
Published: May 14, 2026
Share
SHARE



Contents
The two model splitReliability in the loop

A code migration agent finishes its run, and the pipeline looks green. But several pieces were never compiled — and it took days to catch. That's not a model failure; that's an agent deciding it was done before it actually was.

Many enterprises are now seeing that production AI agent pipelines fail not because of the models’ abilities but because the model behind the agent decides to stop. Several methods to prevent premature task exits are now available from LangChain, Google and OpenAI, though these often rely on separate evaluation systems. The newest method comes from Anthropic: /goals on Claude Code, which formally separates task execution and task evaluation.

Coding agents work in a loop: they read files, run commands, edit code and then check whether the task is done. 

Claude Code /goals essentially adds a second layer to that loop. After a user defines a goal, Claude will continue to turn by turn, but an evaluator model comes in after every step to review and decide if the goal has been achieved. 

The two model split

Orchestration platforms from all three vendors identified the same roadblock. But the way they approach these is different. OpenAI leaves the loop alone and lets the model decide when it’s done, but does let users tag on their own evaluators. For LangGraph and Google’s Agent Development Kit, independent evaluation is possible, but requires developers to define the critic node, write up the termination logic and configure observability. 

Claude Code /goals sets the independent evaluator's default, whether the user wants it to run longer or shorter. Basically, the developer sets the goal completion condition via a prompt. For example, /goal all tests in test/auth pass, and the lint step is clean. Claude Code then runs, and every time the agent attempts to end its work, the evaluation model, which is Haiku by default, will check against the condition loop. If the condition is not met, the agent keeps running. If the condition is met, then it logs the achieved condition to the agent conversation transcript and clears the goal. There are only two decisions the evaluator makes, which is why the smaller Haiku model works well, whether it's done or not. 

Claude Code makes this possible by separating the model that attempts to complete a task from the evaluator model that ensures the task is actually completed. This prevents the agent from mixing up what it's already accomplished with what still needs to be done. With this method, Anthropic noted there’s no need for a third-party observability platform — though enterprises are free to continue using one alongside Claude Code — no need for a custom log, and less reliance on post-mortem reconstruction.

Competitors like Google ADK support similar evaluation patterns. Google ADK deploys a LoopAgent, but developers have to architect that logic.

In its documentation, Anthropic said the most successful conditions usually have: 

  • One measurable end state: a test result, a build exit code, a file count, an empty queue

  • A stated check: how Claude should prove it, such as “npm test exits 0” or “git status is clean.”

  • Constraints that matter: anything that must not change on the way there, such as “no other test file is modified”

Reliability in the loop

For enterprises already managing sprawling tool stacks, the appeal is a native evaluator that doesn't add another system to maintain.

This is part of a broader trend in the agentic space, especially as the possibility of stateful, long-running and self-learning agents becomes more of a reality. Evaluator models, verification systems and other independent adjudication systems are starting to show up in reasoning systems and, in some cases, in coding agents like Devin or SWE-agent. 

Sean Brownell, solutions director at Sprinklr, told VentureBeat in an email that there is interest in this kind of loop, where the task and judge are separate, but he feels there is nothing unique about Anthropic's approach.

"Yes, the loop works. Separating the builder from the judge is sound design because, fundamentally, you can't trust a model to judge its own homework. The model doing the work is the worst judge of whether it's done," Brownell said. "That being said, Anthropic isn't first to market. The most interesting story here is that two of the world’s biggest AI labs shipped the same command just days apart, but each of them reached entirely different conclusions about who gets to declare 'done.'"

Brownell said the loop works best "for deterministic work with a verifiable end-state like migrations, fixing broken test suites, clearing a backlog," but for more nuanced tasks or those needing design judgment, a human making that decision is far more important.

Bringing that evaluator/task split to the agent-loop level shows that companies like Anthropic are pushing agents and orchestration further toward a more auditable, observable system.

[/gpt3]

9 Greatest Robotic Vacuums (2025): Examined and Reviewed in Actual Properties
New ‘Jujutsu Kaisen’ Season 3 trailer reveals January 2026 launch
Greatest Apple MacBook Professional deal: Save over $1,000
Super Bowl 2026: When Bad Bunny performs, how to livestream it
PopSockets simply launched its first-ever Kindle Case — store the gathering now
Share This Article
Facebook Email Print

POPULAR

JW Marriott’s first all-inclusive is now taking reservations
Travel

JW Marriott’s first all-inclusive is now taking reservations

M5 MacBook Pro ,499 Low, iPhone 16e 9, Top Apple Deals
technology

M5 MacBook Pro $1,499 Low, iPhone 16e $449, Top Apple Deals

Software company owner convicted for running “cold, calculated”  billion Medicare fraud scheme
U.S.

Software company owner convicted for running “cold, calculated” $1 billion Medicare fraud scheme

As Trump meets with Xi, security expert says China now faces the U.S. as a peer
Politics

As Trump meets with Xi, security expert says China now faces the U.S. as a peer

Niall Horan Weighs In on Harry Styles, Zoe Kravitz’s Wedding
Entertainment

Niall Horan Weighs In on Harry Styles, Zoe Kravitz’s Wedding

Logistic Properties of the Americas (LPA) Q1 2026 Earnings Call Transcript
Money

Logistic Properties of the Americas (LPA) Q1 2026 Earnings Call Transcript

Scoopico

Stay ahead with Scoopico — your source for breaking news, bold opinions, trending culture, and sharp reporting across politics, tech, entertainment, and more. No fluff. Just the scoop.

  • Home
  • U.S.
  • Politics
  • Sports
  • True Crime
  • Entertainment
  • Life
  • Money
  • Tech
  • Travel
  • Contact Us
  • Privacy Policy
  • Terms of Service

2025 Copyright © Scoopico. All rights reserved

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?