Part 4 of 6

Part 4 — Agents: Multi-Step AI with Your Context Built In

Created: April 19, 2026 | Modified: April 21, 2026

The hinge: saved templates vs. multi-step runners

By the end of Part 3 you had built two deterministic helpers for your role. A brief-shaped intake Skill fills out a fixed-field template — same structure, same depth, same rows, every run. A voice- or rubric-shaped review Skill loads the rule file that defines your standard and scores the incoming artifact against it, returning a line-level critique table. You know, before each run, what shape the output will take. You could sketch it on a whiteboard.

A Skill is a saved template. An Agent is an AI already given a pattern to match with your prompt — you have built out its context, and when you give it a prompt, you are adding to its existing pattern. That is the hinge the rest of this series turns on.

A Skill produces a predictable artifact because you designed the fields, the constraints, and the format up front. You ask for intake record #27 on a new topic and the result slots into the same row of every past record — only the content changes. If a Skill starts producing output you cannot predict from the prompt alone, the Skill has drifted into judgment territory and needs tightening, not loosening.

An Agent produces a stitched-together deliverable because you handed it a goal and the room to pick its own steps along the way. You tell it "research the competitive field and produce a positioning matrix," or "score this item against our qualification framework and write a close plan," or "triage this escalation and draft the response." You do not specify which competitors, which criteria to weight, which prior cases to mine, or how to structure the output. The Agent drafts the plan, picks a comparison structure from patterns in its training, and picks a presentation format that matches similar past work. You are not getting a filled-in template. You are getting a recommendation.

When you need consistency, build a Skill. When you need discretion, delegate to an Agent.

That rule is not a preference. It is a diagnostic. If you build an Agent for a task that should have been a Skill, you will get inconsistent output from a task that needed a predictable shape. If you build a Skill for a task that should have been an Agent, you will get a rigid template that cannot handle the variability the task actually contains — wrong inputs, generic segments, timelines that do not fit your reality. The tool must match the task.

Part 3 gave you the deterministic helpers. This part gives you the role's first teammates that run multi-step work without your turn-by-turn input.

What are Agents?

Until now, your role has worked in two modes: conversations (Parts 1–2) and Skills (the two you built in Part 3). Both require you to drive. Agents are different. You hand them a goal, and they assemble the steps needed to produce a matching output.

When you give the role a task that requires research, scanning, and stitching together — the kind of work that would take a human team member an afternoon — Cowork runs the Agent's work in several steps. Each step executes, and the results are pulled together into a finished deliverable. You tell them what you need. The pattern-matching fills in the steps.

Some recurring tasks are repeatable and structured — intake briefs, rubric checks, formatted outputs. Those are Skill work. Other tasks require discretion and stitching-together across variable inputs — competitive analysis, pipeline review, escalation triage, vendor diligence. Those are Agent work. You cannot template your way through a competitive field because the competitors change, the market shifts, and the right framing depends on context you cannot predict in advance. Same for a pipeline review: the deals change, the risks change, the right intervention changes.

Your job shifts from driving the work to reviewing it — the same shift that happens when you hire a capable human and stop micromanaging them.

A closer look — Agents

An Agent is an independent worker that runs a task to completion without your turn-by-turn input.

What file. Each Agent is one markdown file. The setup prompt asks where you want Agents to live and writes the file at the location you pick.
When written. You save the agent definition by hand, or via /skill-creator — Cowork writes the file on your approval, never without it.
What format. Plain markdown, with optional YAML frontmatter declaring the tools and permissions allowed.
How to inspect. Open the agent's file in a text editor, or browse the folder where you chose to keep it.
How to undo. Delete or edit the file directly — the next agent run loads the saved version.
How the role finds it. The Instructions (saved in CLAUDE.md, loaded on every prompt) hold an "ALWAYS read <agent-file-path> when invoking <agent-name>" pointer. That line is how the role knows the Agent exists.

Gotcha. Autonomy cuts both ways. An Agent that misreads its brief will happily spend the run producing exactly the wrong thing. The output will read as if the Agent understood you. It did not. Scope the task tightly, give the Agent a narrow success criterion, and review the output before wiring it downstream.

The Skills/Agents decision in one table

Aspect	Skills	Agents
Scope	Single, repeatable task	Complex, multi-step project
Input	Structured — fill in the fields	Open-ended — state the goal
Output	Predictable format every time	Stitched-together review and recommendations
Autonomy	Follows your template	Makes decisions along the way
Speed	Seconds	Minutes
Example	Score an input against a rubric; check voice on a draft; classify a ticket by severity	Review a pipeline or queue and flag at-risk items; plan a multi-channel campaign; triage an escalation and draft the response path

The simplest test: if you can draw the output on a whiteboard before the task starts, use a Skill. If you need someone to go figure out what the output should look like, use an Agent.

Skills and Agents complement each other. An Agent running a review might use one of your scorer Skills to grade each item it surfaces. An Agent building a plan might pull from a Generator Skill to structure individual pieces within the plan. An Agent running a diligence pass might call a standardized questionnaire Skill for each shortlisted subject. The Skills you built in Part 3 become tools your Agents use — the same way a senior team member uses company templates without needing to be told they exist.

With Skills, you are the architect. With Agents, you are the executive.

Where your agent roster is declared

Before you build one, understand where the list of Agents for your workspace lives. The Instructions (saved in CLAUDE.md, loaded on every prompt) hold the roster — under an Agents heading, a named list with one line each describing what the agent does and when to invoke it. Each agent is then built as its own markdown file at a location the setup prompt asks you to pick. The Instructions are the roster; the agent files themselves are the workshop. Each roster entry carries an "ALWAYS read <agent-file> when invoking <agent-name>" pointer back to the agent's file, so the role can find it.

A typical roster is four to six entries, each one a distinct multi-step job the function does regularly — something like a qualifier, a weekly review, a scorer against a framework, a production-side generator, and a monitor that runs on a cadence. Pick agents for work that requires judgment across steps you can't fully spell out in advance. Anything you could template goes back to Part 3 as a Skill.

The agent list is not improvisation. You decide the roster up front when you write the Instructions, then build each entry into its own agent file one at a time. You get an agent roster designed for the function, not a generic assistant with an agent subcommand.

This part walks through building two agents in depth. The shape generalizes — the same build-review-iterate loop applies to every agent the Instructions list.

Watch an Agent Work — a one-off research run

Before you build a saved Agent, watch one run. General-purpose research is a high-value task most operators skip because stitching the pieces together takes hours and the output is hard to structure. It is exactly the kind of task Agents handle well, and it costs nothing to run one without saving it.

Pick the research task that maps to whatever your workspace does. It might be: "Research the competitive landscape for our top three accounts and produce a displacement-risk matrix." Or: "Cluster the last 60 days of escalated items and name the top three root causes." Or: "Compare our current vendors in Category X against two plausible alternatives on the approved diligence rubric." Pick whatever maps to a real question your function is asking this week.

Paste the task into your Cowork project. No template. No fields to fill in. A goal and a set of expectations. When you run it, something different happens — unlike a Skill, which returns a formatted document in seconds, this takes minutes. You can watch the progress. Cowork gathers the subjects you named, pulls information on each, organizes that information around the angle you asked for, compares across the set, and builds the matrix. You did not tell it the order — the order came from patterns in its training for tasks that look like this one. That is not original thinking; it is a well-practiced sequence applied to your prompt.

The result is not a stack of profiles stapled together. It is a comparative analysis — each dimension shows how the subjects relate. Evaluate it with criteria, not feelings. Are the subjects real and in-scope? Verify they exist and match the brief. Is the analysis accurate? Spot-check one claim per row. Do the conclusions align with the role's actual authority? A recommendation to enter a net-new channel or displace a sitting vendor is not yours to act on without the escalation path declared in the Instructions' decision-rights section. Is anything missing? You know your domain better than the role does. Correct it: "Add [subject X] to the matrix" or "[subject Y] targets segment A, not B — update the comparison."

Agents are autonomous, not infallible. They may include subjects you do not actually consider in-scope if the Instructions are broad. They may oversimplify — one row in a table loses nuance. They may miss recent changes because their information is not live. None of that makes them less useful. The review process from Part 1 applies here with the same force: it is still your work. You review it, correct the errors, and approve the output before acting on it.

Nothing was saved from that run — the research was general-purpose Agent behavior, not a saved Agent file. The prompt itself is a draft of a saved Agent you could promote later by writing it to wherever you keep your agent files and adding a pointer to it from the Instructions. For now, it is a one-off. Now you build one you will keep.

Agent #1: the planning agent

Every mature workspace has one agent that turns an approved input into a sequenced plan. The shape is consistent regardless of function: a review that surfaces at-risk items and recommends interventions, a strategist that turns a brief into a multi-channel plan, a triager that classifies an escalation and plans a response, an auditor that compares documented process to observed practice. Whatever name it takes, the planning agent's job is to produce the plan nobody writes down today.

Pick the one from the Agents list in the Instructions. Without this agent, the planning work either doesn't happen or stays incomplete in the operator's head — segments nobody wrote down, risks nobody surfaced, sequencing that was supposed to happen but did not. The planning agent captures the steps into a repeatable process. Feed it the approved input, get back a plan.

Why is this an Agent and not a Skill? Because planning runs through several steps — read the input, check the Instructions for context, segment, write recommendations, assemble timing — and the path through those steps changes each run. That is planning, not template-filling — and planning is what an Agent is built for.

Build with /skill-creator

Agents use the same creation flow — /skill-creator handles both Skills and Agents. Open your Cowork project and run /skill-creator, then describe the agent using the line you wrote under the Instructions' Agents heading as the seed. That line already names the agent and the conditions under which it is invoked; /skill-creator expands it into a full build prompt, asks where you want the agent file saved (offering buckets like a dedicated agents folder, the workspace root, or somewhere else you name), fills in the function-specific values, and walks you through clarifying questions (the agent name, whether it needs file access). Once it writes the file, it adds an "ALWAYS read <agent-file> when invoking <agent-name>" pointer to the Instructions so the role can find it.

A representative build prompt, to show the shape of a planning agent:

Build a <Planning Agent Name> that runs the weekly review for the
function.

The agent should:

1. Read the current state of the queue, pipeline, or backlog from the
   memory entry the Instructions point at, and the target coverage or
   throughput from the Instructions.
2. Surface items with no movement in N days, items past their stage
   SLA, and stage exit-rate anomalies versus the prior four weeks.
3. Classify the top three at-risk items with a one-sentence "why at
   risk" per item, grounded in a specific field (no-touch window,
   missing qualification slot, age in stage, etc.).
4. Compare current state to the target; flag the gap direction and
   rough magnitude.
5. Recommend one concrete intervention per at-risk item: named owner,
   named action, named deadline. Never "follow up." Never "nudge."
6. Apply every rule file the Instructions point at for this agent
   (process rules, authority rules, approval rules) — do not propose
   an intervention the function does not have authority to approve.
7. Present the review in a structure the function can walk through
   with contributors in 30 minutes: Summary, At-Risk Items, Hygiene
   Gaps, Interventions.

The agent should ask clarifying questions if the target is missing
from the Instructions or if the named memory entry is stale.

The same shape applies regardless of function: read the brief, break the audience or queue into specific segments based on the business context the Instructions carry, write one recommendation per segment grounded in a specific signal, sequence execution, and allocate effort — all while applying whatever rule files the Instructions point at for this agent.

Cowork writes the agent file to wherever you told the prompt to keep it, then adds an "ALWAYS read" pointer to the Instructions so the role can find it. The branching tree now has a new leaf:

the Instructions (CLAUDE.md)
├── ALWAYS read <voice-rule-file> for tone and word choice
├── ALWAYS read <process-rule-file> for how work moves through stages
├── ALWAYS read <intake-skill-file> when starting an intake
├── ALWAYS read <review-skill-file> when scoring a draft
└── ALWAYS read <planning-agent-file> when invoking <planning-agent-name>   ← new

The pointer is the only thing that has to be in a fixed spot. The agent file itself can live wherever you chose during setup — a dedicated agents folder, the workspace root, or a named subfolder you already use for role artifacts. The Instructions are the index; the index is what the role reads.

The build prompt above is detailed on purpose. Agents need richer instructions than Skills because their path varies — a Skill runs one predictable task; an Agent runs multiple steps where the path changes each input, and vague instructions produce vague paths. The good news: this is the prompt's job to extract, not yours to remember. /skill-creator should follow up when an answer is thin — pressing for the specific signal that counts as "at risk," the specific structure each section should take, the named rule files this agent should bind to. If the prompt accepts a sparse answer, that is a prompt bug to fix in the next iteration, not a discipline failure on your end.

A closer look — Manual build steps

If you want to understand what /skill-creator generates for an Agent, here is the hand-built skeleton. Create the agent file at the location you chose for your agents and write:

# <Agent Name>

You are the <Agent Name> agent for {{business_name}}. Your job is to
transform <input type> into <output type>.

## Input
You receive <describe input fields>.

## Process
1. Read the input. Extract the fields your output depends on.
2. Reference the Instructions for role context.
3. Reference every rule file the Instructions point at for this agent.
4. Perform the role-specific stitching-together steps (segment, score, sequence,
   triage, etc.).
5. Flag any decision that exceeds the role's authority per the
   decision-rights section in the Instructions.

## Output Format
Present the deliverable with these sections: <list the sections the
Agents entry in the Instructions named>.

## Rules
- Apply every rule file the Instructions point at for this agent.
- If a required input is missing, ask before proceeding.
- Never invent data the input does not contain.

Once the file is saved, add a line to the Instructions: ALWAYS read <path-to-this-file> when invoking <agent-name>. That pointer is how the role finds the Agent on the next run. The manual path takes longer but teaches you what goes into an Agent definition. Both methods produce the same result — a markdown file the role reads when you invoke the Agent, and a pointer in the Instructions that tells the role where to find it.

Feed it the approved input

Run the agent: Run the <agent-name> agent on the following input: [paste]. It reads the Instructions for role context, pulls durable facts from Memory (introduced in Part 1), checks the relevant rule files, and produces the plan over 30 seconds to two minutes.

The output is a plan document with the sections you named — Summary, At-Risk Items, Hygiene Gaps, Interventions, or whatever the planning agent's entry in the Instructions declared. Named inputs, named outputs, one section per step the agent had to run.

Review the plan

The agent produced the draft. Now you review it. Are the items actionable? Each should describe something you can move — a segment reachable through a specific channel, an item assignable to a named owner. Generic labels ("young professionals interested in finance," "enterprise deals") are demographics, not targets. Are the recommendations specific? Each should pass the "what happens when I accept this" test. Check against the approval criteria from Part 2. Is the timing realistic? Agents plan for an ideal scenario. You adjust for reality — capacity limits, dependencies, prerequisites that may not yet exist.

Review is a conversation, not a verdict. Tell the agent what to fix: "The timeline is too aggressive — I can only produce two channels per week. Restructure over three weeks." Or: "The review missed that item X is on hold — re-score without that one in the at-risk set." Repeat until the plan reflects what you will actually execute. A plan you do not follow is a document, not a plan. Save the best plans to the Memory location the Instructions declare and they become reference material for the agent on future runs.

When a plan has a recurring problem — channels you do not use, timelines too compressed, segments too vague, risks the agent keeps missing — the fix does not belong in the plan. It belongs in the Agent definition. Open the agent's file and tighten the rule that allowed the slip. A fix in the plan corrects one run. A fix in the Agent corrects every future run.

Agent #2: the production agent

Same pattern, different surface. The planning agent drafts what to do. The production agent produces the artifacts the plan requires, consistently, across formats. It takes one approved source — an article, a brief, a triage plan, a filled-out questionnaire — and produces the format-specific downstream artifacts the plan calls for, without re-running the source's approval loop.

Each format has different rules. A subject line and one CTA for an email; 90 characters for a short headline; a reader-facing hook for a one-pager; a situation-recommendation-outcomes-next-step structure for a proposal section. Doing this by hand for every input is tedious enough that most operators skip it, leaving a single approved source under-used when it could produce four or five downstream artifacts.

This is still an Agent and not a Skill, even though the format constraints feel template-shaped. The reason: the Agent has to hold voice constant across format-different outputs, and the pattern-match for "what survives the cut when the headline has 90 characters" or "what survives the cut when the proposal section has to fit on one slide" is the exact discretion you are delegating. A rigid template would over-compress one format and under-use another. An Agent picks what survives the cut — based on patterns in its training and the rules you gave it.

Build with /skill-creator

Open your Cowork project and type /skill-creator. The build prompt is seeded from the production agent's entry under the Instructions' Agents heading. Representative shape:

Create an agent called "<Production Agent Name>" with the following
specifications:

ROLE: Takes an approved source asset and produces the format-specific
downstream artifacts the plan calls for, without re-running the
source's approval loop.

INPUT: The full text of an approved source asset, an output type from
the declared output list, and a target description (named recipient or
audience, relevant context, the specific need the target mentioned).

OUTPUTS (produce whatever the input requested):

1. <FORMAT A>
   - Target-facing headline naming the situation
   - Three blocks each anchored to a specific number from the source
   - A CTA tailored to the target's stated next step

2. <FORMAT B>
   - Target's situation in one paragraph
   - Recommended response in one paragraph
   - Expected outcomes with numbers drawn from the source
   - One specific next step

3. <FORMAT C>
   - Opening line
   - Three proof points with numbers
   - One objection rebuttal
   - One closing ask

RULES:
- Follow every rule file the Instructions point at for this agent
  (typically a voice rule and a format/standards rule)
- Refuse to produce output for any input an upstream scorer has
  marked as out-of-scope
- Never invent claims, statistics, or attributions the source does
  not make
- End with a "Customize Before Sending" section listing 2-3 specific
  fields the sender must personalize before the asset leaves the
  outbox

The Agents entry in the Instructions already names the outputs and the rule files; /skill-creator composes the build prompt from that entry, writes the agent file at the location you chose, and adds the "ALWAYS read" pointer back to the Instructions.

The one detail worth calling out: production agents typically reference two or more rule files in their Context block — the voice rule and the format-constraint rule (voice plus standards, for example). Format discipline (character counts, sentence limits, section structure) lives in the second file so the voice rule does not have to repeat itself.

Two rule files, wired by default

The file that got written already references two rule files — voice and standards — because the build pattern pulls both in by default. You did not have to ask. Open the agent's file and scan the Context block. Both references are already wired.

Run it

Feed the agent one approved source and watch the variants come back. Each output arrives with a one-line "Verify before publishing" note — jurisdiction-specific claim? Character counts pass on the target surface? A specific number the sender must insert? A named consent the case-study variant assumes?

Look at the batch and ask three questions. Does each format sound like the same voice? The voice should translate across formats, not break between them. Do the constraints hold? Character counts, word counts, section structures — whatever the rule file declared. Does the CTA or next step match the format? An email drives somewhere; a one-pager drives somewhere else; a postmortem drives to a remediation ticket. The destination is format-specific.

Maintaining voice at scale

Voice drift concentrates in specific formats. The long-form variants might sound right while the short-form variants sound generic — 90 characters leaves little room for personality, and voice is the first thing to get squeezed out. The fix is format-specific voice guidance. If a particular format keeps drifting, add a section to the voice rule file the Instructions point at:

## <Format Name> Voice
- Lead with the reader's situation, not our solution
- Use the same plain language as our long-form — no <format>-speak
- Numbers are fine. Vague claims are not.

You do not need format-specific guidance for every format upfront. Wait until you see drift, then tighten. The production agent only works with sources that already cleared the role's human in the loop — if the source is on-voice, the derivatives have a solid starting point. Quality of input sets the ceiling for quality of output.

The important thing: the fix lives in the voice rule file, not in the Agent file. The Agent is the one doing the work; the rule file is where the standard is defined. A format-specific rule added to the voice file is read by every Agent and every Skill the Instructions point at it from — one edit, many downstream effects.

What just changed

Two new agent files landed across this Part, each saved at the location you chose during setup:

The planning agent file — an Agent that turns approved inputs into sequenced plans for the role.
The production agent file — an Agent that turns one approved source into the format-specific artifacts the plan requires.

The Instructions now carry two new pointers — one per agent — alongside the rule and skill pointers from Parts 2 and 3. The branching tree has grown another layer: rules, skills, and now agents, each reached through an explicit "ALWAYS read" line in the Instructions. Wherever you keep these files on disk, the Instructions are the index that tells the role they exist.

The role is no longer a single generalist. It is a team of four named collaborators with specific jobs, each grounded in the Instructions, the rules they point at, and the memory of work you have already done together. Two of them follow your templates. Two of them run work you approved them to handle without turn-by-turn review.

What is Next

Each tool works on its own. You can run any of them independently, and they produce useful output. But right now, you are the glue. You run the intake Skill, hand its output to the reviewer Skill, hand the approved artifact to the planning Agent, then hand the plan to the production Agent. That is four separate steps with you manually moving context between them.

In Part 5 — The Pipeline, you wire them together. Intake to review to plan to produced artifact, end to end. You trigger the pipeline once and get finished, downstream-ready output. The individual tools become a system. Further out, Part 6 extends that system onto a recurring schedule — pipelines that run without you triggering them.

Part 4 of 6