RFC 0009: Agent Loop Extension
- Status: Draft
- Author(s): AltairaLabs Team
- Created: 2026-03-26
- Updated: 2026-03-26
- Related Issues: N/A
Summary
Extend the PromptPack workflow model (RFC 0005) to support agent loops — iterative, self-correcting execution patterns where an agent repeatedly observes, acts, and evaluates until a goal is reached or a budget is exhausted. The extension adds four concepts to the existing workflow state machine: terminal states, loop guards, artifacts (lightweight transition metadata that enables time-travel debugging and observability), and execution budgets. No new orchestration model is introduced — loops are expressed as cycles in the existing state graph.
Motivation
The workflow extension (RFC 0005) introduced a state machine over prompts with event-driven transitions. The existing model already permits cycles — a state's on_event can point back to itself or to an earlier state. This means the mechanism for looping exists, but the guardrails and context management for safe, productive loops do not.
Agent loops are the dominant pattern in autonomous AI systems. They appear across domains:
- Code generation: plan → write code → run tests → if tests fail, revise code → repeat
- Data exploration: form hypothesis → query data → analyze results → refine hypothesis → repeat
- Research synthesis: search → read → synthesize → identify gaps → search again → repeat
- Document generation: draft → review against criteria → revise → repeat until quality bar met
- Autonomous ops: monitor → detect anomaly → diagnose → remediate → verify fix → repeat
- ETL / data pipelines: discover schema → map fields → transform → validate → fix mapping errors → repeat
- Simulation / what-if analysis: set parameters → run simulation → analyze results → adjust → repeat
- Security audit: scan → triage findings → investigate → classify (true/false positive) → next finding → repeat
- Planning and re-planning: plan → execute step → observe outcome → re-plan if unexpected → repeat
These patterns share a common structure: iterate with accumulated context until a termination condition is met or resources run out.
Current Limitations
Without loop-specific extensions, developers must:
- Hope cycles terminate — no
max_visitsguard means a self-transition can run forever, consuming unbounded tokens and time - Lose intermediate state — the
persistencehint (transient/persistent) controls conversation history, but there's no structured way to carry transition metadata (commit refs, result summaries, error classifications) across loop iterations - Guess at completion — no explicit terminal states means a state with an empty
on_eventis ambiguous: is it finished, or is it misconfigured? - Implement budgets in runtime code — the
engineblock accepts arbitrary hints, but there's no standard vocabulary for resource limits
This leads to:
- Unbounded execution — agent loops with no termination guarantee are a safety and cost risk
- Lost transition context — each loop iteration starts without structured access to what prior iterations produced or decided
- Implicit termination — readers can't distinguish terminal states from incomplete workflow definitions
- Non-portable budgets — every runtime invents its own budget mechanism
Goals
- Enable safe, bounded agent loops using the existing workflow state machine
- Provide structured artifact passing between loop iterations
- Define explicit terminal states for clear workflow completion
- Standardize execution budget vocabulary in the
engineblock - Maintain full backward compatibility with existing workflow definitions
Non-Goals
- Introduce imperative control flow (if/else, while loops, expressions) into the spec
- Define parallel state execution or fork/join patterns (deferred per RFC 0005, Question 5)
- Specify how runtimes implement loop execution, context windowing, or artifact storage
- Define session memory, conversation history, or long-term memory — these are runtime concerns
- Replace or subsume the agents extension (RFC 0007) — agent loops are intra-agent, agents are inter-agent
- Define nested loop constructs — nested loops are achieved by composing agents (see Design Notes)
- Define artifact schemas or validation beyond type declarations
Detailed Design
Core Insight
An agent loop is a cycle in a directed graph — and the workflow state machine is already a directed graph. Rather than introducing a new loop construct, this RFC adds the missing pieces that make cycles practical:
- Terminal states — explicit "the workflow is done" markers
- Loop guards — per-state visit limits with forced-exit transitions
- Artifacts — lightweight, typed metadata (pointers, summaries, compact results) captured at each state transition, enabling both context passing and time-travel debugging
- Budgets — standardized resource limits in the
engineblock
Together, these turn the workflow model from a conversational state machine into an agent loop runtime — without changing its fundamental nature.
The generic agent loop pattern looks like this:
Schema Changes
Modified Definition: WorkflowState
Three new optional properties are added to WorkflowState:
{
"WorkflowState": {
"type": "object",
"description": "A single state in the workflow state machine. References a prompt task and declares event-driven transitions to other states. May be marked as terminal to indicate workflow completion, or guarded with max_visits to bound loop iterations.",
"required": ["prompt_task"],
"additionalProperties": false,
"properties": {
"prompt_task": {
"type": "string",
"description": "Reference to a prompt key defined in the pack's prompts object."
},
"description": {
"type": "string",
"description": "Human-readable description of this state's purpose."
},
"on_event": {
"type": "object",
"description": "Map of event name to target state name. When the named event fires, the workflow transitions to the target state.",
"additionalProperties": {
"type": "string"
}
},
"persistence": {
"type": "string",
"description": "Whether conversation context is kept (persistent) or reset (transient) on entry.",
"examples": ["transient", "persistent"]
},
"orchestration": {
"type": "string",
"description": "How this state is orchestrated: internal (runtime manages), external (caller manages), or hybrid.",
"examples": ["internal", "external", "hybrid"]
},
"skills": {
"type": "string",
"description": "Skill filter for this workflow state (per RFC 0008)."
},
"terminal": {
"type": "boolean",
"description": "If true, this state is a terminal state. The workflow completes after this state's prompt executes. Terminal states should not declare on_event transitions.",
"default": false,
"examples": [true]
},
"max_visits": {
"type": "integer",
"description": "Maximum number of times this state can be entered during a single workflow execution. When the limit is reached, the workflow transitions to the state named in on_max_visits. If on_max_visits is not set, the workflow terminates.",
"minimum": 1,
"examples": [5, 10, 20]
},
"on_max_visits": {
"type": "string",
"description": "Target state to transition to when max_visits is reached. Must reference a key in the states object. If omitted and max_visits is reached, the workflow terminates with a budget-exhausted status.",
"examples": ["review", "summarize", "error_handler"]
},
"artifacts": {
"type": "object",
"description": "Named artifact slots for lightweight, structured metadata that flows across state visits. Artifacts should be pointers (commit SHAs, URIs), compact representations (schemas, summaries, diffs), or small structured results — not bulk data. Artifact values are available to the prompt as template variables under the 'artifacts' namespace (e.g., {{artifacts.commit_sha}}).",
"additionalProperties": {
"$ref": "#/$defs/ArtifactDef"
},
"examples": [
{
"commit_sha": { "type": "text/plain", "description": "Git commit of the latest generated code" },
"test_report": { "type": "application/json", "description": "Structured test runner summary" }
}
]
}
}
}
}
New Definition: ArtifactDef
{
"ArtifactDef": {
"type": "object",
"description": "Declares a named artifact slot for carrying lightweight, structured metadata across workflow state visits. Artifacts are typically pointers (commit SHAs, file paths, URIs), compact representations (schemas, summaries, diffs), or small structured results — not bulk data. Values are captured at each state transition, forming an observable trace that enables time-travel debugging and workflow audit. They persist across loop iterations and are accessible to prompts as template variables.",
"required": ["type"],
"additionalProperties": false,
"properties": {
"type": {
"type": "string",
"description": "MIME type indicating the artifact's content type. Used by runtimes to determine serialization and presentation.",
"examples": ["text/plain", "application/json", "text/markdown", "text/x-python"]
},
"description": {
"type": "string",
"description": "Human-readable description of what this artifact contains and how it's used."
},
"mode": {
"type": "string",
"description": "How the artifact is updated across visits. 'replace' overwrites the previous value on each visit. 'append' accumulates content across visits (e.g., a log). Defaults to 'replace'.",
"enum": ["replace", "append"],
"default": "replace",
"examples": ["replace", "append"]
}
}
}
}
New Definition: WorkflowBudget
Added as a standardized property within the engine block:
{
"WorkflowBudget": {
"type": "object",
"description": "Resource budget for workflow execution. Provides a safety net to prevent unbounded loops. When any budget limit is reached, the workflow terminates with a budget-exhausted status. All fields are optional — omitting a field means no limit for that resource.",
"additionalProperties": false,
"properties": {
"max_total_visits": {
"type": "integer",
"description": "Maximum total state visits across all states in the workflow. This is a global safety net independent of per-state max_visits.",
"minimum": 1,
"examples": [50, 100, 200]
},
"max_tool_calls": {
"type": "integer",
"description": "Maximum total tool calls across all states in the workflow.",
"minimum": 1,
"examples": [100, 500]
},
"max_wall_time_sec": {
"type": "integer",
"description": "Maximum wall-clock time in seconds for the entire workflow execution.",
"minimum": 1,
"examples": [300, 600, 3600]
}
}
}
}
Modified Definition: WorkflowConfig
The engine block gains a standardized budget property alongside its existing additionalProperties: true behavior:
{
"WorkflowConfig": {
"properties": {
"engine": {
"type": "object",
"description": "Optional runtime engine configuration for workflow execution.",
"properties": {
"budget": {
"$ref": "#/$defs/WorkflowBudget",
"description": "Resource budget for workflow execution. Provides safety limits to prevent unbounded loops."
}
},
"additionalProperties": true
}
}
}
}
How Agent Loops Work
-
Define the cycle: Use
on_eventtransitions to create cycles in the state graph. A state can transition to itself (self-loop) or to any earlier state (back-edge). -
Guard the cycle: Set
max_visitson states that participate in loops. When the visit count is reached, the workflow transitions toon_max_visits(or terminates if omitted). -
Carry context: Declare
artifactson states that need structured working memory. The runtime populates artifact values from tool results or LLM output and makes them available to the prompt as template variables on subsequent visits. -
Set a budget: Declare
engine.budgetwith global resource limits as a safety net. Per-statemax_visitshandles expected iteration counts; the global budget catches unexpected runaway behavior. -
Mark the exit: Set
terminal: trueon states that represent workflow completion. This makes the state graph's exit points explicit and unambiguous.
Artifact Lifecycle
Artifacts follow a simple lifecycle:
- Declaration: States declare artifact slots with a MIME type and optional constraints.
- Population: During state execution, the runtime captures values from tool results or LLM output and writes them to artifact slots. The mapping from tool results to artifacts is a runtime concern (e.g., convention-based, explicit tool-result bindings in
engine). - Capture: The runtime snapshots artifact values at each state transition — the artifact value, the source state, the triggering event, and the target state form a single transition record.
- Access: On subsequent visits to the state (or transitions to other states that declare the same artifact name), the current artifact values are available as template variables under
{{artifacts.<name>}}. - Update: Each visit can update the artifact. In
replacemode, the new value overwrites the current value. Inappendmode, the new value is appended. Both modes produce a new transition record in the history. - Completion: When the workflow reaches a terminal state or is budget-exhausted, final artifact values are available as workflow output. The full transition history is available for debugging and audit.
Artifacts share a workflow-scoped namespace: if two states both declare artifacts.code, they reference the same artifact. State A can write it; state B can read and update it. This is the simplest model and matches how developers think about working memory. See Design Notes: Relationship to Runtime Memory for how artifacts relate to session memory and why this distinction enables time-travel debugging.
Design Notes
Relationship to Runtime Memory
Artifacts are not a memory system. They are typed, workflow-scoped data slots that are captured as part of each state transition — making them the observable, replayable record of what happened during workflow execution.
This distinction is fundamental:
-
Memory is point-in-time. A runtime's session memory tells you what the agent knows right now — but not how it got there, what it tried and discarded, or what the data looked like three iterations ago. You could add time-travel to a memory system, but the snapshots would be devoid of transition context (which state? which event? why did the agent move on?).
-
Artifacts are transition-scoped. Because they're captured at each state visit alongside the event that triggered the transition, the full artifact history gives you a structured trace: state A produced this code → event
TestsFailed→ state B saw these errors → eventCodeReady→ state A produced revised code. This enables time-travel debugging and workflow observability as first-class capabilities — runtimes can replay, inspect, and visualize every step of an agent loop.
| Concern | Mechanism | Scope | Defined by | Observable? |
|---|---|---|---|---|
| Conversation history | Runtime session memory | Across turns within a state | Runtime | Point-in-time only |
| Cross-state conversational context | Runtime session memory + persistence hint | Across state transitions | Runtime (informed by spec) | Point-in-time only |
| Structured transition metadata (pointers, summaries, compact results) | Artifacts | Across state visits within a workflow execution | Pack author (this RFC) | Full history at every transition |
| Long-term recall across sessions | Runtime memory system | Across workflow executions | Runtime | Point-in-time only |
A runtime that captures artifact values at every state transition gets a complete, structured execution trace for free. This is far more useful for debugging, auditing, and optimization than replaying conversation logs — because artifacts carry the data that drove each decision, not just the words around it.
Artifacts are opt-in and intentionally lightweight. They should be pointers to immutable state (a commit SHA, an S3 URI, a query ID), compact representations (a schema diff, a structured summary, a BAML description of what changed), or small structured results (a test report, a list of findings) — not bulk data. The actual data lives wherever it naturally lives (git, a database, object storage); the artifact is the reference that makes the transition trace meaningful and replayable.
A runtime with rich session memory can carry conversational context implicitly — artifacts add value for structured metadata that you want in the observable transition trace.
Example artifact trace for a codegen loop — each row captures artifact values at that state transition:
Transition │ Event │ commit_sha │ test_report │
──────────────────┼──────────────┼────────────┼─────────────┤
→ plan │ (entry) │ — │ — │
→ implement (1) │ PlanReady │ — │ — │
→ test (1) │ CodeReady │ abc123 │ — │
→ implement (2) │ TestsFailed │ abc123 │ 2/5 pass │
→ test (2) │ CodeReady │ def456 │ 2/5 pass │
→ review │ TestsPassed │ def456 │ 5/5 pass │
→ done │ Approved │ def456 │ 5/5 pass │
The full trace is a replayable record of what the agent produced and knew at every step — each row is a snapshot you can inspect, diff, or replay.
Nested Loops
Some use cases involve nested iteration — for example, a security audit that loops over findings (outer loop), where investigating each finding is itself an iterative process (inner loop). This RFC does not introduce nested state machines.
Instead, nested loops are achieved by composing agents via RFC 0007. The outer loop is one agent's workflow; the inner loop is a separate agent's workflow. The outer agent delegates to the inner agent via a tool call (resolved as an A2A call by the runtime):
# Outer agent's workflow
states:
triage:
prompt_task: triage_findings
max_visits: 20
on_event:
NextFinding: investigate
AllTriaged: report
investigate:
prompt_task: coordinator
# This prompt calls the 'deep_investigator' tool,
# which the runtime resolves to a separate agent
# that has its own workflow with its own loop
on_event:
InvestigationComplete: triage
This keeps the workflow model flat and composes complexity through agents — consistent with RFC 0007's design philosophy that workflows are intra-agent and agents are inter-agent.
Human-in-the-Loop
Many agent loop patterns benefit from human checkpoints — a reviewer approving generated code, a data scientist confirming a hypothesis direction, an operator authorizing a remediation action. This RFC does not prescribe a single HITL mechanism because multiple valid patterns exist:
- External orchestration: Set
orchestration: "external"on a state. The runtime pauses and waits for an external system (or human) to emit the transition event. - Async tool call: A tool like
request_human_reviewcreates a review request and returns asynchronously when the human responds. The agent loop waits on the tool result like any other tool call. - Event from outside: The workflow pauses at a state with no self-transition. An external system injects an event (e.g.,
ApprovedorChangesNeeded) when the human acts.
All three patterns work with the existing workflow model and the additions in this RFC. The max_visits guard ensures the loop doesn't run away while waiting for human input, and terminal states provide clear completion points.
Specification Impact
WorkflowStategains three optional properties:terminal,max_visits/on_max_visits, andartifacts- A new
ArtifactDefdefinition is added to$defs - A new
WorkflowBudgetdefinition is added to$defs - The
engineblock onWorkflowConfiggains a standardizedbudgetproperty - All existing fields and definitions remain unchanged
- Workflows without these new fields are unaffected
Validation Rules
-
Terminal state consistency: If
terminalis true,on_eventshould be empty or omitted. Validators should warn if a terminal state declares transitions (they would never fire). -
Loop guard references:
on_max_visitsmust reference a valid key in thestatesobject. If the target state itself hasmax_visits, ensure the workflow can still terminate (no infinite forced-exit chains). -
Artifact name uniqueness: Artifact names must be unique within a state's
artifactsobject (enforced by JSON object key uniqueness). -
Artifact template references: If a prompt template references
{{artifacts.<name>}}, the corresponding workflow state should declare that artifact. Validators should warn on mismatches. -
Budget coherence: If
engine.budget.max_total_visitsis set, it should be >= the sum of all reachable states'max_visitsvalues. Warn if per-state limits exceed the global budget (they'd never be reached). -
Backward compatibility: All new fields are optional. Existing workflows without
terminal,max_visits,artifacts, orbudgetremain valid and behave as before. -
Reachability: Validators should warn if a non-terminal state has no outgoing transitions and no
max_visits— this is ambiguous (is it terminal, or is it misconfigured?).
Examples
Example 1: Codegen Agent Loop
A code generation workflow that plans, implements, tests, and reviews — with the implement/test cycle bounded by max_visits.
YAML shown for readability (per RFC 0002). All examples are equally valid as JSON.
id: codegen-agent
name: Code Generation Agent
version: 1.0.0
template_engine:
version: v1
syntax: "{{variable}}"
prompts:
planner:
id: planner
name: Implementation Planner
version: 1.0.0
system_template: |
You are a software architect. Given the requirements, create a
step-by-step implementation plan.
Requirements: {{requirements}}
variables:
- name: requirements
type: string
required: true
coder:
id: coder
name: Code Writer
version: 1.0.0
system_template: |
You are an expert programmer. Write code according to the plan.
Plan: {{plan}}
Previous attempt (empty on first iteration): {{artifacts.commit_sha}}
What was changed: {{artifacts.change_summary}}
Test results: {{artifacts.test_report}}
If there are test failures from a previous attempt, fix them
while preserving passing behavior.
tools: [write_file, read_file]
parameters:
temperature: 0.2
test_runner:
id: test_runner
name: Test Runner
version: 1.0.0
system_template: |
Run the test suite against the generated code and report results.
Code revision: {{artifacts.commit_sha}}
tools: [run_tests, read_file]
parameters:
temperature: 0.0
reviewer:
id: reviewer
name: Code Reviewer
version: 1.0.0
system_template: |
Review the final code for quality, correctness, and style.
Code revision: {{artifacts.commit_sha}}
Changes: {{artifacts.change_summary}}
Test results: {{artifacts.test_report}}
parameters:
temperature: 0.3
summarizer:
id: summarizer
name: Summary Generator
version: 1.0.0
system_template: |
Summarize what was built, key decisions made, and any remaining concerns.
Final revision: {{artifacts.commit_sha}}
Change log: {{artifacts.change_summary}}
tools:
write_file:
name: Write File
description: Write content to a file
parameters:
type: object
properties:
path: { type: string }
content: { type: string }
required: [path, content]
read_file:
name: Read File
description: Read content from a file
parameters:
type: object
properties:
path: { type: string }
required: [path]
run_tests:
name: Run Tests
description: Execute the test suite and return results
parameters:
type: object
properties:
test_path: { type: string }
required: [test_path]
workflow:
version: 2
entry: plan
engine:
budget:
max_total_visits: 30
max_tool_calls: 200
max_wall_time_sec: 600
states:
plan:
prompt_task: planner
description: Create an implementation plan from requirements
on_event:
PlanReady: implement
implement:
prompt_task: coder
description: Write or revise code based on plan and test feedback
max_visits: 10
on_max_visits: review
artifacts:
commit_sha:
type: text/plain
description: Git commit SHA of the latest generated code
change_summary:
type: text/markdown
description: Compact summary of what was changed and why
test_report:
type: application/json
description: Structured test result summary (pass/fail counts, failure messages)
on_event:
CodeReady: test
NeedsRethink: plan
test:
prompt_task: test_runner
description: Run tests against the generated code
max_visits: 10
on_max_visits: review
artifacts:
test_report:
type: application/json
description: Structured test result summary
on_event:
TestsPassed: review
TestsFailed: implement
review:
prompt_task: reviewer
description: Review the final code for quality
on_event:
Approved: done
ChangesNeeded: implement
done:
prompt_task: summarizer
description: Generate a summary of what was built
terminal: true
Example 2: Agentic Data Exploration
An iterative data exploration workflow where the agent forms hypotheses, queries data, and refines its understanding.
id: data-explorer
name: Agentic Data Explorer
version: 1.0.0
template_engine:
version: v1
syntax: "{{variable}}"
prompts:
hypothesizer:
id: hypothesizer
name: Hypothesis Former
version: 1.0.0
system_template: |
You are a data scientist. Given the dataset description and any
previous findings, form the next hypothesis to investigate.
Dataset: {{dataset_description}}
Previous findings (empty on first iteration):
{{artifacts.findings}}
Queries already executed (avoid repeating these):
{{artifacts.queries_run}}
variables:
- name: dataset_description
type: string
required: true
querier:
id: querier
name: Data Querier
version: 1.0.0
system_template: |
Write and execute a SQL query to test the current hypothesis.
Return the query ID and summary statistics, not full result sets.
Hypothesis: {{artifacts.current_hypothesis}}
Dataset: {{dataset_description}}
tools: [run_query, describe_table]
parameters:
temperature: 0.1
analyst:
id: analyst
name: Results Analyst
version: 1.0.0
system_template: |
Analyze the query results in the context of the hypothesis.
Determine whether the hypothesis is supported, refuted, or inconclusive.
Hypothesis: {{artifacts.current_hypothesis}}
Query summary: {{artifacts.query_result_ref}}
Previous findings: {{artifacts.findings}}
parameters:
temperature: 0.3
reporter:
id: reporter
name: Report Generator
version: 1.0.0
system_template: |
Generate a comprehensive analysis report from all findings.
Findings:
{{artifacts.findings}}
tools:
run_query:
name: Run SQL Query
description: Execute a SQL query against the dataset
parameters:
type: object
properties:
sql: { type: string }
required: [sql]
describe_table:
name: Describe Table
description: Get schema and sample rows for a table
parameters:
type: object
properties:
table_name: { type: string }
required: [table_name]
workflow:
version: 2
entry: hypothesize
engine:
budget:
max_total_visits: 40
max_tool_calls: 100
max_wall_time_sec: 300
states:
hypothesize:
prompt_task: hypothesizer
description: Form or refine a hypothesis based on current knowledge
max_visits: 8
on_max_visits: report
artifacts:
current_hypothesis:
type: text/plain
description: The current hypothesis being investigated
findings:
type: application/json
description: Structured list of findings — each entry has hypothesis, verdict, and key metric
mode: append
queries_run:
type: application/json
description: Log of query IDs and their one-line summaries
mode: append
on_event:
HypothesisFormed: query
AnalysisComplete: report
query:
prompt_task: querier
description: Execute queries to test the current hypothesis
artifacts:
query_result_ref:
type: application/json
description: Query ID and summary statistics from the most recent query (full results available via tool)
on_event:
QueryComplete: analyze
QueryFailed: hypothesize
analyze:
prompt_task: analyst
description: Analyze query results against the hypothesis
on_event:
NeedsMoreData: hypothesize
HypothesisConfirmed: hypothesize
HypothesisRefuted: hypothesize
AnalysisComplete: report
report:
prompt_task: reporter
description: Generate final analysis report
terminal: true
Example 3: Minimal Self-Correcting Loop
The simplest possible agent loop — a single state that retries until it succeeds.
{
"$schema": "https://promptpack.org/schema/v1/promptpack.schema.json",
"id": "self-correcting",
"name": "Self-Correcting Agent",
"version": "1.0.0",
"template_engine": {
"version": "v1",
"syntax": "{{variable}}"
},
"prompts": {
"worker": {
"id": "worker",
"name": "Worker",
"version": "1.0.0",
"system_template": "Complete the task. If your previous attempt had errors, review them and try again.\n\nPrevious error (empty on first attempt): {{artifacts.error_summary}}"
},
"fallback": {
"id": "fallback",
"name": "Fallback Handler",
"version": "1.0.0",
"system_template": "The task could not be completed after multiple attempts. Summarize what was tried and why it failed.\n\nError summary:\n{{artifacts.error_summary}}"
}
},
"workflow": {
"version": 2,
"entry": "work",
"states": {
"work": {
"prompt_task": "worker",
"description": "Attempt the task, retrying on failure",
"max_visits": 3,
"on_max_visits": "give_up",
"artifacts": {
"error_summary": {
"type": "text/plain",
"description": "One-line error classification and message from the most recent failed attempt"
}
},
"on_event": {
"Success": "complete",
"Error": "work"
}
},
"complete": {
"prompt_task": "worker",
"terminal": true
},
"give_up": {
"prompt_task": "fallback",
"terminal": true
}
}
}
}
Example 4: Autonomous Remediation with Human Checkpoint
An ops agent that diagnoses and remediates infrastructure issues, with a human approval gate before applying fixes.
id: ops-remediation
name: Autonomous Remediation Agent
version: 1.0.0
template_engine:
version: v1
syntax: "{{variable}}"
prompts:
diagnostician:
id: diagnostician
name: Diagnostician
version: 1.0.0
system_template: |
You are an SRE investigating a production issue.
Alert: {{alert_description}}
Previous investigation steps (empty on first visit):
{{artifacts.diagnosis_log}}
Investigate the issue. Use available tools to gather data.
When you have a diagnosis and proposed fix, emit DiagnosisReady.
tools: [query_metrics, read_logs, check_service_health]
variables:
- name: alert_description
type: string
required: true
proposer:
id: proposer
name: Fix Proposer
version: 1.0.0
system_template: |
Based on the diagnosis, propose a remediation action.
Root cause: {{artifacts.diagnosis}}
Investigation steps: {{artifacts.diagnosis_log}}
Propose a specific, reversible fix as a structured action
(action type, target, parameters, rollback steps).
executor:
id: executor
name: Fix Executor
version: 1.0.0
system_template: |
Apply the approved remediation action.
Approved fix: {{artifacts.proposed_fix}}
Execute the action and run a health check.
tools: [apply_config, restart_service, run_healthcheck]
verifier:
id: verifier
name: Fix Verifier
version: 1.0.0
system_template: |
Verify that the remediation was successful.
Original alert: {{alert_description}}
Applied fix: {{artifacts.proposed_fix}}
Check that the original issue is resolved and no new issues
were introduced.
tools: [query_metrics, check_service_health]
postmortem:
id: postmortem
name: Postmortem Generator
version: 1.0.0
system_template: |
Generate a brief incident summary.
Root cause: {{artifacts.diagnosis}}
Investigation steps: {{artifacts.diagnosis_log}}
Fix applied: {{artifacts.proposed_fix}}
tools:
query_metrics:
name: Query Metrics
description: Query Prometheus/Grafana for metric data
parameters:
type: object
properties:
query: { type: string }
range: { type: string }
required: [query]
read_logs:
name: Read Logs
description: Read application or system logs
parameters:
type: object
properties:
service: { type: string }
since: { type: string }
required: [service]
check_service_health:
name: Check Service Health
description: Run health checks against a service
parameters:
type: object
properties:
service: { type: string }
required: [service]
apply_config:
name: Apply Config
description: Apply a configuration change
parameters:
type: object
properties:
target: { type: string }
config: { type: object }
required: [target, config]
restart_service:
name: Restart Service
description: Restart a service
parameters:
type: object
properties:
service: { type: string }
required: [service]
run_healthcheck:
name: Run Health Check
description: Run post-remediation health check
parameters:
type: object
properties:
service: { type: string }
required: [service]
workflow:
version: 2
entry: diagnose
engine:
budget:
max_total_visits: 20
max_tool_calls: 50
max_wall_time_sec: 1800
states:
diagnose:
prompt_task: diagnostician
description: Investigate the alert and gather diagnostic data
max_visits: 5
on_max_visits: propose
artifacts:
diagnosis:
type: text/plain
description: One-line root cause classification and affected component
diagnosis_log:
type: application/json
description: Structured list of investigation steps — tool called, what was found
mode: append
on_event:
DiagnosisReady: propose
NeedMoreData: diagnose
propose:
prompt_task: proposer
description: Propose a specific remediation action
artifacts:
proposed_fix:
type: application/json
description: Structured fix proposal — action type, target, parameters, rollback steps
on_event:
FixProposed: await_approval
await_approval:
prompt_task: proposer
description: Wait for human approval of the proposed fix
# Human-in-the-loop: the runtime pauses here.
# An external system (PagerDuty, Slack, etc.) injects
# the Approved or Rejected event when the human acts.
orchestration: external
on_event:
Approved: execute
Rejected: diagnose
execute:
prompt_task: executor
description: Apply the approved fix
on_event:
FixApplied: verify
verify:
prompt_task: verifier
description: Verify the fix resolved the issue
max_visits: 3
on_max_visits: postmortem
on_event:
Healthy: postmortem
StillBroken: diagnose
postmortem:
prompt_task: postmortem
description: Generate incident summary
terminal: true
Example 5: Backward-Compatible Workflow (No New Features)
Existing workflows remain valid. No new fields required — terminal, max_visits, artifacts, and budget are all optional:
workflow:
version: 1
entry: intake
states:
intake:
prompt_task: intake
on_event:
RequirementsGathered: planning
persistence: transient
planning:
prompt_task: planning
on_event:
PlanReady: execution
persistence: persistent
Drawbacks
Increased State Complexity
WorkflowStategains three new properties, making it the most complex definition in the schema- Authors need to understand the interaction between
max_visits,on_max_visits,terminal, andon_event
Mitigation: All new properties are optional. Simple workflows don't use them. The properties are orthogonal — each addresses one concern (termination, bounding, context, resources).
Artifact-Template Coupling
- Artifacts introduce a runtime-populated variable namespace (
{{artifacts.*}}), creating a dependency between workflow state definitions and prompt templates - If an artifact isn't populated (e.g., first visit), the template must handle the missing value
Mitigation: This is analogous to optional template variables — prompts already handle missing values via conditionals or defaults. The coupling is intentional: artifacts are the contract between iterations. They serve a dual purpose — carrying lightweight metadata forward and producing an observable transition trace. Conversational context that doesn't need to be in the trace flows through the runtime's session memory instead.
Budget Enforcement Is Runtime-Dependent
- The spec declares budgets but can't enforce them — a non-compliant runtime could ignore limits
- Different runtimes may measure "wall time" or "tool calls" differently
Mitigation: This is consistent with the spec's philosophy: define what, not how. Budgets are a contract between the pack author and the runtime, just like max_rounds on tool_policy.
Workflow Version Bump
- The examples use
version: 2to indicate the new capabilities, which may confuse runtimes that only supportversion: 1
Mitigation: Runtimes should treat unknown workflow versions as "unsupported" and fail gracefully. The version bump signals that new features are in use. Alternatively, version 1 workflows can use the new fields — see Unresolved Questions.
Alternatives
Alternative 1: Dedicated Loop Construct
Introduce a new loop block alongside workflow:
{
"loop": {
"entry": "implement",
"max_iterations": 10,
"exit_condition": "tests_pass",
"states": { "..." }
}
}
Rejected: Duplicates the workflow model. Loops are cycles in a state graph — the workflow already provides the graph. Adding a separate construct would create two overlapping orchestration mechanisms. Authors would need to choose between workflow and loop for the same patterns.
Alternative 2: Imperative Loop DSL
Add loop primitives (while, for, retry) to the workflow:
{
"states": {
"implement": {
"while": "tests_failing",
"do": ["write_code", "run_tests"],
"max_iterations": 10
}
}
}
Rejected: Violates the declarative philosophy established in RFC 0005. The workflow RFC explicitly rejected an imperative DSL (Alternative 4) to avoid embedding business logic and introducing security concerns. Event-driven transitions with guards preserve the declarative stance.
Alternative 3: Runtime-Only Solution
Leave loop management entirely to runtimes. Authors use the existing engine block for custom budget hints.
Rejected: This is the status quo and it's insufficient. Without standardized max_visits and terminal, every runtime invents its own vocabulary. Pack portability — a core PromptPack principle — suffers. The engine block is the right place for runtime-specific extensions, but termination, bounding, and working memory are universal concerns.
Alternative 4: Artifacts as a Separate Top-Level Section
Declare artifacts at the pack level rather than per-state:
{
"artifacts": {
"code": { "type": "text/plain" },
"test_results": { "type": "application/json" }
}
}
Rejected: Artifacts are scoped to workflow execution, not the pack. A pack without a workflow has no use for artifacts. Declaring them on states makes the relationship explicit: this state produces and consumes these transition metadata. It also documents intent — a reader can see at a glance what structured data flows through each state, which is essential for understanding and debugging agent loops.
Alternative 5: Extend tool_policy Instead
Add loop semantics to the existing tool_policy:
{
"tool_policy": {
"max_rounds": 50,
"on_error": "retry",
"result_bindings": { "run_tests.output": "artifacts.test_results" }
}
}
Rejected: tool_policy governs tool usage within a single prompt turn. Agent loops span multiple state transitions — they're a workflow-level concern. Overloading tool_policy with loop semantics would conflate two distinct scopes (intra-turn vs. cross-state).
Adoption Strategy
Backward Compatibility
- Fully backward compatible
- Requires migration
- Breaking change
All new fields are optional. Existing workflows without terminal, max_visits, artifacts, or budget remain valid and behave identically. Runtimes that don't support the new fields can ignore them.
Migration Path
No migration required. Adoption is incremental:
- Phase 1: Continue using workflows without loop features (current behavior)
- Phase 2: Add
terminal: trueto exit states for explicit termination - Phase 3: Add
max_visitsandon_max_visitsto states involved in cycles - Phase 4: Add
artifactsto states that need structured context across iterations - Phase 5: Add
engine.budgetfor global resource limits
Runtime Support Levels
- Level 0: Ignore new fields (backward compatible, existing workflow execution)
- Level 1: Enforce
terminalandmax_visits/on_max_visits— bounded loop execution - Level 2: Full artifact management with template variable injection and budget enforcement
Unresolved Questions
1. Should workflow.version bump to 2?
The new fields are optional and backward compatible — a version: 1 workflow with max_visits is valid. But a version bump signals to runtimes that new features may be in use.
Options:
- Option A: Bump to
version: 2when any new field is used (examples use this approach) - Option B: Keep
version: 1— new fields are additive, no version bump needed - Option C: Make
versiona minimum-supported-version —version: 2means "requires a runtime that understands agent loops"
Leaning toward Option C: It's the most informative for runtime negotiation.
2. Should runtimes be required to capture artifact history?
Artifacts are designed to be captured at each state transition, enabling time-travel debugging and observability. But should the spec require this, or merely enable it?
Options:
- Option A: Require runtimes to maintain the full artifact transition history for the duration of the workflow execution. This guarantees observability but increases storage and memory requirements.
- Option B: Recommend but don't require history capture. Runtimes at Level 1 support only enforce current artifact values; Level 2 runtimes capture full history.
- Option C: Leave it to runtimes entirely — the spec defines the slots and their semantics, not the observability infrastructure.
Leaning toward Option B: The Runtime Support Levels already distinguish between basic execution (Level 1) and full artifact management (Level 2). History capture fits naturally into Level 2. The spec should document the intended observability benefits so runtime authors understand why capturing history is valuable, without mandating a specific implementation.
3. How do runtimes map tool results to artifacts?
The spec declares artifact slots but doesn't specify how tool outputs populate them. Options:
- Option A: Convention-based — runtime maps tool results to same-named artifacts automatically
- Option B: Explicit bindings in
engine— e.g.,"result_bindings": {"run_tests": "test_results"} - Option C: LLM-mediated — the prompt instructs the LLM to produce structured output that maps to artifact slots
- Option D: Runtime-specific — leave it to runtimes entirely
Leaning toward Option D: This is consistent with the spec's non-goal of defining runtime execution semantics. The spec declares the slots; runtimes decide how to fill them. A future RFC could standardize bindings if a dominant pattern emerges.
4. Should there be a max_total_iterations alias?
engine.budget.max_total_visits is precise but verbose. Should we also accept max_iterations as an alias?
Proposal: No. One name for one concept. Aliases create confusion. "Visits" is consistent with the state machine mental model.
5. Should terminal states still execute their prompt?
When a workflow reaches a terminal state, should the prompt still execute (producing a final summary/output), or should terminal mean "stop immediately"?
Proposal: Terminal states execute their prompt. The terminal flag means "this is the last state" — the prompt produces the workflow's final output. If you want to stop without executing, use on_max_visits pointing to a no-op state.
Revision History
- 2026-03-26: Initial draft
- 2026-03-26: Expanded use cases; added Design Notes (runtime memory relationship, nested loops via agent composition, HITL patterns); added ops remediation example; resolved artifact sharing as workflow-scoped namespace; reframed artifacts as transition-scoped observable data enabling time-travel debugging, distinct from point-in-time runtime memory