Live view of agentic reinforcement learning
Observe the full agent loop from task intake to policy updates. Every metric is backed by traces, JSONL artifacts, and safety checks.
82%
Tasks completed end-to-end
2.15
Shaped reward (14d)
97%
Runs passing guardrails
7.4 min
Median episode duration
Evidence of learning uplift across recent runs.
Reward shaping guides the agent toward clinically safe behavior by scoring each step, not just the final outcome.
Track how cohorts split across policy versions.
Weights can be edited under Policies & Rewards.
Penalizes unsafe tool calls and missing safety documentation.
Rewards alignment with expected clinical outcomes and rubric scoring.
Rewards lower latency and fewer tool retries while keeping safety intact.
82%
2.15
97%
7.4 min
Task → Plan → Tool calls → Observations → Critique → Revise → Evaluate → Learn.
Use the Live Run view to inspect every tool invocation with JSON payloads and reward shaping signals.