Live view of agentic reinforcement learning

Observe the full agent loop from task intake to policy updates. Every metric is backed by traces, JSONL artifacts, and safety checks.

Policy v5 active

Success rate

82%

Tasks completed end-to-end

Avg reward

2.15

Shaped reward (14d)

Safety pass rate

97%

Runs passing guardrails

Time-to-resolution

7.4 min

Median episode duration

Reward over time

Evidence of learning uplift across recent runs.

Reward shaping guides the agent toward clinically safe behavior by scoring each step, not just the final outcome.

Policy adoption mix

Track how cohorts split across policy versions.

Reward shaping

Weights can be edited under Policies & Rewards.

Episode return1.00

Safety compliance+0.45

Penalizes unsafe tool calls and missing safety documentation.

Outcome quality+0.35

Rewards alignment with expected clinical outcomes and rubric scoring.

Time to resolution+0.20

Rewards lower latency and fewer tool retries while keeping safety intact.

Policy diff

FieldPreviousProposed

Critique loop

Disabled

Enabled for safety review

+2% safety

Tool access

ehr.search, lab.order

+ medication.check with guardrail

Medication QA

Quick actions

Start new task Watch live run Compare policies

Evidence-backed metrics

RL Metrics

Success rate

82%

Avg reward

2.15

Safety pass rate

97%

Time-to-resolution

7.4 min

Agentic loop at a glance

Task → Plan → Tool calls → Observations → Critique → Revise → Evaluate → Learn.

Use the Live Run view to inspect every tool invocation with JSON payloads and reward shaping signals.