Agentic RL Control Center

Track policy, reward, and safety signals

Live view of agentic reinforcement learning

Observe the full agent loop from task intake to policy updates. Every metric is backed by traces, JSONL artifacts, and safety checks.

Policy v5 active
Success rate

82%

Tasks completed end-to-end

Avg reward

2.15

Shaped reward (14d)

Safety pass rate

97%

Runs passing guardrails

Time-to-resolution

7.4 min

Median episode duration

Reward over time

Evidence of learning uplift across recent runs.

Reward shaping guides the agent toward clinically safe behavior by scoring each step, not just the final outcome.

Policy adoption mix

Track how cohorts split across policy versions.

Reward shaping

Weights can be edited under Policies & Rewards.

Episode return1.00
Safety compliance+0.45

Penalizes unsafe tool calls and missing safety documentation.

Outcome quality+0.35

Rewards alignment with expected clinical outcomes and rubric scoring.

Time to resolution+0.20

Rewards lower latency and fewer tool retries while keeping safety intact.

Policy diff
FieldPreviousProposed
Critique loop
Disabled
Enabled for safety review
+2% safety
Tool access
ehr.search, lab.order
+ medication.check with guardrail
Medication QA
Evidence-backed metrics
RL Metrics
Success rate

82%

Avg reward

2.15

Safety pass rate

97%

Time-to-resolution

7.4 min

Agentic loop at a glance

Task → Plan → Tool calls → Observations → Critique → Revise → Evaluate → Learn.

Use the Live Run view to inspect every tool invocation with JSON payloads and reward shaping signals.