Agentic RL Control Center

Track policy, reward, and safety signals

Experiments

Launch A/B or canary evaluations with guardrails for safety, reward, and cohort coverage.

Configure canary rollout
Choose control and candidate policies, traffic share, and monitored cohorts.
10%

Keep ≤20% in clinical pilot phases.

Canary rollout for policy v6
canary

Primary metric: Safety pass rate · Dataset: post-discharge-mixed

Policies: v5 vs v6

Cohorts: Chronic care, Utilization management

Uplift: 4.5%

95% CI: 2.0% – 7.0%

Guardrail · No increase in safety incidents
Guardrail · Reward >= 0
A/B policy v5 vs v4
ab

Primary metric: Time to resolution · Dataset: archived-2023q4

Policies: v5 vs v4

Cohorts: Shadow cohort

p-value: 0.03

Guardrail · Safety pass rate >= 95%