Experiments
Launch A/B or canary evaluations with guardrails for safety, reward, and cohort coverage.
Configure canary rollout
Choose control and candidate policies, traffic share, and monitored cohorts.
10%
Keep ≤20% in clinical pilot phases.
Canary rollout for policy v6
canary
Primary metric: Safety pass rate · Dataset: post-discharge-mixed
Policies: v5 vs v6
Cohorts: Chronic care, Utilization management
Uplift: 4.5%
95% CI: 2.0% – 7.0%
Guardrail · No increase in safety incidents
Guardrail · Reward >= 0
A/B policy v5 vs v4
ab
Primary metric: Time to resolution · Dataset: archived-2023q4
Policies: v5 vs v4
Cohorts: Shadow cohort
p-value: 0.03
Guardrail · Safety pass rate >= 95%