Managing 10,000 Agents Through a Product Transition

Managing 10,000 Agents Through a Product Transition | AI PM Portfolio

Managing 10,000 Agents Through a Product Transition

June 14, 2022 · 15 min read · Case Study / Playbook

Rolling out AI-assisted workflows to 10,000 call center agents at a national tax services company taught me that technology adoption is 20% technology and 80% change management. We used a 4-phase rollout -- 1% to 10% to 50% to 100% -- over 5 months. Agent satisfaction dropped 22 points in the first two weeks, then recovered to 11 points above baseline by month 4. The difference between success and failure was not the AI. It was the champion network: 340 peer advocates embedded across 47 teams who made the transition feel like an upgrade rather than a replacement.

Why do most AI rollouts to large workforces fail?

According to a 2022 BCG study, 74% of enterprise AI transformations fail to scale beyond the pilot phase. The most common cause is not technical failure. It is workforce resistance. At a national tax services company with 6,000 franchise locations and over 10,000 call center agents, we were rolling out an AI system that would fundamentally change how agents processed tax inquiries, verified client information, and routed cases.

The agents had an average tenure of 4.3 years. They were good at their jobs. They had built muscle memory around existing workflows that took months to develop. And we were about to tell them that an AI system would now handle the parts of their job they had spent years mastering.

I had seen the Gartner statistic: organizations that invest in change management are 6x more likely to meet AI project objectives. But statistics do not tell you what to do on the morning when 200 agents refuse to log into the new system. That happened on day 3 of our pilot.

What did the 4-phase rollout look like?

Phase 1: Proof of Concept (1% -- 100 agents, Weeks 1-3)

Hand-selected agents from 8 locations. Criteria: mix of tenure levels, mix of performance levels, at least 3 vocal skeptics. The skeptics were intentional -- if we could convert them, they became our most credible advocates. AI ran in "shadow mode" alongside the existing workflow. Agents could see AI suggestions but were not required to use them.

Phase 2: Validated Pilot (10% -- 1,000 agents, Weeks 4-8)

Expanded to 1,000 agents across 22 locations. AI moved from shadow mode to "assist mode" where agents received AI recommendations but made all final decisions. First real resistance emerged. Daily feedback sessions. 14 workflow adjustments made during this phase based on agent input.

Phase 3: Scaled Deployment (50% -- 5,000 agents, Weeks 9-16)

The hardest phase. Champion network activated. AI moved to "default mode" where AI-generated recommendations were pre-populated and agents confirmed or overrode. Productivity dip of 18% in weeks 9-10, recovered by week 13. This is where most rollouts die -- the productivity dip scares leadership into pulling back.

Phase 4: Full Deployment (100% -- 10,000 agents, Weeks 17-22)

Remaining 5,000 agents onboarded. By this point, Phase 3 agents were outperforming pre-AI baselines by 23%. The transition was smoother because agents in the remaining locations had heard from peers that the system worked. Social proof had replaced our marketing.

How did agent satisfaction actually change over time?

We measured agent satisfaction weekly using a 7-question pulse survey (1-100 scale). The pattern surprised us -- it was not a straight line in either direction. It was a J-curve.

Period Agents Active Satisfaction Score Change from Baseline
Pre-rollout baseline 10,000 68 --
Week 1-2 (Phase 1) 100 61 -7
Week 4-5 (Phase 2 start) 1,000 46 -22
Week 7-8 (Phase 2 end) 1,000 58 -10
Week 9-10 (Phase 3 start) 5,000 52 -16
Week 13-14 (Phase 3 mid) 5,000 67 -1
Week 16 (Phase 3 end) 5,000 74 +6
Week 22 (Phase 4 end) 10,000 79 +11

The critical moment was week 4-5, when satisfaction bottomed at 46. That 22-point drop represented real anger. Agents felt the AI was second-guessing them. One agent told me, "I have been doing this for 7 years. Now a computer is telling me I am wrong." That quote kept me up at night because she was not wrong to feel that way. We had designed the AI feedback to sound corrective rather than collaborative.

According to Prosci's research on enterprise change management, the average adoption dip during technology transitions lasts 6-10 weeks. Ours lasted 9 weeks (weeks 4-13). The recovery to above-baseline happened because of two deliberate interventions: redesigning AI suggestions to feel like a colleague offering help rather than a system correcting errors, and activating the champion network.

What is the champion network pattern?

The champion network was the single most impactful element of the entire rollout. It was not in our original plan. We invented it in week 5, when satisfaction hit 46 and we realized that no amount of top-down communication would fix a bottom-up trust problem.

The structure was simple:

  1. Identify early adopters: From the Phase 1 and early Phase 2 agents, we identified 47 who had genuinely embraced the AI workflow -- not because they were told to, but because they found it useful. Their average satisfaction score was 78 while the overall was 46.
  2. Recruit them as champions: We asked each of the 47 to recruit 5-8 peers from their teams. This was voluntary. 340 agents agreed. They received no extra pay, but they received early access to new features and a direct feedback channel to the product team.
  3. Equip with stories, not scripts: Champions were not given talking points. They were given permission to share their honest experience. Most said some version of: "I was skeptical too. Here is what changed my mind." Peer testimony is 4x more effective than management communication according to McKinsey's 2021 change management research.
  4. Create a feedback loop: Champions reported issues and suggestions through a dedicated Slack channel. We committed to responding within 4 hours and shipping improvements within 2 weeks. Over 5 months, champions submitted 283 suggestions. We implemented 94 of them. That 33% implementation rate was itself a powerful signal -- agents could see their feedback changing the product.

The champion-to-agent ratio was 1:29 (340 champions for 10,000 agents). According to organizational change research by Kotter, effective change networks need a ratio between 1:20 and 1:50. We were in the sweet spot.

What did we change based on agent feedback?

The 94 implemented suggestions fell into three categories. The breakdown reveals what agents actually cared about:

Category Suggestions Received Implemented Example
AI suggestion tone/framing 112 41 "Try: Dependent may qualify for EITC" instead of "Error: EITC not claimed"
Workflow efficiency 98 36 Keyboard shortcuts for accepting/rejecting AI suggestions
Override experience 73 17 One-click override without justification for low-risk suggestions

The tone/framing category was revelatory. Nearly 40% of all feedback was about how the AI communicated, not what it recommended. When we changed "Error: Missing Schedule B" to "Suggestion: Client may have interest income -- ask about bank accounts," agent acceptance of AI recommendations jumped from 54% to 71% in two weeks. The AI's accuracy had not changed. Only its voice had.

This directly applied the compliance-first framework we had developed: treat user experience requirements -- even internal user experience -- as first-class design constraints.

How do you handle the productivity dip without losing executive support?

The 18% productivity dip in weeks 9-10 of Phase 3 was expected. We had modeled it. But expecting a number on a spreadsheet and watching 5,000 agents process 18% fewer cases per day are different experiences for an executive team.

Three tactics preserved executive support:

  1. Pre-committed metrics and timeline: Before Phase 3 began, we presented the executive team with our prediction: 15-20% productivity dip for 3-4 weeks, recovery by week 13, and above-baseline by week 15. We documented this prediction in writing. When the dip hit 18%, we pointed to the prediction. The actual matched the forecast, which built credibility rather than alarm.
  2. Leading indicators alongside lagging: While cases-per-day dropped 18%, error rate dropped 31% simultaneously. We reported both. The productivity dip was agents being careful with a new system. The error reduction showed the system was working. According to Harvard Business Review, organizations that track leading indicators during transitions are 2.7x more likely to maintain executive commitment through the adoption dip.
  3. Weekly champion stories: Every Monday, we shared one champion story with the executive team. Not metrics -- a story. "Maria in the Phoenix office was processing 34 cases/day manually. In week 3 with AI, she dropped to 28. By week 8 she hit 41 and told her team lead the old way felt slow." Stories are more persuasive than dashboards when executives are nervous.

What were the final outcomes?

At the end of 22 weeks, the rollout was complete. Here are the before-and-after numbers across all 10,000 agents:

  • Cases per agent per day: 34 (baseline) to 42 (post-transition) -- a 23% improvement
  • Error rate per case: 4.8% to 2.1% -- a 56% reduction
  • Average handle time: 14.2 minutes to 11.8 minutes -- a 17% reduction
  • Agent satisfaction: 68 to 79 -- an 11-point increase
  • Agent attrition (monthly): 3.4% to 2.8% -- a 0.6 percentage point decrease
  • Customer satisfaction (CSAT): 72 to 81 -- a 9-point increase

The attrition decrease was worth approximately $1.2 million annually. According to the Society for Human Resource Management, replacing a call center agent costs 50-75% of their annual salary. At 10,000 agents with an average salary of $38,000, each percentage point of attrition reduction saved roughly $2 million. The 0.6-point decrease: $1.2 million.

The CSAT improvement was a surprise. We had not predicted it. But faster, more accurate case resolution directly improved the customer experience. The processing efficiency improvements we later measured showed this was not coincidental -- speed and quality moved together when AI augmented rather than replaced human judgment.

The Core Lesson: At scale, the product is the change management. The best AI system in the world fails if the people using it do not trust it, understand it, or feel ownership over it. The champion network worked because it turned adoption from a corporate mandate into a peer recommendation. You cannot memo your way to adoption. You have to earn it, 340 champions at a time.

Frequently Asked Questions

How do you identify good champions?

Look for agents who are respected by peers (not necessarily top performers), who were initially skeptical but converted, and who give specific feedback rather than vague complaints. Our best champions were not the earliest adopters -- they were the early skeptics who became believers. Their conversion story was more credible than someone who was enthusiastic from day one.

What is the right pace for a phased rollout?

For 10,000 users, our 22-week timeline (roughly 5 months) was appropriate. The rule of thumb: spend 30% of the timeline in Phases 1-2 (learning) and 70% in Phases 3-4 (scaling). Rushing Phases 1-2 means you scale problems instead of solutions. We made 14 workflow changes during Phase 2 that would have been catastrophic at 10,000-agent scale.

How do you handle agents who never adopt?

At week 22, approximately 4% of agents (roughly 400) were still resistant. We did not force adoption. We created a "classic workflow" option that used the old process with AI running in background analytics mode. Over the next 3 months, attrition in this group was 2x higher than the AI-adopter group -- most left voluntarily. The remaining holdouts gradually transitioned as they saw peers benefiting. By month 8, the classic workflow group was under 1%.

Does the champion network pattern work for smaller teams?

Yes, but the ratio changes. For teams under 100, you need 1 champion per 5-10 people. For 100-1,000, 1 per 15-20. For 1,000+, 1 per 25-35. The key is that every agent should know a champion personally -- not just know of one. Personal relationship is what makes peer testimony work.

Last updated: June 14, 2022