Pilot Program Design Principles
Learning Objectives
Design pilot programs with clear objectives and appropriate scope
Define meaningful success metrics beyond vanity metrics
Structure pilot phases from closed alpha through open pilot
Establish go/no-go criteria that enable honest assessment
Avoid common pilot design mistakes that lead to false conclusions
The purpose of a pilot is to learn—not to prove that CBDC works. This distinction matters enormously.
Proving pilots are designed to generate positive headlines. They select favorable user groups, measure vanity metrics, declare success regardless of results, and proceed to launch unprepared. They fail in production.
Learning pilots are designed to discover problems before they become catastrophic. They test with diverse users, measure adoption and retention honestly, acknowledge failures, and either fix problems or stop. They enable informed decisions.
Every failed CBDC launched without meaningful pilot testing, or with pilots designed to prove rather than learn. This lesson provides the framework for designing pilots that generate genuine insight.
PILOT PROGRAM OBJECTIVES
TECHNOLOGY QUESTIONS:
├── Does the platform perform under real load?
├── Is security adequate against real threats?
├── Do integrations work in production?
├── Can the system scale as designed?
├── What operational issues emerge?
└── Metric: Uptime, latency, security incidents
USER EXPERIENCE QUESTIONS:
├── Can users complete basic tasks?
├── Is onboarding friction acceptable?
├── What support issues emerge?
├── Do users return after first use?
├── What features are unused?
└── Metric: Task completion, support tickets, retention
OPERATIONAL QUESTIONS:
├── Can support handle real user issues?
├── Are fraud patterns detectable?
├── Do reconciliation processes work?
├── Can issues be resolved quickly?
├── What operational costs emerge?
└── Metric: Resolution time, cost per user
BUSINESS CASE QUESTIONS:
├── Will users adopt voluntarily?
├── Can merchant network be built?
├── What does user acquisition cost?
├── Is the value proposition validated?
├── What's the path to scale?
└── Metric: Adoption rate, retention, transaction frequency
THE HONEST QUESTION:
Should we proceed to production, or stop?
```
PILOT SCOPE DIMENSIONS
GEOGRAPHIC SCOPE:
Level 1: Single location/city
├── Easier to manage
├── Local partnerships
├── Limited learning
└── Duration: 3-6 months
Level 2: Multiple locations
├── Tests geographic variation
├── More complex operations
├── Better generalizability
└── Duration: 6-12 months
Level 3: Regional/national
├── Tests scale
├── Significant complexity
├── Pre-production validation
└── Duration: 12-18 months
USER SCOPE:
Level 1: Invited users (100-500)
├── Controlled group
├── High engagement expectation
├── Selection bias risk
└── Phase: Closed alpha
Level 2: Limited registration (1K-10K)
├── Self-selected users
├── Some organic discovery
├── More realistic behavior
└── Phase: Limited beta
Level 3: Open registration (10K-100K)
├── General public access
├── Organic adoption patterns
├── Production-like behavior
└── Phase: Open pilot
TRANSACTION SCOPE:
Level 1: P2P transfers only
├── Simplest use case
├── Limited utility
└── Tests basic functionality
Level 2: P2P + limited merchant
├── Real spending possible
├── Merchant onboarding tested
└── Tests value proposition
Level 3: Full functionality
├── All use cases enabled
├── Production feature set
└── Final validation
VALUE LIMITS:
├── Strict limits initially (e.g., $500 max balance)
├── Loosen as confidence builds
├── Protect against losses
└── Protect against systemic risk
```
VANITY METRICS (Avoid as primary measures)
"Wallets created": 1,000,000
├── Measures: Marketing reach
├── Doesn't measure: Actual usage
├── Reality: 98.5% may never be used (Nigeria)
└── Problem: Success theater
"Total transactions": 10,000,000
├── Measures: Total activity
├── Doesn't measure: Per-user engagement
├── Reality: Could be 100 users transacting 100K times
└── Problem: Concentration hidden
"Value transferred": $1,000,000,000
├── Measures: Volume
├── Doesn't measure: Utility
├── Reality: Could be incentive-driven churning
└── Problem: Unsustainable without incentives
MEANINGFUL METRICS (Track these)
Adoption rate: X% of eligible users
├── Measures: Market penetration
├── Target: 10-30% in pilot area
└── Reality: <5% indicates problem
Activation rate: X% of registered users active
├── Measures: Wallet-to-usage conversion
├── Target: 50%+ activation
└── Reality: <20% indicates onboarding problem
Retention rate: X% still active after 30/60/90 days
├── Measures: Sustained engagement
├── Target: 60%+ 30-day, 40%+ 90-day
└── Reality: <30% indicates value proposition problem
Transaction frequency: X transactions/user/month
├── Measures: Engagement depth
├── Target: 4+ transactions/month for regular users
└── Reality: <2 indicates limited utility
Net Promoter Score: Would users recommend?
├── Measures: User satisfaction
├── Target: NPS > 30 (good), > 50 (excellent)
└── Reality: Negative NPS indicates serious problems
Organic growth: Users acquired without marketing
├── Measures: Word-of-mouth
├── Target: 20%+ of new users organic
└── Reality: 0% organic indicates no value
```
PILOT MEASUREMENT FRAMEWORK
DAILY METRICS (Operational health):
├── Active users
├── Transactions processed
├── System uptime
├── Error rates
├── Support tickets opened
└── Dashboard: Real-time monitoring
WEEKLY METRICS (Trend analysis):
├── New registrations
├── Activation rate (this week's signups)
├── Transaction frequency
├── User segments activity
├── Top support issues
└── Report: Weekly progress
MONTHLY METRICS (Strategic review):
├── Retention cohorts (30/60/90 day)
├── NPS survey results
├── Merchant growth
├── Cost per user
├── Feature usage analysis
└── Report: Monthly steering committee
MILESTONE METRICS (Phase gate decisions):
├── Cumulative adoption vs. target
├── Retention vs. threshold
├── Incident history
├── Operational cost vs. budget
├── Stakeholder satisfaction
└── Report: Phase gate recommendation
```
CBDC PILOT PHASES
PHASE 1: CLOSED ALPHA (3 months)
┌─────────────────────────────────────────────────────────────┐
│ Participants: 100-500 invited users │
│ Selection: Central bank staff, bank partners, trusted users│
│ Environment: Production-like but isolated │
│ Support: High-touch, dedicated team │
│ Purpose: Find critical bugs before wider exposure │
│ │
│ Success criteria: │
│ □ No critical security incidents │
│ □ Core user journeys complete >90% │
│ □ System uptime >99% │
│ □ Key bugs identified and fixed │
│ │
│ Go/No-Go decision point at end │
└─────────────────────────────────────────────────────────────┘
PHASE 2: LIMITED BETA (6 months)
┌─────────────────────────────────────────────────────────────┐
│ Participants: 1,000-10,000 users │
│ Selection: Application/waitlist, bank customer invites │
│ Environment: Full production │
│ Support: Structured but scaling │
│ Purpose: Test operations, gather diverse feedback │
│ │
│ Success criteria: │
│ □ Activation rate >50% │
│ □ 30-day retention >60% │
│ □ Transaction frequency >2/user/month │
│ □ NPS >20 │
│ □ Support scalable │
│ │
│ Go/No-Go decision point at end │
└─────────────────────────────────────────────────────────────┘
PHASE 3: OPEN PILOT (6-12 months)
┌─────────────────────────────────────────────────────────────┐
│ Participants: 10,000-100,000 users │
│ Selection: Open registration (possibly geographic limit) │
│ Environment: Full production at scale │
│ Support: Production support model │
│ Purpose: Validate at scale, refine for launch │
│ │
│ Success criteria: │
│ □ Adoption rate >10% in pilot area │
│ □ 90-day retention >40% │
│ □ Transaction frequency >4/user/month │
│ □ NPS >30 │
│ □ Organic growth >15% │
│ □ Merchant network viable │
│ □ Operational costs sustainable │
│ │
│ Production readiness decision at end │
└─────────────────────────────────────────────────────────────┘
TOTAL PILOT DURATION: 15-21 months minimum
```
PILOT GO/NO-GO FRAMEWORK
AT EACH PHASE GATE:
Option 1: PROCEED
├── Criteria met or exceeded
├── Issues minor and manageable
├── Confidence in next phase
└── Action: Begin next phase
Option 2: PROCEED WITH CONDITIONS
├── Most criteria met
├── Specific issues to address
├── Confidence conditional on fixes
└── Action: Address issues, then proceed
Option 3: EXTEND
├── Insufficient data for decision
├── Need more time to evaluate
├── No critical failures
└── Action: Extend current phase
Option 4: SIGNIFICANT PIVOT
├── Fundamental issues discovered
├── Current approach unlikely to succeed
├── Alternative approach possible
└── Action: Major redesign before proceeding
Option 5: PAUSE
├── External factors require delay
├── Issues require resolution time
├── Not a failure, but not ready
└── Action: Pause, address issues, reassess
Option 6: TERMINATE
├── Fundamental viability questions
├── Cannot achieve success criteria
├── Continued investment not justified
└── Action: End program, document learnings
CRITICAL PRINCIPLE:
Termination is a legitimate outcome
Better to stop than launch a failure
Pilots should enable this decision
---
PILOT DESIGN MISTAKES
MISTAKE 1: SELECTION BIAS
├── What it looks like:
│ └── Pilot with tech-savvy early adopters only
├── Why it's dangerous:
│ └── Success doesn't predict mainstream adoption
├── How to avoid:
│ └── Include diverse demographics, especially target segments
└── Reality check: If your target is unbanked, include unbanked
MISTAKE 2: INCENTIVE DISTORTION
├── What it looks like:
│ └── Heavy cashback/rewards during pilot
├── Why it's dangerous:
│ └── Adoption disappears when incentives end
├── How to avoid:
│ └── Moderate incentives, track organic behavior
└── Reality check: China e-CNY usage drops between promotions
MISTAKE 3: INSUFFICIENT DURATION
├── What it looks like:
│ └── 3-month pilot declared success
├── Why it's dangerous:
│ └── Can't measure retention, habituation
├── How to avoid:
│ └── Minimum 12-18 months across all phases
└── Reality check: 90-day retention requires 90-day pilot minimum
MISTAKE 4: VANITY METRIC FOCUS
├── What it looks like:
│ └── "1 million wallets created!"
├── Why it's dangerous:
│ └── Masks actual usage patterns
├── How to avoid:
│ └── Focus on activation, retention, frequency
└── Reality check: Nigeria's 98.5% unused wallets
MISTAKE 5: CONFIRMATION BIAS
├── What it looks like:
│ └── "Users love it" (based on selected quotes)
├── Why it's dangerous:
│ └── Ignores negative feedback
├── How to avoid:
│ └── Systematic feedback collection, include critics
└── Reality check: Track complaints as carefully as compliments
MISTAKE 6: INFRASTRUCTURE GAP
├── What it looks like:
│ └── Pilot in urban area with perfect connectivity
├── Why it's dangerous:
│ └── Production will include poor-connectivity areas
├── How to avoid:
│ └── Include challenging environments in pilot
└── Reality check: Rural, offline, low-bandwidth testing essential
```
HONEST PILOT DESIGN PRINCIPLES
PRINCIPLE 1: DESIGN FOR DISCONFIRMATION
├── Ask: "What would prove CBDC won't work?"
├── Include: Difficult user segments
├── Measure: Failure indicators, not just success
└── Accept: Negative findings as valuable learning
PRINCIPLE 2: REPRESENTATIVE PARTICIPATION
├── Include: Target demographics, not just easy ones
├── Geographic: Mix of urban, suburban, rural
├── Economic: Range of income levels
└── Technical: Range of digital literacy
PRINCIPLE 3: REALISTIC CONDITIONS
├── Incentives: No more than production plan
├── Support: Scalable model, not white-glove
├── Merchant network: Organic growth, not forced
└── Infrastructure: Real-world conditions
PRINCIPLE 4: LONG ENOUGH TO LEARN
├── Duration: 15-21 months minimum
├── Retention: Must observe 90-day behavior
├── Seasonality: Ideally full year
└── Iteration: Time for improvements
PRINCIPLE 5: TRANSPARENT REPORTING
├── Report: All metrics, not selected ones
├── Acknowledge: Problems and failures
├── Publish: Methodology and limitations
└── Enable: External scrutiny
PRINCIPLE 6: WILLINGNESS TO STOP
├── Pre-define: Termination criteria
├── Commit: To honest assessment
├── Accept: That stopping is success (avoiding worse failure)
└── Prepare: For either outcome
```
PILOT SUPPORT STRUCTURE
TIER 1: SELF-SERVICE
├── In-app help and FAQ
├── Chatbot assistance
├── Community forums
├── Video tutorials
└── Target: 60-70% of issues
TIER 2: BASIC SUPPORT
├── Email/ticket support
├── Chat support (business hours)
├── Response: 24 hours
├── Resolution: 48-72 hours
└── Target: 25-30% of issues
TIER 3: ESCALATED SUPPORT
├── Dedicated support agents
├── Phone support for complex issues
├── Response: 4 hours
├── Resolution: 24 hours
└── Target: 5-10% of issues
TIER 4: CRITICAL ISSUES
├── Incident response team
├── Technical specialists
├── Response: 1 hour
├── Resolution: As needed
└── Target: <1% of issues
SUPPORT METRICS:
├── First response time
├── Resolution time
├── Customer satisfaction
├── Issue categorization
└── Trend analysis
```
PILOT FEEDBACK SYSTEMS
CONTINUOUS FEEDBACK:
├── In-app feedback button
├── Post-transaction rating
├── Support ticket analysis
├── App store reviews
└── Social media monitoring
PERIODIC SURVEYS:
├── Onboarding experience (Day 1)
├── Early usage (Day 7)
├── Engagement survey (Day 30)
├── Retention survey (Day 90)
├── NPS survey (Monthly)
└── Exit survey (For churned users)
QUALITATIVE RESEARCH:
├── Focus groups (Monthly)
├── User interviews (Ongoing)
├── Usability testing (Each release)
├── Merchant interviews (Monthly)
└── Ethnographic observation (Quarterly)
BEHAVIORAL ANALYTICS:
├── User journey analysis
├── Feature usage heatmaps
├── Drop-off point identification
├── Session recordings (anonymized)
└── Cohort analysis
```
✅ Pilot duration matters: Pilots under 12 months cannot measure meaningful retention. Rushed pilots produce unreliable data.
✅ Vanity metrics mislead: "Wallets created" tells you nothing about adoption success. Focus on activation, retention, and frequency.
✅ Selection bias distorts results: Pilots with only early adopters don't predict mainstream behavior.
⚠️ Whether any pilot predicts retail CBDC success: Even well-designed pilots may not predict production adoption, especially for behavior change challenges.
⚠️ Optimal pilot scale: Whether 10K or 100K users provides better signal is context-dependent.
🔴 Confirmation bias design: Designing pilots to prove success rather than enable honest learning.
🔴 Incentive-driven adoption: Heavy incentives during pilot that won't continue in production.
🔴 Proceeding despite warning signs: Ignoring retention or NPS problems because of sunk costs.
Pilots should generate information that enables better decisions—including the decision to stop. A pilot that reveals fatal problems is successful if it prevents a larger failure. Design pilots for learning, not for proving.
Assignment: Design a complete pilot program for a retail CBDC implementation.
Requirements:
Key questions the pilot must answer
Technology, UX, operational, and business case objectives
What would cause you to recommend stopping
Geographic scope and rationale
User selection criteria and target numbers per phase
Transaction scope (what's enabled when)
Value limits and rationale
Metrics for each objective category
Targets for each metric (with rationale)
Measurement methodology
Reporting frequency and format
Three phases with duration, participants, focus
Phase gate criteria (proceed/extend/pivot/stop)
Go/no-go decision process
Top 5 pilot risks
Mitigation strategies
Warning signs to monitor
Objective clarity (20%)
Scope appropriateness (20%)
Metric sophistication (25%)
Phase structure completeness (20%)
Risk awareness (15%)
Time investment: 3-4 hours
Q1: What is the primary purpose of a CBDC pilot?
A) Prove CBDC works B) Generate positive press C) Learn and enable honest decisions D) Acquire users
Answer: C
Q2: Which is a vanity metric?
A) 30-day retention rate B) Wallets created C) Transaction frequency per user D) NPS score
Answer: B
Q3: Minimum recommended total pilot duration?
A) 3 months B) 6 months C) 15-21 months D) 36 months
Answer: C
Q4: What activation rate target indicates healthy pilot?
A) 10% B) 25% C) 50%+ D) 90%
Answer: C
Q5: When should termination be considered a successful pilot outcome?
A) Never B) Only if budget runs out C) When it prevents a larger production failure D) Only after 3 years
Answer: C
End of Lesson 8
Total words: ~4,200
Estimated completion time: 55 minutes reading + 3-4 hours for deliverable
Key Takeaways
Pilots are for learning, not proving
: Design to discover problems, not confirm success.
Measure what matters
: Activation rate, retention, transaction frequency, and NPS—not wallets created or total volume.
Duration must be adequate
: 15-21 months minimum across three phases. 90-day retention requires 90+ day observation.
Include diverse users
: Representative demographics including target segments, not just early adopters.
Termination is a valid outcome
: Define exit criteria upfront. Stopping is success if it prevents larger failure. ---