Operational Risk Management-Business Continuity and Disaster Recovery
Learning Objectives
Assess operational risks in custody arrangements
Evaluate custodian business continuity capabilities
Develop institutional contingency plans
Design custodian transition procedures
Manage key person and operational dependencies
No system is perfect. Custodians can fail. People can leave. Systems can break. Natural disasters happen. Operational risk management is about preparing for these scenarios—not if they happen, but when.
This lesson provides frameworks for building resilience into your custody operations.
CUSTODY OPERATIONAL RISKS:
- Custodian insolvency
- Custodian operational failure
- Custodian security breach
- Custodian regulatory action
- Custodian service degradation
- System outages
- Data loss
- Cyber attacks
- Integration failures
- Software bugs
- Key person departure
- Institutional knowledge loss
- Fraud or misconduct
- Skills gaps
- Succession failures
- Transaction errors
- Settlement failures
- Authorization breakdowns
- Reconciliation failures
- Communication failures
- Natural disasters
- Political instability
- Regulatory changes
- Market disruption
- Counterparty failures
RISK ASSESSMENT MATRIX:
1. Probability (1-5)
2. Impact (1-5)
3. Detectability (1-5)
4. Current Controls
5. Residual Risk Score
6. Mitigation Actions
PROBABILITY SCALE:
1 - Rare (<1% annually)
2 - Unlikely (1-10%)
3 - Possible (10-25%)
4 - Likely (25-50%)
5 - Almost Certain (>50%)
IMPACT SCALE:
1 - Negligible (<$10K, minor operational)
2 - Minor ($10K-$100K, recoverable)
3 - Moderate ($100K-$1M, significant)
4 - Major ($1M-$10M, severe)
5 - Catastrophic (>$10M, existential)
SAMPLE RISK ASSESSMENT:
Risk: Custodian Insolvency
Probability: 2 (Unlikely)
Impact: 5 (Catastrophic)
Detectability: 3 (Some warning signs)
Risk Score: 2 × 5 = 10
Controls: Diversification, monitoring, contractual
Residual: Medium-High
Mitigations: Multi-custodian, financial monitoring
```
RISK PRIORITIZATION:
- Custodian security breach with asset loss
- Key person departure with sole access
- Total custodian system failure
- Regulatory action blocking access
Action: Immediate mitigation required
Review: Monthly
- Custodian operational degradation
- Technology integration failures
- Transaction processing errors
- Partial system outages
Action: Mitigation plan within 90 days
Review: Quarterly
- Minor reconciliation differences
- Administrative errors
- Temporary service issues
- Documentation gaps
Action: Monitor, address in normal course
Review: Annually
RISK REGISTER:
Maintain register including:
□ Risk identification
□ Assessment scores
□ Control description
□ Residual risk
□ Mitigation status
□ Owner
□ Review date
---
CUSTODIAN BCP/DR ASSESSMENT:
DOCUMENTATION REQUEST:
□ Business Continuity Plan summary
□ Disaster Recovery Plan summary
□ Recovery Time Objectives (RTOs)
□ Recovery Point Objectives (RPOs)
□ Testing schedule and results
□ Incident history
KEY EVALUATION AREAS:
- Geographic diversity of systems?
- Data center redundancy?
- Network redundancy?
- Power backup systems?
Good Indicators:
✅ Multiple data centers
✅ Geographic distribution
✅ Redundant networks
✅ Generator backup
- Backup frequency?
- Backup location diversity?
- Encryption of backups?
- Recovery testing?
Good Indicators:
✅ Real-time replication
✅ Geographically distributed
✅ Encrypted at rest
✅ Regular restoration tests
- Alternate work sites?
- Remote work capability?
- Cross-training of staff?
- Succession planning?
Good Indicators:
✅ Multiple operational sites
✅ Proven remote capability
✅ Role redundancy
✅ Documented succession
```
RECOVERY TIME OBJECTIVES (RTO):
Definition: Maximum acceptable time to restore
service after disruption
Custody RTO Expectations:
Critical Systems (Trading/Withdrawals):
Target: < 4 hours
Rationale: Market access, liquidity
Core Systems (Reporting/Access):
Target: < 24 hours
Rationale: Operational continuity
Support Systems (Analytics/Optimization):
Target: < 72 hours
Rationale: Non-critical to operations
RECOVERY POINT OBJECTIVES (RPO):
Definition: Maximum acceptable data loss
measured in time
Custody RPO Expectations:
Transaction Data:
Target: 0 (no data loss)
Method: Real-time replication
Position Data:
Target: < 1 hour
Method: Frequent snapshots
Historical Data:
Target: < 24 hours
Method: Daily backups
EVALUATION:
- What are your RTOs for custody services?
- What are your RPOs for transaction data?
- When were objectives last tested?
- What were the test results?
- Have objectives ever been invoked for real?
BCP/DR TESTING EVALUATION:
TESTING TYPES:
Discussion-based
Scenario walkthrough
Identify gaps
Minimum acceptable
Specific component testing
Recovery procedure validation
Integration point testing
Better assurance
Complete DR invocation
All systems activated
Real-time processing
Best assurance
EVALUATION QUESTIONS:
□ What types of tests are performed?
□ How frequently?
□ What was tested in most recent test?
□ What were the results?
□ Were any gaps identified?
□ How were gaps remediated?
□ Is there independent verification?
GOOD INDICATORS:
✅ Annual full simulation
✅ Quarterly functional tests
✅ Results documented
✅ Gaps remediated
✅ Third-party verification
CONCERNS:
⚠️ Tabletop only
⚠️ Infrequent testing
⚠️ Unresolved gaps
⚠️ No documentation
⚠️ Never tested for real
---
CUSTODIAN TRANSITION PLAN:
PURPOSE:
Enable orderly transition from one custodian
to another under various scenarios
SCENARIOS:
Service quality issues
Cost optimization
Strategic change
Timeline: 60-120 days
Material service failure
Regulatory action
Financial distress signs
Timeline: 30-60 days
Custodian failure
Asset security concern
Regulatory mandate
Timeline: As fast as possible
TRANSITION PLANNING ELEMENTS:
Alternative Custodian Identification
Asset Inventory
Transfer Procedures
Operational Transition
BACKUP CUSTODIAN STRATEGY:
OPTION 1: COLD STANDBY
Description: Pre-qualified but not active
- Due diligence completed
- Documentation prepared
- Relationship established
- Account not opened
- Lower ongoing cost
- Flexibility
- Multiple options possible
- Slower activation
- Integration not tested
- Relationship untested
OPTION 2: WARM STANDBY
Description: Active but minimal use
- Account opened
- Small position maintained
- Integration tested
- Operational relationship
- Faster transition
- Tested processes
- Active relationship
- Ongoing costs
- Operational overhead
- Multiple relationships
OPTION 3: ACTIVE DIVERSIFICATION
Description: Multiple custodians active
- Multiple active relationships
- Positions distributed
- Full integration
- Ongoing operations
- Immediate resilience
- No transition needed
- Continuous validation
- Highest cost
- Operational complexity
- Reconciliation challenges
RECOMMENDATION:
- Warm standby as minimum
- Active diversification if scale permits
- Cold standby only for cost-constrained
TRANSITION EXECUTION:
PRE-TRANSITION (BEFORE TRIGGER):
Standing Preparation:
□ Backup custodian identified
□ Documentation current
□ Procedures documented
□ Team trained
□ Authority delegated
Monitoring for Triggers:
□ Service degradation tracking
□ Financial news monitoring
□ Regulatory action alerts
□ Industry intelligence
TRANSITION INITIATION:
Decision Point:
□ Trigger event identified
□ Assessment completed
□ Decision documented
□ Authority approval obtained
Immediate Actions:
□ Notify backup custodian
□ Activate account (if not active)
□ Freeze new deposits to primary (if appropriate)
□ Prepare transfer instructions
EXECUTION:
Transfer Process:
□ Submit withdrawal requests
□ Verify delivery addresses
□ Monitor transfer status
□ Confirm receipt at new custodian
□ Reconcile balances
System Transition:
□ Update system configurations
□ Test integrations
□ Verify reporting
□ Update documentation
POST-TRANSITION:
Verification:
□ Complete reconciliation
□ Verify all assets transferred
□ Confirm no assets left behind
□ Update records
Administrative:
□ Close old account (when appropriate)
□ Final fee settlement
□ Record retention
□ Lessons learned documentation
---
KEY PERSON RISK ASSESSMENT:
IDENTIFICATION:
- Custody relationship owner
- Primary operational contact
- Technical integration owner
- Compliance oversight
- Executive sponsor
For Each Key Person:
□ Role criticality
□ Knowledge uniqueness
□ Authority level
□ Backup identified
□ Documentation status
ASSESSMENT MATRIX:
Person/Role Criticality Backup Documentation Risk
────────────────────────────────────────────────────────
CCO High Partial Good Med
Ops Manager High Yes Good Low
Tech Lead High No Limited High
Relationship Medium Yes Good Low
HIGH RISK INDICATORS:
🚩 No backup identified
🚩 Limited documentation
🚩 Unique knowledge
🚩 Single point of failure
🚩 Long tenure without succession
```
KEY PERSON RISK MITIGATION:
- Cross-train team members
- Assign backup for each role
- Rotate responsibilities
- Share relationships
Implementation:
□ Identify backup for each key role
□ Create cross-training plan
□ Include backup in key meetings
□ Document handover procedures
- Procedural documentation
- Relationship documentation
- System access documentation
- Decision rationale documentation
Implementation:
□ Document all key procedures
□ Maintain relationship logs
□ Secure credential management
□ Regular documentation review
- Regular team updates
- Written briefings
- Training sessions
- Institutional memory preservation
Implementation:
□ Monthly knowledge sharing
□ Quarterly team updates
□ Annual procedure review
□ Exit documentation requirement
- Succession planning
- Career development
- Retention strategies
- Notice period requirements
Implementation:
□ Succession plan for key roles
□ Retention incentives
□ Notice period enforcement
□ Staged transitions
DEPENDENCY MANAGEMENT:
- Systems requiring custody data
- Processes depending on custody
- Reports derived from custody
- Stakeholders requiring access
- Custodian systems
- Third-party integrations
- Market infrastructure
- Communication channels
DEPENDENCY MAPPING:
- Identify dependencies
- Assess single points of failure
- Document contingencies
- Test alternatives
Example: NAV Calculation
Custodian position feed
Pricing source
Calculation system
Reporting platform
Custodian feed (if only source)
Pricing source (if single)
Manual position entry backup
Alternative pricing source
Spreadsheet calculation backup
Manual reporting capability
CONTINGENCY TESTING:
TEST TYPES:
Tabletop Exercise:
Frequency: Semi-annual
Participants: Key stakeholders
Scope: Scenario walkthrough
Output: Gap identification
Functional Test:
Frequency: Annual
Participants: Operations + backup
Scope: Specific procedures
Output: Procedure validation
Full Simulation:
Frequency: Bi-annual
Participants: All relevant staff
Scope: End-to-end scenario
Output: Full readiness assessment
TEST SCENARIOS:
24-hour system unavailability
No transaction processing
No position reporting
News of financial distress
Withdrawal concerns
Transition consideration
Primary contact unavailable
No notice period
Immediate absence
Suspected breach at custodian
Asset security uncertain
Immediate response needed
PLAN MAINTENANCE:
REVIEW SCHEDULE:
Quarterly:
□ Contact information updates
□ Procedure minor updates
□ Lesson learned incorporation
□ Test result review
Annually:
□ Full plan review
□ Scenario updates
□ Dependency reassessment
□ Backup custodian validation
Event-Driven:
□ After any incident
□ Organizational changes
□ Custodian changes
□ Regulatory requirements
DOCUMENTATION UPDATES:
After Each Review:
□ Update version number
□ Document changes
□ Distribute updates
□ Archive prior version
□ Update training materials
□ Communicate to stakeholders
CONTINUOUS IMPROVEMENT:
- Test results
- Actual incidents
- Near misses
- Industry learnings
- Regulatory guidance
- Identify improvement opportunity
- Assess feasibility
- Implement change
- Test effectiveness
- Document and train
✅ Contingency planning reduces incident impact - Prepared organizations recover faster
✅ Diversification reduces custodian concentration risk - Multiple providers enhance resilience
✅ Testing validates plans - Untested plans fail in real incidents
✅ Documentation enables continuity - Knowledge captured survives personnel changes
⚠️ Optimal diversification level - Balance between resilience and complexity
⚠️ Transition timeline in crisis - Real-world constraints may differ from plans
⚠️ Test scenario adequacy - Unknown unknowns remain
⚠️ Plan maintenance discipline - Easy to deprioritize
📌 Plans that exist only on paper - Untested, unstaffed, unfunded
📌 Assuming custodian resilience - Single point of failure regardless of their capabilities
📌 Key person dependencies without mitigation - One departure away from crisis
📌 Maintenance neglect - Outdated plans worse than no plans
Operational resilience requires sustained investment—not just in planning, but in testing, maintaining, and actually staffing contingencies. Most institutions have plans; fewer have truly tested and operational contingencies.
Assignment: Develop a custody contingency plan for your institution.
- Part 1: Risk Assessment (1.5 pages)
- Part 2: Custodian Transition Plan (1.5 pages)
- Part 3: Key Person Mitigation (1 page)
- Part 4: Testing Program (1 page)
Format: Professional contingency plan, 5 pages maximum
Time Investment: 4-5 hours
1. What is the primary purpose of a backup custodian strategy?
Answer: B - Enable orderly transition if primary custodian fails or becomes unsuitable
2. What is a Recovery Time Objective (RTO)?
Answer: C - Maximum acceptable time to restore service after disruption
3. Why is contingency testing important?
Answer: A - Untested plans often fail when actually needed
4. How should key person risk be mitigated?
Answer: D - Redundancy, documentation, knowledge transfer, and succession planning
5. How often should contingency plans be reviewed?
Answer: B - Quarterly for minor updates, annually for full review, plus event-driven
End of Lesson 14
Total Words: ~4,200
Estimated Completion Time: 55 minutes reading + 4-5 hours for deliverable
Key Takeaways
Operational risks are real and varied
- Systematic assessment required
Evaluate custodian BCP/DR critically
- Their capabilities affect your resilience
Maintain custodian transition capability
- Backup ready before you need it
Address key person risk proactively
- Redundancy, documentation, succession
Test and maintain continuously
- Plans age; continuous attention required ---