advanced•50 min

Routine Maintenance Procedures

Name: Running an XRPL Validator
Price: 29 USD
Availability: InStock

Learning Objectives

Establish a maintenance schedule covering all routine validator tasks

Execute software updates safely with minimal downtime

Manage logs and database growth to prevent disk exhaustion

Apply security updates promptly while maintaining stability

Document maintenance activities for operational history

The best incident is one that never happens. Regular maintenance prevents:

What Maintenance Prevents:

- Log rotation prevents full disks
- Database cleanup maintains free space
- Monitoring alerts before critical

- Timely updates patch vulnerabilities
- Review catches misconfigurations
- Audit reveals unauthorized changes

- Resource monitoring catches trends
- Cleanup prevents accumulation
- Tuning maintains efficiency

- Proactive updates vs. emergency patches
- Planned restarts vs. crashes
- Scheduled maintenance windows

Maintenance is scheduled, controlled change—far better than unscheduled, uncontrolled failures.

---

Maintenance Types:

- Health check review
- Alert response
- Quick status verification

- Detailed health audit
- Log review
- Resource trend analysis
- Security update check

- Comprehensive system audit
- Performance analysis
- Backup verification
- Documentation update

- Capacity planning
- Security review
- Procedure testing
- Long-term trend analysis

- Software updates (when released)
- Security patches (urgent)
- Incident response
- Configuration changes

Monthly Maintenance Calendar:

- Monday: Weekly health audit
- Ongoing: Daily monitoring

- Monday: Weekly health audit
- Wednesday: Security update review
- Ongoing: Daily monitoring

- Monday: Weekly health audit
- Friday: Monthly comprehensive audit
- Ongoing: Daily monitoring

- Monday: Weekly health audit
- Thursday: Backup verification
- Friday: Documentation update
- Ongoing: Daily monitoring

Scheduling Maintenance:

- Low-traffic periods (varies by use case)
- Your awake/available hours
- Not during major network events

- Peak usage times
- When you can't monitor aftermath
- Multiple changes at once
- Right before vacations/unavailability

- Network doesn't have "low traffic" per se
- Choose times you can monitor closely
- Avoid when major announcements expected
- Have rollback plan ready

---

rippled Update Types:

- Significant changes
- May include breaking changes
- Thorough testing required
- Longer observation period

- New features, improvements
- Should be backward compatible
- Standard testing required
- Normal observation period

- Bug fixes
- Security patches
- Minimal testing needed
- Can expedite if security-critical

# Create update procedure script
sudo nano /opt/ripple/bin/update-rippled.sh

#!/bin/bash
#===============================================================================
# rippled Update Procedure
# Safe update with rollback capability
#===============================================================================

set -e # Exit on error

echo "=============================================="
echo "rippled Update Procedure"
echo "Started: $(date)"
echo "=============================================="

Pre-flight checks

echo ""
echo "=== Pre-Update Checks ==="

Record current version

CURRENT_VERSION=$(/opt/ripple/bin/rippled --version | head -1)
echo "Current version: $CURRENT_VERSION"

Check current state

STATE=$(/opt/ripple/bin/rippled server_info 2>/dev/null | grep -o '"server_state" : "[^"]*"' | cut -d'"' -f4)
echo "Current state: $STATE"

if [ "$STATE" != "proposing" ]; then
echo "WARNING: Not in proposing state. Continue? (y/n)"
read -r response
[ "$response" != "y" ] && exit 1
fi

Backup configuration

echo ""
echo "=== Backing Up Configuration ==="
BACKUP_DIR="/opt/ripple/backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
cp /opt/ripple/etc/rippled.cfg "$BACKUP_DIR/"
cp /opt/ripple/etc/validators.txt "$BACKUP_DIR/" 2>/dev/null || true
echo "Backed up to: $BACKUP_DIR"

Stop service

echo ""
echo "=== Stopping rippled ==="
sudo systemctl stop rippled
sleep 5

Update package

echo ""
echo "=== Updating Package ==="
sudo apt update
sudo apt install --only-upgrade rippled -y

Start service

echo ""
echo "=== Starting rippled ==="
sudo systemctl start rippled

Wait for startup

echo ""
echo "=== Waiting for Synchronization (120 seconds) ==="
sleep 120

Post-update verification

echo ""
echo "=== Post-Update Verification ==="

NEW_VERSION=$(/opt/ripple/bin/rippled --version | head -1)
echo "New version: $NEW_VERSION"

NEW_STATE=$(/opt/ripple/bin/rippled server_info 2>/dev/null | grep -o '"server_state" : "[^"]*"' | cut -d'"' -f4)
echo "New state: $NEW_STATE"

PEERS=$(/opt/ripple/bin/rippled server_info 2>/dev/null | grep -o '"peers" : [0-9]*' | awk '{print $3}')
echo "Peers: $PEERS"

Check success

if [ "$NEW_STATE" = "proposing" ] || [ "$NEW_STATE" = "full" ]; then
echo ""
echo "=== UPDATE SUCCESSFUL ==="
echo "Old version: $CURRENT_VERSION"
echo "New version: $NEW_VERSION"
echo "State: $NEW_STATE"
else
echo ""
echo "=== WARNING: Not in expected state ==="
echo "Current state: $NEW_STATE"
echo "Consider rollback if state doesn't improve"
fi

echo ""
echo "=============================================="
echo "Update complete: $(date)"
echo "Monitor closely for next 24 hours"
echo "=============================================="
```

chmod +x /opt/ripple/bin/update-rippled.sh

# Create rollback script
sudo nano /opt/ripple/bin/rollback-rippled.sh

#!/bin/bash
#===============================================================================
# rippled Rollback Procedure
# Revert to previous version if update fails
#===============================================================================

echo "=============================================="
echo "rippled Rollback Procedure"
echo "=============================================="

Check available versions

echo ""
echo "=== Available Versions ==="
apt-cache policy rippled

echo ""
echo "Enter version to install (e.g., 2.0.0-1):"
read -r VERSION

if [ -z "$VERSION" ]; then
echo "No version specified. Exiting."
exit 1
fi

Confirm

echo ""
echo "This will install rippled version: $VERSION"
echo "Continue? (y/n)"
read -r response
[ "$response" != "y" ] && exit 1

Stop service

echo ""
echo "=== Stopping rippled ==="
sudo systemctl stop rippled

Install specific version

echo ""
echo "=== Installing Version $VERSION ==="
sudo apt install rippled="$VERSION" -y

Restore configuration if needed

echo ""
echo "=== Check if configuration restore needed ==="
ls -la /opt/ripple/backups/ | tail -5
echo "Restore from backup? (y/n)"
read -r restore
if [ "$restore" = "y" ]; then
echo "Enter backup directory name:"
read -r backup_dir
cp "/opt/ripple/backups/$backup_dir/rippled.cfg" /opt/ripple/etc/
fi

Start service

echo ""
echo "=== Starting rippled ==="
sudo systemctl start rippled

echo ""
echo "=== Rollback Complete ==="
echo "Monitor server state and verify operation"
```

chmod +x /opt/ripple/bin/rollback-rippled.sh

# Configure logrotate for rippled
sudo nano /etc/logrotate.d/rippled

/var/log/rippled/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 rippled rippled
    sharedscripts
    postrotate
        systemctl reload rippled > /dev/null 2>&1 || true
    endscript
}

# Create log cleanup script
sudo nano /opt/ripple/bin/cleanup-logs.sh

#!/bin/bash
#===============================================================================
# Log Cleanup Script
# Removes old logs and maintains disk space
#===============================================================================

LOG_DIR="/var/log/rippled"
METRICS_DIR="/var/log/validator-metrics"
RETENTION_DAYS=14

echo "Log Cleanup - $(date)"
echo "========================"

rippled logs

echo "Cleaning rippled logs older than $RETENTION_DAYS days..."
find "$LOG_DIR" -name ".log." -mtime +$RETENTION_DAYS -delete
find "$LOG_DIR" -name "*.gz" -mtime +$RETENTION_DAYS -delete

Metrics files

echo "Cleaning metrics files older than $RETENTION_DAYS days..."
find "$METRICS_DIR" -name "*.json" -mtime +$RETENTION_DAYS -delete

Health check logs

echo "Cleaning health check logs..."
find /var/log -name "validator-health*.log" -mtime +$RETENTION_DAYS -delete

Journal cleanup

echo "Cleaning systemd journal..."
sudo journalctl --vacuum-time=${RETENTION_DAYS}d

Report disk usage

echo ""
echo "Current disk usage:"
df -h /var/log
df -h /var/lib/rippled

echo ""
echo "Cleanup complete"
```

chmod +x /opt/ripple/bin/cleanup-logs.sh

Schedule weekly cleanup

Add to crontab:

0 3 * * 0 /opt/ripple/bin/cleanup-logs.sh >> /var/log/cleanup.log 2>&1
```

# Create log analysis script
sudo nano /opt/ripple/bin/analyze-logs.sh

#!/bin/bash
#===============================================================================
# Log Analysis Script
# Weekly review of rippled logs
#===============================================================================

LOG_FILE="/var/log/rippled/debug.log"

echo "=============================================="
echo "Log Analysis Report - $(date)"
echo "=============================================="

echo ""
echo "=== Error Summary (Last 7 Days) ==="
sudo grep -i "error" "$LOG_FILE" | grep -E "$(date -d '7 days ago' '+%Y-%m')" |
awk '{print $NF}' | sort | uniq -c | sort -rn | head -10

echo ""
echo "=== Warning Summary (Last 7 Days) ==="
sudo grep -i "warning" "$LOG_FILE" | grep -E "$(date -d '7 days ago' '+%Y-%m')" |
awk '{print $NF}' | sort | uniq -c | sort -rn | head -10

echo ""
echo "=== Connection Issues ==="
sudo grep -i "disconnect|connection" "$LOG_FILE" | tail -20

echo ""
echo "=== Validation Messages (Sample) ==="
sudo grep -i "validation" "$LOG_FILE" | tail -10

echo ""
echo "=== Log File Sizes ==="
ls -lh /var/log/rippled/

echo ""
echo "=============================================="
```

chmod +x /opt/ripple/bin/analyze-logs.sh

Database Growth Management:

- Controls ledger history retention
- Higher values = more history = more disk
- Lower values = less history = less disk

- 256: ~20 minutes of history, minimal disk
- 512: ~30 minutes, small footprint
- 2048: ~2 hours, moderate disk
- 32768: ~1 day, significant disk

- 512-2048 typically sufficient
- Full history not required for validation
- Balance history needs with disk space

# Create database monitoring script
sudo nano /opt/ripple/bin/monitor-database.sh

#!/bin/bash
#===============================================================================
# Database Space Monitoring
#===============================================================================

DB_PATH="/var/lib/rippled/db"

echo "Database Space Report - $(date)"
echo "================================"

Total database size

echo ""
echo "=== Database Size ==="
du -sh "$DB_PATH"

Breakdown by directory

echo ""
echo "=== Directory Breakdown ==="
du -sh "$DB_PATH"/*

Disk space available

echo ""
echo "=== Disk Space ==="
df -h "$DB_PATH"

Growth rate (compare to yesterday)

TODAY_SIZE=$(du -sb "$DB_PATH" | awk '{print $1}')
YESTERDAY_FILE="/var/run/db_size_yesterday"

if [ -f "$YESTERDAY_FILE" ]; then
YESTERDAY_SIZE=$(cat "$YESTERDAY_FILE")
GROWTH=$((TODAY_SIZE - YESTERDAY_SIZE))
GROWTH_MB=$((GROWTH / 1024 / 1024))
echo ""
echo "=== Growth Since Yesterday ==="
echo "Growth: ${GROWTH_MB} MB"
fi

echo "$TODAY_SIZE" > "$YESTERDAY_FILE"

Projection

DISK_FREE=$(df "$DB_PATH" | tail -1 | awk '{print $4}')
if [ "$GROWTH" -gt 0 ]; then
DAYS_REMAINING=$((DISK_FREE * 1024 / GROWTH))
echo ""
echo "=== Projection ==="
echo "At current growth rate: ~$DAYS_REMAINING days until disk full"
fi
```

chmod +x /opt/ripple/bin/monitor-database.sh

# If database needs cleanup (changing online_delete)
# This requires careful procedure:

1. Update configuration

sudo nano /opt/ripple/etc/rippled.cfg

Change online_delete to lower value

2. Restart rippled

sudo systemctl restart rippled

3. Monitor deletion progress

Deletion happens gradually, not immediately

WARNING: Don't set online_delete too low

Very low values can cause issues during network stress

# Create security update script
sudo nano /opt/ripple/bin/security-updates.sh

#!/bin/bash
#===============================================================================
# Security Update Script
# Apply security updates with minimal risk
#===============================================================================

echo "Security Update Check - $(date)"
echo "=================================="

Check for security updates

echo ""
echo "=== Available Security Updates ==="
apt list --upgradable 2>/dev/null | grep -i security

Count updates

SECURITY_UPDATES=$(apt list --upgradable 2>/dev/null | grep -ci security)

if [ "$SECURITY_UPDATES" -gt 0 ]; then
echo ""
echo "Found $SECURITY_UPDATES security updates"
echo ""
echo "Apply security updates? (y/n)"
read -r response

if [ "$response" = "y" ]; then
echo ""
echo "=== Applying Security Updates ==="
sudo apt update
sudo apt upgrade -y

echo ""
echo "=== Checking if Reboot Required ==="
if [ -f /var/run/reboot-required ]; then
echo "REBOOT REQUIRED"
echo "Schedule reboot at convenient time"
else
echo "No reboot required"
fi
fi
else
echo ""
echo "No security updates available"
fi

echo ""
echo "Update check complete"
```

chmod +x /opt/ripple/bin/security-updates.sh

# Configure automatic security updates
sudo apt install unattended-upgrades -y
sudo dpkg-reconfigure -plow unattended-upgrades

Verify configuration

cat /etc/apt/apt.conf.d/50unattended-upgrades
```

Monthly Security Tasks:

Week 1:
□ Review failed login attempts
□ Check for unauthorized users
□ Verify SSH configuration
□ Review firewall rules

Week 2:
□ Run security scanner (Lynis)
□ Review AIDE integrity report
□ Check certificate expiration
□ Audit cron jobs

Week 3:
□ Review security updates status
□ Check for new vulnerabilities
□ Verify backup encryption
□ Test incident response

Week 4:
□ Update security documentation
□ Review access permissions
□ Check monitoring coverage
□ Plan next month's tasks

# Maintenance Log

Date: YYYY-MM-DD
Time: HH:MM
Operator: Name
Type: [Routine/Update/Emergency/Security]
Duration: X minutes
Description: What was done
Outcome: Result
Notes: Additional observations

After Each Maintenance:

1. Review Procedure

1. Update Documentation

1. Capture Lessons

Track Over Time:

- Time spent on maintenance (hours/month)
- Planned vs. unplanned maintenance ratio
- Update success rate
- Mean time between incidents

- Increasing maintenance time → investigate
- More unplanned maintenance → prevention needed
- Update failures → process improvement
- Frequent incidents → root cause analysis

---

Daily Maintenance (5-10 minutes):

□ Review monitoring dashboard
□ Check for alerts
□ Verify server state = "proposing"
□ Quick resource check (disk, memory)
□ Review any automated reports

If Issues Found:
□ Document in maintenance log
□ Investigate and resolve
□ Update monitoring if needed

Weekly Maintenance (30-60 minutes):

□ Detailed health audit
□ Log analysis (errors, warnings)
□ Resource trend review
□ Security update check
□ Peer connectivity analysis
□ Backup verification spot-check
□ Documentation review

Deliverables:
□ Weekly status summary
□ Any issues documented
□ Next week's planned maintenance

Monthly Maintenance (2-4 hours):

□ Comprehensive system audit
□ Full security review
□ Backup restoration test
□ Performance analysis
□ Capacity planning review
□ Documentation update
□ Procedure verification
□ Incident review (if any)

Deliverables:
□ Monthly maintenance report
□ Updated documentation
□ Next month's maintenance plan
□ Any improvement recommendations

✅ Scheduled maintenance prevents emergencies - Regular updates and cleanup prevent accumulation of issues

✅ Documentation enables consistency - Written procedures ensure maintenance is done correctly regardless of who performs it

✅ Log management prevents disk exhaustion - Without rotation and cleanup, logs fill disks

✅ Security updates are essential - Timely patching prevents exploitation of known vulnerabilities

⚠️ Optimal maintenance frequency - Balance between thoroughness and time investment varies by situation

⚠️ Best update timing - Depends on your monitoring capability and availability

⚠️ Automation extent - Some maintenance benefits from human review; full automation may miss issues

📌 Skipping maintenance - Deferred maintenance accumulates until something breaks

📌 Updates without testing - Direct mainnet updates risk outages

📌 No rollback plan - Updates without rollback capability are high-risk

📌 Undocumented changes - Changes without documentation become troubleshooting obstacles

Maintenance isn't exciting, but it's essential. A validator that receives consistent, documented maintenance will outlast and outperform one that's only touched when something breaks.

Start with the basics: daily quick checks, weekly reviews, monthly audits. As you build comfort, refine your procedures. The goal is sustainable operation—maintenance that's routine, not heroic.

Assignment: Establish a comprehensive maintenance framework for your validator.

Requirements:

Create daily, weekly, monthly checklists
Define maintenance windows
Schedule recurring tasks
Document responsible parties
Document update procedure (script or steps)
Create rollback procedure
Define testing requirements
Document notification process
Configure log rotation
Create cleanup scripts
Document retention policies
Verify disk space monitoring
Create log template
Document recent maintenance activities
Track metrics (time, outcomes)
Plan upcoming maintenance
PDF or Markdown document
Scripts and configurations
Completed checklists
Sample log entries
Comprehensive maintenance schedule (25%)
Working update procedures (25%)
Proper log management (25%)
Documented maintenance activities (25%)

Time investment: 4-6 hours
Value: Sustainable maintenance framework for long-term operation

1. Update Procedure (Tests Process Knowledge):

What should you do BEFORE applying a rippled update to your mainnet validator?

A) Nothing—just apply the update
B) Notify other validators
C) Test the update on testnet and observe for 24-48 hours
D) Back up the database

Correct Answer: C
Explanation: Updates should always be tested on testnet first, with observation for at least 24-48 hours to identify any issues. Only after successful testnet operation should you apply to mainnet. This prevents applying problematic updates to production.

2. Log Management (Tests Technical Knowledge):

Why is log rotation important for validator operation?

A) It makes logs easier to read
B) It prevents disk exhaustion which would cause validator failure
C) It's required by XRPL protocol
D) It improves validation speed

Correct Answer: B
Explanation: Without log rotation, logs grow indefinitely until they fill the disk. A full disk causes rippled to fail, taking down your validator. Log rotation limits log size and removes old logs, preventing disk exhaustion.

3. Maintenance Frequency (Tests Operational Understanding):

What is an appropriate frequency for a comprehensive system audit?

A) Daily
B) Weekly
C) Monthly
D) Annually

Correct Answer: C
Explanation: Monthly comprehensive audits provide thorough review without being excessive. Daily is too frequent for deep audits, weekly is appropriate for regular health checks, and annually is too infrequent to catch developing issues.

4. Rollback Capability (Tests Risk Management):

Why should you maintain rollback capability for updates?

A) To save disk space
B) To revert to a previous version if the update causes problems
C) To comply with regulations
D) To improve performance

Correct Answer: B
Explanation: Rollback capability allows you to quickly revert to a known-good state if an update causes issues. Without rollback, a problematic update requires troubleshooting under pressure. With rollback, you can restore service quickly and troubleshoot at leisure.

5. Documentation Value (Tests Process Understanding):

What is the primary benefit of maintaining a maintenance log?

A) Regulatory compliance
B) Enables trend analysis, troubleshooting, and knowledge transfer
C) Reduces maintenance time
D) Automates maintenance tasks

Correct Answer: B
Explanation: Maintenance logs document what was done, when, and with what outcome. This enables trend analysis (is maintenance time increasing?), troubleshooting (what changed before this problem?), and knowledge transfer (new operators can understand history).

Linux system maintenance best practices
Log management strategies
Change management procedures

Ansible for maintenance automation
Cron best practices
Systemd timer documentation

Runbook creation guides
Change log best practices
IT documentation standards

For Next Lesson:
With maintenance procedures established, Lesson 15 will cover troubleshooting and incident response—how to diagnose and resolve issues when they occur.

End of Lesson 14

Total words: ~5,100
Estimated completion time: 50 minutes reading + 4-6 hours implementation

Key Takeaways

Scheduled maintenance prevents unscheduled outages

—regular updates, cleanup, and review prevent accumulation of issues that cause incidents.

Test updates on testnet first

—never apply updates directly to mainnet; verify on testnet and observe before production deployment.

Maintain rollback capability

—every update should have a documented rollback procedure in case of problems.

Log rotation prevents disk exhaustion

—configure automatic log rotation and periodic cleanup to prevent storage issues.

Document all maintenance activities

—maintenance logs enable trend analysis, troubleshooting, and knowledge transfer. ---

Learning Objectives

Introduction: Maintenance Prevents Emergencies

Section 1: Maintenance Schedule Framework

Section 2: Software Update Procedures

Pre-flight checks

Record current version

Check current state

Backup configuration

Stop service

Update package

Start service

Wait for startup

Post-update verification

Check success

Check available versions

Confirm

Stop service

Install specific version

Restore configuration if needed

Start service

Section 3: Log Management

rippled logs

Metrics files

Health check logs

Journal cleanup

Report disk usage

Schedule weekly cleanup

Add to crontab:

Section 4: Database Maintenance

Total database size

Breakdown by directory

Disk space available

Growth rate (compare to yesterday)

Projection

1. Update configuration

Change online_delete to lower value

2. Restart rippled

3. Monitor deletion progress

Deletion happens gradually, not immediately

WARNING: Don't set online_delete too low

Very low values can cause issues during network stress

Section 5: Security Maintenance

Check for security updates

Count updates

Verify configuration

Section 6: Maintenance Documentation

Entry Format

Log Entries

Section 7: Maintenance Checklist

Critical Analysis

Deliverable: Maintenance Framework

Assessment Questions

Further Reading & Sources

Key Takeaways