advanced•55 min

Validator & Node Optimization - Infrastructure Engineering

Name: XRPL Performance & Scaling
Price: 29 USD
Availability: InStock

Learning Objectives

Specify optimal hardware configurations for validators and stock nodes at different scales

Tune operating systems (Linux) for XRPL workloads

Configure rippled for maximum performance and reliability

Design monitoring and alerting for proactive issue detection

Calculate total cost of ownership for different deployment options

Every optimization discussed in previous lessons depends on solid infrastructure. A poorly configured server wastes CPU cycles on unnecessary work, hits I/O bottlenecks prematurely, and fails under load that well-tuned hardware handles easily.

This lesson provides the complete infrastructure playbook—from hardware specs to kernel parameters—for running production XRPL infrastructure.

Tier 1: Development / Testing

Purpose: Local development, testing
Capacity: <50 TPS

CPU: 4+ cores, 2.5 GHz+
Intel i5/Ryzen 5 or equivalent

Memory: 16 GB DDR4

Storage: 500 GB SSD (SATA acceptable)
Consumer NVMe preferred

Network: 100 Mbps

Estimated Cost: $500-1,000 (used/refurbished)

Development environments
Testnet participation
Learning/experimentation

Tier 2: Production Stock Node

Purpose: Serving application traffic, API access
Capacity: 100-500 TPS

CPU: 8+ cores, 3.0 GHz+
Intel Xeon E-series / AMD EPYC

Memory: 64 GB DDR4 ECC

Storage: 2 TB NVMe SSD
Enterprise grade (1+ DWPD)
Samsung PM1733 or equivalent

Network: 1 Gbps dedicated
Low latency to XRPL network

Estimated Cost: $3,000-6,000

Production applications
API service providers
Exchange integrations

Tier 3: Production Validator

Purpose: Consensus participation
Capacity: 500-1,500 TPS

CPU: 16+ cores, 3.5 GHz+
Intel Xeon Gold / AMD EPYC 7003
High single-thread performance important

Memory: 128 GB DDR4 ECC

Storage: 4 TB NVMe SSD
Enterprise grade (3+ DWPD)
RAID 1 for redundancy

Network: 1-10 Gbps dedicated
Multiple ISP redundancy recommended
Low latency global connectivity

Estimated Cost: $10,000-25,000

Mainnet validators
High-availability deployments
Institutional infrastructure

Tier 4: Enterprise / High-Performance

Purpose: Maximum throughput, full history
Capacity: 1,500+ TPS with headroom

CPU: 32+ cores, 4.0 GHz+
Intel Xeon Platinum / AMD EPYC 7003+
Maximum single-thread performance

Memory: 256 GB - 1 TB DDR4 ECC
Enable huge pages

Storage: 10+ TB NVMe SSD
RAID 10 configuration
Enterprise datacenter grade (10+ DWPD)

Network: 10-25 Gbps dedicated
Global anycast capability
DDoS protection

Estimated Cost: $50,000-100,000+

Infrastructure providers
Enterprise deployments
Research/testing at scale

Single-thread performance (signature verification)
Core count (parallel processing)
Cache size (working set fits in L3)

AMD EPYC 7003/9004 series (best value)
Intel Xeon Gold 6300+ series
For maximum single-thread: Intel Xeon W-3300

ARM processors (limited rippled optimization)
Low-clock server chips (many slow cores)
Consumer desktop chips (no ECC, limited lifespan)

ECC (Error Correcting Code) - mandatory for validators
Registered/buffered for large capacity
Speed: 3200 MHz+ DDR4

Active state cache: ~10-20 GB
Transaction processing: ~20-40 GB
OS and overhead: ~10 GB
Headroom: 2× above
Minimum production: 64 GB
Recommended: 128 GB

Populate all channels for maximum bandwidth
Matched DIMMs for dual/quad channel

Sequential write: 3+ GB/s
Random write IOPS: 200K+
Endurance: 1+ DWPD (Drive Writes Per Day)
Power loss protection: Required for validators

Samsung PM1733/PM1735
Intel P5510/P5800X
Micron 9400 series
Kioxia CM6/CM7 series

Consumer NVMe (QLC, low endurance)
SATA SSDs (too slow for high throughput)
HDDs (completely unsuitable)

RAID 1: Basic redundancy
RAID 10: Best performance + redundancy
Hardware RAID controller with BBU/flash cache

Bandwidth:

Traffic analysis at different loads:

TPS    | Inbound    | Outbound   | Total
-------|------------|------------|--------
20     | 50 Kbps    | 200 Kbps   | 250 Kbps
100    | 250 Kbps   | 1 Mbps     | 1.25 Mbps
500    | 1.25 Mbps  | 5 Mbps     | 6.25 Mbps
1,500  | 3.75 Mbps  | 15 Mbps    | 18.75 Mbps

- Minimum: 10× expected peak traffic
- Stock node: 100 Mbps - 1 Gbps
- Validator: 1 Gbps - 10 Gbps

Latency Requirements:

Target latencies to major XRPL hubs:

Location       | Target RTT | Impact
---------------|------------|------------------
US East        | <20ms      | Fastest consensus
US West        | <50ms      | Good
Europe         | <100ms     | Acceptable
Asia-Pacific   | <150ms     | Workable
Global average | <200ms     | Should be below

- Higher latency = delayed proposal receipt
- May cause transaction omission from ledgers
- Not disqualifying but suboptimal

---

Network Tuning (/etc/sysctl.conf):

# Increase network buffer sizes
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 31457280
net.core.wmem_default = 31457280

TCP buffer sizes

net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728

Connection handling

net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535

TCP keepalive (for WebSocket connections)

net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 10

Enable TCP fast open

net.ipv4.tcp_fastopen = 3

Disable slow start after idle

net.ipv4.tcp_slow_start_after_idle = 0
```

Memory Tuning:

# Virtual memory
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.vfs_cache_pressure = 50

Huge pages (calculate based on memory)

For 128GB system with 64GB for huge pages:

vm.nr_hugepages = 32768

Memory overcommit

vm.overcommit_memory = 1
vm.overcommit_ratio = 80
```

File System Tuning:

# Increase file descriptor limits
fs.file-max = 2097152
fs.nr_open = 2097152

Inotify limits (for monitoring)

fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 512
```

# For rippled user
rippled soft nofile 1000000
rippled hard nofile 1000000
rippled soft nproc 65535
rippled hard nproc 65535
rippled soft memlock unlimited
rippled hard memlock unlimited

Core dumps for debugging

rippled soft core unlimited
rippled hard core unlimited
```

# For NVMe drives, use 'none' (no scheduler)
echo "none" > /sys/block/nvme0n1/queue/scheduler

For SATA SSDs, use 'mq-deadline'

echo "mq-deadline" > /sys/block/sda/queue/scheduler

Increase queue depth

echo 1024 > /sys/block/nvme0n1/queue/nr_requests

Disable NCQ for problematic drives (rarely needed)

echo 1 > /sys/block/sda/device/queue_depth

# Recommended mount options for XRPL data
UUID=xxxx /var/lib/rippled ext4 defaults,noatime,nodiratime,discard,barrier=0 0 2

For XFS (alternative)

UUID=xxxx /var/lib/rippled xfs defaults,noatime,nodiratime,discard,allocsize=64k 0 2

Notes:

noatime/nodiratime: Don't update access times

discard: Enable TRIM for SSD

barrier=0: Disable barriers if using BBU RAID (careful!)

allocsize: XFS allocation size for large files

Server Section:

[server]
port_rpc_admin_local
port_peer
port_ws_admin_local
port_ws_public

[port_peer]
ip=0.0.0.0
port=51235
protocol=peer

[port_ws_public]
ip=0.0.0.0
port=6006
protocol=wss
admin=

[port_rpc_admin_local]
ip=127.0.0.1
port=5005
protocol=http
admin=127.0.0.1

[port_ws_admin_local]
ip=127.0.0.1
port=6007
protocol=ws
admin=127.0.0.1
```

Performance Section:

[node_size]
huge

Memory for ledger cache

[ledger_history]
full

Or for limited history:

[ledger_history]

256

[fetch_depth]
full

Database settings

[node_db]
type=NuDB
path=/var/lib/rippled/db/nudb
online_delete=256
advisory_delete=0

Or for RocksDB:

[node_db]

type=RocksDB

path=/var/lib/rippled/db/rocksdb

compression=lz4

online_delete=256

Transaction database

[transaction_db]
type=SQLite
path=/var/lib/rippled/db/transaction.db

Temporary database

[temp_db]
type=RocksDB
path=/var/lib/rippled/db/tempdb
```

Network Section:

[peers_max]
50

[peer_private]
0

Fixed peers (reliable well-connected nodes)

[ips_fixed]
s1.ripple.com 51235
s2.ripple.com 51235

Cluster for multiple own servers

[cluster_nodes]

nHUhG...nodepubkey1

nHUhG...nodepubkey2

[sntp_servers]
time.google.com
time.cloudflare.com
time.apple.com
```

Validator Configuration (if validating):

[validator_token]
eyJ2YWxpZGF0aW9uX...your_token_here

[validators_file]
validators.txt

For UNL (usually external file)

[validators]

nHUXe...validator1

nHBta...validator2

# Ledger cache size (adjust based on available RAM)
# Larger = more ledgers in memory = faster access
[ledger_history]
full

For memory-constrained systems:

[ledger_history]

256 State cache settings

auto = rippled calculates based on available RAM

[fetch_depth]
full

SQLite cache

[sqlite]
cache_size=-2097152 # 2GB cache (negative = KB)
```

Pre-flight Check Script:

#!/bin/bash
# pre-flight-check.sh - Verify system is optimized for rippled

echo "=== XRPL Node Pre-flight Check ==="

Check CPU

echo -n "CPU cores: "
nproc
echo -n "CPU frequency: "
lscpu | grep "MHz" | head -1

Check memory

echo -n "Total RAM: "
free -h | grep Mem | awk '{print $2}'

Check storage

echo -n "Storage type: "
cat /sys/block/nvme0n1/queue/rotational 2>/dev/null && echo "NVMe" || echo "Check manually"

Check file limits

echo -n "File descriptor limit: "
ulimit -n

Check kernel parameters

echo "=== Kernel Parameters ==="
sysctl net.core.rmem_max
sysctl vm.swappiness
sysctl fs.file-max

Check disk scheduler

echo -n "Disk scheduler: "
cat /sys/block/nvme0n1/queue/scheduler 2>/dev/null || echo "N/A"

echo "=== Check Complete ==="
```

Node Health:

metrics:
  - name: rippled_server_state
    description: Current server state
    alert_if: != "full"

name: rippled_complete_ledgers

name: rippled_peer_count

name: rippled_uptime

Performance:

metrics:
  - name: rippled_ledger_close_time
    description: Time to close ledger
    warning_if: > 5000  # ms
    alert_if: > 10000

name: rippled_transaction_queue

name: rippled_fetch_duration

Resource Utilization:

metrics:
  - name: cpu_utilization
    warning_if: > 70%
    alert_if: > 90%

name: memory_utilization

name: disk_io_utilization

name: disk_space

name: network_bandwidth

Prometheus + Grafana Setup:

# prometheus.yml
scrape_configs:
  - job_name: 'rippled'
    static_configs:
      - targets: ['localhost:5005']
    metrics_path: '/metrics'
    scheme: 'http'

job_name: 'node_exporter'

Grafana Dashboard Panels:

Dashboard: XRPL Node Health

Server State (stat)
Peer Count (gauge)
Uptime (stat)
Ledger Range (stat)

Ledger Close Time (time series)
Transaction Queue Depth (time series)
Transactions per Second (time series)

CPU Usage (time series)
Memory Usage (time series)
Disk I/O (time series)
Network Traffic (time series)

Active Alerts Table
Alert History

# alerting_rules.yml
groups:
  - name: rippled
    rules:
      - alert: RippledDown
        expr: up{job="rippled"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "rippled is down"

alert: RippledNotSynced

alert: HighLedgerCloseTime

Cloud vs. Bare Metal:

Scenario: Production Validator (Tier 3)

Instance: c6i.4xlarge or equivalent
Storage: 4TB gp3 NVMe
Network: 5TB transfer
Total Monthly: $1,250-1,550
Annual: $15,000-18,600

Hardware: $15,000 (amortized over 4 years)
Colocation: 2U, 10A, 1Gbps
Bandwidth: Included or $100-200
Total Monthly: $700-1,000
Annual: $8,400-12,000

Hardware: $15,000 (amortized over 4 years)
Power: ~300W × $0.12/kWh
Internet: Business 1Gbps
Total Monthly: $550-850
Annual: $6,600-10,200

Testing/Development: Cloud (flexibility)
Production <6 months: Cloud
Production >6 months: Bare metal

Scenario: High-Availability Validator (2 nodes)

| Cloud | Colo | On-Premise
--------------------|--------------|--------------|-------------
Year 1 | $36,000 | $38,000* | $40,000*
Year 2 | $36,000 | $15,000 | $10,000
Year 3 | $36,000 | $15,000 | $10,000
Year 4 | $36,000 | $38,000** | $40,000**

Year 5	$36,000	$15,000	$10,000
5-Year Total	$180,000	$121,000	$110,000
Per Year Average	$36,000	$24,200	$22,000

Includes initial hardware purchase

Cloud provides fastest deployment, highest flexibility
Colo provides best reliability/cost balance
On-premise cheapest but requires expertise

Additional operational costs to consider:

- Part-time monitoring: $5,000-10,000/year
- Full-time ops engineer: $100,000-150,000/year
- On-call coverage: $10,000-20,000/year

- Monitoring (PagerDuty, etc.): $1,000-5,000/year
- Security scanning: $2,000-10,000/year
- Backup services: $1,000-5,000/year

- Cyber insurance: $5,000-20,000/year
- Business continuity planning: Variable

- Small deployment: $10,000-20,000/year
- Medium deployment: $50,000-100,000/year
- Enterprise deployment: $200,000+/year

---

✅ OS tuning provides measurable improvement—kernel parameters affect throughput

✅ Monitoring prevents failures—proactive alerting catches issues early

✅ TCO varies significantly by deployment model—cloud vs. bare metal trade-offs are real

⚠️ Long-term hardware requirements—depends on network growth

⚠️ Cloud cost trajectory—pricing changes frequently

📌 Over-provisioning initially—wasteful if growth doesn't materialize

📌 Ignoring monitoring—problems discovered by users not ops

📌 Single points of failure—no redundancy = inevitable outage

Infrastructure optimization provides the foundation for everything else. A properly configured server handles 2-3× the load of a default installation. The investment in proper hardware, tuning, and monitoring pays for itself in reliability and performance.

Assignment: Create a complete infrastructure specification for an XRPL deployment.

Requirements:

Define your use case and capacity requirements
Specify availability and latency targets
Document compliance/security requirements
Complete BOM (Bill of Materials) with specific parts
Justify each selection
Calculate total hardware cost
OS tuning parameters with explanations
Complete rippled.cfg
Monitoring configuration
5-year TCO calculation
Cloud vs. bare metal comparison
Recommendation with justification
Complete, specific hardware specs (25%)
Correct configuration parameters (25%)
Thorough monitoring plan (25%)
Realistic cost analysis (25%)

Time investment: 3-4 hours

1. Why is ECC memory recommended for validators?

A) ECC is faster
B) ECC prevents bit-flip errors that could cause incorrect consensus
C) ECC uses less power
D) ECC is required by XRPL protocol

Correct Answer: B

2. What's the recommended I/O scheduler for NVMe SSDs running rippled?

A) cfq
B) deadline
C) none (no scheduler)
D) bfq

Correct Answer: C

3. Which node_size setting is appropriate for a production validator?

A) tiny
B) small
C) medium
D) huge

Correct Answer: D

4. At what peer count should alerts trigger?

A) < 100 peers
B) < 50 peers
C) < 10 peers
D) < 5 peers

Correct Answer: C

5. For a 2-year production deployment, which hosting model typically has lowest TCO?

A) Major cloud provider
B) Bare metal colocation
C) Home server
D) They're all the same

Correct Answer: B

rippled documentation on configuration
XRPL Foundation validator guides
Ripple server requirements

Brendan Gregg's Linux performance resources
Red Hat Performance Tuning Guide
Linux kernel documentation

For Next Lesson:
Lesson 10 covers Production Performance Patterns—real-world case studies and lessons learned.

End of Lesson 9

Total words: ~6,000
Estimated completion time: 55 minutes reading + 3-4 hours for deliverable

Key Takeaways

Hardware tiers exist for a reason

: Match your hardware to your actual needs—overprovisioning wastes money, underprovisioning causes failures.

OS tuning is free performance

: Kernel parameters, file limits, and I/O scheduling can improve performance 20-50% with no hardware changes.

rippled configuration matters

: node_size, cache settings, and peer configuration significantly affect behavior.

Monitoring is not optional

: You can't optimize what you don't measure. Instrument everything.

TCO analysis drives smart decisions

: Cloud is fastest to start; bare metal wins long-term for committed deployments. ---

Learning Objectives

Introduction: The Infrastructure Foundation

Section 1: Hardware Specifications

Section 2: Operating System Tuning

TCP buffer sizes

Connection handling

TCP keepalive (for WebSocket connections)

Enable TCP fast open

Disable slow start after idle

Huge pages (calculate based on memory)

For 128GB system with 64GB for huge pages:

Memory overcommit

Inotify limits (for monitoring)

Core dumps for debugging

For SATA SSDs, use 'mq-deadline'

Increase queue depth

Disable NCQ for problematic drives (rarely needed)

echo 1 > /sys/block/sda/device/queue_depth

For XFS (alternative)

Notes:

noatime/nodiratime: Don't update access times

discard: Enable TRIM for SSD

barrier=0: Disable barriers if using BBU RAID (careful!)

allocsize: XFS allocation size for large files

Section 3: rippled Configuration

Memory for ledger cache

Or for limited history:

[ledger_history]

256

Database settings

Or for RocksDB:

[node_db]

type=RocksDB

path=/var/lib/rippled/db/rocksdb

compression=lz4

online_delete=256

Transaction database

Temporary database

Fixed peers (reliable well-connected nodes)

Cluster for multiple own servers

[cluster_nodes]

nHUhG...nodepubkey1

nHUhG...nodepubkey2

For UNL (usually external file)

[validators]

nHUXe...validator1

nHBta...validator2

For memory-constrained systems:

[ledger_history]

256

State cache settings

auto = rippled calculates based on available RAM

SQLite cache

Check CPU

Check memory

Check storage

Check file limits

Check kernel parameters

Check disk scheduler

Section 4: Monitoring and Alerting

Section 5: Total Cost of Ownership

Critical Analysis

Deliverable: Infrastructure Specification Document

Assessment Questions

Further Reading & Sources

Key Takeaways