Cryptographic Optimization - The Signature Verification Challenge | XRPL Performance & Scaling | XRP Academy - XRP Academy
3 free lessons remaining this month

Free preview access resets monthly

Upgrade for Unlimited
Skip to main content
advanced60 min

Cryptographic Optimization - The Signature Verification Challenge

Learning Objectives

Analyze signature verification costs for ECDSA (secp256k1) and Ed25519 algorithms used by XRPL

Design parallel verification strategies that leverage multi-core processors effectively

Evaluate hardware acceleration options including GPUs, FPGAs, and HSMs

Calculate ROI for cryptographic optimizations based on throughput requirements

Plan HSM integration for institutional deployments requiring hardware security

Every XRPL transaction carries a digital signature proving the sender authorized it. Verifying this signature requires:

  1. **Hash the transaction** (SHA-512 half): Fast (~1 microsecond)
  2. **Verify the signature** against public key: Slow (~0.3-1 millisecond)
  • 1,500 × 0.5ms = 750ms of CPU time per second
  • That's 75% of one CPU core just for signatures

The signature verification rate determines the maximum transaction throughput a single-threaded validator can achieve.

This lesson explores how to break through that limit.


XRPL supports two signature algorithms:

ECDSA with secp256k1:

Algorithm: Elliptic Curve Digital Signature Algorithm
Curve: secp256k1 (same as Bitcoin)
Key size: 256 bits
Signature size: 64-72 bytes (DER encoded)

- Older, more widely deployed
- Slower than Ed25519
- Requires careful implementation (k-value handling)
- Standard in cryptocurrency

- Sign: ~200μs
- Verify: ~500-800μs
- Throughput: ~1,500 verifications/second

**Ed25519:**

Algorithm: Edwards-curve Digital Signature Algorithm
Curve: Curve25519
Key size: 256 bits
Signature size: 64 bytes (fixed)

  • Modern, designed for performance

  • Faster than ECDSA

  • Simpler implementation (no k-value)

  • Deterministic signatures

  • Sign: ~50μs

  • Verify: ~150-200μs

  • Throughput: ~5,000-6,000 verifications/second

Operation          | ECDSA (secp256k1) | Ed25519  | Ratio
-------------------|-------------------|----------|-------
Key generation     | ~100μs            | ~30μs    | 3.3×
Signing            | ~200μs            | ~50μs    | 4×
Verification       | ~600μs            | ~170μs   | 3.5×
Batch verify (100) | ~50ms             | ~8ms     | 6×
  • ECDSA: ~1,600 verifications/second
  • Ed25519: ~5,800 verifications/second
Key Concept

Key Insight

Ed25519 is 3-4× faster than ECDSA for all operations. New XRPL accounts should prefer Ed25519.

Multi-signature transactions multiply verification costs:

Signature Config | Verifications | Time (ECDSA) | Time (Ed25519)
-----------------|---------------|--------------|---------------
1-of-1           | 1             | 0.6ms        | 0.17ms
2-of-3           | 2             | 1.2ms        | 0.34ms
3-of-5           | 3             | 1.8ms        | 0.51ms
5-of-8           | 5             | 3.0ms        | 0.85ms

- 1-of-1 ECDSA: 1,600 TPS
- 3-of-5 ECDSA: 550 TPS
- 1-of-1 Ed25519: 5,800 TPS
- 3-of-5 Ed25519: 1,950 TPS

**Institutional Impact:** Many institutional accounts use multi-sig for security. This dramatically increases verification load and makes Ed25519 adoption even more important.

---

Most XRPL transactions are independent—verifying one signature doesn't require results from another:

Transaction Set in Ledger:
┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
│ Tx1 │ │ Tx2 │ │ Tx3 │ │ Tx4 │ │ Tx5 │
└──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘
   │       │       │       │       │
   ▼       ▼       ▼       ▼       ▼
┌──────────────────────────────────────┐
│     Parallel Signature Verification   │
│  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐     │
│  │Core1│ │Core2│ │Core3│ │Core4│     │
│  │ Tx1 │ │ Tx2 │ │ Tx3 │ │ Tx4 │     │
│  └─────┘ └─────┘ └─────┘ └─────┘     │
│           Core1 then Tx5              │
└──────────────────────────────────────┘

Result: 5 transactions verified in ~1.2× single-verify time
```

Optimal thread pool configuration:

CPU Cores | Worker Threads | Efficiency | Notes
----------|----------------|------------|------------------
4         | 4              | ~3.5×      | Good for testing
8         | 8              | ~7×        | Production minimum
16        | 14-16          | ~13×       | Leave cores for I/O
32        | 28-30          | ~25×       | Diminishing returns
64        | 50-56          | ~45×       | Enterprise scale

Recommended: (CPU cores - 2) worker threads
Reserve 2 cores for consensus, networking, I/O

Implementation Pattern:

// Pseudocode for parallel verification
class SignatureVerificationPool {
  constructor(numWorkers) {
    this.pool = new ThreadPool(numWorkers);
    this.pending = new Queue();
  }

async verifyBatch(transactions) {
    // Distribute transactions across workers
    const futures = transactions.map(tx => 
      this.pool.submit(() => verifySignature(tx))
    );

// Wait for all verifications
    const results = await Promise.all(futures);

// Return map of tx -> valid/invalid
    return new Map(
      transactions.map((tx, i) => [tx.hash, results[i]])
    );
  }
}
Parallelization Results (Ed25519):

Cores | Single-Thread | Parallel    | Speedup | Efficiency
------|---------------|-------------|---------|------------
1     | 5,800 TPS     | 5,800 TPS   | 1.0×    | 100%
4     | 5,800 TPS     | 21,500 TPS  | 3.7×    | 93%
8     | 5,800 TPS     | 41,000 TPS  | 7.1×    | 89%
16    | 5,800 TPS     | 75,000 TPS  | 12.9×   | 81%
32    | 5,800 TPS     | 130,000 TPS | 22.4×   | 70%

- Memory bandwidth contention
- Thread synchronization overhead
- Cache coherency traffic
Key Concept

Key Insight

An 8-core server with Ed25519 and parallel verification can handle ~40,000 signature verifications per second—far more than XRPL's current 1,500 TPS capacity. The signature bottleneck is solvable with standard hardware.


GPUs excel at parallel cryptographic operations:

GPU vs CPU Comparison (Ed25519):

Device               | Verify/sec | Power (W) | $/verify
---------------------|------------|-----------|----------
Intel i7 (8 core)    | 41,000     | 65        | Baseline
NVIDIA RTX 3080      | 2,000,000  | 320       | 0.01×
NVIDIA A100          | 5,000,000  | 400       | 0.005×
AMD MI100            | 4,000,000  | 300       | 0.006×

GPU provides 50-100× throughput improvement.

GPU Verification Architecture:

┌──────────────────────────────────────────────────┐
│                    Host CPU                       │
│  ┌────────────────────────────────────────────┐  │
│  │         Transaction Queue (batched)         │  │
│  └────────────────────────┬───────────────────┘  │
│                           │                       │
│                    PCIe Transfer                  │
│                           │                       │
└───────────────────────────┼──────────────────────┘
                            ▼
┌──────────────────────────────────────────────────┐
│                       GPU                         │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐            │
│  │SM 0  │ │SM 1  │ │SM 2  │ │...   │            │
│  │1024  │ │1024  │ │1024  │ │      │            │
│  │verify│ │verify│ │verify│ │      │            │
│  └──────┘ └──────┘ └──────┘ └──────┘            │
│                                                   │
│        10,000+ parallel verifications             │
└──────────────────────────────────────────────────┘

Batch Size Considerations:

Batch Size | PCIe Overhead | GPU Efficiency | Latency
-----------|---------------|----------------|----------
10         | High (50%)    | Low (20%)      | ~1ms
100        | Medium (15%)  | Medium (60%)   | ~2ms
1,000      | Low (5%)      | High (90%)     | ~5ms
10,000     | Minimal (1%)  | Maximum (98%)  | ~20ms

Optimal batch: 1,000-10,000 signatures
Trade-off: Higher batches = better throughput, higher latency

FPGAs provide dedicated hardware for cryptography:

FPGA vs CPU/GPU:

Aspect        | CPU      | GPU       | FPGA
--------------|----------|-----------|------------
Throughput    | Low      | Highest   | High
Latency       | Low      | Medium    | Lowest
Power         | Medium   | High      | Low
Cost          | Low      | Medium    | High
Flexibility   | High     | Medium    | Low
Development   | Easy     | Medium    | Hard

FPGA ideal for: Low-latency, power-constrained environments
GPU ideal for: Maximum throughput with batch processing

FPGA Performance:

Device              | Verify/sec | Latency | Power
--------------------|------------|---------|-------
Xilinx Alveo U50    | 500,000    | 10μs    | 75W
Intel Stratix 10    | 800,000    | 8μs     | 100W
Xilinx Alveo U250   | 1,200,000  | 12μs    | 150W

- Deterministic, consistent latency
- Lower power than GPU for same throughput
- Can be integrated into network cards (SmartNICs)

For institutional deployments, HSMs provide secure key storage with hardware acceleration:

HSM Performance Comparison:

HSM Model           | RSA/sec | ECDSA/sec | Ed25519/sec | Cost
--------------------|---------|-----------|-------------|--------
Thales Luna 7       | 10,000  | 8,000     | 12,000      | $20K+
AWS CloudHSM        | 2,000   | 1,500     | N/A         | $1.5/hr
Utimaco CryptoServer| 15,000  | 12,000    | 18,000      | $25K+
YubiHSM 2           | 100     | 80        | 150         | $650

Note: HSMs are for signing, not bulk verification.
      Validators verify signatures; HSMs sign transactions.

HSM Integration Architecture:

┌─────────────────────────────────────────────────────┐
│                    Application                       │
│  ┌─────────────┐    ┌─────────────┐                 │
│  │ Transaction │───▶│  HSM Client │                 │
│  │   Builder   │    │   Library   │                 │
│  └─────────────┘    └──────┬──────┘                 │
└────────────────────────────┼────────────────────────┘
                             │ PKCS#11 / Network
                             ▼
┌─────────────────────────────────────────────────────┐
│                        HSM                           │
│  ┌─────────────────────────────────────────────┐   │
│  │         Private Key (never exported)         │   │
│  ├─────────────────────────────────────────────┤   │
│  │            Signing Engine                    │   │
│  │         (Hardware accelerated)               │   │
│  └─────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘

Some signature schemes allow verifying multiple signatures faster than individual verification:

Ed25519 Batch Verification:

  • Time: N × 170μs = 170N μs

  • Time: 170μs + (N × 20μs) = 170 + 20N μs

  • Standard: 17,000μs

  • Batch: 2,170μs

  • Speedup: 7.8×

  • Batch verification combines multiple scalar multiplications

  • Single multi-scalar multiplication is faster than N separate ones

  • Requires all-or-nothing result (if one fails, batch fails)

Implementation Considerations:

async function batchVerify(signatures) {
  // Try batch verification first
  const batchResult = await ed25519BatchVerify(signatures);

if (batchResult.success) {
    // All signatures valid
    return signatures.map(s => ({ ...s, valid: true }));
  } else {
    // At least one invalid - fall back to individual
    // Could use binary search to find invalid signatures faster
    return Promise.all(signatures.map(s => 
      individualVerify(s)
    ));
  }
}

ECDSA Batch Verification:

ECDSA does not support efficient batch verification.
Each signature must be verified individually.
This is another reason to prefer Ed25519.

Proposed Amendment: Aggregate Signatures

Current: Each transaction has individual signature(s)
Proposed: Multiple transactions share aggregate signature

- 100 transactions from different accounts
- Instead of 100 signatures (6,400 bytes)
- Single aggregate signature (64 bytes)
- Single verification instead of 100

- Bandwidth: 99% reduction in signature data
- CPU: 99% reduction in verification time
- Storage: Smaller ledgers

Status: Theoretical - requires protocol amendment
Timeline: Years away if ever implemented

Step 1: Identify Your Bottleneck

Current TPS | CPU Util | Memory | I/O   | Bottleneck
------------|----------|--------|-------|------------
<500        | <30%     | Low    | Low   | Not signatures
500-1,000   | 50-70%   | Medium | Low   | Maybe signatures
1,000-1,500 | 70-90%   | Medium | Medium| Likely signatures
>1,500      | >90%     | High   | High  | Definitely signatures

Step 2: Calculate Optimization ROI

Optimization      | Cost     | TPS Gain  | $/TPS  | Payback
------------------|----------|-----------|--------|----------
Ed25519 migration | Dev time | 3-4×      | Low    | Immediate
Parallel (8 core) | $500     | 7×        | $0.03  | Immediate
GPU (RTX 3080)    | $1,500   | 50×       | $0.04  | Weeks
FPGA (Alveo U50)  | $5,000   | 25×       | $0.10  | Months
HSM (Luna 7)      | $20,000  | N/A*      | N/A    | Security req

*HSM is for signing security, not verification throughput

Path for Growing Throughput Needs:

Stage 1: Software Optimization (Free-$500)
├── Migrate accounts to Ed25519 where possible
├── Implement parallel verification
├── Optimize thread pool configuration
└── Target: 20,000-40,000 verifications/second

Stage 2: Hardware Upgrade ($1,000-$5,000)
├── Upgrade to higher core count CPU
├── Add GPU for batch verification
├── Optimize memory/storage for verification pipeline
└── Target: 100,000-500,000 verifications/second

Stage 3: Specialized Hardware ($10,000+)
├── Deploy FPGA accelerators
├── Consider custom ASIC (very high volume)
├── Multi-GPU configurations
└── Target: 1,000,000+ verifications/second

Stage 4: Protocol Level (Future)
├── Batch verification amendments
├── Aggregate signature support
├── Algorithm upgrades (post-quantum?)
└── Target: Millions of verifications/second
5-Year TCO Comparison (Target: 50,000 TPS capacity)
Solution Capital Annual 5-Year TCO
CPU-only (many servers) $50,000 $30,000 $200,000
CPU + GPU (fewer) $20,000 $15,000 $95,000
FPGA solution $40,000 $10,000 $90,000

GPU solution provides best TCO for high-throughput needs.
```


Parallel verification scales nearly linearly to 8-16 cores with proper implementation

GPU acceleration provides 50-100× improvement for batch verification

Current XRPL throughput is not signature-bound—signatures only become the bottleneck above current capacity

⚠️ GPU integration complexity—no production XRPL GPU verification implementations exist

⚠️ Future algorithm requirements—post-quantum signatures will be slower

📌 Assuming Ed25519 migration is simple—existing accounts can't change algorithms

📌 Ignoring HSM latency for high-frequency signing—HSMs add network round-trip

📌 Betting on batch verification amendments—may never be implemented

Cryptographic signature verification is a solvable bottleneck with known solutions. An 8-core server with proper parallelization handles more than current XRPL capacity. GPU acceleration provides another 50× headroom if needed. The real challenge isn't technical—it's that rippled may need engineering work to fully exploit these optimizations. For most deployments, simply ensuring parallel verification is enabled and using Ed25519 for new accounts is sufficient.


Assignment: Analyze signature verification performance and design an optimization plan.

Requirements:

  • Benchmark signature verification on your hardware

  • Measure single-threaded ECDSA and Ed25519 performance

  • Calculate theoretical maximum TPS

  • Design thread pool configuration for your CPU

  • Estimate expected speedup with parallel verification

  • Identify synchronization overhead

  • Research GPU options for your budget

  • Calculate batch sizes and latency trade-offs

  • Estimate TCO for different solutions

  • For a target of 10,000 TPS, specify:

  • Accurate performance measurements (25%)

  • Sound parallelization design (25%)

  • Realistic hardware evaluation (25%)

  • Practical, cost-effective recommendation (25%)

Time investment: 3-4 hours


1. How much faster is Ed25519 signature verification compared to ECDSA secp256k1?

A) 10% faster
B) 50% faster
C) 3-4× faster
D) 10× faster

Correct Answer: C


2. An 8-core server achieves 41,000 Ed25519 verifications/second with parallel processing. What's the efficiency compared to theoretical 8× speedup?

A) 100% (perfect scaling)
B) ~88% (41,000 / 46,400)
C) ~50% (significant overhead)
D) Cannot be calculated

Correct Answer: B


3. Why can't existing ECDSA XRPL accounts switch to Ed25519?

A) Ed25519 is not supported on mainnet
B) The signature algorithm is determined by the account's key, which cannot be changed
C) Ed25519 requires more storage
D) Regulatory restrictions prevent algorithm changes

Correct Answer: B


4. What is the primary advantage of FPGA over GPU for signature verification?

A) Higher throughput
B) Lower cost
C) Deterministic low latency
D) Easier programming

Correct Answer: C


5. At what point does GPU acceleration become cost-effective for XRPL deployment?

A) Any deployment should use GPU
B) When throughput requirements exceed what CPU parallelization provides (~50,000+ TPS)
C) Only for mainnet validators
D) GPU is never cost-effective for signatures

Correct Answer: B


  • Daniel Bernstein, "Ed25519: high-speed high-security signatures"
  • SafeCurves project (comparison of elliptic curves)
  • libsodium documentation (Ed25519 implementation)
  • NVIDIA CUDA cryptography libraries
  • Intel SGX and cryptographic acceleration
  • Xilinx Vitis cryptography examples
  • XRPL documentation on cryptographic keys
  • rippled source code (crypto module)

For Next Lesson:
Lesson 7 covers Network Optimization—reducing propagation latency and bandwidth usage.


End of Lesson 6

Total words: ~6,500
Estimated completion time: 60 minutes reading + 3-4 hours for deliverable

Key Takeaways

1

Ed25519 beats ECDSA by 3-4×

for all operations. New accounts should use Ed25519; legacy ECDSA accounts can't be migrated.

2

Parallel verification is the first optimization

: 8 cores provide ~7× throughput improvement with minimal cost. This should be enabled by default.

3

GPU acceleration provides 50-100× improvement

for batch verification but adds latency. Appropriate for high-throughput, latency-tolerant applications.

4

HSMs are for security, not speed

: Institutional deployments use HSMs to protect private keys, not to accelerate verification.

5

Optimization follows Amdahl's Law

: Once signatures aren't the bottleneck, optimizing them further provides diminishing returns. Know when to stop. ---