WebSocket Fundamentals - Persistent Connection Patterns | XRPL APIs & Integration | XRP Academy - XRP Academy
3 free lessons remaining this month

Free preview access resets monthly

Upgrade for Unlimited
Skip to main content
intermediate55 min

WebSocket Fundamentals - Persistent Connection Patterns

Learning Objectives

Establish WebSocket connections to XRPL servers using client libraries and raw WebSocket APIs

Implement connection lifecycle handling (connect, disconnect, error states)

Design reconnection logic with exponential backoff and jitter

Manage subscriptions across connection interruptions

Build production-ready connection wrappers that handle real-world network conditions

If you've used REST APIs, WebSocket will feel unfamiliar. REST is stateless—each request is independent, and you don't care what happened before or after. WebSocket is stateful—you establish a connection, maintain it, and must handle what happens when it breaks.

The Core Challenge:

REST API:                    WEBSOCKET:
Request → Response → Done    Connect → Maintain → Handle Failures → Reconnect

Your app must handle:
                             • Connection establishment
                             • Keeping connection alive
                             • Detecting disconnection
                             • Reconnecting automatically
                             • Resubscribing after reconnect
                             • Handling messages during reconnection
                             • Correlating responses to requests

Most XRPL tutorials show connection code that works perfectly in demos:

// The "hello world" that breaks in production
const client = new xrpl.Client('wss://s1.ripple.com:51233')
await client.connect()
const response = await client.request({ command: 'server_info' })
console.log(response)
  • Connection succeeds on first try
  • Connection never drops
  • No network interruptions
  • Server is always available

None of these assumptions hold in production. This lesson teaches you to build code that handles reality.


The xrpl.js library abstracts much of WebSocket complexity, but you still need to understand what's happening:

const xrpl = require('xrpl')

async function connect() {
const client = new xrpl.Client('wss://s1.ripple.com:51233')

// Event handlers should be registered BEFORE connect()
client.on('connected', () => {
console.log('Connected to XRPL')
})

client.on('disconnected', (code) => {
console.log(Disconnected with code: ${code})
})

client.on('error', (error) => {
console.error('Connection error:', error)
})

try {
await client.connect()
console.log('Connection established')
return client
} catch (error) {
console.error('Failed to connect:', error)
throw error
}
}
```

Critical Insight: Register event handlers BEFORE calling connect(). If you register them after, you might miss events that fire during connection establishment.

Understanding raw WebSocket helps when debugging or when library behavior isn't what you need:

const WebSocket = require('ws')

function rawConnect(url) {
return new Promise((resolve, reject) => {
const ws = new WebSocket(url)

ws.on('open', () => {
console.log('WebSocket connection opened')
resolve(ws)
})

ws.on('error', (error) => {
console.error('WebSocket error:', error)
reject(error)
})

ws.on('close', (code, reason) => {
console.log(WebSocket closed: ${code} - ${reason})
})

ws.on('message', (data) => {
const message = JSON.parse(data)
console.log('Received:', message)
})
})
}

// Making a request with raw WebSocket
async function rawRequest(ws, command, params = {}) {
const requestId = Date.now() // Simple unique ID

return new Promise((resolve, reject) => {
const timeout = setTimeout(() => {
reject(new Error('Request timeout'))
}, 10000)

const handler = (data) => {
const response = JSON.parse(data)
if (response.id === requestId) {
clearTimeout(timeout)
ws.off('message', handler)
resolve(response)
}
}

ws.on('message', handler)

ws.send(JSON.stringify({
id: requestId,
command: command,
...params
}))
})
}
```

Why Raw WebSocket Matters:

  • Debug library issues
  • Implement in languages without good libraries
  • Understand what libraries are doing under the hood
  • Build custom optimizations when needed

WebSocket connections go through distinct states:

CONNECTION STATE MACHINE:

┌──────────────┐
│  CONNECTING  │ ─────── Connection attempt in progress
└──────┬───────┘
       │ success
       ▼
┌──────────────┐
│     OPEN     │ ─────── Connected, can send/receive
└──────┬───────┘
       │ close event (intentional or error)
       ▼
┌──────────────┐
│    CLOSED    │ ─────── Connection terminated
└──────────────┘

WebSocket readyState values:
0 = CONNECTING
1 = OPEN
2 = CLOSING
3 = CLOSED

Checking Connection State:

function isConnected(ws) {
  return ws && ws.readyState === WebSocket.OPEN
}

function safeRequest(client, request) {
  if (!client.isConnected()) {
    throw new Error('Not connected to XRPL')
  }
  return client.request(request)
}

Connections can hang indefinitely without proper timeouts:

async function connectWithTimeout(url, timeoutMs = 10000) {
  const client = new xrpl.Client(url)

const timeoutPromise = new Promise((_, reject) => {
    setTimeout(() => {
      reject(new Error(`Connection timeout after ${timeoutMs}ms`))
    }, timeoutMs)
  })

try {
    await Promise.race([
      client.connect(),
      timeoutPromise
    ])
    return client
  } catch (error) {
    // Clean up on failure
    try {
      await client.disconnect()
    } catch (disconnectError) {
      // Ignore disconnect errors during cleanup
    }
    throw error
  }
}

Connections drop for many reasons:

DISCONNECTION CAUSES:

Network Issues
• Internet connectivity lost
• Route changes (mobile networks)
• Network congestion/packet loss
• Firewall/proxy interruptions

Server Side
• Server maintenance/restart
• Load balancer rotation
• Server overload
• Idle connection cleanup

Client Side
• Application backgrounded (mobile)
• System sleep/hibernate
• Resource constraints
• Intentional disconnect

Protocol Level
• Ping/pong timeout (no heartbeat response)
• TLS/SSL issues
• Message size exceeded
• Protocol violations
Key Concept

Key Insight

Disconnections are not errors to prevent—they're normal events to handle gracefully.

xrpl.js fires a disconnected event:

client.on('disconnected', (code) => {
  console.log(`Disconnected with code: ${code}`)

// Common close codes:
  // 1000 - Normal closure
  // 1001 - Going away (server shutdown)
  // 1006 - Abnormal closure (no close frame)
  // 1011 - Server error
  // 1012 - Server restart

if (code === 1000) {
    console.log('Clean disconnect, likely intentional')
  } else {
    console.log('Unexpected disconnect, should reconnect')
    initiateReconnect()
  }
})

Close Codes Matter:

CODE    MEANING                 ACTION
────────────────────────────────────────────
1000    Normal closure          Don't reconnect (intentional)
1001    Server going away       Reconnect to different server
1006    Abnormal closure        Reconnect with backoff
1011    Server error            Wait longer, then reconnect
1012    Server restart          Wait, then reconnect
1013    Try again later         Wait, then reconnect

TCP connections can become "half-open"—one side thinks it's connected, the other has disconnected. Without active checking, you might not know your connection is dead.

How rippled Handles This:

rippled implements WebSocket ping/pong frames. The server sends periodic ping frames; your client must respond with pong. If the server doesn't receive pong, it closes the connection.

Client-Side Heartbeat:

class HeartbeatManager {
  constructor(client, intervalMs = 30000) {
    this.client = client
    this.intervalMs = intervalMs
    this.heartbeatTimer = null
    this.lastPong = Date.now()
    this.missedPongs = 0
    this.maxMissedPongs = 3
  }

start() {
    this.heartbeatTimer = setInterval(async () => {
      try {
        // server_info is a lightweight way to check connection
        const start = Date.now()
        await this.client.request({ command: 'ping' })
        this.lastPong = Date.now()
        this.missedPongs = 0

const latency = this.lastPong - start
        console.log(`Heartbeat OK, latency: ${latency}ms`)
      } catch (error) {
        this.missedPongs++
        console.warn(`Heartbeat failed (${this.missedPongs}/${this.maxMissedPongs})`)

if (this.missedPongs >= this.maxMissedPongs) {
          console.error('Too many missed heartbeats, connection presumed dead')
          this.client.disconnect()
        }
      }
    }, this.intervalMs)
  }

stop() {
    if (this.heartbeatTimer) {
      clearInterval(this.heartbeatTimer)
      this.heartbeatTimer = null
    }
  }
}

// BAD: Reconnect immediately on disconnect
client.on('disconnected', async () => {
  await client.connect()  // Don't do this!
})
  • If server is down, you hammer it with connection attempts
  • If network is flaky, you create connection storms
  • No backoff means no recovery time
  • Could overwhelm both client and server

The standard pattern: wait longer after each failed attempt.

class ExponentialBackoff {
  constructor(options = {}) {
    this.baseDelay = options.baseDelay || 1000      // Start with 1 second
    this.maxDelay = options.maxDelay || 60000       // Cap at 60 seconds
    this.multiplier = options.multiplier || 2       // Double each time
    this.attempt = 0
  }

getDelay() {
const delay = Math.min(
this.baseDelay * Math.pow(this.multiplier, this.attempt),
this.maxDelay
)
this.attempt++
return delay
}

reset() {
this.attempt = 0
}
}

// Usage
const backoff = new ExponentialBackoff()
// First attempt: 1000ms
// Second: 2000ms
// Third: 4000ms
// Fourth: 8000ms
// Fifth: 16000ms
// ... until maxDelay (60000ms)
```

Pure exponential backoff has a problem: if many clients disconnect simultaneously (server restart), they all reconnect at the same intervals, creating thundering herd.

Add randomness (jitter) to spread out reconnection attempts:

class BackoffWithJitter {
  constructor(options = {}) {
    this.baseDelay = options.baseDelay || 1000
    this.maxDelay = options.maxDelay || 60000
    this.multiplier = options.multiplier || 2
    this.jitterFactor = options.jitterFactor || 0.3  // ±30%
    this.attempt = 0
  }

getDelay() {
    const baseDelay = Math.min(
      this.baseDelay * Math.pow(this.multiplier, this.attempt),
      this.maxDelay
    )

// Add jitter: random value between -jitterFactor and +jitterFactor
    const jitter = baseDelay * this.jitterFactor * (Math.random() * 2 - 1)
    const delay = Math.max(0, baseDelay + jitter)

this.attempt++
    return Math.round(delay)
  }

reset() {
    this.attempt = 0
  }
}

// Example delays with 30% jitter:
// Base 1000ms → actual 700-1300ms
// Base 2000ms → actual 1400-2600ms
// Base 4000ms → actual 2800-5200ms

Here's production-ready reconnection:

class ResilientXRPLClient {
  constructor(url, options = {}) {
    this.url = url
    this.client = null
    this.backoff = new BackoffWithJitter({
      baseDelay: options.baseDelay || 1000,
      maxDelay: options.maxDelay || 60000,
      jitterFactor: options.jitterFactor || 0.3
    })
    this.maxAttempts = options.maxAttempts || 10
    this.currentAttempt = 0
    this.shouldReconnect = true
    this.subscriptions = []  // Track active subscriptions
    this.onReconnect = options.onReconnect || (() => {})
  }

async connect() {
    this.client = new xrpl.Client(this.url)
    this.setupEventHandlers()

try {
      await this.client.connect()
      this.backoff.reset()
      this.currentAttempt = 0
      console.log('Connected to XRPL')
      return this.client
    } catch (error) {
      console.error('Initial connection failed:', error)
      throw error
    }
  }

setupEventHandlers() {
    this.client.on('disconnected', async (code) => {
      console.log(`Disconnected with code: ${code}`)

if (code === 1000 || !this.shouldReconnect) {
        console.log('Clean disconnect, not reconnecting')
        return
      }

await this.attemptReconnect()
    })

this.client.on('error', (error) => {
      console.error('Connection error:', error)
    })
  }

async attemptReconnect() {
    while (this.currentAttempt < this.maxAttempts && this.shouldReconnect) {
      this.currentAttempt++
      const delay = this.backoff.getDelay()

console.log(`Reconnection attempt ${this.currentAttempt}/${this.maxAttempts} in ${delay}ms`)

await this.sleep(delay)

if (!this.shouldReconnect) return

try {
        this.client = new xrpl.Client(this.url)
        this.setupEventHandlers()
        await this.client.connect()

console.log('Reconnected successfully')
        this.backoff.reset()
        this.currentAttempt = 0

// Restore subscriptions
        await this.restoreSubscriptions()

// Notify application of reconnection
        this.onReconnect()

return
      } catch (error) {
        console.error(`Reconnection attempt ${this.currentAttempt} failed:`, error)
      }
    }

console.error('Max reconnection attempts reached')
    throw new Error('Failed to reconnect after maximum attempts')
  }

async restoreSubscriptions() {
    for (const sub of this.subscriptions) {
      try {
        await this.client.request({
          command: 'subscribe',
          ...sub.params
        })
        console.log(`Restored subscription: ${JSON.stringify(sub.params)}`)
      } catch (error) {
        console.error(`Failed to restore subscription:`, error)
      }
    }
  }

// Track subscriptions for restoration after reconnect
  async subscribe(params) {
    const response = await this.client.request({
      command: 'subscribe',
      ...params
    })

// Store subscription for restoration
    this.subscriptions.push({ params })

return response
  }

async disconnect() {
    this.shouldReconnect = false
    if (this.client) {
      await this.client.disconnect()
    }
  }

sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms))
  }

isConnected() {
    return this.client && this.client.isConnected()
  }

async request(req) {
    if (!this.isConnected()) {
      throw new Error('Not connected')
    }
    return this.client.request(req)
  }
}

Subscriptions tell the server to push updates when events occur:

// Subscribe to ledger closings
await client.request({
  command: 'subscribe',
  streams: ['ledger']
})

// Subscribe to specific account transactions
await client.request({
  command: 'subscribe',
  accounts: ['rN7n3473SaZBCG4dFL83w7a1RXtXtbk2D9']
})

// Subscribe to order book changes
await client.request({
  command: 'subscribe',
  books: [{
    taker_gets: { currency: 'XRP' },
    taker_pays: { currency: 'USD', issuer: 'rvYAfWj5gh67oV6fW32ZzP3Aw4Eubs59B' }
  }]
})

Events arrive as messages on the WebSocket connection:

// xrpl.js handles parsing; you register handlers
client.on('ledgerClosed', (ledger) => {
  console.log(`Ledger ${ledger.ledger_index} closed`)
  console.log(`Transactions: ${ledger.txn_count}`)
  console.log(`Time: ${new Date(ledger.ledger_time * 1000 + 946684800000)}`)
})

client.on('transaction', (tx) => {
  console.log(`Transaction: ${tx.transaction.hash}`)
  console.log(`Type: ${tx.transaction.TransactionType}`)
  console.log(`Result: ${tx.meta.TransactionResult}`)
})

When connection drops and reconnects, you miss events that occurred during the gap:

Timeline:
──────────────────────────────────────────────────────────
t=0     Connected, subscribed to account
t=10    Ledger 100 closes (you receive notification)
t=15    Ledger 101 closes (you receive notification)
t=20    CONNECTION DROPS
t=21    Ledger 102 closes (you DON'T receive this)
t=25    Ledger 103 closes (you DON'T receive this)
t=30    Reconnected, resubscribed
t=35    Ledger 104 closes (you receive notification)

GAP: You missed ledgers 102 and 103
```

Handling the Gap:

class GapAwareSubscriptionManager {
  constructor(client) {
    this.client = client
    this.lastLedgerSeen = null
    this.lastDisconnectTime = null
  }

async handleReconnect() {
    if (this.lastLedgerSeen && this.lastDisconnectTime) {
      // Query for missed data
      await this.fillGap()
    }

// Resubscribe
    await this.subscribe()
  }

async fillGap() {
    console.log(`Filling gap from ledger ${this.lastLedgerSeen}`)

// Get current validated ledger
    const serverInfo = await this.client.request({ command: 'server_info' })
    const currentLedger = serverInfo.result.info.validated_ledger.seq

if (currentLedger > this.lastLedgerSeen) {
      // Query transactions we might have missed
      const response = await this.client.request({
        command: 'account_tx',
        account: this.monitoredAccount,
        ledger_index_min: this.lastLedgerSeen + 1,
        ledger_index_max: currentLedger
      })

// Process missed transactions
      for (const tx of response.result.transactions) {
        console.log(`Gap fill: Found transaction ${tx.tx.hash}`)
        this.processTransaction(tx)
      }
    }

this.lastLedgerSeen = currentLedger
  }

onLedgerClosed(ledger) {
    this.lastLedgerSeen = ledger.ledger_index
  }

onDisconnect() {
    this.lastDisconnectTime = Date.now()
  }
}
DO:
✓ Track last seen ledger for gap detection
✓ Store subscription parameters for restoration
✓ Handle events idempotently (same event processed twice is safe)
✓ Log subscription state changes for debugging

DON'T:
✗ Assume you'll receive every event (gaps happen)
✗ Subscribe to everything (server may disconnect heavy subscribers)
✗ Process events synchronously if slow (buffer and process async)
✗ Ignore subscription errors (they indicate problems)
```


Here's a full-featured connection manager combining all patterns:

const xrpl = require('xrpl')
const EventEmitter = require('events')

class ProductionXRPLClient extends EventEmitter {
  constructor(servers, options = {}) {
    super()

// Support multiple servers for failover
    this.servers = Array.isArray(servers) ? servers : [servers]
    this.currentServerIndex = 0

this.client = null
    this.options = {
      connectionTimeout: options.connectionTimeout || 10000,
      requestTimeout: options.requestTimeout || 15000,
      heartbeatInterval: options.heartbeatInterval || 30000,
      maxReconnectAttempts: options.maxReconnectAttempts || 10,
      baseReconnectDelay: options.baseReconnectDelay || 1000,
      maxReconnectDelay: options.maxReconnectDelay || 60000,
      ...options
    }

this.state = 'disconnected'
    this.subscriptions = new Map()
    this.lastLedgerIndex = null
    this.heartbeatTimer = null
    this.reconnectAttempts = 0
    this.shouldReconnect = true
  }

get currentServer() {
    return this.servers[this.currentServerIndex]
  }

nextServer() {
    this.currentServerIndex = (this.currentServerIndex + 1) % this.servers.length
    return this.currentServer
  }

async connect() {
    this.shouldReconnect = true
    this.state = 'connecting'

try {
      await this.establishConnection()
      return this
    } catch (error) {
      this.state = 'disconnected'
      throw error
    }
  }

async establishConnection() {
    this.client = new xrpl.Client(this.currentServer, {
      timeout: this.options.requestTimeout
    })

this.setupEventHandlers()

// Connect with timeout
    const connectPromise = this.client.connect()
    const timeoutPromise = new Promise((_, reject) => {
      setTimeout(() => reject(new Error('Connection timeout')), 
        this.options.connectionTimeout)
    })

await Promise.race([connectPromise, timeoutPromise])

this.state = 'connected'
    this.reconnectAttempts = 0

// Start heartbeat
    this.startHeartbeat()

// Restore subscriptions
    await this.restoreSubscriptions()

// Fill any gaps
    await this.fillGaps()

this.emit('connected', { server: this.currentServer })

console.log(`Connected to ${this.currentServer}`)
  }

setupEventHandlers() {
    this.client.on('disconnected', async (code) => {
      this.stopHeartbeat()
      this.state = 'disconnected'

this.emit('disconnected', { code, server: this.currentServer })

if (code !== 1000 && this.shouldReconnect) {
        await this.handleReconnect()
      }
    })

this.client.on('error', (error) => {
      this.emit('error', error)
    })

this.client.on('ledgerClosed', (ledger) => {
      this.lastLedgerIndex = ledger.ledger_index
      this.emit('ledgerClosed', ledger)
    })

this.client.on('transaction', (tx) => {
      this.emit('transaction', tx)
    })
  }

async handleReconnect() {
    this.state = 'reconnecting'

while (this.reconnectAttempts < this.options.maxReconnectAttempts && 
           this.shouldReconnect) {
      this.reconnectAttempts++

// Calculate delay with exponential backoff and jitter
      const baseDelay = Math.min(
        this.options.baseReconnectDelay * Math.pow(2, this.reconnectAttempts - 1),
        this.options.maxReconnectDelay
      )
      const jitter = baseDelay * 0.3 * (Math.random() * 2 - 1)
      const delay = Math.round(baseDelay + jitter)

console.log(`Reconnection attempt ${this.reconnectAttempts}/${this.options.maxReconnectAttempts} in ${delay}ms`)

this.emit('reconnecting', { 
        attempt: this.reconnectAttempts, 
        maxAttempts: this.options.maxReconnectAttempts,
        delay 
      })

await this.sleep(delay)

if (!this.shouldReconnect) return

// Try next server if we've failed on current one
      if (this.reconnectAttempts > 1 && this.servers.length > 1) {
        const newServer = this.nextServer()
        console.log(`Trying alternate server: ${newServer}`)
      }

try {
        await this.establishConnection()
        this.emit('reconnected', { 
          server: this.currentServer,
          attempts: this.reconnectAttempts 
        })
        return
      } catch (error) {
        console.error(`Reconnection attempt ${this.reconnectAttempts} failed:`, error.message)
      }
    }

this.state = 'failed'
    const error = new Error('Max reconnection attempts exceeded')
    this.emit('connectionFailed', error)
    throw error
  }

startHeartbeat() {
    this.heartbeatTimer = setInterval(async () => {
      try {
        await this.client.request({ command: 'ping' })
      } catch (error) {
        console.warn('Heartbeat failed:', error.message)
        // Connection will detect failure and trigger reconnect
      }
    }, this.options.heartbeatInterval)
  }

stopHeartbeat() {
    if (this.heartbeatTimer) {
      clearInterval(this.heartbeatTimer)
      this.heartbeatTimer = null
    }
  }

async subscribe(id, params) {
    const response = await this.client.request({
      command: 'subscribe',
      ...params
    })

// Store for restoration after reconnect
    this.subscriptions.set(id, params)

return response
  }

async unsubscribe(id) {
    const params = this.subscriptions.get(id)
    if (!params) return

await this.client.request({
      command: 'unsubscribe',
      ...params
    })

this.subscriptions.delete(id)
  }

async restoreSubscriptions() {
    for (const [id, params] of this.subscriptions) {
      try {
        await this.client.request({
          command: 'subscribe',
          ...params
        })
        console.log(`Restored subscription: ${id}`)
      } catch (error) {
        console.error(`Failed to restore subscription ${id}:`, error.message)
      }
    }
  }

async fillGaps() {
    // Override in subclass for application-specific gap filling
  }

async request(req) {
    if (this.state !== 'connected') {
      throw new Error(`Cannot make request: client is ${this.state}`)
    }
    return this.client.request(req)
  }

async disconnect() {
    this.shouldReconnect = false
    this.stopHeartbeat()

if (this.client) {
      await this.client.disconnect()
    }

this.state = 'disconnected'
  }

isConnected() {
    return this.state === 'connected'
  }

sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms))
  }
}

module.exports = ProductionXRPLClient
const ProductionXRPLClient = require('./ProductionXRPLClient')

async function main() {
const client = new ProductionXRPLClient([
'wss://s1.ripple.com:51233',
'wss://s2.ripple.com:51233',
'wss://xrplcluster.com'
], {
maxReconnectAttempts: 15,
heartbeatInterval: 30000
})

// Event handlers
client.on('connected', ({ server }) => {
console.log(Connected to ${server})
})

client.on('disconnected', ({ code }) => {
console.log(Disconnected with code ${code})
})

client.on('reconnecting', ({ attempt, maxAttempts, delay }) => {
console.log(Reconnecting (${attempt}/${maxAttempts}) in ${delay}ms)
})

client.on('reconnected', ({ server, attempts }) => {
console.log(Reconnected to ${server} after ${attempts} attempts)
})

client.on('ledgerClosed', (ledger) => {
console.log(Ledger ${ledger.ledger_index} closed)
})

// Connect
await client.connect()

// Subscribe to ledger stream
await client.subscribe('ledger', { streams: ['ledger'] })

// Make requests
const info = await client.request({ command: 'server_info' })
console.log('Server version:', info.result.info.build_version)

// Keep running
process.on('SIGINT', async () => {
console.log('Shutting down...')
await client.disconnect()
process.exit(0)
})
}

main().catch(console.error)
```


Exponential backoff with jitter is the standard: Industry-wide consensus that this prevents thundering herd and resource exhaustion

WebSocket connections require active maintenance: Half-open connections are a real problem; heartbeats are necessary

Subscriptions must survive reconnection: Applications that don't restore subscriptions silently fail

Multiple servers improve reliability: Failover to alternate servers handles single-server outages

⚠️ Optimal heartbeat interval: 30 seconds is common, but depends on network and server configuration

⚠️ Maximum reconnection attempts: 10 is arbitrary; depends on application requirements and user experience

⚠️ Gap detection completeness: Some events may be impossible to recover after extended disconnection

⚠️ xrpl.js reconnection behavior: Library behavior may change between versions; test with your version

🔴 Assuming connection is stable: Production connections WILL drop; code must handle this

🔴 Not tracking subscriptions: After reconnect, you'll have no subscriptions and miss events

🔴 Immediate reconnection: Hammering servers without backoff can get you rate-limited or banned

🔴 Ignoring gaps: Business-critical applications must detect and fill subscription gaps

WebSocket connection management is unglamorous but essential. The code in this lesson isn't exciting—it's defensive programming against network reality. Skip this work at your peril: your demo will work perfectly, and your production system will silently stop receiving events when the network hiccups. Every hour spent on connection resilience saves days of debugging production incidents.


Assignment: Build a production-ready WebSocket client for XRPL.

Requirements:

Part 1: Core Connection Management (40%)

  • Connects with configurable timeout
  • Detects disconnection via close event and heartbeat failure
  • Reconnects with exponential backoff (configurable base, max, jitter)
  • Supports multiple servers with automatic failover
  • Emits events for all state changes

Part 2: Subscription Management (30%)

  • Track active subscriptions by ID
  • Restore all subscriptions after reconnection
  • Support subscribe/unsubscribe methods
  • Log subscription state changes

Part 3: Gap Detection (30%)

  • Track last seen ledger index

  • Detect gaps after reconnection

  • Log detected gaps (actual gap filling is application-specific)

  • Connects to server successfully

  • Reconnects after server-initiated disconnect

  • Backs off appropriately (verify timing)

  • Fails over to alternate server

  • Restores subscriptions after reconnect

  • Detects ledger gaps

  • Correctness of reconnection logic: 30%

  • Proper backoff implementation: 25%

  • Subscription restoration: 25%

  • Code quality and documentation: 20%

Time Investment: 3-4 hours

Submission: JavaScript/TypeScript module with usage example and test results

Value: This client becomes the foundation for all your XRPL applications. Invest in getting it right.


1. Reconnection Strategy (Tests Understanding):

Why is exponential backoff with jitter preferred over fixed-interval reconnection?

A) It's faster—you reconnect more quickly
B) It prevents thundering herd when many clients reconnect simultaneously
C) It's required by the WebSocket specification
D) It uses less bandwidth

Correct Answer: B

Explanation: When a server restarts, all connected clients disconnect simultaneously. With fixed intervals, they all try to reconnect at the same times, overwhelming the server. Exponential backoff spreads attempts over time, and jitter ensures clients don't synchronize on the exponential intervals. Option A is wrong—backoff is slower, not faster. Option C is wrong—this is a best practice, not a requirement. Option D is tangentially related but not the primary reason.


2. Subscription Behavior (Tests Knowledge):

What happens to your WebSocket subscriptions when the connection drops and reconnects?

A) They are automatically restored by the server
B) They persist because WebSocket maintains state
C) They are lost—you must resubscribe after reconnecting
D) They are queued and resume when connection is restored

Correct Answer: C

Explanation: WebSocket subscriptions exist only for the lifetime of a connection. When connection drops, the server forgets your subscriptions. After reconnecting, you have a new connection with no subscriptions. Your application must track subscriptions and restore them after reconnection. The server has no memory of previous connection state.


3. Gap Handling (Tests Critical Thinking):

Your payment monitoring application was disconnected for 30 seconds. During that time, ledgers 1000-1005 closed. After reconnecting, what should your application do?

A) Nothing—subscriptions will catch you up automatically
B) Query account_tx for the missed ledger range to find any transactions
C) Assume no payments were received since you didn't see them
D) Restart the application to ensure clean state

Correct Answer: B

Explanation: Subscriptions only deliver events in real-time; they don't provide history. During disconnection, you missed any events. For payment monitoring, you must query the ledger for transactions that occurred in the gap (ledgers 1000-1005) to ensure you don't miss payments. Option A is wrong—subscriptions don't catch up. Option C could cause you to miss real payments. Option D doesn't address the gap.


4. Heartbeat Purpose (Tests Comprehension):

What problem does implementing a client-side heartbeat solve?

A) It makes the connection faster
B) It detects half-open connections where the client thinks it's connected but the server has disconnected
C) It prevents the server from disconnecting idle clients
D) It reduces bandwidth usage

Correct Answer: B

Explanation: TCP connections can become "half-open"—one side has closed but the other hasn't received notification (network issue, abrupt failure). Without heartbeats, your application might try to use a dead connection indefinitely. Periodic heartbeats (ping/pong or request/response) detect this condition so you can reconnect. Option C is a secondary benefit but not the primary purpose.


5. Server Failover (Tests Application):

Your application is configured with three servers: [A, B, C]. Server A drops your connection. What's the correct failover behavior?

A) Immediately connect to server B
B) Retry server A with backoff, then try B, then C
C) Try all servers simultaneously and use first to connect
D) Alert an operator to manually select a server

Correct Answer: B

Explanation: Proper failover uses backoff before trying alternate servers. The first disconnection might be transient; immediately jumping to B abandons A too quickly. Retry A with backoff first, then cycle through other servers. Option A doesn't use backoff. Option C wastes resources. Option D isn't automated failover.


  • "Release It!" by Michael Nygard (Circuit breaker, bulkhead patterns)
  • AWS Architecture Blog: Exponential Backoff and Jitter

For Next Lesson:
Lesson 3 covers JSON-RPC—the simpler but stateless alternative to WebSocket. We'll examine when stateless requests make more sense and how to implement them efficiently.


End of Lesson 2

Total words: ~5,200
Estimated completion time: 55 minutes reading + 3-4 hours for deliverable


  1. Transforms students from "it works in demo" to "it works in production" mindset
  2. Provides copy-paste production code (the deliverable is genuinely useful)
  3. Covers failure modes before students discover them in production
  4. Establishes patterns used throughout the course
  • "My app stopped receiving payments" → No reconnection logic
  • "Why did my app miss transactions?" → No gap detection
  • "Server is getting hammered" → No backoff
  • "Works on my machine" → No handling of real network conditions

Code Provided:
The lesson provides substantial code. This is intentional—WebSocket resilience is boilerplate that everyone needs. Students should focus on understanding and customizing, not reinventing.

Lesson 3 Setup:
After the complexity of WebSocket, Lesson 3's JSON-RPC will feel refreshingly simple. This contrast helps students appreciate when simpler approaches are appropriate.

Key Takeaways

1

WebSocket requires lifecycle management:

Connect, maintain, detect failures, reconnect—your code must handle the full lifecycle, not just the happy path.

2

Exponential backoff with jitter is mandatory:

Immediate reconnection causes thundering herd problems. Always back off, always add randomness.

3

Subscriptions don't survive disconnection:

You must track what you're subscribed to and restore subscriptions after reconnecting.

4

Gaps happen and must be handled:

During disconnection, you miss events. Production applications detect gaps and query for missed data.

5

Multiple servers provide resilience:

Configure fallback servers. If one is down, try another. Don't rely on a single endpoint. ---