Network Synchronization and Stock Server Operations
Learning Objectives
Verify complete synchronization with the XRPL network using multiple diagnostic methods
Interpret server_info output to assess server health and synchronization quality
Establish baseline performance metrics for your specific hardware configuration
Execute basic API operations to confirm functional network participation
Determine readiness for validator activation based on stability criteria
You've installed and configured rippled. Your server is running, connecting to peers, and synchronizing with the network. But before becoming a validator—accepting the responsibility of participating in consensus—you need to answer a critical question:
Is this server stable enough to be trusted with validation responsibilities?
A validator that frequently loses synchronization, restarts unexpectedly, or performs erratically damages both your reputation and, to a small degree, network health. The validator community expects operators to demonstrate competence before enabling validation.
This lesson establishes the checkpoint criteria. By the end, you'll know whether your server is ready to proceed—or whether additional troubleshooting is needed first.
Think of this as a "stock server graduation ceremony." Your server must prove itself worthy of promotion to validator status.
A stock server (also called a tracking server) is a rippled instance that:
Stock Server Capabilities:
✓ Connects to the P2P network
✓ Downloads and validates all ledger data
✓ Maintains synchronized ledger state
✓ Can submit transactions to the network
✓ Provides API access for queries
✓ Follows validated ledgers
Stock Server Limitations:
✗ Does NOT participate in consensus voting
✗ Does NOT issue validation messages
✗ Does NOT influence which transactions are included
✗ Has no validator identity or reputation
Running as a stock server before enabling validation provides:
- Verify infrastructure stability without reputation risk
- Establish baseline performance metrics
- Identify and resolve issues in low-stakes environment
- Build familiarity with rippled operations
- Confirm network connectivity and synchronization
- Problems discovered as stock server = no reputation impact
- Problems discovered as validator = reputation damage
- Time invested here saves reputation later
Server States Comparison:
- connected: Initial connection to network
- syncing: Downloading ledger data
- tracking: Following network, may have gaps
- full: Fully synchronized, following consensus
- All above states, plus...
- proposing: Actively participating in consensus
Goal for this lesson:
Server consistently in 'full' state
Ready for 'proposing' after validator enablement
# Check current server state
/opt/ripple/bin/rippled server_info
Look for this in output:
"server_state" : "full",
Alternative: Quick state check
/opt/ripple/bin/rippled server_info | grep server_state
```
Interpreting server_state:
"server_state" : "connected"
→ Problem: Not yet synchronizing
→ Action: Wait or check peer connectivity
"server_state" : "syncing"
→ Status: Normal during initial sync
→ Action: Wait for synchronization to complete
"server_state" : "tracking"
→ Status: Following network but may have gaps
→ Action: Usually transitions to full; if stuck, investigate
"server_state" : "full"
→ Status: Fully synchronized - this is the goal
→ Action: Ready for validation (after stability period)
# Get detailed server info
/opt/ripple/bin/rippled server_info
Key fields to examine:
"complete_ledgers" - Range of ledgers you have
"validated_ledger" - Latest validated ledger info
Understanding complete_ledgers:
"complete_ledgers" : "32570-75892341"
↑ ↑
| Current ledger
Oldest ledger retained
- Server has continuous history from ledger 32570
- Current ledger is around 75892341
- No gaps in this range
- "complete_ledgers" : "empty" → Not synchronized
- Gaps like "32570-100000,100500-75892341" → Missing data
- Very small range → Just started syncing
Verify your server matches the live network:
# Get your validated ledger
/opt/ripple/bin/rippled server_info | grep -A5 "validated_ledger"
# Compare to public network
# Visit: https://livenet.xrpl.org
# Or use public API:
curl -s https://s1.ripple.com:51234 -X POST \
-H "Content-Type: application/json" \
-d '{"method":"server_info"}' | \
python3 -c "import sys,json; d=json.load(sys.stdin); print(d['result']['info']['validated_ledger']['seq'])"
Synchronization Quality Check:
Your validated_ledger.seq: 75892341
Network validated_ledger.seq: 75892345
Difference: 4 ledgers
- Within 10 ledgers: Excellent synchronization
- Within 50 ledgers: Good, slight lag
- Within 200 ledgers: Acceptable, monitor
- More than 200: Investigate connectivity/performance
# Count connected peers
/opt/ripple/bin/rippled peers | grep -c '"address"'
View peer details
/opt/ripple/bin/rippled peers
Examine peer quality (look for)
- Geographic diversity in peer IPs
- Mix of inbound and outbound connections
- Stable connection times (uptime)
Peer Count Guidelines:
Peer Count Assessment:
0-2 peers: Critical - Investigate immediately
3-5 peers: Concerning - May have connectivity issues
6-10 peers: Acceptable - Monitor for stability
11-20 peers: Good - Normal operation
21+ peers: Excellent - Well-connected
- Firewall configuration (port 51235)
- NAT/router configuration
- Geographic location
- Network reputation (new servers may have fewer)
---
# Full server_info output
/opt/ripple/bin/rippled server_info 2>&1 | tee ~/server_info_$(date +%Y%m%d_%H%M%S).jsonKey Fields to Analyze:
{
"info": {
"build_version": "2.x.x",
"complete_ledgers": "32570-75892341",
"hostid": "VALIDATOR",
"io_latency_ms": 1,
"jq_trans_overflow": "0",
"last_close": {
"converge_time_s": 2.001,
"proposers": 35
},
"load_factor": 1,
"peer_disconnects": "123",
"peer_disconnects_resources": "0",
"peers": 21,
"server_state": "full",
"server_state_duration_us": "1234567890",
"state_accounting": { ... },
"uptime": 86400,
"validated_ledger": {
"age": 3,
"hash": "ABC123...",
"seq": 75892341
},
"validation_quorum": 28
}
}Field Interpretations:
build_version: Should be current release
→ Check against https://github.com/XRPLF/rippled/releases
io_latency_ms: Storage latency
→ Should be 1-5ms; higher indicates storage issues
jq_trans_overflow: Job queue overflows
→ Should be "0"; non-zero indicates overload
last_close.converge_time_s: Consensus timing
→ Normal: 2-4 seconds
→ High values may indicate connectivity issues
load_factor: Current load multiplier
→ 1 = normal; higher = server under stress
peer_disconnects_resources: Disconnects due to overload
→ Should be "0" or very low
peers: Connected peer count
→ Target: 10+ for reliable operation
server_state_duration_us: Time in current state
→ Higher = more stable; want millions (hours)
uptime: Server uptime in seconds
→ 86400 = 1 day; want stability over days
validated_ledger.age: Seconds since last validated ledger
→ Should be < 10; higher indicates sync issues
validation_quorum: Required validators for consensus
→ Information only; set by network
Create a comprehensive health check:
sudo nano /opt/ripple/bin/health-check.sh#!/bin/bash
#===============================================================================
# rippled Health Check Script
# Run periodically to verify server health
#===============================================================================
echo "=============================================="
echo "rippled Health Check - $(date)"
echo "=============================================="
Get server info
INFO=$(/opt/ripple/bin/rippled server_info 2>/dev/null)
if [ $? -ne 0 ]; then
echo "ERROR: Cannot connect to rippled"
exit 1
fi
Extract key metrics
STATE=$(echo "$INFO" | grep -o '"server_state" : "[^"]"' | cut -d'"' -f4)
PEERS=$(echo "$INFO" | grep -o '"peers" : [0-9]' | awk '{print $3}')
LEDGER_AGE=$(echo "$INFO" | grep -o '"age" : [0-9]' | head -1 | awk '{print $3}')
IO_LATENCY=$(echo "$INFO" | grep -o '"io_latency_ms" : [0-9]' | awk '{print $3}')
LOAD=$(echo "$INFO" | grep -o '"load_factor" : [0-9]' | awk '{print $3}')
UPTIME=$(echo "$INFO" | grep -o '"uptime" : [0-9]' | awk '{print $3}')
Calculate uptime in human readable format
UPTIME_DAYS=$((UPTIME / 86400))
UPTIME_HOURS=$(((UPTIME % 86400) / 3600))
echo ""
echo "Server State: $STATE"
echo "Peers: $PEERS"
echo "Ledger Age: ${LEDGER_AGE}s"
echo "IO Latency: ${IO_LATENCY}ms"
echo "Load Factor: $LOAD"
echo "Uptime: ${UPTIME_DAYS}d ${UPTIME_HOURS}h"
echo ""
echo "--- Health Assessment ---"
State check
if [ "$STATE" = "full" ]; then
echo "✓ Server state: OK (full)"
else
echo "✗ Server state: WARNING ($STATE)"
fi
Peer check
if [ "$PEERS" -ge 10 ]; then
echo "✓ Peer count: OK ($PEERS peers)"
elif [ "$PEERS" -ge 5 ]; then
echo "! Peer count: MARGINAL ($PEERS peers)"
else
echo "✗ Peer count: LOW ($PEERS peers)"
fi
Ledger age check
if [ "$LEDGER_AGE" -le 10 ]; then
echo "✓ Ledger age: OK (${LEDGER_AGE}s)"
else
echo "✗ Ledger age: HIGH (${LEDGER_AGE}s)"
fi
IO latency check
if [ "$IO_LATENCY" -le 5 ]; then
echo "✓ IO latency: OK (${IO_LATENCY}ms)"
else
echo "✗ IO latency: HIGH (${IO_LATENCY}ms)"
fi
Load check
if [ "$LOAD" -eq 1 ]; then
echo "✓ Load factor: OK"
else
echo "! Load factor: ELEVATED ($LOAD)"
fi
echo ""
echo "=============================================="
Resource usage
echo "--- System Resources ---"
echo "Memory:"
free -h | grep -E "Mem|Swap"
echo ""
echo "Disk:"
df -h /var/lib/rippled/db
echo ""
echo "CPU (rippled):"
ps aux | grep [r]ippled | awk '{printf "CPU: %s%%, MEM: %s%%\n", $3, $4}'
echo "=============================================="
```
# Make executable
sudo chmod +x /opt/ripple/bin/health-check.sh
Run health check
sudo /opt/ripple/bin/health-check.sh
```
# Real-time resource monitoring
top -p $(pgrep rippled)
Memory usage details
ps aux | grep rippled
Expected resource usage (varies by node_size):
- CPU: 5-30% normal, spikes during load
- Memory: 15-40 GB for 'large' node_size
- Should be stable, not continuously growing
Verify your server can handle queries correctly:
# Account info query (uses a known account)
/opt/ripple/bin/rippled account_info rHb9CJAWyB4rj91VRWn96DkukG4bwdtyTh
# Should return account information
# Confirms server can query ledger state
# Server info (already used)
/opt/ripple/bin/rippled server_info
# Fee query
/opt/ripple/bin/rippled fee
# Returns current transaction fee information
# Confirms server tracks network fee state
# Ledger query
/opt/ripple/bin/rippled ledger current
# Returns current ledger information
If you have a funded testnet account, you can test transaction submission:
# Note: This requires a funded account
# For mainnet testing, use a small amount
# For testnet, use testnet faucet funds
# Example: Check account status
/opt/ripple/bin/rippled account_info YOUR_ACCOUNT_ADDRESS
# Transaction submission would use:
# /opt/ripple/bin/rippled submit <signed_transaction>
# Test WebSocket connectivity (requires wscat or similar)
# Install wscat: npm install -g wscat
Connect to admin WebSocket
wscat -c ws://127.0.0.1:6006
Once connected, send:
{"command": "server_info"}
Should receive server info response
Ctrl+C to exit
Create a baseline metrics log:
# Create metrics directory
mkdir -p ~/validator-metrics
# Create metrics collection script
nano ~/validator-metrics/collect-metrics.sh
#!/bin/bash
# Collect metrics for baseline establishment
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
METRICS_FILE=~/validator-metrics/metrics_$TIMESTAMP.txt
echo "Metrics Collection: $TIMESTAMP" > $METRICS_FILE
# Server info
/opt/ripple/bin/rippled server_info >> $METRICS_FILE 2>&1
echo "---" >> $METRICS_FILE
# Peer count
echo "Peer Count: $(/opt/ripple/bin/rippled peers | grep -c '"address"')" >> $METRICS_FILE
# System resources
echo "---" >> $METRICS_FILE
echo "Memory:" >> $METRICS_FILE
free -h >> $METRICS_FILE
echo "---" >> $METRICS_FILE
echo "Disk:" >> $METRICS_FILE
df -h /var/lib/rippled/db >> $METRICS_FILE
echo "Metrics saved to $METRICS_FILE"
chmod +x ~/validator-metrics/collect-metrics.sh# Add to crontab for regular collection
crontab -e
Add this line (collect every hour):
0 * * * * /home/validator/validator-metrics/collect-metrics.sh
Save and exit
After 24-48 hours of collection, establish your baseline:
Expected Baseline Metrics (64GB RAM, NVMe, good network):
- State: "full" consistently
- State duration: > 1 hour between state changes
- Ledger age: < 10 seconds typically
- Complete ledgers: Continuous range, no gaps
- IO latency: 1-5 ms
- Load factor: 1 (normal)
- Peer count: 10-25 typical
- Peer disconnects: Low rate, not increasing rapidly
- CPU: 5-30% average
- Memory: Stable (not continuously growing)
- Disk: Stable growth, predictable
- Hardware specifications
- Geographic location
- Network connectivity
- node_size setting
---
Before enabling validation, verify ALL of the following:
MANDATORY CRITERIA (must meet all):
□ Server state "full" for 24+ continuous hours
□ No unplanned restarts in 48+ hours
□ Peer count stable at 10+ peers
□ Ledger age consistently < 10 seconds
□ IO latency < 10ms average
□ No job queue overflows (jq_trans_overflow = 0)
□ Memory usage stable (not continuously growing)
□ Disk usage predictable and sustainable
For best validator performance, also verify:
RECOMMENDED CRITERIA (strongly advised):
□ Server state "full" for 72+ continuous hours
□ Peer count averaging 15+
□ Ledger age typically < 5 seconds
□ IO latency < 5ms average
□ CPU usage averaging < 25%
□ No error storms in logs
□ Automated monitoring in place
□ Health check script working
Complete this checklist before proceeding to Phase 2:
PHASE 1 COMPLETION CHECKLIST:
Infrastructure:
□ Server meets recommended specifications
□ NVMe storage verified
□ Adequate disk space for growth
Operating System:
□ SSH hardened (key-only, fail2ban)
□ Firewall configured (port 51235 open)
□ NTP synchronized
□ Automatic security updates enabled
rippled Installation:
□ Current version installed
□ Service enabled for auto-start
□ Configuration documented
Synchronization:
□ "full" state achieved
□ Consistent peer connectivity
□ Validated ledger matches network
Stability:
□ 24+ hours stable operation (48-72 preferred)
□ No restarts required
□ Resources stable
Documentation:
□ Configuration annotated
□ Baseline metrics established
□ Health check script operational
If ALL items checked: READY FOR PHASE 2
If ANY items unchecked: RESOLVE BEFORE PROCEEDING
Server won't reach "full" state:
# Check peer count
/opt/ripple/bin/rippled peers | grep -c '"address"'
# If low peers, verify firewall
sudo ufw status | grep 51235
# Check if port is externally reachable
# From another machine:
nc -zv YOUR_SERVER_IP 51235
# Check logs for connection issues
sudo grep -i "error\|warning" /var/log/rippled/debug.log | tail -50
Frequent state transitions:
# Monitor state over time
watch -n 30 '/opt/ripple/bin/rippled server_info | grep server_state'
# Check for resource pressure
free -h
df -h /var/lib/rippled/db
top -p $(pgrep rippled)
# Potential causes:
# - Insufficient memory (swap usage)
# - Storage too slow
# - Network connectivity issues
# - Insufficient peers
High ledger age:
# Check network connectivity
/opt/ripple/bin/rippled peers
# Verify time synchronization
timedatectl | grep synchronized
# Check for processing backlog
/opt/ripple/bin/rippled server_info | grep -E "load_factor|io_latency"
# Create documentation file
nano ~/validator-docs/server-documentation.md# XRPL Validator Server Documentation
- **Hostname:** validator.example.com
- **IP Address:** x.x.x.x
- **Provider:** [Provider name]
- **Location:** [Data center location]
- **Purpose:** XRPL Validator (currently stock server)
- **CPU:** [Specs]
- **RAM:** [Amount]
- **Storage:** [Type and size]
- **Network:** [Bandwidth]
- **OS:** Ubuntu [version]
- **rippled version:** [version]
- **Configuration:** See /opt/ripple/etc/rippled.cfg
- **SSH Port:** [port]
- **Peer Port:** 51235
- **Admin Ports:** 5005, 6006 (localhost only)
- **Typical peer count:** [range]
- **Typical ledger age:** [seconds]
- **IO latency:** [ms]
- **Memory usage:** [amount]
- **SSH:** ssh -p [port] validator@[ip]
- **Admin user:** validator
- **rippled commands:** /opt/ripple/bin/rippled [command]
- **Health check:** sudo /opt/ripple/bin/health-check.sh
- **Metrics collection:** Hourly via cron
- **Configuration:** /opt/ripple/etc/rippled.cfg
- **Logs:** /var/log/rippled/debug.log
- **Database:** /var/lib/rippled/db/
- **Primary operator:** [name/contact]
- **Backup operator:** [name/contact]
- [Date]: Initial deployment
- [Date]: Stock server stability confirmed
Create a completion report for your records:
nano ~/validator-docs/phase1-completion.md# Phase 1 Completion Report
No content blocks found
- Provider: [name]
- Location: [location]
- Specifications: [summary]
- Monthly cost: $[amount]
- Continuous "full" state: [X] hours
- Unplanned restarts: [0]
- Average peer count: [X]
- Average ledger age: [X] seconds
[Document any issues and resolutions]
[Include key metrics from collection]
□ All mandatory criteria met
□ All recommended criteria met (or noted exceptions)
Ready for Phase 2: Validator Configuration
Signed: [Operator name]
Date: [Date]
```
✅ Stock server stability predicts validator stability - Issues discovered as stock server will persist or worsen as validator
✅ 24+ hours minimum stability is essential - Short observation periods miss intermittent issues that damage validator reputation
✅ Peer connectivity affects performance - Low peer count correlates with synchronization issues and reduced effectiveness
✅ Baseline metrics enable troubleshooting - Understanding normal behavior helps identify abnormal conditions
⚠️ Exact peer count needed - 10+ is guidance; optimal count depends on network topology and server location
⚠️ Ideal observation period - 24 hours is minimum; 72+ hours provides more confidence but delays progress
⚠️ Future stability prediction - Passing current checks doesn't guarantee future stability; ongoing monitoring essential
📌 Rushing to enable validation - Skipping stability verification leads to reputation damage that takes months to repair
📌 Ignoring intermittent issues - "It works most of the time" is not acceptable for validator operation
📌 Proceeding without documentation - Future troubleshooting requires understanding of baseline and configuration decisions
📌 No monitoring infrastructure - Operating without health checks means discovering problems from external reports
This lesson may feel like artificial delay if your server "seems fine." It's not. The stability checkpoint exists because validator reputation is built over months but can be damaged in hours. A validator that goes down, loses sync, or behaves erratically gets noticed. Other operators may remove you from their UNLs, and UNL publishers will note poor performance.
The 24-72 hours of stock server operation is your final opportunity to find issues with zero reputation risk. Use it.
Assignment: Document complete synchronization verification and stability confirmation.
Requirements:
Screenshot of
server_infoshowing "full" stateComparison of your validated_ledger.seq to public network
Complete_ledgers range showing continuous history
Timestamp evidence (24+ hours between first and last check)
Run health check script at 0, 12, and 24 hours minimum
Document all checks passing
Note any warnings or issues observed
Include peer count at each check point
Establish baseline for: peer count, ledger age, IO latency, memory usage
Document typical ranges observed
Note any anomalies and explanations
Include resource usage trends
Complete the full readiness checklist from Section 6.3
Document any items that required remediation
Include approval statement for Phase 2 commencement
Sign and date completion report
PDF or Markdown document
Screenshots with timestamps
Completed checklists
Phase 1 completion report
Verified synchronization with evidence (25%)
Health check results across time period (25%)
Complete baseline metrics documentation (25%)
Readiness checklist and completion report (25%)
Time investment: 24-72 hours observation + 2 hours documentation
Value: Verified foundation for reliable validator operation
1. Server State Understanding (Tests Technical Knowledge):
What does a server_state of "full" indicate?
A) The server has downloaded the complete ledger history since genesis
B) The server is fully synchronized with the current network state and following validated ledgers
C) The server's disk is full
D) The server is ready to issue validation messages
Correct Answer: B
Explanation: "full" state means the server is fully synchronized with the network's current state and is following validated ledgers as they close. It does NOT mean complete history since genesis (that would require terabytes of storage), and it doesn't indicate validation capability (that requires a validator token). It's the prerequisite state for becoming a validator.
2. Peer Connectivity (Tests Operational Knowledge):
A server shows server_state "full" but only has 3 peers. What's the appropriate response?
A) Proceed with validation—the server is synchronized
B) Investigate connectivity issues; low peer count may affect validation message propagation even though currently synchronized
C) Restart the server to get more peers
D) Peer count doesn't matter for validators
Correct Answer: B
Explanation: Low peer count (3) is a warning sign even if currently synchronized. Validators need good connectivity to propagate validation messages effectively. With few peers, messages may not reach the network reliably, affecting agreement percentage. Investigate firewall settings, NAT configuration, and network reachability before enabling validation.
3. Stability Verification (Tests Best Practices):
Why is 24+ hours of stock server stability recommended before enabling validation?
A) rippled requires a warmup period before validation works
B) To identify intermittent issues that would damage validator reputation if discovered after validation is enabled
C) The network requires new servers to wait 24 hours
D) To accumulate enough ledger history for validation
Correct Answer: B
Explanation: The stability period allows operators to discover intermittent issues—memory leaks, occasional sync loss, connectivity problems—before they affect validator reputation. A validator that frequently loses sync or restarts develops poor agreement statistics visible to UNL operators. Finding issues as a stock server (zero reputation impact) is far better than finding them as a validator.
4. Baseline Metrics (Tests Operational Understanding):
What is the primary purpose of establishing baseline metrics?
A) To report to Ripple for UNL consideration
B) To enable identification of abnormal conditions by comparison to normal operation
C) To calculate validator rewards
D) To configure automatic scaling
Correct Answer: B
Explanation: Baseline metrics establish "normal" for your specific server—typical peer count, resource usage, latency, etc. When investigating issues later, you can compare current metrics to baseline to identify what's abnormal. "Memory at 80%" is concerning if baseline is 40%, but normal if baseline is 75%. Without baselines, troubleshooting is guesswork.
5. Readiness Assessment (Tests Critical Thinking):
A server has been in "full" state for 26 hours but had one brief period where it dropped to "syncing" for 5 minutes before recovering. Should you proceed to enable validation?
A) Yes—26 hours exceeds the 24-hour minimum and brief interruptions are normal
B) No—investigate why synchronization was lost; validators should maintain consistent state
C) Yes—but document the incident
D) No—you need to start the 24-hour count over
Correct Answer: B
Explanation: A sync loss, even brief, indicates potential issues that should be understood before enabling validation. Was it a network partition? Resource pressure? A software bug? As a validator, similar events would affect your agreement percentage. Investigate the cause, resolve it if possible, and then observe for another stable period. Don't ignore warning signs.
- XRPL.org, "server_info Method" - Complete field reference
- XRPL.org, "peers Method" - Peer connection details
- XRPL.org, "Diagnosing Problems" - Troubleshooting guidance
- rippled monitoring best practices
- Prometheus/Grafana integration guides
- Log analysis techniques
- https://livenet.xrpl.org - Live network status
- https://xrpscan.com/validators - Validator registry
For Next Lesson:
With a verified, stable stock server, you're ready for Phase 2: Validator Configuration & Security. Lesson 7 will cover validator key generation—creating the cryptographic identity that will represent your validator on the network. This is the point of no return; after enabling validation, your server's behavior affects your reputation.
End of Lesson 6
Congratulations on completing Phase 1: Infrastructure Foundations!
You now have a hardened, configured, synchronized, and verified XRPL server. The foundation is solid. Phase 2 will transform this stock server into an active validator.
Total words: ~5,700
Estimated completion time: 55 minutes reading + 24-72 hours observation + 2 hours documentation
Key Takeaways
"full" state must be consistent and sustained
—achieving "full" once isn't enough; the server must maintain synchronization through varied network conditions and time periods.
Peer count of 10+ is the reliability threshold
—fewer peers indicates connectivity issues that will affect validation message propagation; investigate and resolve before enabling validation.
Baseline metrics are essential for troubleshooting
—document normal resource usage, peer counts, and latency so you can identify abnormal conditions later.
24 hours minimum, 72 hours recommended
for stock server stability verification before enabling validation; this investment prevents reputation damage.
Complete documentation enables future success
—configuration rationale, baseline metrics, and health check scripts serve you when problems arise at 3 AM. ---