Data Sources and Tools - Where to Get Reliable Information
Learning Objectives
Identify and evaluate primary data sources including XRPL APIs, full-history nodes, and direct ledger access
Navigate major block explorers (XRPScan, Bithomp, XRPL.org) and understand their strengths and limitations
Use third-party analytics platforms critically, understanding what they add and what biases they may introduce
Set up a basic data collection workflow using free tools and APIs
Cross-reference sources to identify discrepancies and verify accuracy
You want to know how many transactions occurred on XRPL yesterday. Seems simple. But where do you get that number?
Option 1: Query the ledger directly via API
Option 2: Check a block explorer like XRPScan
Option 3: Look at an analytics dashboard
Option 4: Find a tweet from a "data analyst"
Each source has different reliability, accessibility, and use cases:
| Source | Reliability | Accessibility | Effort Required |
|---|---|---|---|
| Direct API | Highest | Medium (technical) | High |
| Block explorers | High | Easy | Low |
| Analytics platforms | Medium | Easy | Low |
| Social media | Low | Very easy | None |
The trap: Most people default to the easiest option, not the most reliable.
This lesson teaches you to choose the right source for each situation—matching reliability requirements to practical constraints. Sometimes a quick explorer check is fine. Sometimes you need to verify against raw ledger data.
The most authoritative data comes directly from XRPL nodes:
What's a Public Node?
XRPL NETWORK:
├── Validator nodes: Participate in consensus
├── Full-history nodes: Store complete ledger history
├── Stock nodes: Store recent history (configurable)
└── Public nodes: Accept external API connections
Public nodes operated by:
├── XRPL Foundation: wss://xrplcluster.com
├── Ripple: wss://s1.ripple.com, wss://s2.ripple.com
├── Third parties: Various, quality varies
Connecting to a Public Node:
// Using xrpl.js library
const xrpl = require('xrpl');
// Connect to mainnet
const client = new xrpl.Client('wss://xrplcluster.com');
await client.connect();
// Example: Get current ledger
const ledgerResponse = await client.request({
command: 'ledger',
ledger_index: 'validated'
});
console.log('Current ledger:', ledgerResponse.result.ledger_index);
Reliability Assessment:
DIRECT NODE ACCESS:
├── Data accuracy: Cryptographically verified
├── Data completeness: Depends on node (full-history vs pruned)
├── Update frequency: Real-time (new ledger every ~4 seconds)
├── Manipulation risk: None (data is consensus-validated)
└── Limitations: Technical skill required, rate limits on public nodes
The XRPL Foundation provides infrastructure for the ecosystem:
Cluster of public nodes
WebSocket and JSON-RPC access
Rate-limited but reliable
Good for: Development, moderate-volume queries
Optimized for read-heavy workloads
PostgreSQL backend for efficient queries
Better performance for historical queries
Good for: Analytics, reporting, historical research
Pre-aggregated metrics
Simplified access to common queries
Good for: Quick metrics without raw data processing
For serious analysis, consider running your own node:
- No rate limits
- Full control over data retention
- No dependency on third parties
- Can run specialized queries
Requirements:
HARDWARE REQUIREMENTS (Full History):
├── Storage: 15+ TB SSD (and growing)
├── RAM: 64GB+ recommended
├── CPU: 8+ cores
├── Network: High bandwidth, stable connection
HARDWARE REQUIREMENTS (Recent History Only):
├── Storage: 500GB-2TB SSD
├── RAM: 32GB+
├── CPU: 4+ cores
└── More accessible for individual analysts
Practical Reality:
- You're doing high-volume automated queries
- You need guaranteed uptime and performance
- You're building production applications
- You need complete historical data access
- Transaction search and details
- Account analysis
- Rich list and distribution data
- DEX activity tracking
- Validator monitoring
Key Features:
XRPSCAN CAPABILITIES:
TRANSACTION ANALYSIS:
├── Search by hash, account, or ledger
├── Decoded transaction details
├── Visual flow diagrams
└── Metadata interpretation
ACCOUNT ANALYSIS:
├── Balance history
├── Transaction history
├── Trust lines
├── NFT holdings
└── Labeled known accounts (exchanges, etc.)
NETWORK STATS:
├── Daily transaction counts
├── Active accounts
├── Fee burn tracking
├── Validator status
Reliability Assessment:
XRPSCAN:
├── Data source: Direct XRPL node connection
├── Processing: Some aggregation and interpretation
├── Known accounts: Community-contributed labels
├── Bias risk: Low (independent project)
├── Best for: Individual transaction/account lookup
└── Limitation: Historical aggregated data may have gaps
- NFT-focused tools
- Token and trustline analysis
- Rich list tracking
- Historical snapshots
Differentiating Features:
BITHOMP UNIQUE TOOLS:
NFT EXPLORER:
├── Collection browsing
├── Rarity analysis
├── Price history
└── Creator analytics
TOKEN ANALYSIS:
├── Issued currency details
├── Holder distribution
├── Supply changes over time
└── Trust line tracking
DEVELOPER TOOLS:
├── API for integration
├── Webhook services
└── Bulk data access
Reliability Assessment:
BITHOMP:
├── Data source: Own full-history infrastructure
├── Processing: Significant value-added analysis
├── Specialization: Strong on NFTs and tokens
├── Best for: Token/NFT ecosystem analysis
└── Limitation: Some features require account/payment
- Clean, simple interface
- Transaction and account lookup
- Network status monitoring
- Validator information
Reliability Assessment:
XRPL.ORG EXPLORER:
├── Data source: XRPL Foundation infrastructure
├── Processing: Minimal—close to raw data
├── Bias risk: Very low (foundation operated)
├── Best for: Quick lookups, verification
└── Limitation: Fewer advanced features than alternatives
| Feature | XRPScan | Bithomp | XRPL.org |
|---|---|---|---|
| Transaction lookup | ✅ | ✅ | ✅ |
| Account history | ✅ | ✅ | ✅ |
| Known account labels | ✅ | ✅ | Limited |
| DEX analysis | ✅ | ✅ | Basic |
| NFT tools | Basic | ✅✅ | Basic |
| API access | ✅ | ✅ | Via XRPL API |
| Network stats | ✅ | ✅ | ✅ |
| Historical aggregates | ✅ | ✅ | Limited |
- **Quick lookup:** Any explorer works
- **Account analysis:** XRPScan or Bithomp
- **NFT research:** Bithomp
- **Verification:** Use multiple sources
XRPLMeta (xrplmeta.org)
FOCUS: XRPL ecosystem tracking
PROVIDES:
├── Token metrics and rankings
├── DEX volume tracking
├── AMM pool data
├── Historical statistics
RELIABILITY:
├── Data: Aggregated from XRPL
├── Methodology: Usually documented
├── Bias: Low (community project)
├── Use for: Ecosystem overview, token research
```
XRPL Services (xrpl.services)
FOCUS: Developer and analytics tools
PROVIDES:
├── Network statistics
├── Validator monitoring
├── API services
├── Data exports
RELIABILITY:
├── Data: Direct from XRPL
├── Quality: Generally high
├── Best for: Developer integration, monitoring
```
Utility Scan / ODL Trackers
FOCUS: ODL corridor tracking
PROVIDES:
├── Estimated ODL volume by corridor
├── Exchange flow analysis
├── Commercial activity patterns
RELIABILITY:
├── Data: Inferred from on-chain patterns
├── Methodology: Pattern matching (not verified)
├── Uncertainty: Significant (ODL vs arbitrage unclear)
├── Use for: Directional indicators, not precise volumes
```
CoinGecko / CoinMarketCap
RELEVANT DATA:
├── XRP price and volume
├── Market cap rankings
├── Exchange trading data
├── Historical price charts
RELIABILITY CONCERNS:
├── Volume data: Subject to wash trading
├── CoinGecko Trust Score: Helpful filter
├── Best practice: Use adjusted/filtered metrics
USE FOR:
├── Market context (price, rank)
├── Exchange volume comparison
├── NOT for on-chain metrics
```
Glassnode / Santiment / CryptoQuant
FOCUS: On-chain analytics (primarily BTC/ETH)
XRP COVERAGE: Limited but growing
PROVIDES:
├── Active address metrics
├── Exchange flow tracking
├── Whale watching
├── Custom indicators
RELIABILITY:
├── Methodology: Often proprietary
├── Quality: Varies by metric
├── Cost: Premium tiers for most data
├── Caution: Verify XRP-specific accuracy
```
Quarterly XRP Markets Reports
PUBLISHED BY: Ripple Labs
FREQUENCY: Quarterly
CONTAINS:
├── XRP sales from Ripple
├── ODL volume estimates
├── Escrow status
├── RippleNet updates
├── Market commentary
RELIABILITY ASSESSMENT:
├── Self-reported data (incentive bias possible)
├── ODL volumes cannot be independently verified
├── Escrow data matches on-chain observation
├── Useful for: Official Ripple perspective
├── Verify: Against on-chain data where possible
```
RippleNet Partner Announcements
RELIABILITY: Low for actual usage
├── "Partnership" ≠ active ODL usage
├── Announcements often years ahead of implementation
├── No verification of actual transaction volume
├── Use for: Awareness of potential developments
├── NOT for: Confirming actual adoption
For any important metric, verify across at least three sources:
VERIFICATION WORKFLOW:
Step 1: Primary Query (Direct or Explorer)
├── Query XRPL API or block explorer
├── Get raw or semi-raw data
└── Record value and methodology
Step 2: Cross-Reference (Different Explorer)
├── Use different explorer
├── Same query, different source
└── Note any discrepancies
Step 3: Third Source Check (If Available)
├── Analytics platform
├── Or second API query method
└── Resolve discrepancies through investigation
Example: "How many active addresses yesterday?"
├── Source 1: XRPScan network stats
├── Source 2: Bithomp active accounts
├── Source 3: Own query via API
└── If sources differ, investigate why (different definitions?)
Set up a simple querying environment:
// Install: npm install xrpl
const xrpl = require('xrpl');
async function getBasicMetrics() {
// Connect to public node
const client = new xrpl.Client('wss://xrplcluster.com');
await client.connect();
// Get current ledger info
const ledger = await client.request({
command: 'ledger',
ledger_index: 'validated',
transactions: true,
expand: false
});
console.log('Ledger Index:', ledger.result.ledger.ledger_index);
console.log('Close Time:', ledger.result.ledger.close_time_human);
console.log('Transactions:', ledger.result.ledger.transactions?.length || 0);
// Get server info
const serverInfo = await client.request({
command: 'server_info'
});
console.log('Validated Ledger Range:',
serverInfo.result.info.complete_ledgers);
await client.disconnect();
}
getBasicMetrics().catch(console.error);
Alternative for Python users:
# Install: pip install xrpl-py
from xrpl.clients import JsonRpcClient
from xrpl.models import Ledger, ServerInfo
# Connect to public node
client = JsonRpcClient("https://xrplcluster.com")
# Get current ledger
ledger_request = Ledger(ledger_index="validated")
ledger_response = client.request(ledger_request)
ledger_info = ledger_response.result
print(f"Ledger Index: {ledger_info['ledger_index']}")
print(f"Close Time: {ledger_info['ledger']['close_time_human']}")
# Get server info
server_request = ServerInfo()
server_response = client.request(server_request)
server_info = server_response.result['info']
print(f"Ledger Range: {server_info['complete_ledgers']}")
For non-programmers, a spreadsheet workflow works:
MANUAL DATA COLLECTION WORKFLOW:
1. Open XRPScan network stats
2. Record in spreadsheet:
1. Cross-check against Bithomp
2. Note any discrepancies
3. Calculate weekly averages
4. Update trend charts
1. Full reconciliation across sources
2. Update month-over-month calculations
3. Compare to previous periods
4. Identify anomalies for investigation
For ongoing tracking, set up simple automation:
// Simple monitoring script
const xrpl = require('xrpl');
const fs = require('fs');
async function logLedgerStats() {
const client = new xrpl.Client('wss://xrplcluster.com');
await client.connect();
// Subscribe to ledger closes
await client.request({
command: 'subscribe',
streams: ['ledger']
});
client.on('ledgerClosed', (ledger) => {
const stats = {
timestamp: new Date().toISOString(),
ledger_index: ledger.ledger_index,
txn_count: ledger.txn_count,
fee_base: ledger.fee_base
};
// Append to log file
fs.appendFileSync('ledger_log.json',
JSON.stringify(stats) + '\n');
console.log(`Ledger ${ledger.ledger_index}: ${ledger.txn_count} txns`);
});
}
logLedgerStats();
Use this framework to select appropriate sources:
SOURCE SELECTION DECISION TREE:
Q1: What type of data do you need?
├── Current state (balance, order book) → Any node/explorer
├── Single transaction/account → Any explorer
├── Historical aggregate → Full-history source + aggregation
└── Real-time streaming → Direct node connection
Q2: How important is accuracy?
├── Critical (financial decision) → Primary sources + verification
├── Important (analysis) → Explorer + one verification
└── Informational (general interest) → Single reliable source OK
Q3: What technical capability do you have?
├── Can code → API direct access, automation possible
├── Spreadsheet capable → Manual workflow from explorers
└── Non-technical → Rely on analytics platforms (acknowledge limitations)
Q4: What's your time budget?
├── Minutes → Quick explorer lookup
├── Hours → Cross-verified analysis
└── Days → Deep research with multiple methods
Watch for these warning signs:
SOURCE RED FLAGS:
DATA QUALITY ISSUES:
⚠️ Numbers don't match any primary source
⚠️ Methodology not documented
⚠️ Data updates inconsistently
⚠️ Historical data changes retroactively
⚠️ Impossible precision (e.g., "exactly 47.3% institutional")
BIAS INDICATORS:
⚠️ Only positive/negative metrics highlighted
⚠️ Source has financial interest in XRP price
⚠️ Anonymous or unverifiable source
⚠️ Claims can't be reproduced
PRESENTATION ISSUES:
⚠️ No sources or methodology cited
⚠️ Screenshot of data without context
⚠️ Conflates correlation with causation
⚠️ Cherry-picked timeframes
Rate your sources systematically:
SOURCE CONFIDENCE RATING:
LEVEL A - VERIFICATION GRADE:
├── Direct XRPL API queries
├── Raw ledger data
├── Cryptographically verifiable
└── Use for: Final verification, critical decisions
LEVEL B - PRIMARY GRADE:
├── Major block explorers (XRPScan, Bithomp, XRPL.org)
├── XRPL Foundation resources
├── Direct from ledger with interpretation
└── Use for: Regular analysis, monitoring
LEVEL C - SECONDARY GRADE:
├── Analytics platforms with documented methodology
├── Ripple self-reported data (with caveats)
├── Established crypto data providers
└── Use for: Context, trends, comparison
LEVEL D - TERTIARY GRADE:
├── Community analysis (verified methodology)
├── News reports citing primary sources
├── Academic research
└── Use for: Ideas, hypotheses to verify
LEVEL E - UNRATED:
├── Social media claims
├── Anonymous analysis
├── Promotional content
└── Use for: Entertainment only
✅ Multiple reliable data sources exist for XRPL analysis
✅ Direct API access provides verifiable, manipulation-resistant data
✅ Major block explorers offer reasonable accuracy for most use cases
✅ Cross-referencing catches errors and discrepancies
⚠️ Third-party analytics platform methodologies may not be transparent
⚠️ Self-reported data (Ripple quarterly reports) has inherent bias potential
⚠️ "Best" source depends on specific use case and requirements
⚠️ Historical data quality varies across sources
📌 Trusting a single source without verification for important decisions
📌 Using social media "analysis" as if it were verified data
📌 Assuming all explorers use the same definitions for metrics
📌 Ignoring source incentives and potential biases
The good news: Excellent data sources exist for XRPL analysis, from direct API access to well-maintained block explorers. The challenge isn't finding data—it's choosing the right source for each need and maintaining healthy skepticism. A few minutes of cross-referencing prevents hours of analysis built on faulty data. Build the verification habit now.
Assignment: Set up and document a complete data collection workflow for ongoing XRPL analysis, including source selection, verification procedures, and practical implementation.
Requirements:
Part 1: Source Inventory (25%)
Create a comprehensive inventory of data sources you'll use:
- Name and URL
- Type (primary/secondary/tertiary)
- Confidence level (A-E rating)
- Data available
- Access method (web, API, export)
- Cost (free/paid)
- Best use cases
- Limitations
Part 2: Metrics-to-Source Mapping (25%)
- Primary source for each metric
- Backup source for verification
- Any definitional differences between sources
- Expected update frequency
Create a matrix:
| Metric | Primary Source | Verification Source | Definition Notes |
|---|
Part 3: Hands-On Implementation (35%)
Choose ONE of these paths based on your technical level:
Write a script (Python or JavaScript) that queries at least 5 different XRPL metrics
Include error handling and rate limiting
Output data in structured format (JSON or CSV)
Document the script with comments
Create a spreadsheet template for manual data collection
Include data from at least 3 different sources
Set up formulas for week-over-week and month-over-month calculations
Include comparison columns for cross-source verification
Part 4: Verification Protocol (15%)
What triggers verification (always? threshold changes? specific metrics?)
Step-by-step verification process
How to handle discrepancies
Documentation requirements
Completeness of source inventory (20%)
Quality of metrics-to-source mapping (20%)
Functionality of implementation (35%)
Rigor of verification protocol (15%)
Documentation quality (10%)
Time investment: 4-5 hours
Value: This deliverable creates your actual working infrastructure for XRPL analysis. Unlike theoretical exercises, you'll use this workflow throughout the course and beyond.
1. Source Selection:
You need to verify that an exchange received a specific XRP deposit. What is the MOST reliable approach?
A) Check the exchange's transaction history on their platform
B) Search for the transaction hash on XRPScan
C) Query the XRPL API directly using the transaction hash
D) Look for the transaction in CoinGecko's XRP activity feed
Correct Answer: C
Explanation: Direct API queries (C) provide cryptographically verified data from the ledger itself—this is the gold standard for verification. XRPScan (B) is reliable but adds a layer of interpretation/processing. Exchange platforms (A) show their internal records, which may differ from actual on-chain settlement. CoinGecko (D) doesn't provide individual transaction lookup. For verification purposes, primary sources beat secondary sources.
2. Understanding Data Limitations:
You're comparing XRP daily transaction counts from XRPScan and Bithomp and find a 15% difference. What is the MOST likely explanation?
A) One of the explorers is showing incorrect data
B) They're using different definitions of "transaction" or different spam filters
C) XRP Ledger data is inconsistent across different nodes
D) The difference is within normal margin of error for blockchain data
Correct Answer: B
Explanation: Different explorers may apply different spam filters, include/exclude different transaction types, or use different time zone cutoffs for "daily" counts. Both could be showing accurate data by their own definitions (B). XRPL data is consistent across nodes (C is wrong—consensus ensures this). 15% is too large for "margin of error" in deterministic blockchain data (D). "Incorrect data" (A) is possible but less likely than definitional differences for reputable explorers.
3. API Access:
A researcher wants to analyze all Payment transactions from the last 30 days. Which approach is MOST appropriate?
A) Use XRPScan's API to download pre-aggregated payment statistics
B) Query each ledger from the past 30 days via public node API
C) Use a full-history node or data export service optimized for bulk queries
D) Screen-scrape transaction lists from block explorer websites
Correct Answer: C
Explanation: 30 days represents ~650,000 ledgers with potentially tens of millions of transactions. Option B (querying each ledger via public node) would be extremely slow and likely hit rate limits. Option A (pre-aggregated) might work if available, but may not provide transaction-level detail. Option D (screen-scraping) is unreliable and usually against terms of service. Option C—using infrastructure designed for bulk historical queries—is the appropriate approach for this scale of analysis.
4. Source Reliability:
Ripple's quarterly report states that ODL processed $2 billion in volume last quarter. How should you treat this data?
A) Accept it as verified—Ripple has access to internal data we can't see
B) Reject it entirely—self-reported data from interested parties is unreliable
C) Treat as useful estimate with caveats—note it's self-reported and cross-reference where possible
D) Use it only after an independent auditor confirms the figures
Correct Answer: C
Explanation: Ripple's self-reported data is useful but comes from a party with interest in XRP's success. Complete acceptance (A) ignores potential bias. Complete rejection (B) throws away potentially valuable information. Waiting for audit (D) may never happen and is impractical. The balanced approach (C) uses the data while acknowledging its limitations and seeking corroboration where possible. This is standard practice for any self-reported corporate data.
5. Practical Application:
You're setting up weekly monitoring of XRPL network health. Which combination provides the best balance of reliability, coverage, and practical effort?
A) Daily API queries for all metrics, automated with scripts, verified against two explorers
B) Weekly manual check of one trusted explorer, no verification
C) Weekly data pull from XRPScan, spot-verification of key metrics via API or second explorer
D) Monthly review of Ripple's quarterly report plus social media sentiment analysis
Correct Answer: C
Explanation: Option C balances practicality with reliability—regular data collection from a trusted source with verification of the most important metrics. Option A is ideal but potentially more effort than needed for weekly monitoring. Option B lacks verification and single-source risk. Option D is too infrequent (quarterly reports are... quarterly) and relies on unreliable sources (social media). For sustainable ongoing monitoring, efficient verification of key metrics beats comprehensive verification of everything.
- XRPL.org WebSocket/JSON-RPC API reference
- xrpl.js library documentation (JavaScript)
- xrpl-py library documentation (Python)
- XRPScan.com - API documentation section
- Bithomp.com - Developer tools
- XRPL.org Explorer
- XRPLMeta.org
- XRPL.Services
- Ripple Quarterly XRP Markets Reports
- XRPL Foundation blog and resources
For Next Lesson:
Lesson 4 establishes your metrics categories framework—organizing the metrics you'll track into a coherent system that prevents cherry-picking and ensures comprehensive analysis.
End of Lesson 3
Total words: ~6,800
Estimated completion time: 55 minutes reading + 4-5 hours for deliverable
Key Takeaways
Primary sources beat all others
: Direct XRPL API access provides cryptographically verifiable data. When accuracy matters, go to the source. Explorers and platforms add convenience but also add potential error.
Block explorers are reliable but not identical
: XRPScan, Bithomp, and XRPL.org all pull from XRPL, but their aggregations and definitions may differ. Understand what each offers and cross-reference important metrics.
Third-party platforms require calibration
: Analytics services add value through aggregation and visualization but may have undocumented methodologies. Use them for context and trends; verify specifics independently.
Self-reported data needs skepticism
: Ripple's quarterly reports contain valuable data but come from a party with interest in XRP success. Verify against on-chain data where possible.
Build a verification workflow
: For any metric that matters, check at least two independent sources. Discrepancies are information—investigate them rather than ignoring them. ---