On-Chain Intelligence for Jupiter - Vision
TLDR
Jupiter sits on one of the richest behavioral datasets in DeFi. The same intelligence systems I've built in commercial SaaS -- health scoring, churn prediction, behavioral clustering, competitive monitoring -- translate directly to on-chain data. The data layer is actually better: public, real-time, cross-platform, and exhaustive. This document outlines a layered intelligence architecture that converts dormant data into operational advantage.
L0 The Data Landscape
In DeFi, the data IS the product. The blockchain is the warehouse.
What's missing isn't data. It's interpretation.
|
v
L1 What Exists
Raw instructions --> Decoded tables --> Cross-platform aggregations
What's visible, what's dark, what the data layer enables.
|
v
L2 The Intelligence Layer
Wallet Vitals | Product Coupling | Clustering |
Competitor Intelligence | Revenue Forecasting
Each module reads from L1, writes signals that feed the others.
|
v
L3 What Becomes Possible
A/B tested campaigns | Product signals | Competitive response |
Opportunity identification | Agentic readiness | Feedback loop
Layer 0: The Data Landscape
Everyone has the data. Nobody has the intelligence.
In Web2 SaaS, data is a shadow of the product. You build a video editor, then you instrument it -- add event tracking, pipe logs to a warehouse, build marts on top. The product exists first. The data is a lossy, delayed reflection of what happened.
In DeFi, the data IS the product. A swap on Jupiter isn't instrumented after the fact -- the transaction on the ledger is the swap. The execution and the measurement are the same object. Every trade, every perp position, every fee paid exists as a first-class record on a public ledger, replicated across every validator node, indexed by services like Dune, and accessible to anyone without credentials or NDAs.
This changes the problem completely. Jupiter doesn't need to build a data warehouse. The blockchain is the warehouse. What's missing isn't data collection, transformation, or access. What's missing is interpretation.
Case in point: all of Jupiter's transaction data is public, yet most external analysis draws the wrong conclusions from it. Naive market share calculations overstate competitors by counting Jupiter-routed Ultra volume as OKX organic volume. Campaign effectiveness gets measured by aggregate volume rather than per-wallet behavioral change. Wallet counts get inflated by bots that anyone could filter out with basic metadata analysis. The data is there. The intelligence isn't.
The competitive advantage in DeFi analytics is not who has the data. Everyone has the data. The advantage is who builds the intelligence layer on top -- who converts public transaction logs into private strategic insight. That's the gap. And because competitor data is equally public, the same intelligence layer that understands Jupiter's users can simultaneously understand the competitive landscape, market structure, and emerging opportunities across the entire Solana DeFi ecosystem.
Layer 1: What Exists
Three tiers of public on-chain data give full visibility into Jupiter and every competitor -- except Perps, which is the biggest blind spot.
The Public Data Layer
Three tiers of on-chain data are available today, each at a different level of abstraction:
Raw instructions (solana.instruction_calls) -- Every function call ever executed on Solana. The universal audit log. You can reconstruct anything from here, but it requires parsing binary payloads and understanding program-specific discriminators. This is where I extracted the Rewards Hub claimant list -- no decoded table existed, so I parsed the raw contract instructions directly.
Decoded protocol tables (jupiter_solana.aggregator_swaps) -- Community-maintained decodings of popular contracts into structured tables with human-readable columns. Jupiter's aggregator swaps are well-decoded: wallet, token pair, USD values, AMM routing. But coverage is incomplete -- Perps has no decoded table on Dune.
Cross-platform aggregations (dex_solana.trades) -- A unified view of ALL DEX trades on Solana across every platform: Jupiter, OKX, DFlow, Raydium, Meteora, everything. Per trade: wallet, platform, token pair, USD amount, fees. This is the strategic table -- it lets you see not just what your users do on Jupiter, but what they do everywhere. And critically, what everyone else's users do too.
What's Visible and What's Not
| Domain | Visibility | Source |
|---|---|---|
| Aggregator swaps (all versions) | Full | jupiter_solana.aggregator_swaps |
| Cross-platform trading (all Solana DEXes) | Full | dex_solana.trades |
| Fee revenue per trade | Full | dex_solana.trades.fee_usd |
| Competitor user behavior | Full | Same tables, different project filter |
| Perps positions & liquidations | Raw only | solana.instruction_calls (no decoded table) |
| Ultra routing attribution | Partial | CPI pattern: Jupiter outer instruction wrapping OKX/DFlow inner |
| DCA and limit orders | Partial | Some decoded tables exist |
| Off-chain eligibility (campaigns) | None | Determined by Jupiter's API, not on-chain |
The most significant gap is Perps. It generates ~60% of Jupiter's revenue, and the Rewards Hub analysis showed that 60% of claimants were invisible in aggregator tables -- almost certainly Perps users. Building a decoded view of the Perps contract (or accessing internal data) would unlock the majority of fee-paying user behavior.
The second gap is Ultra attribution. Ultra routes through OKX/DFlow for best execution, meaning ~40-50% of OKX's on-chain volume actually originates from Jupiter. Naive market share analysis overstates competitors. Resolving this requires either CPI pattern detection on-chain or internal routing logs.
What This Enables
Because the data is the product, you get properties that Web2 analytics teams spend years trying to approximate:
- Complete behavioral visibility. No dark funnel. Every transaction is public.
- Competitor intelligence for free. The same tables that show Jupiter's users show every competitor's users, their volumes, their growth, and their vulnerabilities.
- Real-time signals. No waiting for nightly syncs. The ledger updates every 400ms.
- Universal benchmarking. Compare your users against all Solana traders, not just your own instrumented subset.
- Forward-looking extensibility. New products -- prediction markets, agentic trading, whatever comes next -- will generate transactions on the same ledger, queryable through the same infrastructure. The intelligence layer doesn't need to be rebuilt for each new product.
Layer 1.5: Market Regime Detection
Before scoring individual wallets, understand the environment they're operating in -- bear/bull, fear/greed, volatility regime.
Individual wallet behavior doesn't exist in a vacuum. A wallet that reduces trading frequency 30% during a market-wide drawdown is behaving rationally. A wallet that reduces 30% while the market is surging is showing genuine disengagement. The intelligence layer needs to distinguish the two.
Market regime features, computed continuously from on-chain data:
Where V_t = aggregate Solana DEX volume (directional trend), \sigma_t = volatility of daily volume (stability), S_t = SOL price momentum, F_t = net flow direction (are wallets depositing or withdrawing from DeFi).
This creates a regime index -- a compact representation of "what the market is doing right now" -- that every downstream model conditions on. Wallet health scores are regime-adjusted: a wallet holding steady during a bear market is healthier than the same metrics during a bull market. Clustering accounts for regime-dependent behavior: some wallets are bear-market specialists (short-biased perps users), others only appear during bull runs (memecoin degens). Others maintain steady cadence regardless (bots, institutional DCA). The regime index makes these patterns legible.
For strategically important wallets (whales, market makers, high-fee generators), the regime index enables a deeper read: how does this wallet respond to fear vs. greed? Do they increase activity during volatility (opportunity-seeking) or retreat (risk-averse)? This behavioral fingerprint under stress is more revealing than any steady-state metric.
Layer 2: The Intelligence Layer
Five interconnected models that convert raw on-chain activity into wallet-level scores, segments, competitive positioning, and revenue forecasts.
This is what turns public data into private strategic advantage. Each module reads from the data landscape, conditions on the market regime, and writes signals that feed the others.
Wallet Vitals (Wallet Health Score)
The probability that a given wallet stays active in the next X days, updated continuously -- the single number everything downstream depends on.
Where H_w = health score for wallet w, P(\text{active}_{t+30}) = probability that the wallet executes at least one trade in the next 30 days, \mathbf{x}_w = the wallet's feature vector (frequency, recency, product breadth, platform loyalty, fee generation), and R_t = current market regime index from L1.5. Regime-conditioning is critical: a wallet scoring 0.6 in a bear market is healthier than 0.6 in a bull market.
Inputs: trading frequency trend (acceleration or deceleration), fee generation consistency, product breadth (how many Jupiter products used), platform loyalty (Jupiter's share of the wallet's total Solana DEX activity), recency of last trade. Crucially, this also includes cross-platform execution behavior -- how the wallet routes trades through competitors. A wallet that starts splitting volume between Jupiter and OKX isn't just a data point; it's an early signal of loyalty erosion that precedes full churn.
Different wallet segments need different treatment. A whale wallet with 100K+ monthly volume operates on different dynamics than a retail wallet doing 100 swaps. The model needs segment-specific learning rates -- the same principle I applied at Synthesia, where enterprise accounts churn on institutional timelines while SMB accounts churn on individual decision timelines.
Output: a score per wallet. Healthy, at-risk, dormant, churned. Everything downstream depends on this.
Product Coupling Map
Which product combinations drive retention, and where do users fall off the multi-product journey?
Where C_{ij} = coupling strength between product i and product j, P(\text{adopt}_j \mid \text{uses}_i) = probability a wallet adopts product j given it already uses product i, and P(\text{adopt}_j) = baseline adoption rate of product j across all wallets. Positive values = reinforcing products (e.g., aggregator swap users are more likely to try Perps). Negative = substitutes or irrelevant pairings.
The intra-product interaction graph. For every wallet: which Jupiter products they use, in what sequence, and how usage of one product correlates with adoption and retention of others.
This is the analytical backbone of the "products reinforcing each other" vision from the CatLumpurr talk. It answers: does a Perps user who also swaps churn less than a Perps-only user? Is DCA adoption a leading indicator of long-term retention? What's the natural product journey -- and where do users fall off?
This framework also extends naturally to future products. When Jupiter launches prediction markets or agentic trading integrations, the coupling map immediately shows how new products interact with the existing ecosystem -- whether they cannibalize, complement, or create entirely new user journeys.
Behavioral Clustering
Meaningful wallet personas -- whale, retail, bot, degen, methodical -- from on-chain metadata alone, no KYC required.
Where z_w = cluster membership vector for wallet w (e.g., [0.7 methodical, 0.2 degen, 0.1 bot]), \text{GMM} = Gaussian Mixture Model that learns the natural groupings from data, and \mathbf{x}_w = the wallet's behavioral feature vector (frequency, timing, token preferences, size distribution, regime response). Soft assignments mean a wallet can belong to multiple personas simultaneously -- more realistic than hard labels.
Group wallets into meaningful personas without identity. Not KYC -- behavioral fingerprints.
Clustering signals: trading frequency distribution, timing patterns (time-of-day, day-of-week), token preferences, transaction size distributions, product usage mix, bot-vs-human detection. And critically, regime response -- how each wallet behaves during market stress. Some wallets increase activity during drawdowns (opportunity-seeking, short-biased). Others go dormant (retail fear). Others maintain steady cadence regardless (bots, institutional DCA). The regime response pattern is often more diagnostic than any steady-state feature.
The cascade of liquidation events over the past month is a good example. You don't need to know who these wallets are to identify the pattern -- the timing, sizing, and sequential structure of the cascade is identifiable through metadata alone. The same principle applies to all wallet behavior: humans trade differently from bots, whales differently from retail, methodical DCA users differently from memecoin degens.
This gives Jupiter a language for talking about user segments that goes beyond "wallet address" and "volume tier." And it creates the foundation for understanding how new user types emerge -- the agentic economy will generate a new class of wallet behavior (AI agents executing trades with non-human patterns, latency profiles, and decision logic) that needs to be identified, understood, and served differently.
Competitor Intelligence
In DeFi, seeing when your users start using a competitor is a query -- and so is finding their vulnerable segments.
Where L_w^t = loyalty score for wallet w at time t, V_w^{\text{Jupiter}} = wallet's trading volume on Jupiter in that period, and V_w^{\text{total}} = wallet's total trading volume across all Solana DEXes (Jupiter + OKX + DFlow + Raydium + ...). A score of 1.0 = Jupiter-exclusive. Track the derivative \Delta L_w over time -- a declining loyalty score is a churn precursor, visible weeks before the wallet goes inactive.
Because competitor data is public, the same analytical framework that scores Jupiter wallets can profile the entire market.
Market structure: Who's growing, who's shrinking, where volume is migrating, and why. Not just top-line numbers -- segment-level: which types of wallets are DFlow gaining? Are they taking retail or whales? Bot flow or human flow?
Competitive churn detection: Per-wallet, per-week: what share of their Solana DEX activity is on Jupiter vs. competitors? Track over time. When a wallet's Jupiter share drops from 80% to 40%, you see exactly where they're going and can infer why.
Wallet overlap as leading indicator: The P01 market share analysis already measures this -- weekly intersection of wallet populations across aggregators. When |Jup ∩ OKX| grows faster than |Jup| alone, wallets are shopping around. When it shrinks, one side is winning exclusivity. The overlap trend is a competitive health metric that updates weekly.
Opportunity identification: Where are competitors' users underserved? If a segment of Raydium-heavy wallets shows behavioral patterns similar to Jupiter power users but hasn't discovered Jupiter's products, that's an acquisition opportunity. If OKX wallets are generating high fees on trades that Jupiter Ultra could route more cheaply, that's a conversion argument backed by data.
This is structurally impossible in Web2. No SaaS company can see when a customer starts using a competitor's product. In DeFi, it's a query.
Revenue Forecasting
Fee revenue as a function of wallet health, trade frequency by segment, and product mix -- probabilistic bands, not point estimates.
Where N_s^{active} = predicted active wallets in segment s (from health scores), \bar{f}_s = average trade frequency for segment (from clustering), \bar{\phi}_s = average fee per trade (from product coupling -- Ultra at 10bp vs. aggregator at 0bp), and g(R_t) = regime adjustment factor.
Each component is predictable from the models above. Wallet health scores predict the active wallet count. Behavioral clustering predicts per-wallet trade frequency by segment. Product coupling predicts fee tier. Layer in exogenous variables -- SOL price, overall DeFi activity, market regime -- and you get probabilistic forecast bands instead of point estimates. Aggregated over a wallet's predicted active lifespan, this yields Wallet Lifetime Value (inferred from Wallet Vitals) -- the expected total fee contribution of a wallet given its current health, segment, and product mix.
This becomes more valuable, not less, as Jupiter's product surface grows. Prediction markets, agentic integrations, and new DeFi primitives each add new fee streams that the forecasting model absorbs as additional terms in the same framework.
Layer 3: What Becomes Possible
The intelligence layer feeds operational capabilities -- from A/B tested campaigns to competitive early warning to agentic readiness.
Campaign Targeting & A/B Testing
- Pre-filtering: Use wallet health scores to exclude dormant wallets from reward eligibility. Season 1 distributed to everyone; Season 2 ($2M, live now) could allocate only to active or at-risk segments.
- A/B framework: Randomly hold out 5-10% of eligible wallets as a control group. Measure swap frequency, volume, and retention for treatment vs. control. This is the only way to distinguish "the campaign caused this behavior" from "this behavior would have happened anyway."
- Segment-specific treatment: Different wallet personas respond to different incentives. A whale might respond to fee rebates; a retail degen might respond to gamified rewards. Clustering enables differentiated campaigns tested against each other.
- Continuous optimization: Each campaign round generates experimental data that feeds back into the health model. Season 2 informs Season 3. The system learns from its own interventions.
Product Development Signals
The coupling map turns "should we build this?" from intuition into a quantifiable question about addressable segments and cannibalization risk.
- Cross-product retention: If wallets using aggregator + Perps have 3x the retention of aggregator-only wallets, that's a signal to reduce friction between those products. If DCA users never discover Perps, that's a surface area problem.
- Pre-launch intelligence: Before building a prediction market: which existing Jupiter wallets show behavioral patterns consistent with prediction market interest? What's the addressable segment? What's the likely cannibalization vs. net-new effect?
- Feature prioritization: Which product improvements would move the highest-value wallet segments? Data-informed product strategy, not intuition.
Competitive Response
Detect market share shifts as they happen -- at the segment level, not in a quarterly review.
- Real-time monitoring: When Jupiter share drops across a wallet segment simultaneously, something changed. A new competitor feature, an improved routing algorithm, a liquidity incentive. The system detects this weekly.
- Acquisition windows: When a competitor's user segment shows signs of dissatisfaction (declining frequency, platform-hopping), that's a window to acquire them before they settle elsewhere.
- Ultra attribution defense: Quantify exactly how much of OKX/DFlow volume is Jupiter-originated through Ultra routing. The "true" market share narrative is a strategic asset.
Agentic Readiness
The bots running arbitrage today are the ancestors of the autonomous agents that will dominate DeFi tomorrow -- the intelligence layer is ready for both.
- Continuity, not disruption. Today's MEV bots, arbitrage scripts, and automated market makers already generate wallet-level behavioral data with non-human patterns. AI agents are the next evolution of the same phenomenon -- more sophisticated, more autonomous, but generating the same on-chain footprint.
- Detection built in. The clustering module already separates bot behavior from human behavior (sub-second execution, no time-of-day patterns, deterministic sizing). As agents grow more sophisticated, the same framework adapts -- it models wallet behavior, not human behavior.
- Agent-as-user. An AI agent managing a portfolio through Jupiter is just another wallet with a behavioral signature. It needs health scoring (is it still active?), clustering (what type of agent?), and competitive monitoring (is it routing through competitors?). The framework is agent-agnostic by design.
The Feedback Loop
Every intervention generates new data -- this is what separates a one-off analysis from infrastructure.
- Campaigns as experiments. A targeted campaign with a holdout group produces a natural experiment. The health model learns which wallet segments respond to which incentives.
- Product changes as signals. A new feature shifts the coupling map. A fee change shifts the revenue model. Every operational decision generates data that updates every model.
- Compounding returns. Season 1 data informed this analysis. This analysis informs Season 2 design. Season 2 results improve the models for Season 3. The system gets better with each cycle.
How the Modules Connect
No model operates alone -- the value is in the interconnection, where each module's output becomes another module's input.
Health scores feed campaign targeting. Clustering feeds health (different segments, different models). Product coupling feeds both health (breadth = stickiness) and revenue forecasting (product mix = fee prediction). Competitive intelligence feeds health (loyalty decline = risk signal) and opportunity identification (competitor weakness = acquisition window). The regime index conditions everything -- no model interprets wallet behavior without knowing the market context. Every intervention feeds back as new data.
Track Record
Nine production modules built in the last 8 months at Synthesia, each with a direct Jupiter isomorph -- the math is domain-agnostic, only the data changes.
Nine production modules, interconnected, influencing 100M+ EUR in decisions:
| Module | Result | Impact | Jupiter Isomorph |
|---|---|---|---|
| Churn Risk | 77% accuracy, 4-5 month advance warning | 2.3M ARR saved through intervention | Wallet attrition prediction from trading frequency decay, platform loyalty drop |
| Expansion Scoring | 96.6% AUC, quota-integrated | Integrated into 100M+ EUR quota planning | Wallet growth prediction -- who increases volume, adopts next product |
| Health Score | Mission-critical daily operational infrastructure | Determines compensation of customer-facing teams optimised for growth and productivity | Wallet vitals: P(active next 30d) from on-chain activity signals |
| User Intelligence | Behavioral personas with soft assignments | Persona-targeted outreach campaigns | Wallet clustering: whale/retail/bot/degen/methodical from trade metadata |
| Territory Optimization | 18% efficiency increase | Headcount reallocation across segments | Campaign budget allocation across wallet segments by expected ROI |
| Revenue Forecasting | 15% accuracy improvement | Board-level planning confidence | Fee revenue forecasting from wallet health x product mix x market regime |
| Attribution | 95%+ accuracy | Resolved $4M+ misattributed pipeline | Ultra routing attribution -- Jupiter-originated volume vs organic competitor |
| Intent Signals | 2.5M freemium users scored | Conversion rate lift on scored leads | Dormant wallet reactivation scoring from historical on-chain patterns |
| Marketing Funnel | Pipeline velocity optimization | Reduced stage-to-stage drop-off 22% | User journey: first swap --> repeat --> multi-product --> power user |
Mathematical Frameworks
| Module | Primary Framework | Technique |
|---|---|---|
| Wallet Vitals (Health Score) | Bayesian inference | Posterior probability of activity given behavioral evidence, with segment-specific priors and regime conditioning |
| Behavioral Clustering | Unsupervised learning | Gaussian Mixture Models (GMM) with soft cluster assignments; regime-response features for stress-state profiling |
| Product Coupling | Conditional probability | Lift analysis -- adoption rates conditioned on existing product usage vs. baseline; association rule mining |
| Competitor Intelligence | Information theory | Mutual information between wallet behavior and platform choice; loyalty score as volume share ratio |
| Revenue Forecasting | Decomposition model | Segment-level summation: active wallets x frequency x fee tier x regime adjustment; survival analysis for Wallet Lifetime Value |
| Market Regime (L1.5) | Time series / regime detection | Hidden Markov Model or rolling feature windows over aggregate volume, volatility, price momentum, and net flow |
The mathematical frameworks are domain-agnostic. Bayesian inference works the same whether evidence comes from product telemetry or on-chain transactions. Survival analysis models time-to-churn whether the "customer" is a SaaS account or a wallet. Mutual information ranks features regardless of what those features measure.
What changes between Synthesia and Jupiter is not the mathematics. It's the data. And Jupiter's data -- public, real-time, cross-platform, exhaustive -- is in many ways better to build intelligence on.