On-Chain Intelligence for Jupiter - Vision

TLDR

Jupiter sits on one of the richest behavioral datasets in DeFi. The same intelligence systems I've built in commercial SaaS -- health scoring, churn prediction, behavioral clustering, competitive monitoring -- translate directly to on-chain data. The data layer is actually better: public, real-time, cross-platform, and exhaustive. This document outlines a layered intelligence architecture that converts dormant data into operational advantage.

L0  The Data Landscape
    In DeFi, the data IS the product. The blockchain is the warehouse.
    What's missing isn't data. It's interpretation.
         |
         v
L1  What Exists
    Raw instructions --> Decoded tables --> Cross-platform aggregations
    What's visible, what's dark, what the data layer enables.
         |
         v
L2  The Intelligence Layer
    Wallet Vitals | Product Coupling | Clustering |
    Competitor Intelligence | Revenue Forecasting
    Each module reads from L1, writes signals that feed the others.
         |
         v
L3  What Becomes Possible
    A/B tested campaigns | Product signals | Competitive response |
    Opportunity identification | Agentic readiness | Feedback loop

Layer 0: The Data Landscape

Everyone has the data. Nobody has the intelligence.

In Web2 SaaS, data is a shadow of the product. You build a video editor, then you instrument it -- add event tracking, pipe logs to a warehouse, build marts on top. The product exists first. The data is a lossy, delayed reflection of what happened.

In DeFi, the data IS the product. A swap on Jupiter isn't instrumented after the fact -- the transaction on the ledger is the swap. The execution and the measurement are the same object. Every trade, every perp position, every fee paid exists as a first-class record on a public ledger, replicated across every validator node, indexed by services like Dune, and accessible to anyone without credentials or NDAs.

This changes the problem completely. Jupiter doesn't need to build a data warehouse. The blockchain is the warehouse. What's missing isn't data collection, transformation, or access. What's missing is interpretation.

Case in point: all of Jupiter's transaction data is public, yet most external analysis draws the wrong conclusions from it. Naive market share calculations overstate competitors by counting Jupiter-routed Ultra volume as OKX organic volume. Campaign effectiveness gets measured by aggregate volume rather than per-wallet behavioral change. Wallet counts get inflated by bots that anyone could filter out with basic metadata analysis. The data is there. The intelligence isn't.

The competitive advantage in DeFi analytics is not who has the data. Everyone has the data. The advantage is who builds the intelligence layer on top -- who converts public transaction logs into private strategic insight. That's the gap. And because competitor data is equally public, the same intelligence layer that understands Jupiter's users can simultaneously understand the competitive landscape, market structure, and emerging opportunities across the entire Solana DeFi ecosystem.

Layer 1: What Exists

Three tiers of public on-chain data give full visibility into Jupiter and every competitor -- except Perps, which is the biggest blind spot.

The Public Data Layer

Three tiers of on-chain data are available today, each at a different level of abstraction:

Raw instructions (solana.instruction_calls) -- Every function call ever executed on Solana. The universal audit log. You can reconstruct anything from here, but it requires parsing binary payloads and understanding program-specific discriminators. This is where I extracted the Rewards Hub claimant list -- no decoded table existed, so I parsed the raw contract instructions directly.

Decoded protocol tables (jupiter_solana.aggregator_swaps) -- Community-maintained decodings of popular contracts into structured tables with human-readable columns. Jupiter's aggregator swaps are well-decoded: wallet, token pair, USD values, AMM routing. But coverage is incomplete -- Perps has no decoded table on Dune.

Cross-platform aggregations (dex_solana.trades) -- A unified view of ALL DEX trades on Solana across every platform: Jupiter, OKX, DFlow, Raydium, Meteora, everything. Per trade: wallet, platform, token pair, USD amount, fees. This is the strategic table -- it lets you see not just what your users do on Jupiter, but what they do everywhere. And critically, what everyone else's users do too.

What's Visible and What's Not

Domain	Visibility	Source
Aggregator swaps (all versions)	Full	`jupiter_solana.aggregator_swaps`
Cross-platform trading (all Solana DEXes)	Full	`dex_solana.trades`
Fee revenue per trade	Full	`dex_solana.trades.fee_usd`
Competitor user behavior	Full	Same tables, different `project` filter
Perps positions & liquidations	Raw only	`solana.instruction_calls` (no decoded table)
Ultra routing attribution	Partial	CPI pattern: Jupiter outer instruction wrapping OKX/DFlow inner
DCA and limit orders	Partial	Some decoded tables exist
Off-chain eligibility (campaigns)	None	Determined by Jupiter's API, not on-chain

The most significant gap is Perps. It generates ~60% of Jupiter's revenue, and the Rewards Hub analysis showed that 60% of claimants were invisible in aggregator tables -- almost certainly Perps users. Building a decoded view of the Perps contract (or accessing internal data) would unlock the majority of fee-paying user behavior.

The second gap is Ultra attribution. Ultra routes through OKX/DFlow for best execution, meaning ~40-50% of OKX's on-chain volume actually originates from Jupiter. Naive market share analysis overstates competitors. Resolving this requires either CPI pattern detection on-chain or internal routing logs.

What This Enables

Because the data is the product, you get properties that Web2 analytics teams spend years trying to approximate:

Complete behavioral visibility. No dark funnel. Every transaction is public.
Competitor intelligence for free. The same tables that show Jupiter's users show every competitor's users, their volumes, their growth, and their vulnerabilities.
Real-time signals. No waiting for nightly syncs. The ledger updates every 400ms.
Universal benchmarking. Compare your users against all Solana traders, not just your own instrumented subset.
Forward-looking extensibility. New products -- prediction markets, agentic trading, whatever comes next -- will generate transactions on the same ledger, queryable through the same infrastructure. The intelligence layer doesn't need to be rebuilt for each new product.

Layer 1.5: Market Regime Detection

Before scoring individual wallets, understand the environment they're operating in -- bear/bull, fear/greed, volatility regime.

Individual wallet behavior doesn't exist in a vacuum. A wallet that reduces trading frequency 30% during a market-wide drawdown is behaving rationally. A wallet that reduces 30% while the market is surging is showing genuine disengagement. The intelligence layer needs to distinguish the two.

Market regime features, computed continuously from on-chain data:

R_t = {V_t, \sigma_t, S_t, F_t}

Where $V_t$ = aggregate Solana DEX volume (directional trend), $\sigma_t$ = volatility of daily volume (stability), $S_t$ = SOL price momentum, $F_t$ = net flow direction (are wallets depositing or withdrawing from DeFi).

This creates a regime index -- a compact representation of "what the market is doing right now" -- that every downstream model conditions on. Wallet health scores are regime-adjusted: a wallet holding steady during a bear market is healthier than the same metrics during a bull market. Clustering accounts for regime-dependent behavior: some wallets are bear-market specialists (short-biased perps users), others only appear during bull runs (memecoin degens). Others maintain steady cadence regardless (bots, institutional DCA). The regime index makes these patterns legible.

For strategically important wallets (whales, market makers, high-fee generators), the regime index enables a deeper read: how does this wallet respond to fear vs. greed? Do they increase activity during volatility (opportunity-seeking) or retreat (risk-averse)? This behavioral fingerprint under stress is more revealing than any steady-state metric.

Layer 2: The Intelligence Layer

Five interconnected models that convert raw on-chain activity into wallet-level scores, segments, competitive positioning, and revenue forecasts.

This is what turns public data into private strategic advantage. Each module reads from the data landscape, conditions on the market regime, and writes signals that feed the others.

Wallet Vitals (Wallet Health Score)

The probability that a given wallet stays active in the next X days, updated continuously -- the single number everything downstream depends on.

H_w = P(\text{active}_{t+30} \mid \mathbf{x}_w, R_t)

Where $H_w$ = health score for wallet $w$ , $P(\text{active}_{t+30})$ = probability that the wallet executes at least one trade in the next 30 days, $\mathbf{x}_w$ = the wallet's feature vector (frequency, recency, product breadth, platform loyalty, fee generation), and $R_t$ = current market regime index from L1.5. Regime-conditioning is critical: a wallet scoring 0.6 in a bear market is healthier than 0.6 in a bull market.

Inputs: trading frequency trend (acceleration or deceleration), fee generation consistency, product breadth (how many Jupiter products used), platform loyalty (Jupiter's share of the wallet's total Solana DEX activity), recency of last trade. Crucially, this also includes cross-platform execution behavior -- how the wallet routes trades through competitors. A wallet that starts splitting volume between Jupiter and OKX isn't just a data point; it's an early signal of loyalty erosion that precedes full churn.

Different wallet segments need different treatment. A whale wallet with $100K+ monthly volume operates on different dynamics than a retail wallet doing$ 100 swaps. The model needs segment-specific learning rates -- the same principle I applied at Synthesia, where enterprise accounts churn on institutional timelines while SMB accounts churn on individual decision timelines.

Output: a score per wallet. Healthy, at-risk, dormant, churned. Everything downstream depends on this.

Product Coupling Map

Which product combinations drive retention, and where do users fall off the multi-product journey?

C_{ij} = P(\text{adopt}_j \mid \text{uses}_i) - P(\text{adopt}_j)

Where $C_{ij}$ = coupling strength between product $i$ and product $j$ , $P(\text{adopt}_j \mid \text{uses}_i)$ = probability a wallet adopts product $j$ given it already uses product $i$ , and $P(\text{adopt}_j)$ = baseline adoption rate of product $j$ across all wallets. Positive values = reinforcing products (e.g., aggregator swap users are more likely to try Perps). Negative = substitutes or irrelevant pairings.

The intra-product interaction graph. For every wallet: which Jupiter products they use, in what sequence, and how usage of one product correlates with adoption and retention of others.

This is the analytical backbone of the "products reinforcing each other" vision from the CatLumpurr talk. It answers: does a Perps user who also swaps churn less than a Perps-only user? Is DCA adoption a leading indicator of long-term retention? What's the natural product journey -- and where do users fall off?

This framework also extends naturally to future products. When Jupiter launches prediction markets or agentic trading integrations, the coupling map immediately shows how new products interact with the existing ecosystem -- whether they cannibalize, complement, or create entirely new user journeys.

Behavioral Clustering

Meaningful wallet personas -- whale, retail, bot, degen, methodical -- from on-chain metadata alone, no KYC required.

z_w = \text{GMM}(\mathbf{x}_w) \rightarrow \text{soft cluster assignments}

Where $z_w$ = cluster membership vector for wallet $w$ (e.g., [0.7 methodical, 0.2 degen, 0.1 bot]), $\text{GMM}$ = Gaussian Mixture Model that learns the natural groupings from data, and $\mathbf{x}_w$ = the wallet's behavioral feature vector (frequency, timing, token preferences, size distribution, regime response). Soft assignments mean a wallet can belong to multiple personas simultaneously -- more realistic than hard labels.

Group wallets into meaningful personas without identity. Not KYC -- behavioral fingerprints.

Clustering signals: trading frequency distribution, timing patterns (time-of-day, day-of-week), token preferences, transaction size distributions, product usage mix, bot-vs-human detection. And critically, regime response -- how each wallet behaves during market stress. Some wallets increase activity during drawdowns (opportunity-seeking, short-biased). Others go dormant (retail fear). Others maintain steady cadence regardless (bots, institutional DCA). The regime response pattern is often more diagnostic than any steady-state feature.

The cascade of liquidation events over the past month is a good example. You don't need to know who these wallets are to identify the pattern -- the timing, sizing, and sequential structure of the cascade is identifiable through metadata alone. The same principle applies to all wallet behavior: humans trade differently from bots, whales differently from retail, methodical DCA users differently from memecoin degens.

This gives Jupiter a language for talking about user segments that goes beyond "wallet address" and "volume tier." And it creates the foundation for understanding how new user types emerge -- the agentic economy will generate a new class of wallet behavior (AI agents executing trades with non-human patterns, latency profiles, and decision logic) that needs to be identified, understood, and served differently.

Competitor Intelligence

In DeFi, seeing when your users start using a competitor is a query -- and so is finding their vulnerable segments.

L_w^t = \frac{V_w^{\text{Jupiter}}}{V_w^{\text{total}}}

Where $L_w^t$ = loyalty score for wallet $w$ at time $t$ , $V_w^{\text{Jupiter}}$ = wallet's trading volume on Jupiter in that period, and $V_w^{\text{total}}$ = wallet's total trading volume across all Solana DEXes (Jupiter + OKX + DFlow + Raydium + ...). A score of 1.0 = Jupiter-exclusive. Track the derivative $\Delta L_w$ over time -- a declining loyalty score is a churn precursor, visible weeks before the wallet goes inactive.

Because competitor data is public, the same analytical framework that scores Jupiter wallets can profile the entire market.

Market structure: Who's growing, who's shrinking, where volume is migrating, and why. Not just top-line numbers -- segment-level: which types of wallets are DFlow gaining? Are they taking retail or whales? Bot flow or human flow?

Competitive churn detection: Per-wallet, per-week: what share of their Solana DEX activity is on Jupiter vs. competitors? Track over time. When a wallet's Jupiter share drops from 80% to 40%, you see exactly where they're going and can infer why.

Wallet overlap as leading indicator: The P01 market share analysis already measures this -- weekly intersection of wallet populations across aggregators. When |Jup ∩ OKX| grows faster than |Jup| alone, wallets are shopping around. When it shrinks, one side is winning exclusivity. The overlap trend is a competitive health metric that updates weekly.

Opportunity identification: Where are competitors' users underserved? If a segment of Raydium-heavy wallets shows behavioral patterns similar to Jupiter power users but hasn't discovered Jupiter's products, that's an acquisition opportunity. If OKX wallets are generating high fees on trades that Jupiter Ultra could route more cheaply, that's a conversion argument backed by data.

This is structurally impossible in Web2. No SaaS company can see when a customer starts using a competitor's product. In DeFi, it's a query.

Revenue Forecasting

Fee revenue as a function of wallet health, trade frequency by segment, and product mix -- probabilistic bands, not point estimates.

\hat{F}{t+1} = \sum{s \in \text{segments}} N_s^{active} \cdot \bar{f}_s \cdot \bar{\phi}_s \cdot g(R_t)

Where $N_s^{active}$ = predicted active wallets in segment $s$ (from health scores), $\bar{f}_s$ = average trade frequency for segment (from clustering), $\bar{\phi}_s$ = average fee per trade (from product coupling -- Ultra at 10bp vs. aggregator at 0bp), and $g(R_t)$ = regime adjustment factor.

Each component is predictable from the models above. Wallet health scores predict the active wallet count. Behavioral clustering predicts per-wallet trade frequency by segment. Product coupling predicts fee tier. Layer in exogenous variables -- SOL price, overall DeFi activity, market regime -- and you get probabilistic forecast bands instead of point estimates. Aggregated over a wallet's predicted active lifespan, this yields Wallet Lifetime Value (inferred from Wallet Vitals) -- the expected total fee contribution of a wallet given its current health, segment, and product mix.

This becomes more valuable, not less, as Jupiter's product surface grows. Prediction markets, agentic integrations, and new DeFi primitives each add new fee streams that the forecasting model absorbs as additional terms in the same framework.

Layer 3: What Becomes Possible

The intelligence layer feeds operational capabilities -- from A/B tested campaigns to competitive early warning to agentic readiness.

Campaign Targeting & A/B Testing

Pre-filtering: Use wallet health scores to exclude dormant wallets from reward eligibility. Season 1 distributed to everyone; Season 2 ($2M, live now) could allocate only to active or at-risk segments.
A/B framework: Randomly hold out 5-10% of eligible wallets as a control group. Measure swap frequency, volume, and retention for treatment vs. control. This is the only way to distinguish "the campaign caused this behavior" from "this behavior would have happened anyway."
Segment-specific treatment: Different wallet personas respond to different incentives. A whale might respond to fee rebates; a retail degen might respond to gamified rewards. Clustering enables differentiated campaigns tested against each other.
Continuous optimization: Each campaign round generates experimental data that feeds back into the health model. Season 2 informs Season 3. The system learns from its own interventions.

Product Development Signals

The coupling map turns "should we build this?" from intuition into a quantifiable question about addressable segments and cannibalization risk.

Cross-product retention: If wallets using aggregator + Perps have 3x the retention of aggregator-only wallets, that's a signal to reduce friction between those products. If DCA users never discover Perps, that's a surface area problem.
Pre-launch intelligence: Before building a prediction market: which existing Jupiter wallets show behavioral patterns consistent with prediction market interest? What's the addressable segment? What's the likely cannibalization vs. net-new effect?
Feature prioritization: Which product improvements would move the highest-value wallet segments? Data-informed product strategy, not intuition.

Competitive Response

Detect market share shifts as they happen -- at the segment level, not in a quarterly review.

Real-time monitoring: When Jupiter share drops across a wallet segment simultaneously, something changed. A new competitor feature, an improved routing algorithm, a liquidity incentive. The system detects this weekly.
Acquisition windows: When a competitor's user segment shows signs of dissatisfaction (declining frequency, platform-hopping), that's a window to acquire them before they settle elsewhere.
Ultra attribution defense: Quantify exactly how much of OKX/DFlow volume is Jupiter-originated through Ultra routing. The "true" market share narrative is a strategic asset.

Agentic Readiness

The bots running arbitrage today are the ancestors of the autonomous agents that will dominate DeFi tomorrow -- the intelligence layer is ready for both.

Continuity, not disruption. Today's MEV bots, arbitrage scripts, and automated market makers already generate wallet-level behavioral data with non-human patterns. AI agents are the next evolution of the same phenomenon -- more sophisticated, more autonomous, but generating the same on-chain footprint.
Detection built in. The clustering module already separates bot behavior from human behavior (sub-second execution, no time-of-day patterns, deterministic sizing). As agents grow more sophisticated, the same framework adapts -- it models wallet behavior, not human behavior.
Agent-as-user. An AI agent managing a portfolio through Jupiter is just another wallet with a behavioral signature. It needs health scoring (is it still active?), clustering (what type of agent?), and competitive monitoring (is it routing through competitors?). The framework is agent-agnostic by design.

The Feedback Loop

Every intervention generates new data -- this is what separates a one-off analysis from infrastructure.

Campaigns as experiments. A targeted campaign with a holdout group produces a natural experiment. The health model learns which wallet segments respond to which incentives.
Product changes as signals. A new feature shifts the coupling map. A fee change shifts the revenue model. Every operational decision generates data that updates every model.
Compounding returns. Season 1 data informed this analysis. This analysis informs Season 2 design. Season 2 results improve the models for Season 3. The system gets better with each cycle.

How the Modules Connect

No model operates alone -- the value is in the interconnection, where each module's output becomes another module's input.

graph TD L0["L0: Data Landscape The data IS the product"] L1["L1: What Exists instruction_calls | aggregator_swaps | dex_solana.trades"] L15["L1.5: Market Regime volume trend | volatility | SOL momentum | net flow"] CLUST["Behavioral Clustering personas, regime response"] COMP["Competitor Intelligence overlap, migration, loyalty"] VITALS["Wallet Vitals P(active | Xd) per wallet"] COUPLE["Product Coupling which combos drive LTV"] REV["Revenue Forecasting F = N × f × φ × g(regime) + Wallet Lifetime Value"] CAMP["A/B Tested Campaigns"] PROD["Product Signals"] COMPETE["Competitive Response"] AGENT["Agentic Readiness"] FEED["Feedback Loop"] L0 --> L1 L1 --> L15 L15 --> CLUST L15 --> COMP L15 --> VITALS CLUST -->|segments inform| VITALS CLUST -->|personas inform| COUPLE COMP -->|loyalty decline = risk signal| VITALS COMP -->|competitor weakness| REV VITALS -->|active wallet count| REV COUPLE -->|product mix = fee prediction| REV COUPLE -->|breadth = stickiness| VITALS VITALS --> CAMP COUPLE --> PROD COMP --> COMPETE CLUST --> AGENT REV --> CAMP CAMP -->|experimental data| FEED PROD -->|shifts coupling map| FEED COMPETE -->|new signals| FEED AGENT -->|new behavioral class| FEED FEED -->|improves every model| L15 style L0 fill:#1a1a2e,stroke:#e94560,color:#fff style L1 fill:#16213e,stroke:#e94560,color:#fff style L15 fill:#0f3460,stroke:#e94560,color:#fff style VITALS fill:#533483,stroke:#e94560,color:#fff style CLUST fill:#533483,stroke:#e94560,color:#fff style COMP fill:#533483,stroke:#e94560,color:#fff style COUPLE fill:#533483,stroke:#e94560,color:#fff style REV fill:#533483,stroke:#e94560,color:#fff style CAMP fill:#2b2d42,stroke:#4ecdc4,color:#fff style PROD fill:#2b2d42,stroke:#4ecdc4,color:#fff style COMPETE fill:#2b2d42,stroke:#4ecdc4,color:#fff style AGENT fill:#2b2d42,stroke:#4ecdc4,color:#fff style FEED fill:#e94560,stroke:#fff,color:#fff

Health scores feed campaign targeting. Clustering feeds health (different segments, different models). Product coupling feeds both health (breadth = stickiness) and revenue forecasting (product mix = fee prediction). Competitive intelligence feeds health (loyalty decline = risk signal) and opportunity identification (competitor weakness = acquisition window). The regime index conditions everything -- no model interprets wallet behavior without knowing the market context. Every intervention feeds back as new data.

Track Record

Nine production modules built in the last 8 months at Synthesia, each with a direct Jupiter isomorph -- the math is domain-agnostic, only the data changes.

Nine production modules, interconnected, influencing 100M+ EUR in decisions:

Module	Result	Impact	Jupiter Isomorph
Churn Risk	77% accuracy, 4-5 month advance warning	2.3M ARR saved through intervention	Wallet attrition prediction from trading frequency decay, platform loyalty drop
Expansion Scoring	96.6% AUC, quota-integrated	Integrated into 100M+ EUR quota planning	Wallet growth prediction -- who increases volume, adopts next product
Health Score	Mission-critical daily operational infrastructure	Determines compensation of customer-facing teams optimised for growth and productivity	Wallet vitals: P(active next 30d) from on-chain activity signals
User Intelligence	Behavioral personas with soft assignments	Persona-targeted outreach campaigns	Wallet clustering: whale/retail/bot/degen/methodical from trade metadata
Territory Optimization	18% efficiency increase	Headcount reallocation across segments	Campaign budget allocation across wallet segments by expected ROI
Revenue Forecasting	15% accuracy improvement	Board-level planning confidence	Fee revenue forecasting from wallet health x product mix x market regime
Attribution	95%+ accuracy	Resolved $4M+ misattributed pipeline	Ultra routing attribution -- Jupiter-originated volume vs organic competitor
Intent Signals	2.5M freemium users scored	Conversion rate lift on scored leads	Dormant wallet reactivation scoring from historical on-chain patterns
Marketing Funnel	Pipeline velocity optimization	Reduced stage-to-stage drop-off 22%	User journey: first swap --> repeat --> multi-product --> power user

Mathematical Frameworks

Module	Primary Framework	Technique
Wallet Vitals (Health Score)	Bayesian inference	Posterior probability of activity given behavioral evidence, with segment-specific priors and regime conditioning
Behavioral Clustering	Unsupervised learning	Gaussian Mixture Models (GMM) with soft cluster assignments; regime-response features for stress-state profiling
Product Coupling	Conditional probability	Lift analysis -- adoption rates conditioned on existing product usage vs. baseline; association rule mining
Competitor Intelligence	Information theory	Mutual information between wallet behavior and platform choice; loyalty score as volume share ratio
Revenue Forecasting	Decomposition model	Segment-level summation: active wallets x frequency x fee tier x regime adjustment; survival analysis for Wallet Lifetime Value
Market Regime (L1.5)	Time series / regime detection	Hidden Markov Model or rolling feature windows over aggregate volume, volatility, price momentum, and net flow

The mathematical frameworks are domain-agnostic. Bayesian inference works the same whether evidence comes from product telemetry or on-chain transactions. Survival analysis models time-to-churn whether the "customer" is a SaaS account or a wallet. Mutual information ranks features regardless of what those features measure.

What changes between Synthesia and Jupiter is not the mathematics. It's the data. And Jupiter's data -- public, real-time, cross-platform, exhaustive -- is in many ways better to build intelligence on.

VISION