On-Chain Intelligence for Jupiter - Vision


TLDR

Jupiter sits on one of the richest behavioral datasets in DeFi. The same intelligence systems I've built in commercial SaaS -- health scoring, churn prediction, behavioral clustering, competitive monitoring -- translate directly to on-chain data. The data layer is actually better: public, real-time, cross-platform, and exhaustive. This document outlines a layered intelligence architecture that converts dormant data into operational advantage.

L0  The Data Landscape
    In DeFi, the data IS the product. The blockchain is the warehouse.
    What's missing isn't data. It's interpretation.
         |
         v
L1  What Exists
    Raw instructions --> Decoded tables --> Cross-platform aggregations
    What's visible, what's dark, what the data layer enables.
         |
         v
L2  The Intelligence Layer
    Wallet Vitals | Product Coupling | Clustering |
    Competitor Intelligence | Revenue Forecasting
    Each module reads from L1, writes signals that feed the others.
         |
         v
L3  What Becomes Possible
    A/B tested campaigns | Product signals | Competitive response |
    Opportunity identification | Agentic readiness | Feedback loop

Layer 0: The Data Landscape

Everyone has the data. Nobody has the intelligence.

In Web2 SaaS, data is a shadow of the product. You build a video editor, then you instrument it -- add event tracking, pipe logs to a warehouse, build marts on top. The product exists first. The data is a lossy, delayed reflection of what happened.

In DeFi, the data IS the product. A swap on Jupiter isn't instrumented after the fact -- the transaction on the ledger is the swap. The execution and the measurement are the same object. Every trade, every perp position, every fee paid exists as a first-class record on a public ledger, replicated across every validator node, indexed by services like Dune, and accessible to anyone without credentials or NDAs.

This changes the problem completely. Jupiter doesn't need to build a data warehouse. The blockchain is the warehouse. What's missing isn't data collection, transformation, or access. What's missing is interpretation.

Case in point: all of Jupiter's transaction data is public, yet most external analysis draws the wrong conclusions from it. Naive market share calculations overstate competitors by counting Jupiter-routed Ultra volume as OKX organic volume. Campaign effectiveness gets measured by aggregate volume rather than per-wallet behavioral change. Wallet counts get inflated by bots that anyone could filter out with basic metadata analysis. The data is there. The intelligence isn't.

The competitive advantage in DeFi analytics is not who has the data. Everyone has the data. The advantage is who builds the intelligence layer on top -- who converts public transaction logs into private strategic insight. That's the gap. And because competitor data is equally public, the same intelligence layer that understands Jupiter's users can simultaneously understand the competitive landscape, market structure, and emerging opportunities across the entire Solana DeFi ecosystem.


Layer 1: What Exists

Three tiers of public on-chain data give full visibility into Jupiter and every competitor -- except Perps, which is the biggest blind spot.

The Public Data Layer

Three tiers of on-chain data are available today, each at a different level of abstraction:

Raw instructions (solana.instruction_calls) -- Every function call ever executed on Solana. The universal audit log. You can reconstruct anything from here, but it requires parsing binary payloads and understanding program-specific discriminators. This is where I extracted the Rewards Hub claimant list -- no decoded table existed, so I parsed the raw contract instructions directly.

Decoded protocol tables (jupiter_solana.aggregator_swaps) -- Community-maintained decodings of popular contracts into structured tables with human-readable columns. Jupiter's aggregator swaps are well-decoded: wallet, token pair, USD values, AMM routing. But coverage is incomplete -- Perps has no decoded table on Dune.

Cross-platform aggregations (dex_solana.trades) -- A unified view of ALL DEX trades on Solana across every platform: Jupiter, OKX, DFlow, Raydium, Meteora, everything. Per trade: wallet, platform, token pair, USD amount, fees. This is the strategic table -- it lets you see not just what your users do on Jupiter, but what they do everywhere. And critically, what everyone else's users do too.

What's Visible and What's Not

Domain Visibility Source
Aggregator swaps (all versions) Full jupiter_solana.aggregator_swaps
Cross-platform trading (all Solana DEXes) Full dex_solana.trades
Fee revenue per trade Full dex_solana.trades.fee_usd
Competitor user behavior Full Same tables, different project filter
Perps positions & liquidations Raw only solana.instruction_calls (no decoded table)
Ultra routing attribution Partial CPI pattern: Jupiter outer instruction wrapping OKX/DFlow inner
DCA and limit orders Partial Some decoded tables exist
Off-chain eligibility (campaigns) None Determined by Jupiter's API, not on-chain

The most significant gap is Perps. It generates ~60% of Jupiter's revenue, and the Rewards Hub analysis showed that 60% of claimants were invisible in aggregator tables -- almost certainly Perps users. Building a decoded view of the Perps contract (or accessing internal data) would unlock the majority of fee-paying user behavior.

The second gap is Ultra attribution. Ultra routes through OKX/DFlow for best execution, meaning ~40-50% of OKX's on-chain volume actually originates from Jupiter. Naive market share analysis overstates competitors. Resolving this requires either CPI pattern detection on-chain or internal routing logs.

What This Enables

Because the data is the product, you get properties that Web2 analytics teams spend years trying to approximate:


Layer 1.5: Market Regime Detection

Before scoring individual wallets, understand the environment they're operating in -- bear/bull, fear/greed, volatility regime.

Individual wallet behavior doesn't exist in a vacuum. A wallet that reduces trading frequency 30% during a market-wide drawdown is behaving rationally. A wallet that reduces 30% while the market is surging is showing genuine disengagement. The intelligence layer needs to distinguish the two.

Market regime features, computed continuously from on-chain data:

R_t = {V_t, \sigma_t, S_t, F_t}

Where V_t = aggregate Solana DEX volume (directional trend), \sigma_t = volatility of daily volume (stability), S_t = SOL price momentum, F_t = net flow direction (are wallets depositing or withdrawing from DeFi).

This creates a regime index -- a compact representation of "what the market is doing right now" -- that every downstream model conditions on. Wallet health scores are regime-adjusted: a wallet holding steady during a bear market is healthier than the same metrics during a bull market. Clustering accounts for regime-dependent behavior: some wallets are bear-market specialists (short-biased perps users), others only appear during bull runs (memecoin degens). Others maintain steady cadence regardless (bots, institutional DCA). The regime index makes these patterns legible.

For strategically important wallets (whales, market makers, high-fee generators), the regime index enables a deeper read: how does this wallet respond to fear vs. greed? Do they increase activity during volatility (opportunity-seeking) or retreat (risk-averse)? This behavioral fingerprint under stress is more revealing than any steady-state metric.


Layer 2: The Intelligence Layer

Five interconnected models that convert raw on-chain activity into wallet-level scores, segments, competitive positioning, and revenue forecasts.

This is what turns public data into private strategic advantage. Each module reads from the data landscape, conditions on the market regime, and writes signals that feed the others.

Wallet Vitals (Wallet Health Score)

The probability that a given wallet stays active in the next X days, updated continuously -- the single number everything downstream depends on.

H_w = P(\text{active}_{t+30} \mid \mathbf{x}_w, R_t)

Where H_w = health score for wallet w, P(\text{active}_{t+30}) = probability that the wallet executes at least one trade in the next 30 days, \mathbf{x}_w = the wallet's feature vector (frequency, recency, product breadth, platform loyalty, fee generation), and R_t = current market regime index from L1.5. Regime-conditioning is critical: a wallet scoring 0.6 in a bear market is healthier than 0.6 in a bull market.

Inputs: trading frequency trend (acceleration or deceleration), fee generation consistency, product breadth (how many Jupiter products used), platform loyalty (Jupiter's share of the wallet's total Solana DEX activity), recency of last trade. Crucially, this also includes cross-platform execution behavior -- how the wallet routes trades through competitors. A wallet that starts splitting volume between Jupiter and OKX isn't just a data point; it's an early signal of loyalty erosion that precedes full churn.

Different wallet segments need different treatment. A whale wallet with 100K+ monthly volume operates on different dynamics than a retail wallet doing 100 swaps. The model needs segment-specific learning rates -- the same principle I applied at Synthesia, where enterprise accounts churn on institutional timelines while SMB accounts churn on individual decision timelines.

Output: a score per wallet. Healthy, at-risk, dormant, churned. Everything downstream depends on this.

Product Coupling Map

Which product combinations drive retention, and where do users fall off the multi-product journey?

C_{ij} = P(\text{adopt}_j \mid \text{uses}_i) - P(\text{adopt}_j)

Where C_{ij} = coupling strength between product i and product j, P(\text{adopt}_j \mid \text{uses}_i) = probability a wallet adopts product j given it already uses product i, and P(\text{adopt}_j) = baseline adoption rate of product j across all wallets. Positive values = reinforcing products (e.g., aggregator swap users are more likely to try Perps). Negative = substitutes or irrelevant pairings.

The intra-product interaction graph. For every wallet: which Jupiter products they use, in what sequence, and how usage of one product correlates with adoption and retention of others.

This is the analytical backbone of the "products reinforcing each other" vision from the CatLumpurr talk. It answers: does a Perps user who also swaps churn less than a Perps-only user? Is DCA adoption a leading indicator of long-term retention? What's the natural product journey -- and where do users fall off?

This framework also extends naturally to future products. When Jupiter launches prediction markets or agentic trading integrations, the coupling map immediately shows how new products interact with the existing ecosystem -- whether they cannibalize, complement, or create entirely new user journeys.

Behavioral Clustering

Meaningful wallet personas -- whale, retail, bot, degen, methodical -- from on-chain metadata alone, no KYC required.

z_w = \text{GMM}(\mathbf{x}_w) \rightarrow \text{soft cluster assignments}

Where z_w = cluster membership vector for wallet w (e.g., [0.7 methodical, 0.2 degen, 0.1 bot]), \text{GMM} = Gaussian Mixture Model that learns the natural groupings from data, and \mathbf{x}_w = the wallet's behavioral feature vector (frequency, timing, token preferences, size distribution, regime response). Soft assignments mean a wallet can belong to multiple personas simultaneously -- more realistic than hard labels.

Group wallets into meaningful personas without identity. Not KYC -- behavioral fingerprints.

Clustering signals: trading frequency distribution, timing patterns (time-of-day, day-of-week), token preferences, transaction size distributions, product usage mix, bot-vs-human detection. And critically, regime response -- how each wallet behaves during market stress. Some wallets increase activity during drawdowns (opportunity-seeking, short-biased). Others go dormant (retail fear). Others maintain steady cadence regardless (bots, institutional DCA). The regime response pattern is often more diagnostic than any steady-state feature.

The cascade of liquidation events over the past month is a good example. You don't need to know who these wallets are to identify the pattern -- the timing, sizing, and sequential structure of the cascade is identifiable through metadata alone. The same principle applies to all wallet behavior: humans trade differently from bots, whales differently from retail, methodical DCA users differently from memecoin degens.

This gives Jupiter a language for talking about user segments that goes beyond "wallet address" and "volume tier." And it creates the foundation for understanding how new user types emerge -- the agentic economy will generate a new class of wallet behavior (AI agents executing trades with non-human patterns, latency profiles, and decision logic) that needs to be identified, understood, and served differently.

Competitor Intelligence

In DeFi, seeing when your users start using a competitor is a query -- and so is finding their vulnerable segments.

L_w^t = \frac{V_w^{\text{Jupiter}}}{V_w^{\text{total}}}

Where L_w^t = loyalty score for wallet w at time t, V_w^{\text{Jupiter}} = wallet's trading volume on Jupiter in that period, and V_w^{\text{total}} = wallet's total trading volume across all Solana DEXes (Jupiter + OKX + DFlow + Raydium + ...). A score of 1.0 = Jupiter-exclusive. Track the derivative \Delta L_w over time -- a declining loyalty score is a churn precursor, visible weeks before the wallet goes inactive.

Because competitor data is public, the same analytical framework that scores Jupiter wallets can profile the entire market.

Market structure: Who's growing, who's shrinking, where volume is migrating, and why. Not just top-line numbers -- segment-level: which types of wallets are DFlow gaining? Are they taking retail or whales? Bot flow or human flow?

Competitive churn detection: Per-wallet, per-week: what share of their Solana DEX activity is on Jupiter vs. competitors? Track over time. When a wallet's Jupiter share drops from 80% to 40%, you see exactly where they're going and can infer why.

Wallet overlap as leading indicator: The P01 market share analysis already measures this -- weekly intersection of wallet populations across aggregators. When |Jup ∩ OKX| grows faster than |Jup| alone, wallets are shopping around. When it shrinks, one side is winning exclusivity. The overlap trend is a competitive health metric that updates weekly.

Opportunity identification: Where are competitors' users underserved? If a segment of Raydium-heavy wallets shows behavioral patterns similar to Jupiter power users but hasn't discovered Jupiter's products, that's an acquisition opportunity. If OKX wallets are generating high fees on trades that Jupiter Ultra could route more cheaply, that's a conversion argument backed by data.

This is structurally impossible in Web2. No SaaS company can see when a customer starts using a competitor's product. In DeFi, it's a query.

Revenue Forecasting

Fee revenue as a function of wallet health, trade frequency by segment, and product mix -- probabilistic bands, not point estimates.

\hat{F}{t+1} = \sum{s \in \text{segments}} N_s^{active} \cdot \bar{f}_s \cdot \bar{\phi}_s \cdot g(R_t)

Where N_s^{active} = predicted active wallets in segment s (from health scores), \bar{f}_s = average trade frequency for segment (from clustering), \bar{\phi}_s = average fee per trade (from product coupling -- Ultra at 10bp vs. aggregator at 0bp), and g(R_t) = regime adjustment factor.

Each component is predictable from the models above. Wallet health scores predict the active wallet count. Behavioral clustering predicts per-wallet trade frequency by segment. Product coupling predicts fee tier. Layer in exogenous variables -- SOL price, overall DeFi activity, market regime -- and you get probabilistic forecast bands instead of point estimates. Aggregated over a wallet's predicted active lifespan, this yields Wallet Lifetime Value (inferred from Wallet Vitals) -- the expected total fee contribution of a wallet given its current health, segment, and product mix.

This becomes more valuable, not less, as Jupiter's product surface grows. Prediction markets, agentic integrations, and new DeFi primitives each add new fee streams that the forecasting model absorbs as additional terms in the same framework.


Layer 3: What Becomes Possible

The intelligence layer feeds operational capabilities -- from A/B tested campaigns to competitive early warning to agentic readiness.

Campaign Targeting & A/B Testing

Product Development Signals

The coupling map turns "should we build this?" from intuition into a quantifiable question about addressable segments and cannibalization risk.

Competitive Response

Detect market share shifts as they happen -- at the segment level, not in a quarterly review.

Agentic Readiness

The bots running arbitrage today are the ancestors of the autonomous agents that will dominate DeFi tomorrow -- the intelligence layer is ready for both.

The Feedback Loop

Every intervention generates new data -- this is what separates a one-off analysis from infrastructure.


How the Modules Connect

No model operates alone -- the value is in the interconnection, where each module's output becomes another module's input.

graph TD L0["L0: Data Landscape<br/><i>The data IS the product</i>"] L1["L1: What Exists<br/>instruction_calls | aggregator_swaps | dex_solana.trades"] L15["L1.5: Market Regime<br/>volume trend | volatility | SOL momentum | net flow"] CLUST["Behavioral Clustering<br/>personas, regime response"] COMP["Competitor Intelligence<br/>overlap, migration, loyalty"] VITALS["Wallet Vitals<br/>P(active | Xd) per wallet"] COUPLE["Product Coupling<br/>which combos drive LTV"] REV["Revenue Forecasting<br/>F = N × f × φ × g(regime)<br/>+ Wallet Lifetime Value"] CAMP["A/B Tested Campaigns"] PROD["Product Signals"] COMPETE["Competitive Response"] AGENT["Agentic Readiness"] FEED["Feedback Loop"] L0 --> L1 L1 --> L15 L15 --> CLUST L15 --> COMP L15 --> VITALS CLUST -->|segments inform| VITALS CLUST -->|personas inform| COUPLE COMP -->|loyalty decline = risk signal| VITALS COMP -->|competitor weakness| REV VITALS -->|active wallet count| REV COUPLE -->|product mix = fee prediction| REV COUPLE -->|breadth = stickiness| VITALS VITALS --> CAMP COUPLE --> PROD COMP --> COMPETE CLUST --> AGENT REV --> CAMP CAMP -->|experimental data| FEED PROD -->|shifts coupling map| FEED COMPETE -->|new signals| FEED AGENT -->|new behavioral class| FEED FEED -->|improves every model| L15 style L0 fill:#1a1a2e,stroke:#e94560,color:#fff style L1 fill:#16213e,stroke:#e94560,color:#fff style L15 fill:#0f3460,stroke:#e94560,color:#fff style VITALS fill:#533483,stroke:#e94560,color:#fff style CLUST fill:#533483,stroke:#e94560,color:#fff style COMP fill:#533483,stroke:#e94560,color:#fff style COUPLE fill:#533483,stroke:#e94560,color:#fff style REV fill:#533483,stroke:#e94560,color:#fff style CAMP fill:#2b2d42,stroke:#4ecdc4,color:#fff style PROD fill:#2b2d42,stroke:#4ecdc4,color:#fff style COMPETE fill:#2b2d42,stroke:#4ecdc4,color:#fff style AGENT fill:#2b2d42,stroke:#4ecdc4,color:#fff style FEED fill:#e94560,stroke:#fff,color:#fff

Health scores feed campaign targeting. Clustering feeds health (different segments, different models). Product coupling feeds both health (breadth = stickiness) and revenue forecasting (product mix = fee prediction). Competitive intelligence feeds health (loyalty decline = risk signal) and opportunity identification (competitor weakness = acquisition window). The regime index conditions everything -- no model interprets wallet behavior without knowing the market context. Every intervention feeds back as new data.


Track Record

Nine production modules built in the last 8 months at Synthesia, each with a direct Jupiter isomorph -- the math is domain-agnostic, only the data changes.

Nine production modules, interconnected, influencing 100M+ EUR in decisions:

Module Result Impact Jupiter Isomorph
Churn Risk 77% accuracy, 4-5 month advance warning 2.3M ARR saved through intervention Wallet attrition prediction from trading frequency decay, platform loyalty drop
Expansion Scoring 96.6% AUC, quota-integrated Integrated into 100M+ EUR quota planning Wallet growth prediction -- who increases volume, adopts next product
Health Score Mission-critical daily operational infrastructure Determines compensation of customer-facing teams optimised for growth and productivity Wallet vitals: P(active next 30d) from on-chain activity signals
User Intelligence Behavioral personas with soft assignments Persona-targeted outreach campaigns Wallet clustering: whale/retail/bot/degen/methodical from trade metadata
Territory Optimization 18% efficiency increase Headcount reallocation across segments Campaign budget allocation across wallet segments by expected ROI
Revenue Forecasting 15% accuracy improvement Board-level planning confidence Fee revenue forecasting from wallet health x product mix x market regime
Attribution 95%+ accuracy Resolved $4M+ misattributed pipeline Ultra routing attribution -- Jupiter-originated volume vs organic competitor
Intent Signals 2.5M freemium users scored Conversion rate lift on scored leads Dormant wallet reactivation scoring from historical on-chain patterns
Marketing Funnel Pipeline velocity optimization Reduced stage-to-stage drop-off 22% User journey: first swap --> repeat --> multi-product --> power user

Mathematical Frameworks

Module Primary Framework Technique
Wallet Vitals (Health Score) Bayesian inference Posterior probability of activity given behavioral evidence, with segment-specific priors and regime conditioning
Behavioral Clustering Unsupervised learning Gaussian Mixture Models (GMM) with soft cluster assignments; regime-response features for stress-state profiling
Product Coupling Conditional probability Lift analysis -- adoption rates conditioned on existing product usage vs. baseline; association rule mining
Competitor Intelligence Information theory Mutual information between wallet behavior and platform choice; loyalty score as volume share ratio
Revenue Forecasting Decomposition model Segment-level summation: active wallets x frequency x fee tier x regime adjustment; survival analysis for Wallet Lifetime Value
Market Regime (L1.5) Time series / regime detection Hidden Markov Model or rolling feature windows over aggregate volume, volatility, price momentum, and net flow

The mathematical frameworks are domain-agnostic. Bayesian inference works the same whether evidence comes from product telemetry or on-chain transactions. Survival analysis models time-to-churn whether the "customer" is a SaaS account or a wallet. Mutual information ranks features regardless of what those features measure.

What changes between Synthesia and Jupiter is not the mathematics. It's the data. And Jupiter's data -- public, real-time, cross-platform, exhaustive -- is in many ways better to build intelligence on.