Under the Hood: How Zesty Actually Works

If you like priors, posteriors, and a healthy disrespect for noisy data, this one's for you.

Zesty scores dating profiles with a Vibe Compatibility Index (VCI) on a 1–10 scale, wraps it in calibrated uncertainty, and tells you which levers actually move the needle. Here's the system end-to-end—math included.

The Target We Estimate: VCI (1–10)

We model each incoming vote on our 4-point scale {nah, mid, might, would} as an ordinal outcome driven by a latent attractiveness signal η. The expected VCI is a probability-weighted average of fixed anchors [1, 4, 7, 10].

Ordinal logit (cumulative link):

P(Y ≤ k | η) = σ(c_k - η), where c_1 < c_2 < c_3
P(Y = k | η) = P(Y ≤ k) - P(Y ≤ k-1)

Cutpoints c are learned offline from historical data; η is inferred online for each profile. This gives us a principled way to convert messy human votes into clean numerical scores.

Vibe-Based Priors (No Gender Stereotypes)

Users choose up to 3 personality vibes with weights (50/30/20 when three are selected). Each vibe carries a learned offset on the VCI scale—think "Adventurer +0.2, Athlete +0.3" based on historical performance.

Prior construction:

μ_vci = 5.0 + Σ(w_v × Δ_v)
σ²_vci = tight if vibes present, neutral otherwise

We invert this mapping to find the latent η₀ whose expected VCI equals μ_vci, then transform the prior variance to the latent scale via the local derivative dE[VCI]/dη.

Why this matters: Vibes provide structure and stability with limited data without hard-coding demographic assumptions. Gender never enters the prior—your baseline comes from chosen personality markers, not chromosomes.

Per-Audience Posteriors (Partial Pooling)

Votes get bucketed by audience segments: men 25–34, women 35–44, non-binary 18–24, etc. For each segment, we run a MAP (Maximum A Posteriori) fit for η with the vibe prior:

Gradient/Hessian from ordinal logit likelihood
Newton updates with backtracking line search
Posterior variance from inverse observed information

We map (η, Var(η)) back to VCI mean and 95% CI via the delta method, then compute an overall VCI by weighting segment posteriors—default weights proportional to effective sample size, optionally accepting a target audience mix.

Weighting: Time Decay + Voter Reliability → ESS

Each vote carries a composite weight:

Time decay: 90-day half-life. Recent opinions matter more because dating preferences shift.

Voter reliability: Bayesian-shrunk calibration via Brier scores on historical accuracy, clipped to [0.1, 0.9] to prevent "oracle voter" problems.

Effective Sample Size:

ESS = (Σw_i)² / Σw_i²

This captures a fundamental truth: 100 spam taps ≠ 100 thoughtful judgments.

Confidence: Calibrated Uncertainty

We expose confidence as a 0–1 percentage, increasing with ESS and decreasing with posterior variance:

conf = clip(1 - √(σ²_post / σ²_ref(ESS)), 0, 1)

The reference variance σ²_ref is empirically calibrated so bundles of 40–60 quality votes land in a healthy 60–80% confidence band. We tune this using hold-out fits to keep confidence neither stingy nor delusional.

Coverage testing: Our 95% CIs actually hit ≈95% on held-out profiles. If they don't, we adjust the calibration curve, not the posterior math.

MRP: Post-Stratification to Your Market

Raw vote samples can skew badly—imagine this week's voters happened to be 80% college-aged when you're targeting working professionals.

We apply Multilevel Regression & Post-stratification (borrowed from political polling) by reweighting segment VCI means to match your target demographic distribution.

Output includes:

Raw vs. MRP-adjusted VCI
Bias magnitude (absolute difference)
Reweighting strength via Total Variation distance

Result: Market-representative scores even when the voter pool is temporarily unbalanced.

Photo vs. Profile Attribution

Votes are nominally "for the whole profile," but human behavior is photo-anchored. We estimate an attribution coefficient λ ∈ [0,1]:

η_vote = λ × θ_photo + (1-λ) × θ_profile

Signals for λ estimation:

AI photo quality priors (lighting, framing, sharpness, face visibility)
Tag mix patterns ("lighting/crop" → photo-heavy; "bio/tone" → profile-heavy)
Time-on-card telemetry when available

We report "Main Photo: 62% of appeal" so users know where to focus optimization effort.

Polarization & Risk Assessment

Opinion diversity matters strategically. We compute entropy of the 4-class posterior:

H = -Σp_k log p_k
safe_score = 1 - (H / log 4)

High safe score: Clustered opinions (consistent brand, lower exploration risk) Low safe score: Split reactions (edgier profile—risky but can yield stronger matches)

This helps users understand their risk/reward profile in the dating market.

Current Limitations & Roadmap

Honest assessment: Our current implementation approximates voter agreement patterns based on voting distributions (moderate vs. extreme votes), which serves as a reasonable proxy but isn't ideal.

The challenge is anchor agreement—we need to know not just how people vote, but why they vote that way. Right now, we infer this from patterns, but we're building toward something better.

What's coming:

Improved baseline model: Once we have sufficient data, we'll train a proper anchor agreement model that captures voter calibration more precisely
State-space trending: Random walk components for temporal drift beyond simple decay
Richer attribution: Better photo/profile decomposition using click telemetry and gaze patterns
Big Five integration: Text-based personality features as weak, shrink-to-zero predictors

Systems Engineering That Keeps It Sane

Debounced updates: We batch VCI recomputes at strategic intervals (1, 5, 10, 15, then every 5 votes) to avoid write-hot conflicts when votes arrive simultaneously.

Fast MAP inference: No heavy MCMC on the write path. Full Bayesian re-fits run offline for audits and calibration.

Explicit schema: The UI receives means, confidence intervals, ESS, polarization metrics, MRP deltas, and attribution percentages—all on stable, interpretable scales.

Validation & Continuous Calibration

We constantly back-test three critical metrics:

Coverage reliability: Confidence intervals should contain the true value at their stated rate Confidence calibration: Empirical error rates should decrease as shown confidence increases
Decision utility: Expected vs. observed lifts after users accept recommendations

When calibration drifts (it always does), we adjust the confidence curve—never the underlying posterior mathematics.

What Users Actually See

Despite all this machinery, the interface stays clean:

VCI 1–10 with likely range and confidence %
Attribution: "Main photo driving ~62% of score"
Segment breakdown: Key audiences with local VCI + votes
Polarization indicator: "Playing it safe: 74%"
MRP adjustment badge when relevant
One concrete recommendation with expected lift

The Philosophy

Zesty is fundamentally an ordinal-Bayesian, vibe-prior, reliability-weighted, MRP-corrected, entropy-aware system that happens to work on dating profiles.

We borrowed techniques from political polling, clinical trials, and machine learning because dating profile feedback is a hard statistical problem that deserves serious treatment.

The goal: Extract maximum signal from minimum noise, wrap it in honest uncertainty, and give users one concrete action that actually moves the needle.

Nerds see the scaffolding. Everyone else just gets better matches.

For more implementation details or to discuss the statistical methodology, find us in the app or reach out directly.