One of the hardest parts of training a sports model on UFC is the data drift. In 2016 the Vegas odds were only 61% accurate, then in 2024 they were something like 70% accurate. Right now we're seeing a small decline in the accuracy of Vegas. This is all just part of the ebb and flow of variance that also factors in general unpredictability of sports, especially UFC.
LEAST(GREATEST( (observed_strikes - (oh.n_eff_strikes/(oh.n_eff_strikes + K_mean_strikes) * oh.opp_strikes_mean + (K_mean_strikes/(oh.n_eff_strikes + K_mean_strikes)) * wc.strikes_mean)) / GREATEST(oh.n_eff_strikes/(oh.n_eff_strikes + K_mad_strikes) * oh.opp_strikes_mad + (K_mad_strikes/(oh.n_eff_strikes + K_mad_strikes)) * wc.strikes_mad, mad_floor), -7), 7) ELSE (observed_strikes - wc.strikes_mean) / GREATEST(wc.strikes_mad, mad_floor) END AS strikes_adjperf, -- Similar pattern for grappling with grappling-specific parameters CASE WHEN oh.n_eff_grappling >= 1 THEN -- Use grappling-specific K values and opponent history ELSE -- Fall back to weight-class only END AS grappling_adjperf FROM fight_stats fs LEFT JOIN opponent_history_per_column oh USING (fight_id, fighter_id) JOIN weight_class_priors wc ON fs.weightclass = wc.weightclass ) -- Typical K values I use: -- K_mean_strikes = 8, K_mad_strikes = 12 -- K_mean_grappling = 5, K_mad_grappling = 8 -- Higher K_mad provides more stability for variance estimatesCritical implementation insights:
- Per-column effective n: Compute per-column
n_eff
and set weights to zero when that column has no opponent history, instead ofCOALESCE(..., 0)
pretending it does. - Global priors as safety net: Always have a global fallback row in weight-class prior CTEs so sparse classes never collapse toward zero.
- Per-stat K tuning: Tune
K_mean
vsK_mad
per stat family. GenerallyK_mad > K_mean
for stability with small samples. - Time decay implementation: Use
EXP(-λ * years_ago)
with λ ≈ 0.13 for moderate decay over ~5 year half-life.
2) Poisson-Gamma Smoothing (Counts)
What was wrong before
- Under-specified exposure: I mixed count smoothing and rate logic inconsistently. The posterior mean was formed from (mean, variance) but I didn't consistently convert through exposure time when mapping back to counts. Short fights could look "low activity" for the wrong reason.
- Ad-hoc variance branch: When
var ≤ mean
I used a hand-rolled blend (e.g.,(prior*3 + observed)/4
). It stabilized things but it was a heuristic—not a model. - One-size priors by year slice: Priors came from a fixed date window with mean/variance; no explicit rate per minute, and no principled pseudo-exposure strength.
- Binary/duration leakage risk: The old pass didn't clearly wall off duration (control time) or binary signals from count smoothing; at best they were excluded by name heuristics.
What I do now
- Rate-based Bayesian updating: I compute weight-class rates per minute (
μ_w
) and update with exposure timet
(minutes). The posterior isλ_post = (μ_w * τ + X) / (τ + t)
, and I map back to counts viaX_smooth = t * λ_post
. - Validated pseudo-minutes:
τ
(the prior strength) is per stat and sometimes per weight class (only when cross-validation showed ≥0.5% lift). Otherwise I fall back to globalτ
for consistency. - Explicit exposure rules: Round-1 columns use
min(time_sec_rd1, 300)/60
; everything else usestime_sec/60
. No more implicit exposure guessing. - Strict scope: Count data only (e.g., *_land, *_att, kd, rev). Binary and duration stats are handled elsewhere.
Implementation Details for Your Own Model
Here's the specific SQL pattern I use for Poisson-Gamma smoothing:
-- Step 1: Compute weight-class rate priors
WITH wc_priors AS (
SELECT weightclass,
SUM(stat_count) / NULLIF(SUM(time_sec / 60.0), 0) AS wc_rate
FROM fight_stats
WHERE time_sec > 0
GROUP BY weightclass
),
-- Step 2: Apply Poisson-Gamma smoothing
smoothed AS (
SELECT fight_id, fighter_id,
-- Posterior rate: (prior_rate * pseudo_minutes + observed_count) / (pseudo_minutes + actual_minutes)
(p.wc_rate * τ + fs.stat_count) / (τ + fs.time_sec / 60.0) AS posterior_rate,
-- Convert back to smoothed count
(fs.time_sec / 60.0) *
((p.wc_rate * τ + fs.stat_count) / (τ + fs.time_sec / 60.0)) AS stat_count_smooth
FROM fight_stats fs
JOIN wc_priors p ON fs.weightclass = p.weightclass
)
-- Key insight: τ values I use:
-- Striking stats: τ = 15-25 minutes
-- Grappling stats: τ = 8-12 minutes
-- Submission attempts: τ = 5-8 minutes
Critical implementation notes:
- Filter out zero exposure: Always add
WHERE time_sec > 0
when computing priors to avoid division by zero. - Round-1 exposure cap: For first-round stats, cap exposure at 300 seconds (5 minutes) since rounds can't exceed this.
- Pseudo-minute tuning: Start with τ=15 for most stats, then use cross-validation to optimize. Higher τ = more shrinkage toward prior.
- Weight-class fallbacks: Always have a global prior for weight classes with insufficient data.
Why this fixes the old issues
- Short/long fights handled correctly: A two-minute brawl vs. a fifteen-minute chess match are now comparable because smoothing happens on rates and returns properly scaled counts.
- Less hand-waving, more model: Pseudo-minutes encode prior confidence without ad-hoc branches; per-class overrides exist only where they earned their keep.
3) Beta-Binomial Smoothing (Binary)
What was wrong before
- There wasn't any. KO/Win/Decision/Sub were effectively raw indicators or simple ratios. That's noisy, mis-calibrated, and conflates "skill" with "opportunity."
- Attempts were undefined: I had no principled "denominator" for success probability. For example, calling every strike a KO attempt is just wrong.
What I do now
- Proper Beta-Binomial: For each binary family I define attempts and successes:
ko/win/decision
: each fight is one opportunity.sub_land
: opportunities =sub_att
.ctrl
(duration): modeled as a time-share (Bernoulli per second); smoothed rate times exposure seconds.
- Weight-class priors, validated strength: I use WC success rates as p priors and add pseudo-counts
τ
tuned per stat, optionally per WC where cross-validation justifies it (e.g., Featherweight subs, LHW/HW control). - Zero-attempt guard: If attempts are zero, I return the WC/global prior rate rather than fabricating a fraction.
Implementation Details for Beta-Binomial Smoothing
Here's the exact approach for different binary outcomes:
-- For KO/Win/Decision rates (attempts = 1 per fight)
WITH wc_binary_priors AS (
SELECT weightclass,
SUM(CASE WHEN outcome = 'ko' THEN 1 ELSE 0 END)::float / COUNT(*) AS ko_rate,
SUM(CASE WHEN outcome = 'win' THEN 1 ELSE 0 END)::float / COUNT(*) AS win_rate
FROM fight_results
GROUP BY weightclass
),
-- For submission success rates (attempts = sub_att, successes = sub_land)
sub_priors AS (
SELECT weightclass,
SUM(sub_land)::float / NULLIF(SUM(sub_att), 0) AS sub_success_rate
FROM fight_stats
WHERE sub_att > 0
GROUP BY weightclass
),
-- Apply Beta-Binomial smoothing
smoothed_binaries AS (
SELECT fight_id, fighter_id,
-- KO rate: (prior_rate * pseudo_attempts + successes) / (pseudo_attempts + attempts)
(p.ko_rate * τ_ko + ko_success) / (τ_ko + 1) AS ko_prob_smooth,
-- Sub rate: (prior_rate * pseudo_attempts + sub_land) / (pseudo_attempts + sub_att)
(sp.sub_success_rate * τ_sub + sub_land) / (τ_sub + GREATEST(sub_att, 1)) AS sub_prob_smooth
FROM fight_stats fs
JOIN wc_binary_priors p ON fs.weightclass = p.weightclass
JOIN sub_priors sp ON fs.weightclass = sp.weightclass
)
-- Pseudo-attempt values I use:
-- KO/Win/Decision: τ = 8-12 fights
-- Submission success: τ = 3-5 attempts
-- Control time share: τ = 600-900 seconds
Key implementation considerations:
- Zero-attempt handling: When sub_att = 0, use the weight-class prior rate directly rather than trying to compute a ratio.
- Control time as time-share: Model control duration as Bernoulli per second, then multiply smoothed probability by total seconds.
- Minimum attempts: Use
GREATEST(attempts, 1)
to avoid division by zero in edge cases. - Cross-validation tuning: Start with τ values above, then optimize per weight class only if CV shows >0.5% improvement.
Why this fixes the old issues
- Calibration: Probabilities stop over- or under-shooting because we borrow signal from the division realistically.
- Correct denominators: A sub success isn't "one per fight;" it's "out of sub attempts." A KO isn't "per strike;" it's "per fight." This matters.
- Principled uncertainty: Small sample sizes get more shrinkage, large samples trust their own data more.
4) Pipeline & Naming Hygiene
- Order matters: Binary smoothing runs before count smoothing (so sub attempts are raw counts when they need to be). After smoothing, I temporarily keep
*_raw
, compute derived stats (totals, accuracy, defense, ratios, per), then I drop*_raw
. AdjPerf waits until opponent and weight-class aggregates are ready. - Consistent feature families: Head/Body/Leg and Distance/Clinch/Ground share a naming pattern, so per/ratio/opp calculations can be reliably applied and diffed.
Why this fixes the old issues
- No accidental toggling: Derived features don't unknowingly mix raw and smoothed values.
- Less glue code, fewer footguns: Calculators can target families by suffix/prefix without bespoke filters for every one-off.
Complete Implementation Roadmap
If you're building your own MMA prediction model, here's the exact order of operations that prevents data leakage and ensures all derived features use properly smoothed inputs:
Step-by-Step Pipeline
- Base feature extraction: Raw fight stats →
fight_stats_derived
table - Beta-Binomial smoothing: Binary outcomes (KO, win, decision, sub success, control time-share)
- Poisson-Gamma smoothing: Count statistics (strikes, takedowns, etc.)
- Temporary raw preservation: Keep
*_raw
columns during derivation phase - Derived feature computation: Totals, accuracy, defense rates, ratios - all using smoothed values
- Raw column cleanup: Drop
*_raw
columns after derived features are complete - Per-minute and ratio features: Rates, pressure metrics, position-specific stats
- Feature family tables: Separate tables for striking, grappling, position stats
- Opponent aggregation: Build opponent history tables with time decay
- Weight-class priors: Compute means, MADs, and minimum MAD floors
- Adjusted Performance: Apply opponent-aware, reliability-weighted standardization
Key SQL Patterns You Can Reuse
-- 1. Time decay weight calculation
SELECT EXP(-0.13 * EXTRACT(years FROM current_date - fight_date)) AS decay_weight
-- 2. Effective sample size (Kish formula)
SELECT POWER(SUM(w), 2) / NULLIF(SUM(POWER(w, 2)), 0) AS n_effective
-- 3. Shrinkage toward prior
SELECT (n/(n + K) * observed + K/(n + K) * prior) AS shrunk_estimate
-- 4. Robust MAD calculation
WITH medians AS (
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY stat_value) AS median_val
FROM stats_table
),
deviations AS (
SELECT ABS(stat_value - m.median_val) AS abs_dev
FROM stats_table s CROSS JOIN medians m
)
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY abs_dev) AS mad_value
FROM deviations
-- 5. Exposure-corrected rate calculation
SELECT stat_count / GREATEST(time_minutes, 0.1) AS rate_per_minute
Hyperparameters That Matter Most
- Time decay λ: 0.13 gives ~5 year half-life for fight relevance
- Poisson-Gamma τ (pseudo-minutes): 15-25 for striking, 8-12 for grappling
- Beta-Binomial τ (pseudo-attempts): 8-12 for outcomes, 3-5 for submission success
- Shrinkage K values: K_mean = 5-8, K_mad = 8-12 (higher for stability)
- Adjusted performance clipping: [-7, +7] to prevent extreme outliers
Common Pitfalls to Avoid
- Data leakage: Always filter opponent history to
event_date < current_fight_date
- Zero exposure: Add
WHERE time_sec > 0
filters when computing rate priors - Missing weight classes: Include global fallback rows in all prior computations
- Mixed raw/smoothed: Ensure derived features use smoothed inputs consistently
- Inadequate sample sizes: Set minimum thresholds for opponent history reliability