How to Build a Sports Prediction API with Python in 2026

From data collection to model training to serving live win probabilities via REST — with working code.

Published April 12, 2026 · 8 min read

If you've ever wanted to predict the outcome of a live NBA game, an MLB matchup, or a CS2 esports match using real data, you're in the right place. In this guide, we'll walk through the core components of a sports prediction API — from data collection to model training to serving live predictions.

By the end, you'll understand the architecture behind systems like ZenHodl's prediction API, which serves calibrated win probabilities for 10 sports in real-time.

What a Sports Prediction API Actually Does

A prediction API takes in a game state (score, time remaining, team quality) and returns a probability:

GET /v1/games?sport=NBA

{
  "game_id": "nba_2026041201",
  "home_team": "BOS",
  "away_team": "MIA",
  "home_score": 58,
  "away_score": 45,
  "period": 3,
  "home_win_probability": 0.847,
  "model": "xgboost_v3_calibrated",
  "updated_at": "2026-04-12T02:30:00Z"
}

The key challenge isn't building the API endpoint — it's making the probability estimate accurate and well-calibrated. A calibrated model means: when it says 70%, the team actually wins ~70% of the time.

Step 1: Collect Historical Game Data

You need play-by-play or score-update snapshots with timestamps. The best free source in 2026:

ESPN API (free, unofficial):

import requests

def get_nba_scoreboard():
    url = "https://site.api.espn.com/apis/site/v2/sports/basketball/nba/scoreboard"
    resp = requests.get(url, timeout=10)
    data = resp.json()

    games = []
    for event in data.get("events", []):
        competition = event["competitions"][0]
        home = competition["competitors"][0]
        away = competition["competitors"][1]

        games.append({
            "game_id": event["id"],
            "home_team": home["team"]["abbreviation"],
            "away_team": away["team"]["abbreviation"],
            "home_score": int(home["score"]),
            "away_score": int(away["score"]),
            "period": competition.get("status", {}).get("period", 0),
            "clock": competition.get("status", {}).get("displayClock", ""),
        })
    return games

For historical data, you'll want 3-5 seasons. Sources include Basketball Reference for NBA, FanGraphs for MLB, and Jeff Sackmann's GitHub for tennis.

Step 2: Engineer Features

Raw scores aren't enough. You need features that capture game context:

def build_features(game_state, team_stats, elo_ratings):
    home = game_state["home_team"]
    away = game_state["away_team"]
    score_diff = game_state["home_score"] - game_state["away_score"]
    time_fraction = 1 - (game_state["seconds_remaining"] / 2880)

    return {
        "score_diff": score_diff,
        "seconds_remaining": game_state["seconds_remaining"],
        "period": game_state["period"],
        "time_fraction": time_fraction,
        "elo_diff": elo_ratings.get(home, 1500) - elo_ratings.get(away, 1500),
        "ortg_diff": team_stats[home]["ortg"] - team_stats[away]["ortg"],
        "drtg_diff": team_stats[home]["drtg"] - team_stats[away]["drtg"],
        "score_diff_x_tf": score_diff * time_fraction,
        "score_diff_sq": score_diff ** 2,
    }

The most important features by XGBoost importance:

score_diff (~25%) — the current lead
elo_diff (~15%) — pre-game team quality gap
time_fraction (~12%) — how much game is left
score_diff × time_fraction (~10%) — a 10-point lead means different things at halftime vs 2 minutes left
offensive/defensive ratings (~8% each) — team efficiency metrics

Step 3: Train the Model

XGBoost is the standard for tabular sports prediction:

import xgboost as xgb
from sklearn.metrics import brier_score_loss, roc_auc_score
from sklearn.isotonic import IsotonicRegression

X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]

model = xgb.XGBClassifier(
    n_estimators=300, max_depth=5,
    learning_rate=0.05, subsample=0.8,
    colsample_bytree=0.8, eval_metric="logloss",
)
model.fit(X_train, y_train)

raw_probs = model.predict_proba(X_test)[:, 1]
print(f"Brier score: {brier_score_loss(y_test, raw_probs):.4f}")
print(f"ROC AUC: {roc_auc_score(y_test, raw_probs):.4f}")

Critical: use walk-forward splits, not random splits. Random splits leak future game information into training. Walk-forward means all training data is older than all test data.

Step 4: Calibrate the Probabilities

Raw XGBoost outputs are often overconfident. Isotonic regression fixes this:

calibrator = IsotonicRegression(y_min=0.01, y_max=0.99, out_of_bounds="clip")
calibrator.fit(cal_probs, y_cal)

calibrated = calibrator.transform(raw_probs)
print(f"Brier AFTER calibration: {brier_score_loss(y_test, calibrated):.4f}")

After calibration, when your model says 70%, teams should actually win ~70% of the time. For reference, ZenHodl's production models achieve Expected Calibration Error of 0.002 across NBA, NHL, MLB, and LoL.

Step 5: Serve via FastAPI

from fastapi import FastAPI
import pickle

app = FastAPI()

with open("wp_model_NBA.pkl", "rb") as f:
    model_data = pickle.load(f)
model = model_data["model"]
calibrator = model_data["calibrator"]

@app.get("/v1/predict")
def predict(home_team: str, away_team: str, home_score: int,
            away_score: int, period: int, seconds_remaining: int):

    features = build_features(...)
    raw = model.predict_proba([features])[0][1]
    calibrated = calibrator.transform([raw])[0]

    return {
        "home_team": home_team,
        "home_win_probability": round(float(calibrated), 4),
    }

Step 6: Add Real-Time Overlays

A static model isn't enough for live prediction. You need overlays:

Injury adjustments — if a star player is ruled out mid-game, shift the probability
Live recalibration — rolling isotonic refit on recent predictions to catch model drift
Momentum features — scoring runs in basketball, power plays in hockey

These stack on top of the base prediction and are capped at ±20% total adjustment.

Skip the build — use calibrated predictions directly

ZenHodl's API serves real-time win probabilities for 10 sports with a 7-day free trial.

Try the API Free See Live Results

What You Can Build

Trading bots for prediction markets like Polymarket or Kalshi
Live dashboards showing real-time win probabilities
API services for other developers and researchers