How to Build a Production-Grade Sports Prediction API from ESPN's JSON Feeds

May 11, 2026 · 14 min read · Python, FastAPI, ESPN, XGBoost, Production

Building a sports prediction API on top of free public data sources is one of those projects that looks easy in a notebook and gets dramatically harder when you put it in production. The model is a small fraction of the work. The pipeline around it — scraping, schema design, calibration, latency management, monitoring — is where production-grade lives.

This post walks through the architecture of the live API we run, from the ESPN scoreboard scrape at the bottom to the calibrated probability response at the top. Code is in Python; the pattern generalizes to any language.

1. The data source: ESPN's undocumented JSON endpoints

Every game on ESPN.com is backed by a JSON feed at a predictable URL. The base scoreboard endpoint for any sport follows the pattern:

https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/scoreboard

For example, NBA's scoreboard is at basketball/nba/scoreboard; NHL's is at hockey/nhl/scoreboard. The response is JSON with all live games, scores, time remaining, and team identifiers. No authentication. No rate limit posted but ~1 request per second per sport is safe in practice.

For per-game detail, the summary endpoint provides play-by-play, win-probability fields populated by ESPN's own model, and starter lineups:

https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/summary?event={game_id}

These endpoints have been stable since at least 2020. They are not officially documented — ESPN publishes them implicitly to power their own apps. Treat them as a stable but unofficial source: poll defensively, cache aggressively, and have a fallback (official league API or paid Sportradar) for when ESPN occasionally rate-limits or returns garbage.

2. Defensive scraping pattern

The naive approach is requests.get(url).json() in a loop. The production approach handles every failure mode you will hit at 3 AM on a major game night:

import requests, time, logging
from typing import Optional

logger = logging.getLogger(__name__)
SESSION = requests.Session()
SESSION.headers.update({"User-Agent": "ZenHodl/1.0 (+https://zenhodl.net)"})

def fetch_espn(url: str, timeout: float = 8.0, retries: int = 3) -> Optional[dict]:
    last_err = None
    for attempt in range(retries):
        try:
            r = SESSION.get(url, timeout=timeout)
            if r.status_code == 429:
                # Respect Retry-After if present, else exponential backoff
                wait = int(r.headers.get("Retry-After", 2 ** attempt))
                logger.warning(f"ESPN 429, waiting {wait}s")
                time.sleep(wait)
                continue
            r.raise_for_status()
            return r.json()
        except (requests.RequestException, ValueError) as e:
            last_err = e
            time.sleep(2 ** attempt)
    logger.error(f"ESPN fetch failed after {retries}: {url} ({last_err})")
    return None

Three things this pattern gets right that the naive version does not: it respects the Retry-After header on 429 responses, it does exponential backoff on transient errors, and it returns None on total failure rather than crashing the entire poll loop. Your bot keeps going on the games that succeeded.

3. Schema design for multi-sport coverage

The temptation in a single-sport API is to expose ESPN's response shape directly. The mistake becomes obvious at sport three: ESPN's schema for NBA has period, NHL has period, but soccer has clock.displayValue as "45'+3'" with no integer period at all.

The right design splits the response into common metadata and sport-specific state. Common: game_id, sport, home_team, away_team, start_time, as_of. Sport-specific state lives in a nested object whose schema is documented separately per sport. Outcomes live in a list of named-probability objects (binary for most sports, three-way for soccer).

{
  "sport": "NBA",
  "game_id": "401705412",
  "home_team": "LAL",
  "away_team": "BOS",
  "start_time": "2026-05-11T02:30:00Z",
  "as_of": "2026-05-11T03:14:33Z",
  "outcomes": [
    {"name": "home", "prob": 0.617, "prob_calibrated": 0.604},
    {"name": "away", "prob": 0.383, "prob_calibrated": 0.396}
  ],
  "state": {
    "quarter": 3,
    "score_home": 67,
    "score_away": 71,
    "time_remaining_sec": 442
  }
}

This shape stays the same across sports. The state object varies. A consumer that only cares about probabilities reads the outcomes array. A consumer that wants to render the live game reads state based on the sport.

4. The model layer: XGBoost + isotonic calibration

For most sports, an XGBoost classifier with isotonic calibration on top is the right starting point. XGBoost handles the feature interactions; isotonic regression rescales the probabilities to be calibrated against observed outcomes.

from xgboost import XGBClassifier
from sklearn.isotonic import IsotonicRegression
import numpy as np

# Train base model
clf = XGBClassifier(
    n_estimators=400,
    max_depth=5,
    learning_rate=0.05,
    objective="binary:logistic",
)
clf.fit(X_train, y_train)

# Fit isotonic calibrator on a held-out set
raw_probs = clf.predict_proba(X_calib)[:, 1]
iso = IsotonicRegression(out_of_bounds="clip")
iso.fit(raw_probs, y_calib)

# At inference: predict, then calibrate
def predict_calibrated(features):
    raw = clf.predict_proba(features)[:, 1]
    return iso.transform(raw)

Always evaluate calibration. Expected Calibration Error is the right metric: bin predictions into 10-15 buckets, compute the absolute difference between the bin's mean predicted probability and its observed frequency, weight by bin size. Aim for ECE under 0.05 in production.

5. Serving: FastAPI on a single VPS

FastAPI is the right choice for a Python prediction API. Async-friendly, auto-generated OpenAPI docs, Pydantic models for request/response validation. A single FastAPI worker can comfortably handle a few hundred requests per second; a fleet of four workers behind a process manager handles thousands.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app = FastAPI()

class PredictionResponse(BaseModel):
    sport: str
    game_id: str
    fair_prob_home: float
    fair_prob_calibrated: float
    as_of: str

@app.get("/v1/predict/{sport}/{game_id}", response_model=PredictionResponse)
async def predict(sport: str, game_id: str):
    state = get_live_state(sport, game_id)
    if state is None:
        raise HTTPException(404, f"No live state for {sport}/{game_id}")
    raw, calibrated = predict_for(sport, state)
    return PredictionResponse(
        sport=sport,
        game_id=game_id,
        fair_prob_home=raw,
        fair_prob_calibrated=calibrated,
        as_of=state["as_of"],
    )

The get_live_state function reads from a hot in-memory cache populated by a separate background poller. The API never blocks on an external HTTP call — if the cache does not have current state, the request returns 404 rather than waiting on ESPN.

6. Background polling architecture

The polling layer runs as a separate task pool, one task per sport. Each task polls its sport's scoreboard on a sport-appropriate cadence (NBA every 5s, MLB every 10s, soccer every 15s), updates the in-memory cache, and emits diagnostic logs.

import asyncio

CACHE = {}  # game_id -> state

async def poll_sport(sport: str, cadence_sec: int):
    while True:
        try:
            data = fetch_espn(scoreboard_url(sport))
            if data:
                for event in data.get("events", []):
                    state = parse_event(event)
                    CACHE[state["game_id"]] = state
        except Exception as e:
            logger.exception(f"Polling failed for {sport}: {e}")
        await asyncio.sleep(cadence_sec)

@app.on_event("startup")
async def startup():
    sports = [("NBA", 5), ("NHL", 5), ("MLB", 10), ("soccer", 15)]
    for sport, cadence in sports:
        asyncio.create_task(poll_sport(sport, cadence))

This pattern decouples the API response latency from the upstream poll latency. Even if ESPN is having a bad day, the API stays responsive.

7. Production monitoring

Three metrics matter most:

Per-sport poll lag. Time between successful upstream polls for each sport. If lag exceeds 30 seconds, alert.
Per-sport ECE drift. Compute weekly. If ECE rises above your threshold (we use 0.05), retrain or recalibrate.
Request latency p99. Should stay under 100ms for cache hits. If it climbs, investigate (usually a cache miss path or a model load issue).

8. The non-obvious failures

Production will surface failure modes your local dev environment never showed:

ESPN occasionally returns stale data. The HTTP response is fresh, but the scoreboard payload is hours old. Mitigation: parse and cross-check against the date field.
The same game can have two ESPN game_ids. Postponements, suspensions, and reschedules sometimes generate a duplicate. Dedup by team pair plus date.
Sklearn version mismatch on pickle load. Train on sklearn 1.8.0, load on 1.7.x, get an AttributeError on inference. Pin the version in requirements.txt.

The bottom line

A production-grade sports prediction API is 10% modeling and 90% pipeline. ESPN's JSON feeds give you the raw material for free. Defensive scraping, decoupled polling, careful schema design, and continuous calibration monitoring are what turn the raw material into a service you can charge for.

The production version of this stack

ZenHodl runs the architecture described in this post across 11 sports. Calibrated probabilities, sub-30-second updates, free seven-day trial.

Try ZenHodl free