Building a sports prediction API on top of free public data sources is one of those projects that looks easy in a notebook and gets dramatically harder when you put it in production. The model is a small fraction of the work. The pipeline around it — scraping, schema design, calibration, latency management, monitoring — is where production-grade lives.
This post walks through the architecture of the live API we run, from the ESPN scoreboard scrape at the bottom to the calibrated probability response at the top. Code is in Python; the pattern generalizes to any language.
Every game on ESPN.com is backed by a JSON feed at a predictable URL. The base scoreboard endpoint for any sport follows the pattern:
https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/scoreboard
For example, NBA's scoreboard is at basketball/nba/scoreboard; NHL's is at hockey/nhl/scoreboard. The response is JSON with all live games, scores, time remaining, and team identifiers. No authentication. No rate limit posted but ~1 request per second per sport is safe in practice.
For per-game detail, the summary endpoint provides play-by-play, win-probability fields populated by ESPN's own model, and starter lineups:
https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/summary?event={game_id}
These endpoints have been stable since at least 2020. They are not officially documented — ESPN publishes them implicitly to power their own apps. Treat them as a stable but unofficial source: poll defensively, cache aggressively, and have a fallback (official league API or paid Sportradar) for when ESPN occasionally rate-limits or returns garbage.
The naive approach is requests.get(url).json() in a loop. The production approach handles every failure mode you will hit at 3 AM on a major game night:
import requests, time, logging
from typing import Optional
logger = logging.getLogger(__name__)
SESSION = requests.Session()
SESSION.headers.update({"User-Agent": "ZenHodl/1.0 (+https://zenhodl.net)"})
def fetch_espn(url: str, timeout: float = 8.0, retries: int = 3) -> Optional[dict]:
last_err = None
for attempt in range(retries):
try:
r = SESSION.get(url, timeout=timeout)
if r.status_code == 429:
# Respect Retry-After if present, else exponential backoff
wait = int(r.headers.get("Retry-After", 2 ** attempt))
logger.warning(f"ESPN 429, waiting {wait}s")
time.sleep(wait)
continue
r.raise_for_status()
return r.json()
except (requests.RequestException, ValueError) as e:
last_err = e
time.sleep(2 ** attempt)
logger.error(f"ESPN fetch failed after {retries}: {url} ({last_err})")
return None
Three things this pattern gets right that the naive version does not: it respects the Retry-After header on 429 responses, it does exponential backoff on transient errors, and it returns None on total failure rather than crashing the entire poll loop. Your bot keeps going on the games that succeeded.
The temptation in a single-sport API is to expose ESPN's response shape directly. The mistake becomes obvious at sport three: ESPN's schema for NBA has period, NHL has period, but soccer has clock.displayValue as "45'+3'" with no integer period at all.
The right design splits the response into common metadata and sport-specific state. Common: game_id, sport, home_team, away_team, start_time, as_of. Sport-specific state lives in a nested object whose schema is documented separately per sport. Outcomes live in a list of named-probability objects (binary for most sports, three-way for soccer).
{
"sport": "NBA",
"game_id": "401705412",
"home_team": "LAL",
"away_team": "BOS",
"start_time": "2026-05-11T02:30:00Z",
"as_of": "2026-05-11T03:14:33Z",
"outcomes": [
{"name": "home", "prob": 0.617, "prob_calibrated": 0.604},
{"name": "away", "prob": 0.383, "prob_calibrated": 0.396}
],
"state": {
"quarter": 3,
"score_home": 67,
"score_away": 71,
"time_remaining_sec": 442
}
}
This shape stays the same across sports. The state object varies. A consumer that only cares about probabilities reads the outcomes array. A consumer that wants to render the live game reads state based on the sport.
For most sports, an XGBoost classifier with isotonic calibration on top is the right starting point. XGBoost handles the feature interactions; isotonic regression rescales the probabilities to be calibrated against observed outcomes.
from xgboost import XGBClassifier
from sklearn.isotonic import IsotonicRegression
import numpy as np
# Train base model
clf = XGBClassifier(
n_estimators=400,
max_depth=5,
learning_rate=0.05,
objective="binary:logistic",
)
clf.fit(X_train, y_train)
# Fit isotonic calibrator on a held-out set
raw_probs = clf.predict_proba(X_calib)[:, 1]
iso = IsotonicRegression(out_of_bounds="clip")
iso.fit(raw_probs, y_calib)
# At inference: predict, then calibrate
def predict_calibrated(features):
raw = clf.predict_proba(features)[:, 1]
return iso.transform(raw)
Always evaluate calibration. Expected Calibration Error is the right metric: bin predictions into 10-15 buckets, compute the absolute difference between the bin's mean predicted probability and its observed frequency, weight by bin size. Aim for ECE under 0.05 in production.
FastAPI is the right choice for a Python prediction API. Async-friendly, auto-generated OpenAPI docs, Pydantic models for request/response validation. A single FastAPI worker can comfortably handle a few hundred requests per second; a fleet of four workers behind a process manager handles thousands.
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
app = FastAPI()
class PredictionResponse(BaseModel):
sport: str
game_id: str
fair_prob_home: float
fair_prob_calibrated: float
as_of: str
@app.get("/v1/predict/{sport}/{game_id}", response_model=PredictionResponse)
async def predict(sport: str, game_id: str):
state = get_live_state(sport, game_id)
if state is None:
raise HTTPException(404, f"No live state for {sport}/{game_id}")
raw, calibrated = predict_for(sport, state)
return PredictionResponse(
sport=sport,
game_id=game_id,
fair_prob_home=raw,
fair_prob_calibrated=calibrated,
as_of=state["as_of"],
)
The get_live_state function reads from a hot in-memory cache populated by a separate background poller. The API never blocks on an external HTTP call — if the cache does not have current state, the request returns 404 rather than waiting on ESPN.
The polling layer runs as a separate task pool, one task per sport. Each task polls its sport's scoreboard on a sport-appropriate cadence (NBA every 5s, MLB every 10s, soccer every 15s), updates the in-memory cache, and emits diagnostic logs.
import asyncio
CACHE = {} # game_id -> state
async def poll_sport(sport: str, cadence_sec: int):
while True:
try:
data = fetch_espn(scoreboard_url(sport))
if data:
for event in data.get("events", []):
state = parse_event(event)
CACHE[state["game_id"]] = state
except Exception as e:
logger.exception(f"Polling failed for {sport}: {e}")
await asyncio.sleep(cadence_sec)
@app.on_event("startup")
async def startup():
sports = [("NBA", 5), ("NHL", 5), ("MLB", 10), ("soccer", 15)]
for sport, cadence in sports:
asyncio.create_task(poll_sport(sport, cadence))
This pattern decouples the API response latency from the upstream poll latency. Even if ESPN is having a bad day, the API stays responsive.
Three metrics matter most:
Production will surface failure modes your local dev environment never showed:
A production-grade sports prediction API is 10% modeling and 90% pipeline. ESPN's JSON feeds give you the raw material for free. Defensive scraping, decoupled polling, careful schema design, and continuous calibration monitoring are what turn the raw material into a service you can charge for.
ZenHodl runs the architecture described in this post across 11 sports. Calibrated probabilities, sub-30-second updates, free seven-day trial.
Try ZenHodl free