The Circuit Breaker Pattern for Trading Bots in Python

Published May 5, 2026 · 12 min read · By SsysTech Softwares

Your trading bot just lost three days in a row on one strategy. The model is bad, or the regime changed, or the data feed is corrupt — you don't know yet. By the time you wake up tomorrow morning, it'll have lost three more days. That's the gap a circuit breaker closes.

The circuit breaker pattern, borrowed from electrical engineering and microservices, gives a system the ability to automatically suspend a failing component until the failure passes. In a trading bot, "failing component" is usually a single strategy or instrument that's bleeding while the rest of the portfolio is fine. You want to disable just that strategy, not the whole bot, and you want it to come back online automatically when conditions normalize — not next Tuesday after you remember to flip a switch.

This post walks through a production circuit breaker we run across 10 sports on automated prediction market bots. It's hysteresis-protected, fail-open, JSON-persisted, and ~150 lines of Python. All code is Python 3.10+ and uses only the standard library plus pandas (for the rolling P&L calculation).

Contents

  1. Why every trading bot needs one
  2. Design: hysteresis, fail-open, sample minimums
  3. Persistent state with JSON
  4. The core breaker class
  5. Wiring into the trading loop
  6. The nightly evaluation cron
  7. Anti-patterns: what not to do
  8. Testing the breaker

Why Every Trading Bot Needs One

Without a circuit breaker, a bot in drawdown has only two states: trading and not trading. Switching between them requires a human noticing, deciding, and acting. That loop has a typical latency of 24-72 hours, during which a degraded strategy continues to lose money.

A daily P&L kill-switch (cap session losses at −$50, restart tomorrow) handles intra-day failures. The circuit breaker handles multi-day failures — the slower, more expensive class. The two are complementary: kill-switch is intra-session, breaker is multi-session.

Design: Hysteresis, Fail-Open, Sample Minimums

Three properties make the difference between a useful breaker and one that constantly false-fires:

1. Hysteresis. Disable threshold and re-enable threshold are different. Trip at −5% rolling 30d ROI; re-enable at >0% rolling 30d ROI. Without this gap, the breaker flaps on and off every time ROI crosses zero, which is constantly. With it, the strategy has to actually recover before resuming.

2. Fail-open semantics. If the breaker can't read its state file, can't compute P&L, or hits any exception, it returns "allow trade." A breaker that fails closed on its own bug is worse than no breaker — it silently halts your entire system. Always default to permissive.

3. Minimum sample size. A strategy with 8 trades and −7% ROI over the last 30 days isn't necessarily broken — that's noise. Require at least 30 trades in the window before the breaker is allowed to trip. Otherwise you'll disable strategies for being unlucky on small samples.

Production note: we tuned these parameters by walking forward across a year of live trading data and asking, for each candidate (trip threshold, recovery threshold, min sample), how many real degradations would have been caught and how many healthy strategies would have been false-tripped. −5%/+0%/N=30 was the sweet spot.

Persistent State with JSON

The breaker state needs to survive process restarts and be readable by humans (so you can see what's blocked and why). A flat JSON file is the simplest workable choice:

{
  "as_of": "2026-05-05T03:49:12Z",
  "sports": {
    "NBA":     {"status": "active",   "roi_30d": 0.034, "n_30d": 87,  "tripped_at": null},
    "NHL":     {"status": "blocked",  "roi_30d": -0.071, "n_30d": 42, "tripped_at": "2026-05-03T03:49:08Z", "reason": "roi_below_threshold"},
    "MLB":     {"status": "active",   "roi_30d": 0.012, "n_30d": 134, "tripped_at": null},
    "NCAAMB":  {"status": "active",   "roi_30d": 0.058, "n_30d": 211, "tripped_at": null},
    "TENNIS":  {"status": "active",   "roi_30d": 0.021, "n_30d": 71,  "tripped_at": null},
    "CS2":     {"status": "monitor",  "roi_30d": -0.018, "n_30d": 18, "tripped_at": null, "note": "below_min_n"}
  }
}

Three statuses: active (allow trades), blocked (reject trades), monitor (allow but flag — below min sample). The cron writes this file once per day; the bot reads it on every signal evaluation.

The Core Breaker Class

The breaker has two halves: the daily evaluator (decides who's blocked, writes JSON) and the runtime check (reads JSON, returns allow/deny). Here's the runtime side — the hot path the bot calls before placing every order:

import json
import time
from pathlib import Path
from typing import Tuple

STATE_PATH = Path(__file__).resolve().parent / "sport_circuit_state.json"
STALE_AFTER_SECONDS = 60 * 60 * 30   # 30h: cron runs daily, allow some slack


class CircuitBreaker:
    """Reads JSON state, returns allow/deny per sport. Fail-open."""

    def __init__(self, state_path: Path = STATE_PATH):
        self.state_path = state_path
        self._cache = None
        self._cache_mtime = 0.0

    def _load(self) -> dict:
        try:
            mtime = self.state_path.stat().st_mtime
            if self._cache is not None and mtime == self._cache_mtime:
                return self._cache
            with self.state_path.open() as f:
                state = json.load(f)
            self._cache, self._cache_mtime = state, mtime
            return state
        except (FileNotFoundError, json.JSONDecodeError, OSError):
            return {}

    def check(self, sport: str) -> Tuple[bool, str]:
        """Return (allow, reason). Fail open on any error."""
        try:
            state = self._load()
            if not state:
                return True, "state_missing_fail_open"

            # Stale state == fail open: cron probably broke
            as_of = state.get("as_of_unix", 0)
            if time.time() - as_of > STALE_AFTER_SECONDS:
                return True, "state_stale_fail_open"

            sport_state = state.get("sports", {}).get(sport)
            if sport_state is None:
                return True, "sport_unknown_fail_open"

            status = sport_state.get("status", "active")
            if status == "blocked":
                return False, f"circuit_breaker_blocked"
            return True, status
        except Exception:
            return True, "exception_fail_open"

Notice the fail-open paths: missing file, stale state, missing sport, exception — all return True. The breaker can never accidentally block trading because of its own bug. The price you pay for this is that a broken breaker will silently let through trades you wanted to block. That's the right trade-off for a non-critical safety net; if you want hard guarantees, layer a second mechanism (kill-switch, manual review).

Wiring into the Trading Loop

The breaker should sit immediately before order submission, after all your signal logic. That way you keep generating signals (so your CLV log accumulates and you can later analyze what would have happened) but you don't trade them:

breaker = CircuitBreaker()

def evaluate_signal(candidate):
    # ... existing checks: edge, fair_prob, max_entry, etc ...
    if not basic_checks_pass(candidate):
        return None

    # Last gate before placing the order
    allow, reason = breaker.check(candidate.sport)
    if not allow:
        log_rejected_signal(candidate, reason=reason)
        return None

    return submit_order(candidate)

Logging rejected signals is important — it gives you the counterfactual. After a sport recovers, you can backtest: "what would my P&L have been if I'd kept trading?" If the breaker correctly avoided losses, the answer validates the design. If you would have made money, you tune the threshold.

The Nightly Evaluation Cron

The other half of the breaker is the daily job that recomputes status. Read your trade log, slice the last 30 days, compute size-weighted ROI per sport, apply the trip/recovery rules, write the JSON.

import json
import time
import pandas as pd
from datetime import datetime, timezone, timedelta
from pathlib import Path

TRADES_PATH = Path("trades.jsonl")
STATE_PATH = Path("sport_circuit_state.json")
ROI_TRIP = -0.05        # disable below -5% ROI
ROI_RECOVER = 0.0       # re-enable above 0% ROI
MIN_N = 30              # minimum trades in window
WINDOW_DAYS = 30


def load_trades() -> pd.DataFrame:
    rows = []
    with TRADES_PATH.open() as f:
        for line in f:
            try:
                rows.append(json.loads(line))
            except json.JSONDecodeError:
                continue
    return pd.DataFrame(rows)


def evaluate_breaker():
    df = load_trades()
    df["ts"] = pd.to_datetime(df["ts"], utc=True, errors="coerce")
    df = df[df["resolved"].notna()]
    cutoff = datetime.now(timezone.utc) - timedelta(days=WINDOW_DAYS)
    df = df[df["ts"] >= cutoff]

    # Size-weighted ROI per sport
    df["dollar_pnl"] = df["pnl_c"] * df["size"] / 100
    df["dollar_size"] = df["entry_price_c"] * df["size"] / 100
    grouped = df.groupby("sport").agg(
        pnl=("dollar_pnl", "sum"),
        cost=("dollar_size", "sum"),
        n=("dollar_pnl", "count"),
    )
    grouped["roi_30d"] = grouped["pnl"] / grouped["cost"]

    # Read previous state for hysteresis
    try:
        with STATE_PATH.open() as f:
            prev = json.load(f).get("sports", {})
    except (FileNotFoundError, json.JSONDecodeError):
        prev = {}

    new_state = {}
    for sport, row in grouped.iterrows():
        prev_status = prev.get(sport, {}).get("status", "active")

        if row["n"] < MIN_N:
            new_state[sport] = {
                "status": "monitor", "roi_30d": float(row["roi_30d"]),
                "n_30d": int(row["n"]), "note": "below_min_n",
            }
            continue

        if prev_status == "blocked":
            # Hysteresis: only re-enable when above recovery threshold
            if row["roi_30d"] >= ROI_RECOVER:
                status, reason = "active", "recovered"
            else:
                status, reason = "blocked", "still_below_threshold"
        else:
            if row["roi_30d"] < ROI_TRIP:
                status, reason = "blocked", "roi_below_threshold"
            else:
                status, reason = "active", "ok"

        new_state[sport] = {
            "status": status, "roi_30d": float(row["roi_30d"]),
            "n_30d": int(row["n"]), "reason": reason,
            "tripped_at": (
                datetime.now(timezone.utc).isoformat()
                if status == "blocked" and prev_status != "blocked"
                else prev.get(sport, {}).get("tripped_at")
            ),
        }

    out = {
        "as_of": datetime.now(timezone.utc).isoformat(),
        "as_of_unix": int(time.time()),
        "sports": new_state,
    }
    tmp = STATE_PATH.with_suffix(".tmp")
    with tmp.open("w") as f:
        json.dump(out, f, indent=2)
    tmp.replace(STATE_PATH)


if __name__ == "__main__":
    evaluate_breaker()

Schedule via cron, daily, after your data-aggregation jobs:

# 03:49 daily, 10 minutes after the CLV bucket aggregator
49 3 * * * cd /opt/yourapp && python3 sport_circuit_breaker.py >> /var/log/breaker.log 2>&1

Without this cron, the breaker is a no-op. The state file goes stale, the runtime check fails open on staleness, and your bot trades through any drawdown. We learned this the hard way after deploying the breaker code without the cron entry — everything looked fine until we noticed nothing was ever blocking, three weeks later. Always pair the runtime code with a cron-health alert.

Anti-Patterns: What Not To Do

Things we tried that didn't work:

Testing the Breaker

Three tests we run in CI for the breaker:

def test_fail_open_on_missing_file(tmp_path):
    breaker = CircuitBreaker(state_path=tmp_path / "missing.json")
    allow, reason = breaker.check("NBA")
    assert allow is True
    assert "fail_open" in reason


def test_blocks_when_state_says_blocked(tmp_path):
    state = {
        "as_of_unix": int(time.time()),
        "sports": {"NBA": {"status": "blocked", "reason": "roi_below_threshold"}},
    }
    p = tmp_path / "s.json"
    p.write_text(json.dumps(state))
    breaker = CircuitBreaker(state_path=p)
    allow, reason = breaker.check("NBA")
    assert allow is False


def test_fail_open_on_stale_state(tmp_path):
    state = {
        "as_of_unix": int(time.time()) - 60 * 60 * 48,    # 48h old
        "sports": {"NBA": {"status": "blocked"}},
    }
    p = tmp_path / "s.json"
    p.write_text(json.dumps(state))
    breaker = CircuitBreaker(state_path=p)
    allow, reason = breaker.check("NBA")
    assert allow is True
    assert "stale" in reason

The stale-state test is the most important one to keep around. It's the one failure mode that's both common (cron fails silently for two days) and catastrophic if handled wrong (entire bot blocked on one bad cron run).

Putting It All Together

The full circuit breaker is roughly 200 lines of Python: 80 for the runtime check class, 100 for the cron evaluator, 20 for tests. It runs in production daily across 10 sports and has caught two material model degradations that would otherwise have eaten 5-7 days of P&L each before manual intervention.

The hardest part isn't writing it — it's choosing the parameters and committing to letting the breaker make decisions. Operators tend to override their own circuit breakers ("I think the strategy will recover, let me re-enable") and then watch the strategy keep losing. If you build the breaker, trust the breaker. Tune the thresholds offline; don't override them mid-drawdown.

See it running in production

ZenHodl's automated bots use this exact pattern across 10 sports. Per-sport breaker state, hysteresis, daily cron, fail-open semantics. The whole stack is part of the platform.

See Live Results →

Further Reading