Skip to content

[QST] cuOpt self-hosted client: large MILP times out in Colab, while tiny/easy models return results #355

@Egecan33

Description

@Egecan33

Summary

I’m solving a bipartite assignment-style MILP (binary matching with per-org side balances and slack variables) via the cuOpt self-hosted client from Google Colab.
• ✅ Small/easy models: solve successfully via the same client and endpoint.
• ❌ Large instance (~985 transactions × 326 organizations): HTTP request times out even with high polling_timeout (420–600s) and solver time_limit (180–600s).

I’m trying to determine whether this is an environment/network constraint, request serialization/size limit, or a server-side issue under large models.

Environment
• Client: Google Colab (Python 3.12)
• cuOpt client packages
• cuopt_mps_parser — [TODO: paste pip show cuopt_mps_parser]
• data_model (from cuOpt SDK) — [TODO]
• cuopt_sh_client — [TODO]
• Other deps: numpy, polars, pandas, openpyxl (optionally fastexcel)
• Service host/port: localhost:5000 (self-hosted cuOpt service)
• Behavior: Easy toy models return solutions; the large Excel-backed MILP times out.

Expected vs Actual

Expected: The self-hosted service returns status, objective/best bound/gap, and (when available) variable values, even if the solve stops on time_limit.

Actual: For the large model, the request times out at the HTTP layer (examples below), despite long polling_timeout. For small models, the same client call succeeds.

Observed errors (representative)

Connection aborted / Read timeout

ConnectionError: ('Connection aborted.', TimeoutError('timed out'))

ReadTimeout

ReadTimeout: HTTPConnectionPool(host='localhost', port=5000): Read timed out. (read timeout=30)

Earlier runs: ConnectionRefused

ConnectionRefusedError: [Errno 111] Connection refused
MaxRetryError: HTTPConnectionPool(host='localhost', port=5000) ...

Steps to reproduce
1. Load Excel → dicts (transactions & organizations).
2. Build MILP (binary x_l(i,j), x_s(i,j); per-org side balance with slacks sp/sn; each txn-side assigned exactly once; no org on both sides of same txn).
3. Build cuOpt DataModel and call client.get_LP_solve with high time_limit and polling_timeout.

  1. Minimal cuOpt client
Code
from typing import Dict, List, Tuple, Optional
import numpy as np
import cuopt_mps_parser
from data_model import DataModel
from cuopt_sh_client import ThinClientSolverSettings, CuOptServiceSelfHostClient

class CuOptModel:
    def __init__(self, name: str = "model", sense: str = "max"):
        self.name = name
        self.sense = sense.lower()
        self._var_order: List[str] = []
        self._var_meta: Dict[str, Dict] = {}
        self._constraints: List[Dict] = []
        self._objective: Dict[str, float] = {}

    def add_var(self, name: str, lb: float = 0.0, ub: float = np.inf, vtype: str = "C"):
        if name in self._var_meta:
            raise ValueError(f"Variable {name} already exists.")
        if vtype not in ("C","I"):
            raise ValueError("vtype must be 'C' or 'I'.")
        self._var_order.append(name)
        self._var_meta[name] = {"lb": lb, "ub": ub, "type": vtype}

    def set_objective(self, terms: List[Tuple[str, float]], sense: str = None):
        if sense:
            self.sense = sense.lower()
        self._objective = {}
        for v, c in terms:
            if v not in self._var_meta:
                raise KeyError(f"Objective references unknown variable {v}")
            self._objective[v] = self._objective.get(v, 0.0) + float(c)

    def add_constraint(self, pairs: List[Tuple[str, float]],
                       lb: float = -np.inf, ub: float = np.inf, name: Optional[str] = None):
        for v, _ in pairs:
            if v not in self._var_meta:
                raise KeyError(f"Constraint references unknown variable {v}")
        self._constraints.append({
            "pairs": [(v, float(c)) for v, c in pairs],
            "lb": float(lb), "ub": float(ub), "name": name or f"c{len(self._constraints)+1}"
        })

    def build_datamodel(self) -> DataModel:
        dm = DataModel()
        var_names = np.array(self._var_order)
        lb = np.array([self._var_meta[n]["lb"] for n in self._var_order], dtype=np.float64)
        ub = np.array([self._var_meta[n]["ub"] for n in self._var_order], dtype=np.float64)
        vtypes = np.array([self._var_meta[n]["type"] for n in self._var_order])

        dm.set_variable_names(var_names)
        dm.set_variable_lower_bounds(lb)
        dm.set_variable_upper_bounds(ub)
        dm.set_variable_types(vtypes)

        obj = np.zeros(len(self._var_order), dtype=np.float64)
        for i, n in enumerate(self._var_order):
            obj[i] = float(self._objective.get(n, 0.0))
        dm.set_objective_coefficients(obj)
        dm.set_maximize(self.sense == "max")

        A_vals: List[float] = []
        A_idx: List[int] = []
        A_off: List[int] = [0]
        clb: List[float] = []
        cub: List[float] = []

        name2idx = {n:i for i, n in enumerate(self._var_order)}
        for cons in self._constraints:
            for (v, coef) in cons["pairs"]:
                A_vals.append(coef)
                A_idx.append(name2idx[v])
            A_off.append(len(A_vals))
            clb.append(cons["lb"])
            cub.append(cons["ub"])

        dm.set_csr_constraint_matrix(
            np.array(A_vals, dtype=np.float64),
            np.array(A_idx, dtype=np.int32),
            np.array(A_off, dtype=np.int32),
        )
        dm.set_constraint_lower_bounds(np.array(clb, dtype=np.float64))
        dm.set_constraint_upper_bounds(np.array(cub, dtype=np.float64))
        return dm

def solve_cuopt(dm: DataModel, time_limit: float = 180.0,
                ip: str = "localhost", port: int = 5000, polling_timeout=420):
    ss = ThinClientSolverSettings()
    ss.set_parameter("time_limit", time_limit)
    data = cuopt_mps_parser.toDict(dm)
    data["solver_config"] = ss.toDict()
    client = CuOptServiceSelfHostClient(ip=ip, port=port, polling_timeout=polling_timeout)
    return client.get_LP_solve(data, response_type="dict")
  1. Excel → dict loader (Polars + fallback)
Code
import polars as pl
import pandas as pd

file_name = "/content/contract_dash_example.xlsx"  # uploaded via Colab "files"

def read_excel_pl(path: str, sheet_name: str) -> pl.DataFrame:
    try:
        return pl.read_excel(path, sheet_name=sheet_name, engine="fastexcel")
    except Exception:
        pdf = pd.read_excel(path, sheet_name=sheet_name, engine="openpyxl")
        return pl.from_pandas(pdf)

# Transactions
txn_df = read_excel_pl(file_name, "idm_txn_log").drop(["dt", "c", "price"], strict=False)
txn_list = txn_df.to_dicts()
txn_d = {str(r["id"]): {k: v for k, v in r.items() if k != "id"} for r in txn_list}
txn_id_l = list(txn_d.keys())

# Organizations
org_df = (
    read_excel_pl(file_name, "org_summary")
    .filter((pl.col("idm_long") > 0) | (pl.col("idm_short") > 0))
    .with_columns(
        long=(pl.col("idm_long") * 10).cast(pl.Int64),
        short=(pl.col("idm_short") * 10).cast(pl.Int64),
        org_id=pl.int_range(0, pl.len()).add(1).cast(pl.String),
    )
    .select(["long", "short", "org_name", "org_id"])
)
org_d = {r["org_id"]: {k: v for k, v in r.items() if k != "org_id"} for r in org_df.to_dicts()}
org_id_l = list(org_d.keys())

print(f"Loaded {len(txn_d)} transactions and {len(org_d)} organizations.")
# -> Loaded 985 transactions and 326 organizations.

  1. Build the MILP
Code
import numpy as np

m = CuOptModel(name='txn_match', sense='min')

x_l_vars, x_s_vars = {}, {}
sp_vars, sn_vars = {}, {}

# Variables
for side_symbol, side_label in [('l', 'long'), ('s', 'short')]:
    for j in org_id_l:
        any_var = False
        for i in txn_id_l:
            if org_d[j][side_label] >= txn_d[i]['lots']:
                vname = f"x_{side_symbol}({i}_{j})"
                m.add_var(vname, lb=0, ub=1, vtype='I')
                (x_l_vars if side_symbol=='l' else x_s_vars)[(j, i)] = vname
                any_var = True
        if any_var:
            sp_name, sn_name = f"sp_{side_symbol}_{j}", f"sn_{side_symbol}_{j}"
            m.add_var(sp_name, lb=0, ub=np.inf, vtype='C')
            m.add_var(sn_name, lb=0, ub=np.inf, vtype='C')
            sp_vars[(side_symbol, j)] = sp_name
            sn_vars[(side_symbol, j)] = sn_name

# Objective: minimize sum of slacks
obj_terms = [(nm,1.0) for nm in sp_vars.values()] + [(nm,1.0) for nm in sn_vars.values()]
m.set_objective(obj_terms, sense='min')

# (a) Per-org side balance: sum_i x*lots + sp - sn = position
for side_symbol, side_label in [('l','long'), ('s','short')]:
    for j in org_id_l:
        pairs = []
        if side_symbol == 'l':
            pairs += [(v, txn_d[i]['lots']) for (jj,i), v in x_l_vars.items() if jj == j]
        else:
            pairs += [(v, txn_d[i]['lots']) for (jj,i), v in x_s_vars.items() if jj == j]
        sp_name, sn_name = sp_vars.get((side_symbol,j)), sn_vars.get((side_symbol,j))
        if sp_name: pairs.append((sp_name, +1.0))
        if sn_name: pairs.append((sn_name, -1.0))
        if pairs:
            pos = float(org_d[j][side_label])
            m.add_constraint(pairs, lb=pos, ub=pos, name=f"pos_{side_symbol}_{j}")

# (b) Each txn side assigned to exactly one org
for side_symbol in ['l','s']:
    for i in txn_id_l:
        if side_symbol == 'l':
            pairs = [(v,1.0) for (j,i2), v in x_l_vars.items() if i2 == i]
        else:
            pairs = [(v,1.0) for (j,i2), v in x_s_vars.items() if i2 == i]
        if pairs:
            m.add_constraint(pairs, lb=1.0, ub=1.0, name=f"assign_{side_symbol}_{i}")

# (c) No org can take both sides of the same txn
for j in org_id_l:
    for i in txn_id_l:
        v_l, v_s = x_l_vars.get((j,i)), x_s_vars.get((j,i))
        if v_l and v_s:
            m.add_constraint([(v_l,1.0),(v_s,1.0)], lb=-np.inf, ub=1.0, name=f"no_both_{j}_{i}")
  1. Solve (fails for large instance)
Code
dm = m.build_datamodel()
print("Solving with cuOpt... (3-minute time limit)")

sol = solve_cuopt(
    dm,
    time_limit=180,      # also tried 600
    ip='localhost',
    port=5000,
    polling_timeout=420  # also tried 600
)

-> HTTP timeouts / connection aborted (see errors above)

What I’ve tried already
• Verified the same endpoint & client solves small models.
• Increased time_limit (180 → 600) and polling_timeout (420 → 600).
• Avoided progress spinner / streaming helpers; using old client path (get_LP_solve) directly.
• (Earlier) attempted to spin up the local service process in Colab; still hit timeouts for the large instance.

Questions for the cuOpt team
1. Are there request size / serialization limits or recommendations for large MILPs (payload size, model build time, etc.)?
2. Are there server-side timeouts or reverse proxies that could terminate long requests even when time_limit and client polling_timeout are large?
3. What’s the recommended pattern for long-running solves over HTTP (eg, job submission + polling endpoint) with cuOpt self-hosted?
4. Is there a suggested upper bound on variables/constraints per request, or guidance on decomposition for these models?
5. How can I enable server logs / debug to determine whether the service receives the request and where it stalls?

Checklist
• Small toy models solve via the same client and endpoint.
• Increased both solver time_limit and client polling_timeout.
• Confirmed Excel parsing & model assembly run without error.
• Collected server logs (how to enable on self-hosted Python module? Please advise).
• Happy to test suggested patches or diagnostics.

Thanks in advance!

Metadata

Metadata

Assignees

Labels

awaiting responseThis expects a response from maintainer or contributor depending on who requested in last comment.questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions