[QST] cuOpt self-hosted client: large MILP times out in Colab, while tiny/easy models return results


Summary

I’m solving a bipartite assignment-style MILP (binary matching with per-org side balances and slack variables) via the cuOpt self-hosted client from Google Colab.
	•	✅ Small/easy models: solve successfully via the same client and endpoint.
	•	❌ Large instance (~985 transactions × 326 organizations): HTTP request times out even with high polling_timeout (420–600s) and solver time_limit (180–600s).

I’m trying to determine whether this is an environment/network constraint, request serialization/size limit, or a server-side issue under large models.

⸻

Environment
	•	Client: Google Colab (Python 3.12)
	•	cuOpt client packages
	•	cuopt_mps_parser — [TODO: paste pip show cuopt_mps_parser]
	•	data_model (from cuOpt SDK) — [TODO]
	•	cuopt_sh_client — [TODO]
	•	Other deps: numpy, polars, pandas, openpyxl (optionally fastexcel)
	•	Service host/port: localhost:5000 (self-hosted cuOpt service)
	•	Behavior: Easy toy models return solutions; the large Excel-backed MILP times out.

⸻

Expected vs Actual

Expected: The self-hosted service returns status, objective/best bound/gap, and (when available) variable values, even if the solve stops on time_limit.

Actual: For the large model, the request times out at the HTTP layer (examples below), despite long polling_timeout. For small models, the same client call succeeds.

⸻

Observed errors (representative)

<details>
<summary><code>Connection aborted / Read timeout</code></summary>


ConnectionError: ('Connection aborted.', TimeoutError('timed out'))

</details>


<details>
<summary><code>ReadTimeout</code></summary>


ReadTimeout: HTTPConnectionPool(host='localhost', port=5000): Read timed out. (read timeout=30)

</details>


<details>
<summary>Earlier runs: <code>ConnectionRefused</code></summary>


ConnectionRefusedError: [Errno 111] Connection refused
MaxRetryError: HTTPConnectionPool(host='localhost', port=5000) ...

</details>



⸻

Steps to reproduce
	1.	Load Excel → dicts (transactions & organizations).
	2.	Build MILP (binary x_l(i,j), x_s(i,j); per-org side balance with slacks sp/sn; each txn-side assigned exactly once; no org on both sides of same txn).
	3.	Build cuOpt DataModel and call client.get_LP_solve with high time_limit and polling_timeout.

1) Minimal cuOpt client

<details><summary>Code</summary>

```
from typing import Dict, List, Tuple, Optional
import numpy as np
import cuopt_mps_parser
from data_model import DataModel
from cuopt_sh_client import ThinClientSolverSettings, CuOptServiceSelfHostClient

class CuOptModel:
    def __init__(self, name: str = "model", sense: str = "max"):
        self.name = name
        self.sense = sense.lower()
        self._var_order: List[str] = []
        self._var_meta: Dict[str, Dict] = {}
        self._constraints: List[Dict] = []
        self._objective: Dict[str, float] = {}

    def add_var(self, name: str, lb: float = 0.0, ub: float = np.inf, vtype: str = "C"):
        if name in self._var_meta:
            raise ValueError(f"Variable {name} already exists.")
        if vtype not in ("C","I"):
            raise ValueError("vtype must be 'C' or 'I'.")
        self._var_order.append(name)
        self._var_meta[name] = {"lb": lb, "ub": ub, "type": vtype}

    def set_objective(self, terms: List[Tuple[str, float]], sense: str = None):
        if sense:
            self.sense = sense.lower()
        self._objective = {}
        for v, c in terms:
            if v not in self._var_meta:
                raise KeyError(f"Objective references unknown variable {v}")
            self._objective[v] = self._objective.get(v, 0.0) + float(c)

    def add_constraint(self, pairs: List[Tuple[str, float]],
                       lb: float = -np.inf, ub: float = np.inf, name: Optional[str] = None):
        for v, _ in pairs:
            if v not in self._var_meta:
                raise KeyError(f"Constraint references unknown variable {v}")
        self._constraints.append({
            "pairs": [(v, float(c)) for v, c in pairs],
            "lb": float(lb), "ub": float(ub), "name": name or f"c{len(self._constraints)+1}"
        })

    def build_datamodel(self) -> DataModel:
        dm = DataModel()
        var_names = np.array(self._var_order)
        lb = np.array([self._var_meta[n]["lb"] for n in self._var_order], dtype=np.float64)
        ub = np.array([self._var_meta[n]["ub"] for n in self._var_order], dtype=np.float64)
        vtypes = np.array([self._var_meta[n]["type"] for n in self._var_order])

        dm.set_variable_names(var_names)
        dm.set_variable_lower_bounds(lb)
        dm.set_variable_upper_bounds(ub)
        dm.set_variable_types(vtypes)

        obj = np.zeros(len(self._var_order), dtype=np.float64)
        for i, n in enumerate(self._var_order):
            obj[i] = float(self._objective.get(n, 0.0))
        dm.set_objective_coefficients(obj)
        dm.set_maximize(self.sense == "max")

        A_vals: List[float] = []
        A_idx: List[int] = []
        A_off: List[int] = [0]
        clb: List[float] = []
        cub: List[float] = []

        name2idx = {n:i for i, n in enumerate(self._var_order)}
        for cons in self._constraints:
            for (v, coef) in cons["pairs"]:
                A_vals.append(coef)
                A_idx.append(name2idx[v])
            A_off.append(len(A_vals))
            clb.append(cons["lb"])
            cub.append(cons["ub"])

        dm.set_csr_constraint_matrix(
            np.array(A_vals, dtype=np.float64),
            np.array(A_idx, dtype=np.int32),
            np.array(A_off, dtype=np.int32),
        )
        dm.set_constraint_lower_bounds(np.array(clb, dtype=np.float64))
        dm.set_constraint_upper_bounds(np.array(cub, dtype=np.float64))
        return dm

def solve_cuopt(dm: DataModel, time_limit: float = 180.0,
                ip: str = "localhost", port: int = 5000, polling_timeout=420):
    ss = ThinClientSolverSettings()
    ss.set_parameter("time_limit", time_limit)
    data = cuopt_mps_parser.toDict(dm)
    data["solver_config"] = ss.toDict()
    client = CuOptServiceSelfHostClient(ip=ip, port=port, polling_timeout=polling_timeout)
    return client.get_LP_solve(data, response_type="dict")
```

</details>


2) Excel → dict loader (Polars + fallback)

<details><summary>Code</summary>


```
import polars as pl
import pandas as pd

file_name = "/content/contract_dash_example.xlsx"  # uploaded via Colab "files"

def read_excel_pl(path: str, sheet_name: str) -> pl.DataFrame:
    try:
        return pl.read_excel(path, sheet_name=sheet_name, engine="fastexcel")
    except Exception:
        pdf = pd.read_excel(path, sheet_name=sheet_name, engine="openpyxl")
        return pl.from_pandas(pdf)

# Transactions
txn_df = read_excel_pl(file_name, "idm_txn_log").drop(["dt", "c", "price"], strict=False)
txn_list = txn_df.to_dicts()
txn_d = {str(r["id"]): {k: v for k, v in r.items() if k != "id"} for r in txn_list}
txn_id_l = list(txn_d.keys())

# Organizations
org_df = (
    read_excel_pl(file_name, "org_summary")
    .filter((pl.col("idm_long") > 0) | (pl.col("idm_short") > 0))
    .with_columns(
        long=(pl.col("idm_long") * 10).cast(pl.Int64),
        short=(pl.col("idm_short") * 10).cast(pl.Int64),
        org_id=pl.int_range(0, pl.len()).add(1).cast(pl.String),
    )
    .select(["long", "short", "org_name", "org_id"])
)
org_d = {r["org_id"]: {k: v for k, v in r.items() if k != "org_id"} for r in org_df.to_dicts()}
org_id_l = list(org_d.keys())

print(f"Loaded {len(txn_d)} transactions and {len(org_d)} organizations.")
# -> Loaded 985 transactions and 326 organizations.

```
</details>


3) Build the MILP

<details><summary>Code</summary>


```
import numpy as np

m = CuOptModel(name='txn_match', sense='min')

x_l_vars, x_s_vars = {}, {}
sp_vars, sn_vars = {}, {}

# Variables
for side_symbol, side_label in [('l', 'long'), ('s', 'short')]:
    for j in org_id_l:
        any_var = False
        for i in txn_id_l:
            if org_d[j][side_label] >= txn_d[i]['lots']:
                vname = f"x_{side_symbol}({i}_{j})"
                m.add_var(vname, lb=0, ub=1, vtype='I')
                (x_l_vars if side_symbol=='l' else x_s_vars)[(j, i)] = vname
                any_var = True
        if any_var:
            sp_name, sn_name = f"sp_{side_symbol}_{j}", f"sn_{side_symbol}_{j}"
            m.add_var(sp_name, lb=0, ub=np.inf, vtype='C')
            m.add_var(sn_name, lb=0, ub=np.inf, vtype='C')
            sp_vars[(side_symbol, j)] = sp_name
            sn_vars[(side_symbol, j)] = sn_name

# Objective: minimize sum of slacks
obj_terms = [(nm,1.0) for nm in sp_vars.values()] + [(nm,1.0) for nm in sn_vars.values()]
m.set_objective(obj_terms, sense='min')

# (a) Per-org side balance: sum_i x*lots + sp - sn = position
for side_symbol, side_label in [('l','long'), ('s','short')]:
    for j in org_id_l:
        pairs = []
        if side_symbol == 'l':
            pairs += [(v, txn_d[i]['lots']) for (jj,i), v in x_l_vars.items() if jj == j]
        else:
            pairs += [(v, txn_d[i]['lots']) for (jj,i), v in x_s_vars.items() if jj == j]
        sp_name, sn_name = sp_vars.get((side_symbol,j)), sn_vars.get((side_symbol,j))
        if sp_name: pairs.append((sp_name, +1.0))
        if sn_name: pairs.append((sn_name, -1.0))
        if pairs:
            pos = float(org_d[j][side_label])
            m.add_constraint(pairs, lb=pos, ub=pos, name=f"pos_{side_symbol}_{j}")

# (b) Each txn side assigned to exactly one org
for side_symbol in ['l','s']:
    for i in txn_id_l:
        if side_symbol == 'l':
            pairs = [(v,1.0) for (j,i2), v in x_l_vars.items() if i2 == i]
        else:
            pairs = [(v,1.0) for (j,i2), v in x_s_vars.items() if i2 == i]
        if pairs:
            m.add_constraint(pairs, lb=1.0, ub=1.0, name=f"assign_{side_symbol}_{i}")

# (c) No org can take both sides of the same txn
for j in org_id_l:
    for i in txn_id_l:
        v_l, v_s = x_l_vars.get((j,i)), x_s_vars.get((j,i))
        if v_l and v_s:
            m.add_constraint([(v_l,1.0),(v_s,1.0)], lb=-np.inf, ub=1.0, name=f"no_both_{j}_{i}")
```

</details>


4) Solve (fails for large instance)

<details><summary>Code</summary>


```
dm = m.build_datamodel()
print("Solving with cuOpt... (3-minute time limit)")

sol = solve_cuopt(
    dm,
    time_limit=180,      # also tried 600
    ip='localhost',
    port=5000,
    polling_timeout=420  # also tried 600
)
```
# -> HTTP timeouts / connection aborted (see errors above)

</details>



⸻

What I’ve tried already
	•	Verified the same endpoint & client solves small models.
	•	Increased time_limit (180 → 600) and polling_timeout (420 → 600).
	•	Avoided progress spinner / streaming helpers; using old client path (get_LP_solve) directly.
	•	(Earlier) attempted to spin up the local service process in Colab; still hit timeouts for the large instance.

⸻

Questions for the cuOpt team
	1.	Are there request size / serialization limits or recommendations for large MILPs (payload size, model build time, etc.)?
	2.	Are there server-side timeouts or reverse proxies that could terminate long requests even when time_limit and client polling_timeout are large?
	3.	What’s the recommended pattern for long-running solves over HTTP (eg, job submission + polling endpoint) with cuOpt self-hosted?
	4.	Is there a suggested upper bound on variables/constraints per request, or guidance on decomposition for these models?
	5.	How can I enable server logs / debug to determine whether the service receives the request and where it stalls?

⸻

Checklist
	•	Small toy models solve via the same client and endpoint.
	•	Increased both solver time_limit and client polling_timeout.
	•	Confirmed Excel parsing & model assembly run without error.
	•	Collected server logs (how to enable on self-hosted Python module? Please advise).
	•	Happy to test suggested patches or diagnostics.

⸻

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QST] cuOpt self-hosted client: large MILP times out in Colab, while tiny/easy models return results #355

-> HTTP timeouts / connection aborted (see errors above)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] cuOpt self-hosted client: large MILP times out in Colab, while tiny/easy models return results #355

Description

-> HTTP timeouts / connection aborted (see errors above)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions