-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Summary
I’m solving a bipartite assignment-style MILP (binary matching with per-org side balances and slack variables) via the cuOpt self-hosted client from Google Colab.
• ✅ Small/easy models: solve successfully via the same client and endpoint.
• ❌ Large instance (~985 transactions × 326 organizations): HTTP request times out even with high polling_timeout (420–600s) and solver time_limit (180–600s).
I’m trying to determine whether this is an environment/network constraint, request serialization/size limit, or a server-side issue under large models.
⸻
Environment
• Client: Google Colab (Python 3.12)
• cuOpt client packages
• cuopt_mps_parser — [TODO: paste pip show cuopt_mps_parser]
• data_model (from cuOpt SDK) — [TODO]
• cuopt_sh_client — [TODO]
• Other deps: numpy, polars, pandas, openpyxl (optionally fastexcel)
• Service host/port: localhost:5000 (self-hosted cuOpt service)
• Behavior: Easy toy models return solutions; the large Excel-backed MILP times out.
⸻
Expected vs Actual
Expected: The self-hosted service returns status, objective/best bound/gap, and (when available) variable values, even if the solve stops on time_limit.
Actual: For the large model, the request times out at the HTTP layer (examples below), despite long polling_timeout. For small models, the same client call succeeds.
⸻
Observed errors (representative)
Connection aborted / Read timeout
ConnectionError: ('Connection aborted.', TimeoutError('timed out'))
ReadTimeout
ReadTimeout: HTTPConnectionPool(host='localhost', port=5000): Read timed out. (read timeout=30)
Earlier runs: ConnectionRefused
ConnectionRefusedError: [Errno 111] Connection refused
MaxRetryError: HTTPConnectionPool(host='localhost', port=5000) ...
⸻
Steps to reproduce
1. Load Excel → dicts (transactions & organizations).
2. Build MILP (binary x_l(i,j), x_s(i,j); per-org side balance with slacks sp/sn; each txn-side assigned exactly once; no org on both sides of same txn).
3. Build cuOpt DataModel and call client.get_LP_solve with high time_limit and polling_timeout.
- Minimal cuOpt client
Code
from typing import Dict, List, Tuple, Optional
import numpy as np
import cuopt_mps_parser
from data_model import DataModel
from cuopt_sh_client import ThinClientSolverSettings, CuOptServiceSelfHostClient
class CuOptModel:
def __init__(self, name: str = "model", sense: str = "max"):
self.name = name
self.sense = sense.lower()
self._var_order: List[str] = []
self._var_meta: Dict[str, Dict] = {}
self._constraints: List[Dict] = []
self._objective: Dict[str, float] = {}
def add_var(self, name: str, lb: float = 0.0, ub: float = np.inf, vtype: str = "C"):
if name in self._var_meta:
raise ValueError(f"Variable {name} already exists.")
if vtype not in ("C","I"):
raise ValueError("vtype must be 'C' or 'I'.")
self._var_order.append(name)
self._var_meta[name] = {"lb": lb, "ub": ub, "type": vtype}
def set_objective(self, terms: List[Tuple[str, float]], sense: str = None):
if sense:
self.sense = sense.lower()
self._objective = {}
for v, c in terms:
if v not in self._var_meta:
raise KeyError(f"Objective references unknown variable {v}")
self._objective[v] = self._objective.get(v, 0.0) + float(c)
def add_constraint(self, pairs: List[Tuple[str, float]],
lb: float = -np.inf, ub: float = np.inf, name: Optional[str] = None):
for v, _ in pairs:
if v not in self._var_meta:
raise KeyError(f"Constraint references unknown variable {v}")
self._constraints.append({
"pairs": [(v, float(c)) for v, c in pairs],
"lb": float(lb), "ub": float(ub), "name": name or f"c{len(self._constraints)+1}"
})
def build_datamodel(self) -> DataModel:
dm = DataModel()
var_names = np.array(self._var_order)
lb = np.array([self._var_meta[n]["lb"] for n in self._var_order], dtype=np.float64)
ub = np.array([self._var_meta[n]["ub"] for n in self._var_order], dtype=np.float64)
vtypes = np.array([self._var_meta[n]["type"] for n in self._var_order])
dm.set_variable_names(var_names)
dm.set_variable_lower_bounds(lb)
dm.set_variable_upper_bounds(ub)
dm.set_variable_types(vtypes)
obj = np.zeros(len(self._var_order), dtype=np.float64)
for i, n in enumerate(self._var_order):
obj[i] = float(self._objective.get(n, 0.0))
dm.set_objective_coefficients(obj)
dm.set_maximize(self.sense == "max")
A_vals: List[float] = []
A_idx: List[int] = []
A_off: List[int] = [0]
clb: List[float] = []
cub: List[float] = []
name2idx = {n:i for i, n in enumerate(self._var_order)}
for cons in self._constraints:
for (v, coef) in cons["pairs"]:
A_vals.append(coef)
A_idx.append(name2idx[v])
A_off.append(len(A_vals))
clb.append(cons["lb"])
cub.append(cons["ub"])
dm.set_csr_constraint_matrix(
np.array(A_vals, dtype=np.float64),
np.array(A_idx, dtype=np.int32),
np.array(A_off, dtype=np.int32),
)
dm.set_constraint_lower_bounds(np.array(clb, dtype=np.float64))
dm.set_constraint_upper_bounds(np.array(cub, dtype=np.float64))
return dm
def solve_cuopt(dm: DataModel, time_limit: float = 180.0,
ip: str = "localhost", port: int = 5000, polling_timeout=420):
ss = ThinClientSolverSettings()
ss.set_parameter("time_limit", time_limit)
data = cuopt_mps_parser.toDict(dm)
data["solver_config"] = ss.toDict()
client = CuOptServiceSelfHostClient(ip=ip, port=port, polling_timeout=polling_timeout)
return client.get_LP_solve(data, response_type="dict")
- Excel → dict loader (Polars + fallback)
Code
import polars as pl
import pandas as pd
file_name = "/content/contract_dash_example.xlsx" # uploaded via Colab "files"
def read_excel_pl(path: str, sheet_name: str) -> pl.DataFrame:
try:
return pl.read_excel(path, sheet_name=sheet_name, engine="fastexcel")
except Exception:
pdf = pd.read_excel(path, sheet_name=sheet_name, engine="openpyxl")
return pl.from_pandas(pdf)
# Transactions
txn_df = read_excel_pl(file_name, "idm_txn_log").drop(["dt", "c", "price"], strict=False)
txn_list = txn_df.to_dicts()
txn_d = {str(r["id"]): {k: v for k, v in r.items() if k != "id"} for r in txn_list}
txn_id_l = list(txn_d.keys())
# Organizations
org_df = (
read_excel_pl(file_name, "org_summary")
.filter((pl.col("idm_long") > 0) | (pl.col("idm_short") > 0))
.with_columns(
long=(pl.col("idm_long") * 10).cast(pl.Int64),
short=(pl.col("idm_short") * 10).cast(pl.Int64),
org_id=pl.int_range(0, pl.len()).add(1).cast(pl.String),
)
.select(["long", "short", "org_name", "org_id"])
)
org_d = {r["org_id"]: {k: v for k, v in r.items() if k != "org_id"} for r in org_df.to_dicts()}
org_id_l = list(org_d.keys())
print(f"Loaded {len(txn_d)} transactions and {len(org_d)} organizations.")
# -> Loaded 985 transactions and 326 organizations.
- Build the MILP
Code
import numpy as np
m = CuOptModel(name='txn_match', sense='min')
x_l_vars, x_s_vars = {}, {}
sp_vars, sn_vars = {}, {}
# Variables
for side_symbol, side_label in [('l', 'long'), ('s', 'short')]:
for j in org_id_l:
any_var = False
for i in txn_id_l:
if org_d[j][side_label] >= txn_d[i]['lots']:
vname = f"x_{side_symbol}({i}_{j})"
m.add_var(vname, lb=0, ub=1, vtype='I')
(x_l_vars if side_symbol=='l' else x_s_vars)[(j, i)] = vname
any_var = True
if any_var:
sp_name, sn_name = f"sp_{side_symbol}_{j}", f"sn_{side_symbol}_{j}"
m.add_var(sp_name, lb=0, ub=np.inf, vtype='C')
m.add_var(sn_name, lb=0, ub=np.inf, vtype='C')
sp_vars[(side_symbol, j)] = sp_name
sn_vars[(side_symbol, j)] = sn_name
# Objective: minimize sum of slacks
obj_terms = [(nm,1.0) for nm in sp_vars.values()] + [(nm,1.0) for nm in sn_vars.values()]
m.set_objective(obj_terms, sense='min')
# (a) Per-org side balance: sum_i x*lots + sp - sn = position
for side_symbol, side_label in [('l','long'), ('s','short')]:
for j in org_id_l:
pairs = []
if side_symbol == 'l':
pairs += [(v, txn_d[i]['lots']) for (jj,i), v in x_l_vars.items() if jj == j]
else:
pairs += [(v, txn_d[i]['lots']) for (jj,i), v in x_s_vars.items() if jj == j]
sp_name, sn_name = sp_vars.get((side_symbol,j)), sn_vars.get((side_symbol,j))
if sp_name: pairs.append((sp_name, +1.0))
if sn_name: pairs.append((sn_name, -1.0))
if pairs:
pos = float(org_d[j][side_label])
m.add_constraint(pairs, lb=pos, ub=pos, name=f"pos_{side_symbol}_{j}")
# (b) Each txn side assigned to exactly one org
for side_symbol in ['l','s']:
for i in txn_id_l:
if side_symbol == 'l':
pairs = [(v,1.0) for (j,i2), v in x_l_vars.items() if i2 == i]
else:
pairs = [(v,1.0) for (j,i2), v in x_s_vars.items() if i2 == i]
if pairs:
m.add_constraint(pairs, lb=1.0, ub=1.0, name=f"assign_{side_symbol}_{i}")
# (c) No org can take both sides of the same txn
for j in org_id_l:
for i in txn_id_l:
v_l, v_s = x_l_vars.get((j,i)), x_s_vars.get((j,i))
if v_l and v_s:
m.add_constraint([(v_l,1.0),(v_s,1.0)], lb=-np.inf, ub=1.0, name=f"no_both_{j}_{i}")
- Solve (fails for large instance)
Code
dm = m.build_datamodel()
print("Solving with cuOpt... (3-minute time limit)")
sol = solve_cuopt(
dm,
time_limit=180, # also tried 600
ip='localhost',
port=5000,
polling_timeout=420 # also tried 600
)
-> HTTP timeouts / connection aborted (see errors above)
⸻
What I’ve tried already
• Verified the same endpoint & client solves small models.
• Increased time_limit (180 → 600) and polling_timeout (420 → 600).
• Avoided progress spinner / streaming helpers; using old client path (get_LP_solve) directly.
• (Earlier) attempted to spin up the local service process in Colab; still hit timeouts for the large instance.
⸻
Questions for the cuOpt team
1. Are there request size / serialization limits or recommendations for large MILPs (payload size, model build time, etc.)?
2. Are there server-side timeouts or reverse proxies that could terminate long requests even when time_limit and client polling_timeout are large?
3. What’s the recommended pattern for long-running solves over HTTP (eg, job submission + polling endpoint) with cuOpt self-hosted?
4. Is there a suggested upper bound on variables/constraints per request, or guidance on decomposition for these models?
5. How can I enable server logs / debug to determine whether the service receives the request and where it stalls?
⸻
Checklist
• Small toy models solve via the same client and endpoint.
• Increased both solver time_limit and client polling_timeout.
• Confirmed Excel parsing & model assembly run without error.
• Collected server logs (how to enable on self-hosted Python module? Please advise).
• Happy to test suggested patches or diagnostics.
⸻
Thanks in advance!