Mixed-integer Programming Backend

The ocean.mip module is the Gurobi-backed formulation. Most users start from ocean.mip.Explainer, while ocean.mip.Model and the variable classes are useful when you want to inspect or extend the lower-level formulation directly.

Mixed-integer programming backend for optimal counterfactual search.

class BaseModel(name='', env=None)[source]

Bases: ABC, Model

Base Gurobi model used by the MIP backend.

build_vars(*variables)[source]
class Explainer(ensemble, *, mapper, weights=None, isolation=None, isolation_threshold=None, name='OCEAN', env=None, epsilon=1.52587890625e-05, num_epsilon=1e-06, model_type=Type.MIP, flow_type=FlowType.CONTINUOUS)[source]

Bases: Model, BaseExplainer

Mixed-integer programming explainer for tree ensemble classifiers.

Initialize an empty MIP model for a parsed ensemble.

Parameters:
  • trees – Parsed trees whose leaves define the ensemble class scores.

  • mapper – Feature mapper describing how processed query columns map to decision variables.

  • weights – Optional non-negative tree weights.

  • n_isolators – Number of isolation trees contributing to the auxiliary isolation constraint on the path length.

  • max_samples – Reference sample count used by the isolation-forest extension.

  • isolation_threshold – Optional isolation-score cutoff in (0, 1]. When omitted, the classic average-path-length threshold is used.

  • name – Name of the underlying Gurobi model.

  • env – Optional Gurobi environment.

  • epsilon – Classification margin used in the pairwise score constraints.

  • num_epsilon – Strictness constant used in numerical split implications.

  • model_type – Backend builder variant.

  • flow_type – Encoding used for the tree flow variables.

explain(x, *, y, norm, return_callback=False, verbose=False, max_time=60, num_workers=None, random_seed=42, clean_up=True)[source]

Solve one counterfactual query with the MIP backend.

Parameters:
  • x – Query instance in the processed feature space.

  • y – Target class enforced by the counterfactual.

  • norm – Distance norm. The MIP backend supports 0, 1, and 2.

  • return_callback – Whether to collect incumbent solutions through a Gurobi callback.

  • verbose – Whether to print Gurobi logs.

  • max_time – Time limit in seconds.

  • num_workers – Optional Gurobi thread count.

  • random_seed – Random seed passed to Gurobi.

  • clean_up – Whether to remove query-specific constraints after the solve.

Returns:

The decoded counterfactual, or None when no feasible counterfactual is found within the given limits.

Return type:

Explanation | None

Raises:

RuntimeError – If the solver stops for an unexpected status that is not handled by the explainer.

get_anytime_solutions()[source]

Return incumbent solutions collected during the last solve.

Returns:

Time-stamped incumbent objective values when return_callback was enabled in explain(), otherwise None.

Return type:

list[dict[str, float]] | None

get_distance()[source]

Return the post-processed distance of the last CF.

Returns:

Post-processed \(L_p\) distance for the last successful solve.

Return type:

float

Raises:

RuntimeError – If no explanation has been computed yet.

get_objective_value()[source]

Return the solver objective value of the last optimization run.

Returns:

Objective value reported by Gurobi for the latest solve.

Return type:

float

get_solving_status()[source]

Return the latest Gurobi solve status as a readable string.

Returns:

Current model status such as "OPTIMAL" or "TIME_LIMIT".

Return type:

str

vget(i)[source]

Return the solver variable for processed coordinate i of x.

Returns:

Solver variable representing the requested coordinate.

Return type:

gp.Var

class Explanation(mapping=None, *, columns=None, validate=True)[source]

Bases: Mapper[FeatureVar], BaseExplanation

Concrete explanation container returned by the MIP backend.

Initialize a mapper from feature metadata and transformed columns.

Parameters:
  • mapping – Mapping from original feature names to metadata objects.

  • columns – Processed pandas index describing the transformed coordinates.

  • validate – Whether to verify that mapping and columns are consistent.

format_discrete_value(f, val, thresholds)[source]
format_value(f, idx, levels)[source]
property query
to_numpy()[source]
to_series()[source]
property value
vget(i)[source]
property x
class FeatureVar(feature, name)[source]

Bases: Var, FeatureKeeper

MIP variable bundle associated with a single parsed feature.

X_VAR_NAME_FMT = 'x[{name}]'
build(model)[source]
mget(key)[source]
weighted_x(mu)[source]
xget(code=None)[source]
class Model(trees, mapper, *, weights=None, n_isolators=0, max_samples=0, isolation_threshold=None, name='OCEAN', env=None, epsilon=1.52587890625e-05, num_epsilon=1e-06, model_type=Type.MIP, flow_type=FlowType.CONTINUOUS)[source]

Bases: BaseModel, FeatureManager, TreeManager, GarbageManager

Mixed-integer programming formulation for tree ensemble explanations.

The feature variables encode the counterfactual point x and the tree variables encode the active leaf or path decisions p_(t,l).

Initialize an empty MIP model for a parsed ensemble.

Parameters:
  • trees – Parsed trees whose leaves define the ensemble class scores.

  • mapper – Feature mapper describing how processed query columns map to decision variables.

  • weights – Optional non-negative tree weights.

  • n_isolators – Number of isolation trees contributing to the auxiliary isolation constraint on the path length.

  • max_samples – Reference sample count used by the isolation-forest extension.

  • isolation_threshold – Optional isolation-score cutoff in (0, 1]. When omitted, the classic average-path-length threshold is used.

  • name – Name of the underlying Gurobi model.

  • env – Optional Gurobi environment.

  • epsilon – Classification margin used in the pairwise score constraints.

  • num_epsilon – Strictness constant used in numerical split implications.

  • model_type – Backend builder variant.

  • flow_type – Encoding used for the tree flow variables.

DEFAULT_EPSILON = 1.52587890625e-05
DEFAULT_NUM_EPSILON = 1e-06
L1(x, v, *, is_ohe)[source]

Return the MIP \(L_1\) contribution of one coordinate of \(x\).

This creates an auxiliary variable \(u\) such that \(u \ge |x_j - \hat{x}_j|\), where v encodes the counterfactual coordinate \(x_j\) and the code parameter x stores the query coordinate \(\hat{x}_j\).

Parameters:
  • x – Query coordinate \(\hat{x}_j\).

  • v – Model variable encoding the corresponding coordinate of \(x\).

  • is_ohe – Whether this coordinate belongs to a one-hot block \(u_{j,k}\). If so, the returned term is halved.

Returns:

Linear expression equal to the contribution of that coordinate to \(d_1(x, \hat{x})\).

Return type:

gp.LinExpr

static L2(x, v, *, is_ohe)[source]

Return the MIP \(L_2\) contribution of one coordinate of \(x\).

The returned expression is \((x_j - \hat{x}_j)^2\), halved for one-hot encoded coordinates to preserve the same category-switch semantics as the \(L_1\) objective.

Returns:

Quadratic expression equal to the contribution of that coordinate to \(d_2(x, \hat{x})\).

Return type:

gp.QuadExpr

MIN_NUMERIC_TOL = 1e-09
class Type(*values)[source]

Bases: Enum

MIP = 'MIP'
add_objective(x, *, norm=1, sense=1)[source]

Attach the distance objective \(d(x, \hat{x})\) to the MIP model.

Parameters:
  • x – Query point \(\hat{x}\) in the processed feature space. The code parameter is named x, but mathematically it represents the query.

  • norm – Distance norm used for \(d(x, \hat{x})\). The MIP backend supports \(L_0\), \(L_1\), and \(L_2\).

  • sense – Optimization sense passed to Gurobi. Counterfactual search uses minimization.

build()[source]

Create the decision variables and structural constraints of the model.

This step introduces the feature variables encoding \(x\), the tree-path variables \(p_{t,\ell}\), and the constraints linking both so that exactly one leaf is active in each tree and the selected leaves are consistent with the feature values.

cleanup()[source]

Remove query-specific objective auxiliaries and class constraints.

After cleanup(), the structural encoding of \(x\) and \(p_{t,\ell}\) remains, but temporary constraints created for a specific query \(\hat{x}\) are removed.

clear_majority_class()[source]

Remove the current target-class constraints.

This deletes stored inequalities of the form

\[f_y(x) \ge f_c(x) + \varepsilon_c\]
property epsilon
property num_epsilon
set_majority_class(y, *, op=0)[source]

Enforce the target class through pairwise score constraints.

For every competing class, this adds

\[f_y(x) \ge f_c(x) + \varepsilon_c\]
Raises:

ValueError – If y is not a valid class index.

class TreeVar(tree, name, *, flow_type=FlowType.CONTINUOUS, _adaboost=False)[source]

Bases: Var, TreeKeeper, Mapping[NonNegativeInt, Var]

MIP variables encoding the flow through one parsed tree.

FLOW_VAR_NAME_FMT = '{name}_flow'
class FlowType(*values)[source]

Bases: Enum

Available formulations for the tree flow variables.

CONTINUOUS = 'CONTINUOUS'
BINARY = 'BINARY'
build(model)[source]
property length
property value