Mixed-integer Programming Backend

The ocean.mip module is the Gurobi-backed formulation. Most users start from ocean.mip.Explainer, while ocean.mip.Model and the variable classes are useful when you want to inspect or extend the lower-level formulation directly.

Mixed-integer programming backend for optimal counterfactual search.

class BaseModel(name='', env=None)[source]

Bases: ABC, Model

Base Gurobi model used by the MIP backend.

build_vars(*variables)[source]

class Explainer(ensemble, *, mapper, weights=None, isolation=None, isolation_threshold=None, name='OCEAN', env=None, epsilon=1.52587890625e-05, num_epsilon=1e-06, model_type=Type.MIP, flow_type=FlowType.CONTINUOUS)[source]

Bases: Model, BaseExplainer

Mixed-integer programming explainer for tree ensemble classifiers.

Initialize an empty MIP model for a parsed ensemble.

Parameters:

trees – Parsed trees whose leaves define the ensemble class scores.
mapper – Feature mapper describing how processed query columns map to decision variables.
weights – Optional non-negative tree weights.
n_isolators – Number of isolation trees contributing to the auxiliary isolation constraint on the path length.
max_samples – Reference sample count used by the isolation-forest extension.
isolation_threshold – Optional isolation-score cutoff in (0, 1]. When omitted, the classic average-path-length threshold is used.
name – Name of the underlying Gurobi model.
env – Optional Gurobi environment.
epsilon – Classification margin used in the pairwise score constraints.
num_epsilon – Strictness constant used in numerical split implications.
model_type – Backend builder variant.
flow_type – Encoding used for the tree flow variables.

explain(x, *, y, norm, return_callback=False, verbose=False, max_time=60, num_workers=None, random_seed=42, clean_up=True)[source]

Solve one counterfactual query with the MIP backend.

Parameters:

x – Query instance in the processed feature space.
y – Target class enforced by the counterfactual.
norm – Distance norm. The MIP backend supports 0, 1, and 2.
return_callback – Whether to collect incumbent solutions through a Gurobi callback.
verbose – Whether to print Gurobi logs.
max_time – Time limit in seconds.
num_workers – Optional Gurobi thread count.
random_seed – Random seed passed to Gurobi.
clean_up – Whether to remove query-specific constraints after the solve.

Returns:

The decoded counterfactual, or None when no feasible counterfactual is found within the given limits.

Return type:

Explanation | None

Raises:

RuntimeError – If the solver stops for an unexpected status that is not handled by the explainer.

get_anytime_solutions()[source]

Return incumbent solutions collected during the last solve.

Returns:: Time-stamped incumbent objective values when return_callback was enabled in explain(), otherwise None.
Return type:: list[dict[str, float]] | None

get_distance()[source]

Return the post-processed distance of the last CF.

Returns:: Post-processed \(L_p\) distance for the last successful solve.
Return type:: float
Raises:: RuntimeError – If no explanation has been computed yet.

get_objective_value()[source]

Return the solver objective value of the last optimization run.

Returns:: Objective value reported by Gurobi for the latest solve.
Return type:: float

get_solving_status()[source]

Return the latest Gurobi solve status as a readable string.

Returns:: Current model status such as "OPTIMAL" or "TIME_LIMIT".
Return type:: str

vget(i)[source]

Return the solver variable for processed coordinate i of x.

Returns:: Solver variable representing the requested coordinate.
Return type:: gp.Var

class Explanation(mapping=None, *, columns=None, validate=True)[source]

Bases: Mapper[FeatureVar], BaseExplanation

Concrete explanation container returned by the MIP backend.

Initialize a mapper from feature metadata and transformed columns.

Parameters:

mapping – Mapping from original feature names to metadata objects.
columns – Processed pandas index describing the transformed coordinates.
validate – Whether to verify that mapping and columns are consistent.

format_discrete_value(f, val, thresholds)[source]

format_value(f, idx, levels)[source]

property query

to_numpy()[source]

to_series()[source]

property value

vget(i)[source]

property x

class FeatureVar(feature, name)[source]

Bases: Var, FeatureKeeper

MIP variable bundle associated with a single parsed feature.

X_VAR_NAME_FMT = 'x[{name}]'

build(model)[source]

mget(key)[source]

weighted_x(mu)[source]

xget(code=None)[source]

class Model(trees, mapper, *, weights=None, n_isolators=0, max_samples=0, isolation_threshold=None, name='OCEAN', env=None, epsilon=1.52587890625e-05, num_epsilon=1e-06, model_type=Type.MIP, flow_type=FlowType.CONTINUOUS)[source]

Bases: BaseModel, FeatureManager, TreeManager, GarbageManager

Mixed-integer programming formulation for tree ensemble explanations.

The feature variables encode the counterfactual point x and the tree variables encode the active leaf or path decisions p_(t,l).

Initialize an empty MIP model for a parsed ensemble.

Parameters:

trees – Parsed trees whose leaves define the ensemble class scores.
mapper – Feature mapper describing how processed query columns map to decision variables.
weights – Optional non-negative tree weights.
n_isolators – Number of isolation trees contributing to the auxiliary isolation constraint on the path length.
max_samples – Reference sample count used by the isolation-forest extension.
isolation_threshold – Optional isolation-score cutoff in (0, 1]. When omitted, the classic average-path-length threshold is used.
name – Name of the underlying Gurobi model.
env – Optional Gurobi environment.
epsilon – Classification margin used in the pairwise score constraints.
num_epsilon – Strictness constant used in numerical split implications.
model_type – Backend builder variant.
flow_type – Encoding used for the tree flow variables.

DEFAULT_EPSILON = 1.52587890625e-05

DEFAULT_NUM_EPSILON = 1e-06

L1(x, v, *, is_ohe)[source]

Return the MIP \(L_1\) contribution of one coordinate of \(x\).

This creates an auxiliary variable \(u\) such that \(u \ge |x_j - \hat{x}_j|\), where v encodes the counterfactual coordinate \(x_j\) and the code parameter x stores the query coordinate \(\hat{x}_j\).

Parameters:

x – Query coordinate \(\hat{x}_j\).
v – Model variable encoding the corresponding coordinate of \(x\).
is_ohe – Whether this coordinate belongs to a one-hot block \(u_{j,k}\). If so, the returned term is halved.

Returns:

Linear expression equal to the contribution of that coordinate to \(d_1(x, \hat{x})\).

Return type:

gp.LinExpr

static L2(x, v, *, is_ohe)[source]

Return the MIP \(L_2\) contribution of one coordinate of \(x\).

The returned expression is \((x_j - \hat{x}_j)^2\), halved for one-hot encoded coordinates to preserve the same category-switch semantics as the \(L_1\) objective.

Returns:: Quadratic expression equal to the contribution of that coordinate to \(d_2(x, \hat{x})\).
Return type:: gp.QuadExpr

MIN_NUMERIC_TOL = 1e-09

class Type(*values)[source]

Bases: Enum

MIP = 'MIP'

add_objective(x, *, norm=1, sense=1)[source]

Attach the distance objective \(d(x, \hat{x})\) to the MIP model.

Parameters:

x – Query point \(\hat{x}\) in the processed feature space. The code parameter is named x, but mathematically it represents the query.
norm – Distance norm used for \(d(x, \hat{x})\). The MIP backend supports \(L_0\), \(L_1\), and \(L_2\).
sense – Optimization sense passed to Gurobi. Counterfactual search uses minimization.

build()[source]

Create the decision variables and structural constraints of the model.

This step introduces the feature variables encoding \(x\), the tree-path variables \(p_{t,\ell}\), and the constraints linking both so that exactly one leaf is active in each tree and the selected leaves are consistent with the feature values.

cleanup()[source]

Remove query-specific objective auxiliaries and class constraints.

After cleanup(), the structural encoding of \(x\) and \(p_{t,\ell}\) remains, but temporary constraints created for a specific query \(\hat{x}\) are removed.

clear_majority_class()[source]

Remove the current target-class constraints.

This deletes stored inequalities of the form

\[f_y(x) \ge f_c(x) + \varepsilon_c\]

property epsilon

property num_epsilon

set_majority_class(y, *, op=0)[source]

Enforce the target class through pairwise score constraints.

For every competing class, this adds

\[f_y(x) \ge f_c(x) + \varepsilon_c\]

Raises:: ValueError – If y is not a valid class index.

class TreeVar(tree, name, *, flow_type=FlowType.CONTINUOUS, _adaboost=False)[source]

Bases: Var, TreeKeeper, Mapping[NonNegativeInt, Var]

MIP variables encoding the flow through one parsed tree.

FLOW_VAR_NAME_FMT = '{name}_flow'

class FlowType(*values)[source]

Bases: Enum

Available formulations for the tree flow variables.

CONTINUOUS = 'CONTINUOUS'

BINARY = 'BINARY'

build(model)[source]

property length

property value