Mathematical Model

This page summarizes the optimization problem solved by OCEAN at the modeling level. The goal is not to reproduce every implementation detail of every backend, but to make the shared mathematical structure explicit.

Problem statement

Let:

\(\hat{x} \in \mathcal{X}\) be the query instance,
\(y\) be the desired target class,
\(x\) be the counterfactual decision vector.

The ensemble prediction rule is:

\[\hat{y}(x) \in \arg\max_{c \in \mathcal{Y}} f_c(x).\]

At a high level, OCEAN solves:

\[x^\star \in \arg\min_{x \in \mathcal{X}} d(x, \hat{x}) \quad \text{s.t.} \quad f_y(x) \ge f_c(x) + \varepsilon_c, \qquad \forall c \neq y.\]

Here \(f_c(x)\) is the class-\(c\) component of the ensemble decision function, evaluated at the candidate counterfactual \(x\). The tie-breaking constant is defined as:

\[\begin{split}\varepsilon_c = \begin{cases} \varepsilon & \text{if } c < y, \\ 0 & \text{otherwise.} \end{cases}\end{split}\]

This enforces that the target class is strictly preferred, including in the presence of ties. The default value of \(\varepsilon\) depends on the backend:

MIP uses \(\varepsilon = 2^{-16} = 1/65536 \approx 1.5259 \times 10^{-5}\).
CP uses \(\varepsilon = 1\) because the score expressions are integer scaled.
MaxSAT uses \(\varepsilon = 1\) after integerizing the score differences used in the pseudo-Boolean encoding.

The MIP backend also uses a numerical strictness constant \(\eta = \texttt{num\_epsilon} = 10^{-6}\) for strict right-branch comparisons in numerical splits.

Optional isolation-forest constraint

When an isolation forest is supplied to the MIP or CP explainer, OCEAN parses those isolation trees after the predictive ensemble and adds a plausibility constraint based on the aggregate path length. If \(\mathcal{T}_{iso}\) denotes the isolation trees and \(L_t(x)\) the path length selected in tree \(t\), the additional condition is

\[\sum_{t \in \mathcal{T}_{iso}} L_t(x) \ge |\mathcal{T}_{iso}| \cdot \lambda(\texttt{max\_samples}, \tau),\]

where \(\tau = \texttt{isolation\_threshold}\) and

\[\begin{split}\lambda(m, \tau) = \begin{cases} c(m) & \text{if } \tau \text{ is omitted}, \\ -c(m)\log_2(\tau) & \text{otherwise}. \end{cases}\end{split}\]

Here \(c(\cdot)\) is the usual isolation-forest average path-length normalizer. The historical behavior corresponds to isolation_threshold=0.5 because \(-\log_2(0.5)=1\).

This means the threshold is monotone in the opposite direction one might first guess:

larger values such as 0.9 are weaker because they require a smaller minimum path length,
smaller values such as 0.1 are stricter because they require a much larger minimum path length.

Intuitively, this keeps the counterfactual away from points that would be isolated too quickly by the auxiliary forest.

The extra inequality can also make a target infeasible. That happens when the classifier still has valid target-class regions, but every target-class point that respects the tree ensemble is isolated too quickly by the auxiliary forest. In that case the classification constraints and the isolation-path constraint have no common solution.

Feature domains

The feasible set \(\mathcal{X}\) depends on the feature type produced by ocean.feature.parse_features().

Continuous feature: \[x_j \in [a_j, b_j]\]
Ordinal discrete feature: \[x_j \in D_j = \{d_{j,1}, \dots, d_{j,m_j}\}\]
Binary feature: \[x_j \in \{0, 1\}\]
Unordered nominal feature: Encoded as a one-hot block \(u_{j,k}\) with:

\[u_{j,k} \in \{0, 1\}, \qquad \sum_{k \in K_j} u_{j,k} = 1.\]

This is why ordinal features and one-hot encoded features are treated differently in OCEAN: ordinal values belong to an ordered finite set, while unordered categories are modeled as mutually exclusive indicator variables.

Tree path variables

For each tree \(t\) and each leaf \(\ell\) of that tree, OCEAN creates path or leaf-selection variables. Abstractly, they satisfy:

\[p_{t,\ell} \in \{0, 1\}, \qquad \sum_{\ell \in \mathcal{L}_t} p_{t,\ell} = 1.\]

The equality means exactly one leaf is active in each tree. Path consistency constraints ensure that if \(p_{t,\ell} = 1\), then the feature vector \(x\) satisfies every split on the root-to-leaf path leading to \(\ell\).

For a numerical split of the form \(x_j \le \tau\), the abstract logic is:

\[p_{t,\ell} = 1 \implies x_j \le \tau \qquad \text{for every left branch on the path,}\]

\[p_{t,\ell} = 1 \implies x_j \ge \tau + \eta \qquad \text{for every right branch on the path,}\]

where \(\eta\) is the numerical strictness constant described above. In practice, MIP uses explicit linear implications, CP works with threshold intervals, and MaxSAT reasons over Boolean encodings of the same branch conditions.

For one-hot encoded nominal features, branch tests are expressed on the indicator variables rather than on an ordered scalar value. If a branch accepts exactly the category subset \(S \subseteq K_j\), the corresponding path constraint can be written as:

\[p_{t,\ell} = 1 \implies \sum_{k \in S} u_{j,k} = 1.\]

Because the one-hot block already satisfies \(\sum_{k \in K_j} u_{j,k}=1\), this enforces membership in the allowed subset without inventing an artificial ordering between categories.

Ensemble score aggregation

Let \(v_{t,\ell,c}\) denote the class contribution of leaf \(\ell\) in tree \(t\) for class \(c\), and let \(w_t\) be the tree weight. Then the ensemble decision function used by the optimization model can be written as:

\[f_c(x) = b_c + \sum_{t \in \mathcal{T}} \sum_{\ell \in \mathcal{L}_t} w_t \, v_{t,\ell,c} \, p_{t,\ell}.\]

The optional base term \(b_c\) appears for models such as XGBoost where a base score is part of the ensemble definition.

Distance objectives

The optimization objective measures how far the counterfactual is from the query. OCEAN uses different realizations depending on the backend, but the common modeling idea is the same.

For an \(L_1\) objective, a compact description is:

\[d_1(x, \hat{x}) = \sum_{j \in \mathcal{C} \cup \mathcal{O} \cup \mathcal{B}} |x_j - \hat{x}_j| + \frac{1}{2} \sum_{j \in \mathcal{E}} \sum_{k \in K_j} |u_{j,k} - \hat{x}_{j,k}|.\]

The factor \(\tfrac{1}{2}\) on one-hot blocks avoids double counting a single category switch, because moving from one active code to another changes two binary indicators.

The MIP backend also supports an \(L_2\) variant:

\[d_2(x, \hat{x}) = \sum_{j \in \mathcal{C} \cup \mathcal{O} \cup \mathcal{B}} (x_j - \hat{x}_j)^2 + \frac{1}{2} \sum_{j \in \mathcal{E}} \sum_{k \in K_j} (u_{j,k} - \hat{x}_{j,k})^2.\]

More generally, for any integer \(p \ge 1\), minimizing \(d_p(x, \hat{x})\) is equivalent to minimizing \(d_p(x, \hat{x})^p\) because the map \(t \mapsto t^p\) is strictly increasing on \(\mathbb{R}_{\ge 0}\).

Backend realizations

The three backends solve the same conceptual problem with different modeling primitives.

MIP: Uses continuous and binary decision variables directly and optimizes either \(L_1\) or \(L_2\) with linear or quadratic expressions.
CP: Uses finite-domain integer variables and a scaled integer \(L_p^p\) objective for integer norm values. Continuous features are modeled through interval indices derived from tree thresholds.
MaxSAT: Encodes the \(L_1\) objective as weighted soft clauses and the target prediction constraints as hard clauses or pseudo-Boolean constraints. In hard-voting mode for random forests, the target-class comparison is encoded with vote-count cardinality constraints instead of probability differences.

Reading the implementation

If you want to map these equations back to code, the main entry points are:

ocean.mip._model
ocean.cp._model
ocean.maxsat._model

Those modules differ in how they encode the variables and constraints, but they all implement the same optimization pattern described above.

Backend method reference

The following internal backend model classes use the same notation as this page:

\(x\) is the counterfactual decision vector encoded by feature variables,
\(\hat{x}\) is the processed query instance; in the implementation, the objective-building methods receive it through a parameter named x,
\(y\) is the target class,
\(f_c(x)\) is the class score for class \(c\),
\(p_{t,\ell}\) denotes the active leaf or path variable.

Mixed-integer programming model

class Model(trees, mapper, *, weights=None, n_isolators=0, max_samples=0, isolation_threshold=None, name='OCEAN', env=None, epsilon=1.52587890625e-05, num_epsilon=1e-06, model_type=Type.MIP, flow_type=FlowType.CONTINUOUS)[source]

Bases: BaseModel, FeatureManager, TreeManager, GarbageManager

Mixed-integer programming formulation for tree ensemble explanations.

The feature variables encode the counterfactual point x and the tree variables encode the active leaf or path decisions p_(t,l).

Initialize an empty MIP model for a parsed ensemble.

Parameters:

trees – Parsed trees whose leaves define the ensemble class scores.
mapper – Feature mapper describing how processed query columns map to decision variables.
weights – Optional non-negative tree weights.
n_isolators – Number of isolation trees contributing to the auxiliary isolation constraint on the path length.
max_samples – Reference sample count used by the isolation-forest extension.
isolation_threshold – Optional isolation-score cutoff in (0, 1]. When omitted, the classic average-path-length threshold is used.
name – Name of the underlying Gurobi model.
env – Optional Gurobi environment.
epsilon – Classification margin used in the pairwise score constraints.
num_epsilon – Strictness constant used in numerical split implications.
model_type – Backend builder variant.
flow_type – Encoding used for the tree flow variables.

build()[source]

Create the decision variables and structural constraints of the model.

This step introduces the feature variables encoding \(x\), the tree-path variables \(p_{t,\ell}\), and the constraints linking both so that exactly one leaf is active in each tree and the selected leaves are consistent with the feature values.

add_objective(x, *, norm=1, sense=1)[source]

Attach the distance objective \(d(x, \hat{x})\) to the MIP model.

Parameters:

x – Query point \(\hat{x}\) in the processed feature space. The code parameter is named x, but mathematically it represents the query.
norm – Distance norm used for \(d(x, \hat{x})\). The MIP backend supports \(L_0\), \(L_1\), and \(L_2\).
sense – Optimization sense passed to Gurobi. Counterfactual search uses minimization.

set_majority_class(y, *, op=0)[source]

Enforce the target class through pairwise score constraints.

For every competing class, this adds

\[f_y(x) \ge f_c(x) + \varepsilon_c\]

Raises:: ValueError – If y is not a valid class index.

clear_majority_class()[source]

Remove the current target-class constraints.

This deletes stored inequalities of the form

\[f_y(x) \ge f_c(x) + \varepsilon_c\]

cleanup()[source]

Remove query-specific objective auxiliaries and class constraints.

After cleanup(), the structural encoding of \(x\) and \(p_{t,\ell}\) remains, but temporary constraints created for a specific query \(\hat{x}\) are removed.

_set_majority_class(y, *, op)[source]

Encode the pairwise score dominance constraints.

The stored constraints are

\[f_y(x) - f_c(x) \ge \varepsilon_c, \qquad \forall c \neq y.\]

_set_isolation()[source]

Add the optional isolation-forest length constraint.

When isolation trees are present, this constrains the aggregate path length variable to remain above the minimum admissible value.

_add_objective(x, norm)[source]

Build the symbolic objective expression for \(d(x, \hat{x})\).

Each feature contributes an \(L_0\), \(L_1\), or \(L_2\) term. One-hot encoded coordinates use a factor \(1/2\) for \(L_1\) and \(L_2\) so that switching a category is not double counted.

Returns:: Linear or quadratic expression representing \(d(x, \hat{x})\).
Return type:: Objective
Raises:: ValueError – If x does not have the expected size or norm is unsupported.

L1(x, v, *, is_ohe)[source]

Return the MIP \(L_1\) contribution of one coordinate of \(x\).

This creates an auxiliary variable \(u\) such that \(u \ge |x_j - \hat{x}_j|\), where v encodes the counterfactual coordinate \(x_j\) and the code parameter x stores the query coordinate \(\hat{x}_j\).

Parameters:

x – Query coordinate \(\hat{x}_j\).
v – Model variable encoding the corresponding coordinate of \(x\).
is_ohe – Whether this coordinate belongs to a one-hot block \(u_{j,k}\). If so, the returned term is halved.

Returns:

Linear expression equal to the contribution of that coordinate to \(d_1(x, \hat{x})\).

Return type:

gp.LinExpr

static L2(x, v, *, is_ohe)[source]

Return the MIP \(L_2\) contribution of one coordinate of \(x\).

The returned expression is \((x_j - \hat{x}_j)^2\), halved for one-hot encoded coordinates to preserve the same category-switch semantics as the \(L_1\) objective.

Returns:: Quadratic expression equal to the contribution of that coordinate to \(d_2(x, \hat{x})\).
Return type:: gp.QuadExpr

Constraint-programming model

class Model(trees, mapper, *, weights=None, n_isolators=0, max_samples=0, isolation_threshold=None, epsilon=1, model_type=Type.CP)[source]

Bases: BaseModel, FeatureManager, TreeManager, GarbageManager

Constraint-programming formulation behind ocean.cp.Explainer.

The feature variables encode the counterfactual point x and the tree variables encode the active leaf decisions p_(t,l).

Initialize an empty CP-SAT model for a parsed ensemble.

Parameters:

trees – Parsed trees contributing to the ensemble class scores.
mapper – Feature mapper aligning processed query coordinates with the finite-domain decision variables.
weights – Optional tree weights.
n_isolators – Number of isolation trees appended after the predictive ensemble.
max_samples – Reference sample count used by the isolation-forest extension.
isolation_threshold – Optional isolation-score cutoff in (0, 1]. When omitted, the classic average-path-length threshold is used.
epsilon – Integer classification margin used in the pairwise score constraints.
model_type – Backend builder variant.

build()[source]

Create the CP variables and structural constraints of the formulation.

This builds the finite-domain feature variables for \(x\), the leaf/path variables \(p_{t,\ell}\), and the consistency constraints connecting both representations. When isolation trees are present, it also adds the minimum aggregate path-length constraint.

add_objective(x, *, norm=1)[source]

Minimize the scaled integer approximation of \(d_p(x, \hat{x})\).

Parameters:

x – Query point \(\hat{x}\) in the processed feature space. The code parameter is named x, but mathematically it represents the query.
norm – Distance norm. The CP backend supports non-negative integer norms, with \(L_1\) as the default.

set_majority_class(y, *, op=0)[source]

Enforce the target class through pairwise score inequalities.

For each competing class, this adds the integer-scaled constraint

\[f_y(x) \ge f_c(x) + \varepsilon_c\]

Raises:: ValueError – If y is not a valid class index.

_set_majority_class(y, *, op)[source]

Encode the target-class dominance constraints.

The created constraints encode

\[f_y(x) - f_c(x) \ge \varepsilon_c, \qquad \forall c \neq y.\]

cleanup()[source]

Remove query-specific constraints from the CP model.

The persistent structure over \(x\) and \(p_{t,\ell}\) is kept, while objective auxiliaries and class constraints for the last query \(\hat{x}\) are discarded.

_set_isolation()[source]

Add the optional isolation-forest length constraint.

When isolation trees are present, this constrains the scaled aggregate path length to remain above the scaled minimum admissible value.

_add_objective(x, norm)[source]

Build the scaled linear expression for \(d_p(x, \hat{x})^p\).

Continuous coordinates are costed through interval indices, while discrete, binary, and one-hot encoded features contribute exact scaled powered deviations.

Returns:: Linear objective expression approximating \(d_p(x, \hat{x})^p\).
Return type:: cp.ObjLinearExprT
Raises:: ValueError – If x does not have the expected size.

get_intervals_cost(levels, x, *, norm=1)[source]

Return interval costs for a continuous feature encoded by thresholds.

For a continuous coordinate \(x_j\), CP does not optimize directly over the real value. Instead it selects an interval between successive threshold levels. This helper assigns each interval the scaled \(L_p^p\) distance to the query coordinate \(\hat{x}_j\).

Returns:: Scaled costs for the threshold intervals representing that continuous feature.
Return type:: list[int]

Weighted MaxSAT model

class Model(trees, mapper, *, weights=None, hard_voting=False, max_samples=0, epsilon=1, model_type=Type.MAXSAT)[source]

Bases: BaseModel, FeatureManager, GarbageManager, TreeManager

Weighted MaxSAT formulation for tree ensemble explanations.

The Boolean feature variables encode the counterfactual point x and the Boolean tree variables encode the active leaf decisions p_(t,l).

Initialize an empty weighted MaxSAT model for a parsed ensemble.

Parameters:

trees – Parsed trees whose leaf activations define the ensemble class scores.
mapper – Feature mapper aligning processed query coordinates with the Boolean feature encoding.
weights – Optional tree weights.
hard_voting – Whether to build the hard-voting MaxSAT encoding.
max_samples – Reserved parameter used by compatible higher-level explainers.
epsilon – Integer tie-breaking margin used in the score constraints.
model_type – Backend builder variant.

build()[source]

Create the Boolean encoding of features, leaves, and path consistency.

After build(), the model contains the Boolean variables encoding the counterfactual point \(x\), the active leaves \(p_{t,\ell}\), and the hard clauses linking both.

add_objective(x, *, norm=1)[source]

Encode the \(L_1\) distance \(d_1(x, \hat{x})\) as soft clauses.

Parameters:

x – Query point \(\hat{x}\) in the processed feature space. The code parameter is named x, but mathematically it represents the query.
norm – Distance norm. The MaxSAT backend currently supports only \(L_1\).

Raises:

ValueError – If x does not have the expected size or norm is unsupported.

_add_soft_l1_binary(x_val, v)[source]

Add the soft clause corresponding to a binary coordinate of \(x\).

The clause penalizes a deviation from the query coordinate \(\hat{x}_j\) with weight proportional to \(|x_j - \hat{x}_j|\).

_add_soft_l1_ohe(x_val, v, code)[source]

Add the soft clause for one coordinate of a one-hot block.

The weight is halved so that changing category in an unordered nominal feature contributes one unit of \(L_1\) distance instead of two.

_add_soft_l1_continuous(x_val, v)[source]

Add soft clauses for an interval-encoded continuous coordinate.

Each Boolean interval selector receives a weight equal to the scaled distance between the query coordinate \(\hat{x}_j\) and that interval.

_add_soft_l1_discrete(x_val, v)[source]

Add soft clauses for an ordinal discrete feature.

If mu[i] denotes \(x_j = d_{j,i}\), this method assigns the scaled cost \(|d_{j,i} - \hat{x}_j|\) to every admissible level different from the query value.

_get_intervals_cost(levels, x)[source]

Compute interval costs relative to \(\hat{x}_j\).

This helper is used for continuous features whose Boolean encoding selects one interval between consecutive threshold levels.

Returns:: Scaled costs for the interval selectors attached to that continuous feature.
Return type:: list[int]

set_majority_class(y, *, op=0)[source]

Enforce the target class through hard score constraints.

For every competing class, the backend encodes the pairwise dominance condition

\[f_y(x) \ge f_c(x) + \varepsilon_c\]

Raises:: ValueError – If y is not a valid class index.

_set_majority_class(y, *, op=0)[source]

Encode the target-class dominance constraints in MaxSAT form.

The intended inequality is

\[f_y(x) - f_c(x) \ge \varepsilon_c, \qquad \forall c \neq y.\]

Since MaxSAT works on Boolean clauses, the leaf contributions of each tree are converted into an integer pseudo-Boolean constraint.

_set_hard_voting_majority_class(y, *, op=0)[source]

Encode hard-voting dominance using cardinality constraints.

For hard voting, each tree contributes one unit vote to the class of its active leaf. The target class is enforced through

\[\sum_t \mathbb{1}[\hat{y}_t(x)=y] \ge \sum_t \mathbb{1}[\hat{y}_t(x)=c] + \varepsilon_c, \qquad \forall c \neq y.\]

_encode_cardinality_difference_constraint(positive_lits, negative_lits, threshold)[source]

Encode sum(pos) - sum(neg) >= threshold as a cardinality bound.

The difference is rewritten as

\[\sum pos + \sum \lnot neg \ge |neg| + threshold.\]

This method is used for hard-voting score comparisons, where the contributions are Boolean and the threshold is an integer margin.

Parameters:

positive_lits – List of positive literals (e.g., votes for the target class).
negative_lits – List of negative literals (e.g., votes for the competing class).
threshold – Integer margin for the difference.

Raises:

ImportError – If pysat.card is not installed.

_encode_weighted_sum_constraint(tree_contributions, threshold)[source]

Encode a hard pseudo-Boolean constraint on tree contributions.

This method encodes

\[\sum_i a_i \ell_i \ge \tau,\]

where threshold provides the integer bound \(\tau\). This is the MaxSAT realization of the score comparison

\[f_y(x) - f_c(x) \ge \varepsilon_c\]

Parameters:

tree_contributions (list[list[tuple[int, int]]]) – List of contributions for each tree.
threshold (int) – Threshold for the sum of contributions.

Raises:

ImportError – If pysat.pb is not installed.

static _normalize_tree_contributions(tree_contributions, threshold)[source]

Normalize contributions using the one-active-leaf property per tree.

Subtracting the minimum contribution of each tree preserves the score inequality and removes avoidable negative coefficients. A final GCD reduction keeps the PB coefficients smaller before calling PySAT.

Parameters:

tree_contributions (list[list[tuple[int, int]]]) – List of contributions for each tree.
threshold (int) – Threshold for the sum of contributions.

Returns:

Normalized contributions and the adjusted threshold.

Return type:

tuple[list[list[tuple[int, int]]], int]

cleanup()[source]

Remove query-specific soft and hard clauses from the MaxSAT model.

The structural Boolean encoding of \(x\) and \(p_{t,\ell}\) stays in place, while the objective clauses and target-class clauses for the previous query \(\hat{x}\) are removed.