Mathematical Model
This page summarizes the optimization problem solved by OCEAN at the modeling level. The goal is not to reproduce every implementation detail of every backend, but to make the shared mathematical structure explicit.
Problem statement
Let:
\(\hat{x} \in \mathcal{X}\) be the query instance,
\(y\) be the desired target class,
\(x\) be the counterfactual decision vector.
The ensemble prediction rule is:
At a high level, OCEAN solves:
Here \(f_c(x)\) is the class-\(c\) component of the ensemble decision function, evaluated at the candidate counterfactual \(x\). The tie-breaking constant is defined as:
This enforces that the target class is strictly preferred, including in the presence of ties. The default value of \(\varepsilon\) depends on the backend:
MIP uses \(\varepsilon = 2^{-16} = 1/65536 \approx 1.5259 \times 10^{-5}\).
CP uses \(\varepsilon = 1\) because the score expressions are integer scaled.
MaxSAT uses \(\varepsilon = 1\) after integerizing the score differences used in the pseudo-Boolean encoding.
The MIP backend also uses a numerical strictness constant \(\eta = \texttt{num\_epsilon} = 10^{-6}\) for strict right-branch comparisons in numerical splits.
Optional isolation-forest constraint
When an isolation forest is supplied to the MIP or CP explainer, OCEAN parses those isolation trees after the predictive ensemble and adds a plausibility constraint based on the aggregate path length. If \(\mathcal{T}_{iso}\) denotes the isolation trees and \(L_t(x)\) the path length selected in tree \(t\), the additional condition is
where \(\tau = \texttt{isolation\_threshold}\) and
Here \(c(\cdot)\) is the usual isolation-forest average path-length
normalizer. The historical behavior corresponds to isolation_threshold=0.5
because \(-\log_2(0.5)=1\).
This means the threshold is monotone in the opposite direction one might first guess:
larger values such as
0.9are weaker because they require a smaller minimum path length,smaller values such as
0.1are stricter because they require a much larger minimum path length.
Intuitively, this keeps the counterfactual away from points that would be isolated too quickly by the auxiliary forest.
The extra inequality can also make a target infeasible. That happens when the classifier still has valid target-class regions, but every target-class point that respects the tree ensemble is isolated too quickly by the auxiliary forest. In that case the classification constraints and the isolation-path constraint have no common solution.
Feature domains
The feasible set \(\mathcal{X}\) depends on the feature type produced by
ocean.feature.parse_features().
- Continuous feature
- \[x_j \in [a_j, b_j]\]
- Ordinal discrete feature
- \[x_j \in D_j = \{d_{j,1}, \dots, d_{j,m_j}\}\]
- Binary feature
- \[x_j \in \{0, 1\}\]
- Unordered nominal feature
Encoded as a one-hot block \(u_{j,k}\) with:
\[u_{j,k} \in \{0, 1\}, \qquad \sum_{k \in K_j} u_{j,k} = 1.\]
This is why ordinal features and one-hot encoded features are treated differently in OCEAN: ordinal values belong to an ordered finite set, while unordered categories are modeled as mutually exclusive indicator variables.
Tree path variables
For each tree \(t\) and each leaf \(\ell\) of that tree, OCEAN creates path or leaf-selection variables. Abstractly, they satisfy:
The equality means exactly one leaf is active in each tree. Path consistency constraints ensure that if \(p_{t,\ell} = 1\), then the feature vector \(x\) satisfies every split on the root-to-leaf path leading to \(\ell\).
For a numerical split of the form \(x_j \le \tau\), the abstract logic is:
where \(\eta\) is the numerical strictness constant described above. In practice, MIP uses explicit linear implications, CP works with threshold intervals, and MaxSAT reasons over Boolean encodings of the same branch conditions.
For one-hot encoded nominal features, branch tests are expressed on the indicator variables rather than on an ordered scalar value. If a branch accepts exactly the category subset \(S \subseteq K_j\), the corresponding path constraint can be written as:
Because the one-hot block already satisfies \(\sum_{k \in K_j} u_{j,k}=1\), this enforces membership in the allowed subset without inventing an artificial ordering between categories.
Ensemble score aggregation
Let \(v_{t,\ell,c}\) denote the class contribution of leaf \(\ell\) in tree \(t\) for class \(c\), and let \(w_t\) be the tree weight. Then the ensemble decision function used by the optimization model can be written as:
The optional base term \(b_c\) appears for models such as XGBoost where a base score is part of the ensemble definition.
Distance objectives
The optimization objective measures how far the counterfactual is from the query. OCEAN uses different realizations depending on the backend, but the common modeling idea is the same.
For an \(L_1\) objective, a compact description is:
The factor \(\tfrac{1}{2}\) on one-hot blocks avoids double counting a single category switch, because moving from one active code to another changes two binary indicators.
The MIP backend also supports an \(L_2\) variant:
More generally, for any integer \(p \ge 1\), minimizing \(d_p(x, \hat{x})\) is equivalent to minimizing \(d_p(x, \hat{x})^p\) because the map \(t \mapsto t^p\) is strictly increasing on \(\mathbb{R}_{\ge 0}\).
Backend realizations
The three backends solve the same conceptual problem with different modeling primitives.
- MIP
Uses continuous and binary decision variables directly and optimizes either \(L_1\) or \(L_2\) with linear or quadratic expressions.
- CP
Uses finite-domain integer variables and a scaled integer \(L_p^p\) objective for integer
normvalues. Continuous features are modeled through interval indices derived from tree thresholds.- MaxSAT
Encodes the \(L_1\) objective as weighted soft clauses and the target prediction constraints as hard clauses or pseudo-Boolean constraints. In hard-voting mode for random forests, the target-class comparison is encoded with vote-count cardinality constraints instead of probability differences.
Reading the implementation
If you want to map these equations back to code, the main entry points are:
ocean.mip._modelocean.cp._modelocean.maxsat._model
Those modules differ in how they encode the variables and constraints, but they all implement the same optimization pattern described above.
Backend method reference
The following internal backend model classes use the same notation as this page:
\(x\) is the counterfactual decision vector encoded by feature variables,
\(\hat{x}\) is the processed query instance; in the implementation, the objective-building methods receive it through a parameter named
x,\(y\) is the target class,
\(f_c(x)\) is the class score for class \(c\),
\(p_{t,\ell}\) denotes the active leaf or path variable.
Mixed-integer programming model
- class Model(trees, mapper, *, weights=None, n_isolators=0, max_samples=0, isolation_threshold=None, name='OCEAN', env=None, epsilon=1.52587890625e-05, num_epsilon=1e-06, model_type=Type.MIP, flow_type=FlowType.CONTINUOUS)[source]
Bases:
BaseModel,FeatureManager,TreeManager,GarbageManagerMixed-integer programming formulation for tree ensemble explanations.
The feature variables encode the counterfactual point
xand the tree variables encode the active leaf or path decisionsp_(t,l).Initialize an empty MIP model for a parsed ensemble.
- Parameters:
trees – Parsed trees whose leaves define the ensemble class scores.
mapper – Feature mapper describing how processed query columns map to decision variables.
weights – Optional non-negative tree weights.
n_isolators – Number of isolation trees contributing to the auxiliary isolation constraint on the path length.
max_samples – Reference sample count used by the isolation-forest extension.
isolation_threshold – Optional isolation-score cutoff in
(0, 1]. When omitted, the classic average-path-length threshold is used.name – Name of the underlying Gurobi model.
env – Optional Gurobi environment.
epsilon – Classification margin used in the pairwise score constraints.
num_epsilon – Strictness constant used in numerical split implications.
model_type – Backend builder variant.
flow_type – Encoding used for the tree flow variables.
- build()[source]
Create the decision variables and structural constraints of the model.
This step introduces the feature variables encoding \(x\), the tree-path variables \(p_{t,\ell}\), and the constraints linking both so that exactly one leaf is active in each tree and the selected leaves are consistent with the feature values.
- add_objective(x, *, norm=1, sense=1)[source]
Attach the distance objective \(d(x, \hat{x})\) to the MIP model.
- Parameters:
x – Query point \(\hat{x}\) in the processed feature space. The code parameter is named
x, but mathematically it represents the query.norm – Distance norm used for \(d(x, \hat{x})\). The MIP backend supports \(L_0\), \(L_1\), and \(L_2\).
sense – Optimization sense passed to Gurobi. Counterfactual search uses minimization.
- set_majority_class(y, *, op=0)[source]
Enforce the target class through pairwise score constraints.
For every competing class, this adds
\[f_y(x) \ge f_c(x) + \varepsilon_c\]- Raises:
ValueError – If
yis not a valid class index.
- clear_majority_class()[source]
Remove the current target-class constraints.
This deletes stored inequalities of the form
\[f_y(x) \ge f_c(x) + \varepsilon_c\]
- cleanup()[source]
Remove query-specific objective auxiliaries and class constraints.
After
cleanup(), the structural encoding of \(x\) and \(p_{t,\ell}\) remains, but temporary constraints created for a specific query \(\hat{x}\) are removed.
- _set_majority_class(y, *, op)[source]
Encode the pairwise score dominance constraints.
The stored constraints are
\[f_y(x) - f_c(x) \ge \varepsilon_c, \qquad \forall c \neq y.\]
- _set_isolation()[source]
Add the optional isolation-forest length constraint.
When isolation trees are present, this constrains the aggregate path length variable to remain above the minimum admissible value.
- _add_objective(x, norm)[source]
Build the symbolic objective expression for \(d(x, \hat{x})\).
Each feature contributes an \(L_0\), \(L_1\), or \(L_2\) term. One-hot encoded coordinates use a factor \(1/2\) for \(L_1\) and \(L_2\) so that switching a category is not double counted.
- Returns:
Linear or quadratic expression representing \(d(x, \hat{x})\).
- Return type:
Objective
- Raises:
ValueError – If
xdoes not have the expected size ornormis unsupported.
- L1(x, v, *, is_ohe)[source]
Return the MIP \(L_1\) contribution of one coordinate of \(x\).
This creates an auxiliary variable \(u\) such that \(u \ge |x_j - \hat{x}_j|\), where
vencodes the counterfactual coordinate \(x_j\) and the code parameterxstores the query coordinate \(\hat{x}_j\).- Parameters:
x – Query coordinate \(\hat{x}_j\).
v – Model variable encoding the corresponding coordinate of \(x\).
is_ohe – Whether this coordinate belongs to a one-hot block \(u_{j,k}\). If so, the returned term is halved.
- Returns:
Linear expression equal to the contribution of that coordinate to \(d_1(x, \hat{x})\).
- Return type:
gp.LinExpr
- static L2(x, v, *, is_ohe)[source]
Return the MIP \(L_2\) contribution of one coordinate of \(x\).
The returned expression is \((x_j - \hat{x}_j)^2\), halved for one-hot encoded coordinates to preserve the same category-switch semantics as the \(L_1\) objective.
- Returns:
Quadratic expression equal to the contribution of that coordinate to \(d_2(x, \hat{x})\).
- Return type:
gp.QuadExpr
Constraint-programming model
- class Model(trees, mapper, *, weights=None, n_isolators=0, max_samples=0, isolation_threshold=None, epsilon=1, model_type=Type.CP)[source]
Bases:
BaseModel,FeatureManager,TreeManager,GarbageManagerConstraint-programming formulation behind
ocean.cp.Explainer.The feature variables encode the counterfactual point
xand the tree variables encode the active leaf decisionsp_(t,l).Initialize an empty CP-SAT model for a parsed ensemble.
- Parameters:
trees – Parsed trees contributing to the ensemble class scores.
mapper – Feature mapper aligning processed query coordinates with the finite-domain decision variables.
weights – Optional tree weights.
n_isolators – Number of isolation trees appended after the predictive ensemble.
max_samples – Reference sample count used by the isolation-forest extension.
isolation_threshold – Optional isolation-score cutoff in
(0, 1]. When omitted, the classic average-path-length threshold is used.epsilon – Integer classification margin used in the pairwise score constraints.
model_type – Backend builder variant.
- build()[source]
Create the CP variables and structural constraints of the formulation.
This builds the finite-domain feature variables for \(x\), the leaf/path variables \(p_{t,\ell}\), and the consistency constraints connecting both representations. When isolation trees are present, it also adds the minimum aggregate path-length constraint.
- add_objective(x, *, norm=1)[source]
Minimize the scaled integer approximation of \(d_p(x, \hat{x})\).
- Parameters:
x – Query point \(\hat{x}\) in the processed feature space. The code parameter is named
x, but mathematically it represents the query.norm – Distance norm. The CP backend supports non-negative integer norms, with \(L_1\) as the default.
- set_majority_class(y, *, op=0)[source]
Enforce the target class through pairwise score inequalities.
For each competing class, this adds the integer-scaled constraint
\[f_y(x) \ge f_c(x) + \varepsilon_c\]- Raises:
ValueError – If
yis not a valid class index.
- _set_majority_class(y, *, op)[source]
Encode the target-class dominance constraints.
The created constraints encode
\[f_y(x) - f_c(x) \ge \varepsilon_c, \qquad \forall c \neq y.\]
- cleanup()[source]
Remove query-specific constraints from the CP model.
The persistent structure over \(x\) and \(p_{t,\ell}\) is kept, while objective auxiliaries and class constraints for the last query \(\hat{x}\) are discarded.
- _set_isolation()[source]
Add the optional isolation-forest length constraint.
When isolation trees are present, this constrains the scaled aggregate path length to remain above the scaled minimum admissible value.
- _add_objective(x, norm)[source]
Build the scaled linear expression for \(d_p(x, \hat{x})^p\).
Continuous coordinates are costed through interval indices, while discrete, binary, and one-hot encoded features contribute exact scaled powered deviations.
- Returns:
Linear objective expression approximating \(d_p(x, \hat{x})^p\).
- Return type:
cp.ObjLinearExprT
- Raises:
ValueError – If
xdoes not have the expected size.
- get_intervals_cost(levels, x, *, norm=1)[source]
Return interval costs for a continuous feature encoded by thresholds.
For a continuous coordinate \(x_j\), CP does not optimize directly over the real value. Instead it selects an interval between successive threshold levels. This helper assigns each interval the scaled \(L_p^p\) distance to the query coordinate \(\hat{x}_j\).
- Returns:
Scaled costs for the threshold intervals representing that continuous feature.
- Return type:
list[int]
Weighted MaxSAT model
- class Model(trees, mapper, *, weights=None, hard_voting=False, max_samples=0, epsilon=1, model_type=Type.MAXSAT)[source]
Bases:
BaseModel,FeatureManager,GarbageManager,TreeManagerWeighted MaxSAT formulation for tree ensemble explanations.
The Boolean feature variables encode the counterfactual point
xand the Boolean tree variables encode the active leaf decisionsp_(t,l).Initialize an empty weighted MaxSAT model for a parsed ensemble.
- Parameters:
trees – Parsed trees whose leaf activations define the ensemble class scores.
mapper – Feature mapper aligning processed query coordinates with the Boolean feature encoding.
weights – Optional tree weights.
hard_voting – Whether to build the hard-voting MaxSAT encoding.
max_samples – Reserved parameter used by compatible higher-level explainers.
epsilon – Integer tie-breaking margin used in the score constraints.
model_type – Backend builder variant.
- build()[source]
Create the Boolean encoding of features, leaves, and path consistency.
After
build(), the model contains the Boolean variables encoding the counterfactual point \(x\), the active leaves \(p_{t,\ell}\), and the hard clauses linking both.
- add_objective(x, *, norm=1)[source]
Encode the \(L_1\) distance \(d_1(x, \hat{x})\) as soft clauses.
- Parameters:
x – Query point \(\hat{x}\) in the processed feature space. The code parameter is named
x, but mathematically it represents the query.norm – Distance norm. The MaxSAT backend currently supports only \(L_1\).
- Raises:
ValueError – If
xdoes not have the expected size ornormis unsupported.
- _add_soft_l1_binary(x_val, v)[source]
Add the soft clause corresponding to a binary coordinate of \(x\).
The clause penalizes a deviation from the query coordinate \(\hat{x}_j\) with weight proportional to \(|x_j - \hat{x}_j|\).
- _add_soft_l1_ohe(x_val, v, code)[source]
Add the soft clause for one coordinate of a one-hot block.
The weight is halved so that changing category in an unordered nominal feature contributes one unit of \(L_1\) distance instead of two.
- _add_soft_l1_continuous(x_val, v)[source]
Add soft clauses for an interval-encoded continuous coordinate.
Each Boolean interval selector receives a weight equal to the scaled distance between the query coordinate \(\hat{x}_j\) and that interval.
- _add_soft_l1_discrete(x_val, v)[source]
Add soft clauses for an ordinal discrete feature.
If
mu[i]denotes \(x_j = d_{j,i}\), this method assigns the scaled cost \(|d_{j,i} - \hat{x}_j|\) to every admissible level different from the query value.
- _get_intervals_cost(levels, x)[source]
Compute interval costs relative to \(\hat{x}_j\).
This helper is used for continuous features whose Boolean encoding selects one interval between consecutive threshold levels.
- Returns:
Scaled costs for the interval selectors attached to that continuous feature.
- Return type:
list[int]
- set_majority_class(y, *, op=0)[source]
Enforce the target class through hard score constraints.
For every competing class, the backend encodes the pairwise dominance condition
\[f_y(x) \ge f_c(x) + \varepsilon_c\]- Raises:
ValueError – If
yis not a valid class index.
- _set_majority_class(y, *, op=0)[source]
Encode the target-class dominance constraints in MaxSAT form.
The intended inequality is
\[f_y(x) - f_c(x) \ge \varepsilon_c, \qquad \forall c \neq y.\]Since MaxSAT works on Boolean clauses, the leaf contributions of each tree are converted into an integer pseudo-Boolean constraint.
- _set_hard_voting_majority_class(y, *, op=0)[source]
Encode hard-voting dominance using cardinality constraints.
For hard voting, each tree contributes one unit vote to the class of its active leaf. The target class is enforced through
\[\sum_t \mathbb{1}[\hat{y}_t(x)=y] \ge \sum_t \mathbb{1}[\hat{y}_t(x)=c] + \varepsilon_c, \qquad \forall c \neq y.\]
- _encode_cardinality_difference_constraint(positive_lits, negative_lits, threshold)[source]
Encode
sum(pos) - sum(neg) >= thresholdas a cardinality bound.The difference is rewritten as
\[\sum pos + \sum \lnot neg \ge |neg| + threshold.\]This method is used for hard-voting score comparisons, where the contributions are Boolean and the threshold is an integer margin.
- Parameters:
positive_lits – List of positive literals (e.g., votes for the target class).
negative_lits – List of negative literals (e.g., votes for the competing class).
threshold – Integer margin for the difference.
- Raises:
ImportError – If pysat.card is not installed.
- _encode_weighted_sum_constraint(tree_contributions, threshold)[source]
Encode a hard pseudo-Boolean constraint on tree contributions.
This method encodes
\[\sum_i a_i \ell_i \ge \tau,\]where
thresholdprovides the integer bound \(\tau\). This is the MaxSAT realization of the score comparison\[f_y(x) - f_c(x) \ge \varepsilon_c\]- Parameters:
tree_contributions (list[list[tuple[int, int]]]) – List of contributions for each tree.
threshold (int) – Threshold for the sum of contributions.
- Raises:
ImportError – If pysat.pb is not installed.
- static _normalize_tree_contributions(tree_contributions, threshold)[source]
Normalize contributions using the one-active-leaf property per tree.
Subtracting the minimum contribution of each tree preserves the score inequality and removes avoidable negative coefficients. A final GCD reduction keeps the PB coefficients smaller before calling PySAT.
- Parameters:
tree_contributions (list[list[tuple[int, int]]]) – List of contributions for each tree.
threshold (int) – Threshold for the sum of contributions.
- Returns:
Normalized contributions and the adjusted threshold.
- Return type:
tuple[list[list[tuple[int, int]]], int]