Constraint Programming Backend

The ocean.cp module exposes the OR-Tools CP-SAT formulation. In addition to the standard counterfactual search API, the public explainer also accepts an optional IsolationForest through isolation=... when you want to add the same plausibility constraint available in the MIP backend.

Constraint programming backend for optimal counterfactual search.

class BaseModel[source]

Bases: ABC, CpModel

Base OR-Tools model used by the constraint programming backend.

build_vars(*variables)[source]
class ConstraintProgramBuilder(*args, **kwargs)[source]

Bases: ModelBuilder

Build CP constraints linking feature variables to tree paths.

build(model, *, trees, mapper)[source]

Build the model constraints for the given trees and features.

Parameters:
  • model (BaseModel) – The model to which the constraints will be added.

  • trees (tuple[TreeVar, ...]) – The tree variables for which the constraints will be built.

  • mapper (Mapper[FeatureVar]) – The feature variables for which the constraints will be built.

class Explainer(ensemble, *, mapper, weights=None, isolation=None, isolation_threshold=None, epsilon=1, model_type=Type.CP)[source]

Bases: Model, BaseExplainer

Constraint programming explainer for tree ensemble classifiers.

Initialize an empty CP-SAT model for a parsed ensemble.

Parameters:
  • trees – Parsed trees contributing to the ensemble class scores.

  • mapper – Feature mapper aligning processed query coordinates with the finite-domain decision variables.

  • weights – Optional tree weights.

  • n_isolators – Number of isolation trees appended after the predictive ensemble.

  • max_samples – Reference sample count used by the isolation-forest extension.

  • isolation_threshold – Optional isolation-score cutoff in (0, 1]. When omitted, the classic average-path-length threshold is used.

  • epsilon – Integer classification margin used in the pairwise score constraints.

  • model_type – Backend builder variant.

explain(x, *, y, norm, return_callback=False, verbose=False, max_time=60, num_workers=None, random_seed=42, clean_up=True)[source]

Solve one counterfactual query with the CP-SAT backend.

Parameters:
  • x – Query instance in the processed feature space.

  • y – Target class enforced by the counterfactual.

  • norm – Non-negative integer distance norm used by the CP objective.

  • return_callback – Whether to record incumbent solutions during the search.

  • verbose – Whether to enable CP-SAT search logging.

  • max_time – Time limit in seconds.

  • num_workers – Optional number of CP-SAT workers.

  • random_seed – Random seed passed to CP-SAT.

  • clean_up – Whether to remove query-specific constraints after the solve.

Returns:

The decoded counterfactual, or None when no feasible counterfactual is found within the given limits.

Return type:

Explanation | None

Raises:

RuntimeError – If CP-SAT reports an invalid model or an unexpected status.

get_anytime_solutions()[source]

Return intermediate solutions collected during the last solve.

Returns:

Time-stamped incumbent objective values when return_callback was enabled in explain(), otherwise None.

Return type:

list[dict[str, float]] | None

get_distance()[source]

Return the post-processed distance of the last CF.

Returns:

Post-processed \(L_p\) distance for the last successful solve.

Return type:

float

Raises:

RuntimeError – If no explanation has been computed yet.

get_objective_value()[source]

Return the scaled objective value of the last CP-SAT solve.

Returns:

Objective value rescaled back to the user-facing distance units.

Return type:

float

get_solving_status()[source]

Return the status string from the latest CP-SAT solve.

Returns:

Solver status such as "OPTIMAL", "FEASIBLE", or "INFEASIBLE".

Return type:

str

class Explanation(mapping=None, *, columns=None, validate=True)[source]

Bases: Mapper[FeatureVar], BaseExplanation

Concrete explanation container returned by the CP backend.

Initialize a mapper from feature metadata and transformed columns.

Parameters:
  • mapping – Mapping from original feature names to metadata objects.

  • columns – Processed pandas index describing the transformed coordinates.

  • validate – Whether to verify that mapping and columns are consistent.

format_discrete_value(f, val, thresholds)[source]
format_value(f, idx, levels)[source]
property query
to_numpy()[source]
to_series()[source]
property value
vget(i)[source]
property x
class FeatureManager(mapper)[source]

Bases: object

Manage CP feature variables for the counterfactual point \(x\).

This manager owns the finite-domain variables representing the processed coordinates of the counterfactual explanation. The query \(\hat{x}\) only appears later in the objective terms.

Wrap the parsed feature mapper in CP-specific feature variables.

FEATURE_VAR_FMT = 'feature[{key}]'
build_features(model)[source]

Create CP variables for the coordinates of \(x\).

property explanation
property mapper
property n_columns
property n_features
vget(i)[source]

Return the CP variable for processed coordinate i of x.

Returns:

CP variable representing the requested coordinate.

Return type:

cp.IntVar

class FeatureVar(feature, name)[source]

Bases: Var, FeatureKeeper

CP variable bundle associated with a single parsed feature.

X_VAR_NAME_FMT = 'x[{name}]'
build(model)[source]
objvarget()[source]
xget(code=None)[source]
class Model(trees, mapper, *, weights=None, n_isolators=0, max_samples=0, isolation_threshold=None, epsilon=1, model_type=Type.CP)[source]

Bases: BaseModel, FeatureManager, TreeManager, GarbageManager

Constraint-programming formulation behind ocean.cp.Explainer.

The feature variables encode the counterfactual point x and the tree variables encode the active leaf decisions p_(t,l).

Initialize an empty CP-SAT model for a parsed ensemble.

Parameters:
  • trees – Parsed trees contributing to the ensemble class scores.

  • mapper – Feature mapper aligning processed query coordinates with the finite-domain decision variables.

  • weights – Optional tree weights.

  • n_isolators – Number of isolation trees appended after the predictive ensemble.

  • max_samples – Reference sample count used by the isolation-forest extension.

  • isolation_threshold – Optional isolation-score cutoff in (0, 1]. When omitted, the classic average-path-length threshold is used.

  • epsilon – Integer classification margin used in the pairwise score constraints.

  • model_type – Backend builder variant.

DEFAULT_EPSILON = 1
Lp(x, v, code=None, *, norm=1)[source]

Build the CP contribution of one feature to \(d_p(x, \hat{x})^p\).

Parameters:
  • x – Query coordinate \(\hat{x}_j\).

  • v – CP feature variable representing one original feature or one one-hot block in the counterfactual point \(x\).

  • code – Optional category code \(k\) when the feature is represented by one-hot variables \(u_{j,k}\).

  • norm – Distance norm \(p\) used to build \(d_p(x, \hat{x})^p\).

Returns:

Scaled linear expression for the contribution of that feature to \(d_p(x, \hat{x})^p\).

Return type:

cp.LinearExpr

class Type(*values)[source]

Bases: Enum

CP = 'CP'
add_objective(x, *, norm=1)[source]

Minimize the scaled integer approximation of \(d_p(x, \hat{x})\).

Parameters:
  • x – Query point \(\hat{x}\) in the processed feature space. The code parameter is named x, but mathematically it represents the query.

  • norm – Distance norm. The CP backend supports non-negative integer norms, with \(L_1\) as the default.

build()[source]

Create the CP variables and structural constraints of the formulation.

This builds the finite-domain feature variables for \(x\), the leaf/path variables \(p_{t,\ell}\), and the consistency constraints connecting both representations. When isolation trees are present, it also adds the minimum aggregate path-length constraint.

cleanup()[source]

Remove query-specific constraints from the CP model.

The persistent structure over \(x\) and \(p_{t,\ell}\) is kept, while objective auxiliaries and class constraints for the last query \(\hat{x}\) are discarded.

static get_discrete_base_values(v)[source]
get_discrete_values(v, x)[source]
get_intervals_cost(levels, x, *, norm=1)[source]

Return interval costs for a continuous feature encoded by thresholds.

For a continuous coordinate \(x_j\), CP does not optimize directly over the real value. Instead it selects an interval between successive threshold levels. This helper assigns each interval the scaled \(L_p^p\) distance to the query coordinate \(\hat{x}_j\).

Returns:

Scaled costs for the threshold intervals representing that continuous feature.

Return type:

list[int]

get_value_idx_var(v)[source]
get_values_cost(values, x, *, norm=1)[source]

Return scaled \(L_p^p\) costs for one finite-value domain.

Returns:

Scaled powered costs for each admissible value.

Return type:

list[int]

set_majority_class(y, *, op=0)[source]

Enforce the target class through pairwise score inequalities.

For each competing class, this adds the integer-scaled constraint

\[f_y(x) \ge f_c(x) + \varepsilon_c\]
Raises:

ValueError – If y is not a valid class index.

class TreeManager(trees, *, weights=None, n_isolators=0, max_samples=0, isolation_threshold=None, scale=10000000000)

Bases: object

Manage CP tree variables and the scaled decision function \(f\).

Each TreeVar encodes the active leaf decisions \(p_{t,\ell}\) of one tree. Aggregating those variables yields the integer-scaled CP representation of \(f(x)\).

Initialize the tree manager and the score-scaling factor.

DEFAULT_SCORE_SCALE = 10000000000
NUM_BINARY_CLASS = 2
TREE_VAR_FMT = 'tree[{t}]'
XGBOOST_DEFAULT_CLASS = 1
build_trees(model)

Create the CP tree variables and cache \(f(x)\).

property estimators
property function
property isolation_threshold
property isolators
property length
property length_scale
property max_samples
property min_average_length
property min_length
property min_length_scaled
property n_classes
property n_estimators
property n_isolators
property n_trees
property score_scale
property shape
property trees
weighted_function(weights)

Return the integer-scaled CP representation of \(f(x)\).

The returned dictionary maps each output-class pair (op, c) to the corresponding linear expression for \(f_c(x)\).

Returns:

Dictionary of scaled class-score expressions.

Return type:

dict[tuple[NonNegativeInt, NonNegativeInt], cp.LinearExpr]

property weights
class TreeVar(tree, name, *, length_scale=100000000)[source]

Bases: Var, TreeKeeper, Mapping[NonNegativeInt, _IntVar]

CP variables encoding the active root-to-leaf path of one tree.

DEFAULT_LENGTH_SCALE = 100000000
PATH_VAR_NAME_FMT = '{name}_path'
build(model)[source]
property length
property length_scale