Overview
OCEAN turns fitted tree ensembles into optimization models that search for the closest counterfactual satisfying a target prediction. The library is centered around a simple idea: parse the ensemble into a solver-friendly tree structure, link those tree decisions to feature variables, and optimize the smallest change that flips the model output.
For the mathematical formulation behind that workflow, see Mathematical Model.
What OCEAN expects
OCEAN works best when you separate the workflow into two stages:
Convert raw tabular data into a numerical matrix with
ocean.feature.parse_features().Train a supported tree ensemble on that processed matrix and keep the returned mapper alongside the fitted model.
The mapper is the bridge between the transformed columns seen by the ensemble and the original feature names used by the explanation objects.
Supported models
At the public explainer level, OCEAN supports these fitted classifiers:
sklearn.ensemble.RandomForestClassifiersklearn.ensemble.AdaBoostClassifierxgboost.XGBClassifier
At the lower-level tree parsing layer, xgboost.Booster and
sklearn.ensemble.IsolationForest are also supported where the backend uses
those structures.
Backend summary
Backend |
Public class |
Supported norms |
Notes |
|---|---|---|---|
MIP |
|
|
Requires Gurobi. Also supports adding isolation-forest constraints. |
CP |
|
Integer |
Uses OR-Tools CP-SAT, is the easiest exact backend to run locally, and also supports adding isolation-forest constraints. |
MaxSAT |
|
|
Uses a weighted MaxSAT encoding backed by PySAT and supports an optional hard-voting mode for random forests. |
Common workflow
Prepare data with
ocean.feature.parse_features()or a packaged dataset loader such asocean.datasets.load_adult().Fit a supported ensemble on the processed matrix.
Instantiate one of the public explainers from
oceanwith the model and mapper.Select a query
xas a one-dimensional numpy array in the processed feature space.Call
explainer.explain(x, y=target_class, norm=...).Inspect the result through
explanation.x,explanation.to_series(), the more human-readableexplanation.valuemapping, orexplainer.get_distance()for the post-processed query-to-CF distance.
What the explanation object gives back
Every backend returns a backend-specific explanation object, but the user-level surface is intentionally similar.
.xreturns the counterfactual as a flat numpy array aligned with the processed training columns..to_series()returns the same information as a pandas series..valuereturns a mapping keyed by the original feature names, decoding one-hot encoded groups back into a categorical value when possible.repr(explanation)prints that value-oriented mapping, which is usually the most readable form for reports and notebooks.
If you want to solve multiple queries with the same MIP explainer instance,
you usually do not need any extra step because all three explainers default to
clean_up=True inside explain. Call cleanup() manually only when
you disabled that behavior with clean_up=False and want to reuse the same
instance safely.