Overview ======== OCEAN turns fitted tree ensembles into optimization models that search for the closest counterfactual satisfying a target prediction. The library is centered around a simple idea: parse the ensemble into a solver-friendly tree structure, link those tree decisions to feature variables, and optimize the smallest change that flips the model output. For the mathematical formulation behind that workflow, see :doc:`modelisation`. What OCEAN expects ------------------ OCEAN works best when you separate the workflow into two stages: 1. Convert raw tabular data into a numerical matrix with :func:`ocean.feature.parse_features`. 2. Train a supported tree ensemble on that processed matrix and keep the returned mapper alongside the fitted model. The mapper is the bridge between the transformed columns seen by the ensemble and the original feature names used by the explanation objects. Supported models ---------------- At the public explainer level, OCEAN supports these fitted classifiers: - ``sklearn.ensemble.RandomForestClassifier`` - ``sklearn.ensemble.AdaBoostClassifier`` - ``xgboost.XGBClassifier`` At the lower-level tree parsing layer, ``xgboost.Booster`` and ``sklearn.ensemble.IsolationForest`` are also supported where the backend uses those structures. Backend summary --------------- .. list-table:: Backend comparison :header-rows: 1 * - Backend - Public class - Supported norms - Notes * - MIP - ``ocean.MixedIntegerProgramExplainer`` - ``1`` and ``2`` - Requires Gurobi. Also supports adding isolation-forest constraints. * - CP - ``ocean.ConstraintProgrammingExplainer`` - Integer ``p >= 1`` with default ``1`` - Uses OR-Tools CP-SAT, is the easiest exact backend to run locally, and also supports adding isolation-forest constraints. * - MaxSAT - ``ocean.MaxSATExplainer`` - ``1`` - Uses a weighted MaxSAT encoding backed by PySAT and supports an optional hard-voting mode for random forests. Common workflow --------------- 1. Prepare data with :func:`ocean.feature.parse_features` or a packaged dataset loader such as :func:`ocean.datasets.load_adult`. 2. Fit a supported ensemble on the processed matrix. 3. Instantiate one of the public explainers from :mod:`ocean` with the model and mapper. 4. Select a query ``x`` as a one-dimensional numpy array in the processed feature space. 5. Call ``explainer.explain(x, y=target_class, norm=...)``. 6. Inspect the result through ``explanation.x``, ``explanation.to_series()``, the more human-readable ``explanation.value`` mapping, or ``explainer.get_distance()`` for the post-processed query-to-CF distance. What the explanation object gives back -------------------------------------- Every backend returns a backend-specific explanation object, but the user-level surface is intentionally similar. - ``.x`` returns the counterfactual as a flat numpy array aligned with the processed training columns. - ``.to_series()`` returns the same information as a pandas series. - ``.value`` returns a mapping keyed by the original feature names, decoding one-hot encoded groups back into a categorical value when possible. - ``repr(explanation)`` prints that value-oriented mapping, which is usually the most readable form for reports and notebooks. If you want to solve multiple queries with the same MIP explainer instance, you usually do not need any extra step because all three explainers default to ``clean_up=True`` inside ``explain``. Call ``cleanup()`` manually only when you disabled that behavior with ``clean_up=False`` and want to reuse the same instance safely.