Mathematical Model
==================

This page summarizes the optimization problem solved by OCEAN at the modeling
level. The goal is not to reproduce every implementation detail of every
backend, but to make the shared mathematical structure explicit.

Problem statement
-----------------

Let:

- :math:`\hat{x} \in \mathcal{X}` be the query instance,
- :math:`y` be the desired target class,
- :math:`x` be the counterfactual decision vector.

The ensemble prediction rule is:

.. math::

   \hat{y}(x) \in \arg\max_{c \in \mathcal{Y}} f_c(x).

At a high level, OCEAN solves:

.. math::

   x^\star \in \arg\min_{x \in \mathcal{X}} d(x, \hat{x})
   \quad \text{s.t.} \quad
   f_y(x) \ge f_c(x) + \varepsilon_c,
   \qquad \forall c \neq y.

Here :math:`f_c(x)` is the class-:math:`c` component of the ensemble decision
function, evaluated at the candidate counterfactual :math:`x`. The tie-breaking
constant is defined as:

.. math::

   \varepsilon_c =
   \begin{cases}
   \varepsilon & \text{if } c < y, \\
   0 & \text{otherwise.}
   \end{cases}

This enforces that the target class is strictly preferred, including in the
presence of ties. The default value of :math:`\varepsilon` depends on the
backend:

- MIP uses :math:`\varepsilon = 2^{-16} = 1/65536 \approx 1.5259 \times 10^{-5}`.
- CP uses :math:`\varepsilon = 1` because the score expressions are integer
  scaled.
- MaxSAT uses :math:`\varepsilon = 1` after integerizing the score
  differences used in the pseudo-Boolean encoding.

The MIP backend also uses a numerical strictness constant
:math:`\eta = \texttt{num\_epsilon} = 2^{-6} = 1/64 = 0.015625` for strict
right-branch comparisons in numerical splits.

Feature domains
---------------

The feasible set :math:`\mathcal{X}` depends on the feature type produced by
:func:`ocean.feature.parse_features`.

Continuous feature
   .. math:: x_j \in [a_j, b_j]

Ordinal discrete feature
   .. math:: x_j \in D_j = \{d_{j,1}, \dots, d_{j,m_j}\}

Binary feature
   .. math:: x_j \in \{0, 1\}

Unordered nominal feature
   Encoded as a one-hot block :math:`u_{j,k}` with:

   .. math::

      u_{j,k} \in \{0, 1\},
      \qquad
      \sum_{k \in K_j} u_{j,k} = 1.

This is why ordinal features and one-hot encoded features are treated
differently in OCEAN: ordinal values belong to an ordered finite set, while
unordered categories are modeled as mutually exclusive indicator variables.

Tree path variables
-------------------

For each tree :math:`t` and each leaf :math:`\ell` of that tree, OCEAN creates
path or leaf-selection variables. Abstractly, they satisfy:

.. math::

   p_{t,\ell} \in \{0, 1\},
   \qquad
   \sum_{\ell \in \mathcal{L}_t} p_{t,\ell} = 1.

The equality means exactly one leaf is active in each tree. Path consistency
constraints ensure that if :math:`p_{t,\ell} = 1`, then the feature vector
:math:`x` satisfies every split on the root-to-leaf path leading to
:math:`\ell`.

For a numerical split of the form :math:`x_j \le \tau`, the abstract logic is:

.. math::

   p_{t,\ell} = 1 \implies x_j \le \tau
   \qquad \text{for every left branch on the path,}

.. math::

   p_{t,\ell} = 1 \implies x_j \ge \tau + \eta
   \qquad \text{for every right branch on the path,}

where :math:`\eta` is the numerical strictness constant described above. In
practice, MIP uses explicit linear implications, CP works with threshold
intervals, and MaxSAT reasons over Boolean encodings of the same branch
conditions.

For one-hot encoded nominal features, branch tests are expressed on the
indicator variables rather than on an ordered scalar value. If a branch accepts
exactly the category subset :math:`S \subseteq K_j`, the corresponding path
constraint can be written as:

.. math::

   p_{t,\ell} = 1 \implies \sum_{k \in S} u_{j,k} = 1.

Because the one-hot block already satisfies :math:`\sum_{k \in K_j} u_{j,k}=1`,
this enforces membership in the allowed subset without inventing an artificial
ordering between categories.

Ensemble score aggregation
--------------------------

Let :math:`v_{t,\ell,c}` denote the class contribution of leaf :math:`\ell` in
tree :math:`t` for class :math:`c`, and let :math:`w_t` be the tree weight.
Then the ensemble decision function used by the optimization model can be
written as:

.. math::

   f_c(x) = b_c + \sum_{t \in \mathcal{T}} \sum_{\ell \in \mathcal{L}_t}
   w_t \, v_{t,\ell,c} \, p_{t,\ell}.

The optional base term :math:`b_c` appears for models such as XGBoost where a
base score is part of the ensemble definition.

Distance objectives
-------------------

The optimization objective measures how far the counterfactual is from the
query. OCEAN uses different realizations depending on the backend, but the
common modeling idea is the same.

For an :math:`L_1` objective, a compact description is:

.. math::

   d_1(x, \hat{x}) =
   \sum_{j \in \mathcal{C} \cup \mathcal{O} \cup \mathcal{B}}
   |x_j - \hat{x}_j|
   + \frac{1}{2}
   \sum_{j \in \mathcal{E}} \sum_{k \in K_j}
   |u_{j,k} - \hat{x}_{j,k}|.

The factor :math:`\tfrac{1}{2}` on one-hot blocks avoids double counting a
single category switch, because moving from one active code to another changes
two binary indicators.

The MIP backend also supports an :math:`L_2` variant:

.. math::

   d_2(x, \hat{x}) =
   \sum_{j \in \mathcal{C} \cup \mathcal{O} \cup \mathcal{B}}
   (x_j - \hat{x}_j)^2
   + \frac{1}{2}
   \sum_{j \in \mathcal{E}} \sum_{k \in K_j}
   (u_{j,k} - \hat{x}_{j,k})^2.

More generally, for any integer :math:`p \ge 1`, minimizing
:math:`d_p(x, \hat{x})` is equivalent to minimizing :math:`d_p(x, \hat{x})^p`
because the map :math:`t \mapsto t^p` is strictly increasing on
:math:`\mathbb{R}_{\ge 0}`.

Backend realizations
--------------------

The three backends solve the same conceptual problem with different modeling
primitives.

MIP
   Uses continuous and binary decision variables directly and optimizes either
   :math:`L_1` or :math:`L_2` with linear or quadratic expressions.

CP
   Uses finite-domain integer variables and a scaled integer
   :math:`L_p^p` objective for integer ``norm`` values. Continuous features
   are modeled through interval indices derived from tree thresholds.

MaxSAT
   Encodes the :math:`L_1` objective as weighted soft clauses and the target
   prediction constraints as hard clauses or pseudo-Boolean constraints.

Reading the implementation
--------------------------

If you want to map these equations back to code, the main entry points are:

- ``ocean.mip._model``
- ``ocean.cp._model``
- ``ocean.maxsat._model``

Those modules differ in how they encode the variables and constraints, but they
all implement the same optimization pattern described above.

Backend method reference
------------------------

The following internal backend model classes use the same notation as this
page:

- :math:`x` is the counterfactual decision vector encoded by feature variables,
- :math:`\hat{x}` is the processed query instance; in the implementation, the
  objective-building methods receive it through a parameter named ``x``,
- :math:`y` is the target class,
- :math:`f_c(x)` is the class score for class :math:`c`,
- :math:`p_{t,\ell}` denotes the active leaf or path variable.

Mixed-integer programming model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: ocean.mip._model.Model
   :members: build, add_objective, set_majority_class, clear_majority_class, cleanup, _set_majority_class, _set_isolation, _add_objective, L1, L2
   :member-order: bysource

Constraint-programming model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: ocean.cp._model.Model
   :members: build, add_objective, set_majority_class, cleanup, _set_majority_class, _add_objective, get_intervals_cost, L1
   :member-order: bysource

Weighted MaxSAT model
~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: ocean.maxsat._model.Model
   :members: build, add_objective, _add_soft_l1_binary, _add_soft_l1_ohe, _add_soft_l1_continuous, _add_soft_l1_discrete, _get_intervals_cost, set_majority_class, _set_majority_class, _encode_weighted_sum_constraint, cleanup
   :member-order: bysource