Feature Processing

Feature metadata and preprocessing helpers.

class Feature(ftype, *, levels=(), thresholds=(), codes=())[source]

Bases: object

Description of a single feature after OCEAN preprocessing.

class Type(*values)[source]

Bases: Enum

Supported feature categories.

CONTINUOUS = 'continuous'
DISCRETE = 'discrete'
ONE_HOT_ENCODED = 'one-hot-encoded'
BINARY = 'binary'
add(*levels)[source]
property codes
property ftype
property is_binary
property is_continuous
property is_discrete
property is_numeric
property is_one_hot_encoded
property levels
property thresholds
parse_features(data, *, discretes=(), encoded=(), drop_na=True, drop_constant=True, scale=True)[source]

Parse a tabular dataset into OCEAN’s feature representation.

Parameters:
  • data (pd.DataFrame) – The DataFrame to be processed.

  • discretes (tuple[Key, ...], optional) – A tuple of column names that should be treated as ordered discrete (ordinal) features, such as integer-valued counts or ranked buckets. default is (). If None, no column is treated as discrete.

  • encoded (tuple[Key, ...], optional) – A tuple of column names that should be treated as one-hot encoded features, typically unordered nominal categories. default is ().

  • drop_na (bool, optional) – Whether to drop columns with NaN values. default is True.

  • drop_constant (bool, optional) – Whether to drop columns with constant values. default is True.

  • scale (bool, optional) – Whether to scale continuous features to the centered interval [-0.5, 0.5]. default is True.

Returns:

A tuple (processed_data, mapper) where processed_data is ready to train a tree ensemble and mapper keeps the relationship between original feature names and transformed columns.

Return type:

Parsed

Raises:

ValueError – If a column in discretes is not found in the input frame.