Feature Processing
Feature metadata and preprocessing helpers.
- class Feature(ftype, *, levels=(), thresholds=(), codes=())[source]
Bases:
objectDescription of a single feature after OCEAN preprocessing.
- class Type(*values)[source]
Bases:
EnumSupported feature categories.
- CONTINUOUS = 'continuous'
- DISCRETE = 'discrete'
- ONE_HOT_ENCODED = 'one-hot-encoded'
- BINARY = 'binary'
- property codes
- property ftype
- property is_binary
- property is_continuous
- property is_discrete
- property is_numeric
- property is_one_hot_encoded
- property levels
- property thresholds
- parse_features(data, *, discretes=(), encoded=(), drop_na=True, drop_constant=True, scale=True)[source]
Parse a tabular dataset into OCEAN’s feature representation.
- Parameters:
data (pd.DataFrame) – The DataFrame to be processed.
discretes (tuple[Key, ...], optional) – A tuple of column names that should be treated as ordered discrete (ordinal) features, such as integer-valued counts or ranked buckets. default is (). If None, no column is treated as discrete.
encoded (tuple[Key, ...], optional) – A tuple of column names that should be treated as one-hot encoded features, typically unordered nominal categories. default is ().
drop_na (bool, optional) – Whether to drop columns with NaN values. default is True.
drop_constant (bool, optional) – Whether to drop columns with constant values. default is True.
scale (bool, optional) – Whether to scale continuous features to the centered interval
[-0.5, 0.5]. default is True.
- Returns:
A tuple
(processed_data, mapper)whereprocessed_datais ready to train a tree ensemble andmapperkeeps the relationship between original feature names and transformed columns.- Return type:
Parsed
- Raises:
ValueError – If a column in
discretesis not found in the input frame.