Skip to content
This repository was archived by the owner on Jun 22, 2022. It is now read-only.

Commit d7ce69d

Browse files
authored
Dev (#43)
* updated config * added categorical_features to lgbm * reverted to old config * To english translate - notebook * Replace 'talking-data' with 'avito' in logger * Translate * Delete notebook * Text features * Update pipelines.py * Improve TextCleaner * Improve TextCounter * Fix * added categorical encoding and missing and timestamp, not working yet * updated config * dropped image des * added hashing trick for categorical encoding * updated encoder configs * Include suggested changes * updated random search * Update * easthetic change with translation * updated configs * refactored text features * fixed inference mode remnants * added more groupings, moved item_seq to numericals, starter hyperparam search * fixed pipelines eval mode defs * running random search, feature extraction working * fixed label encoding, and feature extraction for text, added train to lgb.train evals * dropped user_id from categoricals, refactored _groupby_features * week of year * added log price * added input missing, dropped feature dispatcher * added tfidf, changed lightgbm to expect numpy, adjusted pipelines and configs * params updated, small fixes
1 parent 72d41b2 commit d7ce69d

12 files changed

+938
-459
lines changed

feature_cleaning.py

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
import numpy as np
2+
3+
from steps.base import BaseTransformer
4+
5+
6+
class InputMissing(BaseTransformer):
7+
def __init__(self, text_columns,
8+
categorical_columns,
9+
numerical_columns,
10+
timestamp_columns):
11+
self.text_columns = text_columns
12+
self.categorical_columns = categorical_columns
13+
self.numerical_columns = numerical_columns
14+
self.timestamp_columns = timestamp_columns
15+
16+
def transform(self, X, **kwargs):
17+
X_ = X.copy()
18+
for col, input_value in [self.text_columns,
19+
self.categorical_columns,
20+
self.numerical_columns,
21+
self.timestamp_columns]:
22+
X_[col] = X_[col].fillna(input_value)
23+
return {'clean_features': X_}

0 commit comments

Comments
 (0)