Data Leakage in pre-processing #20

parthkl021 · 2024-03-24T03:08:43Z

In reference to this snippet from the preprocessing section in the file model_notebook.ipynb

for col in data.columns:
        if(col not in categorical):
            data[col] = (data[col].astype('float') - np.mean(data[col].astype('float')))/np.std(data[col].astype('float'))

The normalization is done on the whole data (i.e. when the train and test split did not occur). This means that information from the test set is used to scale the training set.

The text was updated successfully, but these errors were encountered:

parthkl021 mentioned this issue Mar 27, 2024

Minor bug fixes #22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Leakage in pre-processing #20

Data Leakage in pre-processing #20

parthkl021 commented Mar 24, 2024

Data Leakage in pre-processing #20

Data Leakage in pre-processing #20

Comments

parthkl021 commented Mar 24, 2024