Skip to contents

This function scales numeric features in the training set, and uses the scaling information from the training set (mean and variance) to scale the corresponding variables in the test set.

Usage

scale_numeric_features_in_train_and_test(tt)

Arguments

tt

train-test list (see ttsplit())

Value

tt with numeric features scaled

Details

Scaling the test set using variable information in the training set is performed to avoid target leakage, and to avoid evaluating model performance on a test set that uses information that might not be available at test time (i.e. mean and variance of variables for other test observations).

Examples

tt <- df |>
   ttsplit() |>
   scale_numeric_features_in_train_and_test()
#> [1] "Split df into list with: train, test (proportion train = 0.7)"
#> [1] "build_scales: I will compute scale on  6 numeric columns."
#> [1] "build_scales: it took me: 0s to compute scale for 6 numeric columns."
#> [1] "fast_scale: I will scale 6 numeric columns."
#> [1] "fast_scale: it took me: 0s to scale 6 numeric columns."
#> [1] "fast_scale: I will scale 6 numeric columns."
#> [1] "fast_scale: it took me: 0s to scale 6 numeric columns."
#> [1] "Created numerical scales using train set and rescaled numerical features in train and test"