Scale numeric features in train and test sets
Source:R/ml.R
scale_numeric_features_in_train_and_test.RdThis function scales numeric features in the training set, and uses the scaling information from the training set (mean and variance) to scale the corresponding variables in the test set.
Arguments
- tt
train-test list (see
ttsplit())
Details
Scaling the test set using variable information in the training set is performed to avoid target leakage, and to avoid evaluating model performance on a test set that uses information that might not be available at test time (i.e. mean and variance of variables for other test observations).
Examples
tt <- df |>
ttsplit() |>
scale_numeric_features_in_train_and_test()
#> [1] "Split df into list with: train, test (proportion train = 0.7)"
#> [1] "build_scales: I will compute scale on 6 numeric columns."
#> [1] "build_scales: it took me: 0s to compute scale for 6 numeric columns."
#> [1] "fast_scale: I will scale 6 numeric columns."
#> [1] "fast_scale: it took me: 0s to scale 6 numeric columns."
#> [1] "fast_scale: I will scale 6 numeric columns."
#> [1] "fast_scale: it took me: 0s to scale 6 numeric columns."
#> [1] "Created numerical scales using train set and rescaled numerical features in train and test"