Remove variables from train and test sets when they are unique
Source:R/ml.R
keep_only_vars_in_both_train_and_test.RdThis function removes variables from train and test sets when they are unique to either the training or test set. Removing variables from the training set if they cannot be found in the test set is particularly useful to avoid training a model on information that will not be available at testing time.
Arguments
- tt
train-test list (see
ttsplit())- remove_from_train_only
(logical) whether to leave the test set untouched
Examples
tt <- df |>
ttsplit() |>
keep_only_vars_in_both_train_and_test()
#> [1] "Split df into list with: train, test (proportion train = 0.7)"
#> [1] "Selected only variables in common in both train and test sets"