Skip to contents

This function removes variables from train and test sets when they are unique to either the training or test set. Removing variables from the training set if they cannot be found in the test set is particularly useful to avoid training a model on information that will not be available at testing time.

Usage

keep_only_vars_in_both_train_and_test(tt, remove_from_train_only = FALSE)

Arguments

tt

train-test list (see ttsplit())

remove_from_train_only

(logical) whether to leave the test set untouched

Value

tt with unique variables removed from the train (and potentially test) set(s)

See also

Examples

tt <- df |>
   ttsplit() |>
   keep_only_vars_in_both_train_and_test()
#> [1] "Split df into list with: train, test (proportion train = 0.7)"
#> [1] "Selected only variables in common in both train and test sets"