Remove target from test set and back up values in reference table
Source:R/ml.R
remove_target_from_test_and_add_ref_to_env.RdThis function removes the target values from the test set (replacing with NA)
and backs up the values in a new variable in the global environment. This is particularly
useful to avoid target leakage (i.e. accidentally using the target value during testing).
Arguments
- tt
train-test list (see
ttsplit())- target
(character) name of target variable
- unique_id
(character) name of unique ID variable
- ref_name
(character) name of new variable that stores mapping between ID and target
Note
This function implements the method described in Preventing Target Leakage (D22 QuantCafé, 2021).
Examples
tt <- df |>
dplyr::mutate(id=1:nrow(df)) |>
ttsplit() |>
remove_target_from_test_and_add_ref_to_env("y1", "id", "target_values")
#> [1] "Split df into list with: train, test (proportion train = 0.7)"
#> [1] "Target column y1 replaced with NA in test set"
#> [1] "Added reference table 'target_values' to global environment"