Skip to contents

This function returns the fuzzy matches for a character vector using a pool of potential matches. The distance between each string and each candidate in the pool is calculated using a specified method, and the candidate with the shortest distance to each string is returned.

Usage

get_fuzzy_match(
  old,
  new,
  method = c("osa", "lv", "dl", "lcs", "qgram", "cosine", "jaccard", "jw"),
  nthread = parallel::detectCores() - 1
)

Arguments

old

(character) vector of strings to fuzzy-match

new

(character) vector of strings to use as possible matches

method

method used to calculate distances between strings (see Details)

nthread

number of parallel threads (default all minus 1)

Value

(character) strings in old replaced with best matches in new

Details

This function uses stringdist::stringdistmatrix() to calculate distances between each requested string in old and the candidates in new. The method is one of the following: osa (default), lv, dl, lcs, qgram, cosine, jaccard, or jw. See the corresponding stringdist::stringdistmatrix() documentation.

Examples

if (FALSE) {
df$Country <- df$Country |> get_fuzzy_match(new=list_of_countries)
}