Skip to contents

This function assigns each row of a data.frame to a cluster based on the Gower distance matrix, and either a pre-specified or an optimal number of clusters.

Usage

get_cluster(df, v_cluster = NULL, k = NULL, weights = NULL)

Arguments

df

data.frame

v_cluster

variables used to compute Gower distances between rows (if NULL, use all)

k

number of clusters (if NULL, determined optimally; see Details)

weights

(numeric vector) variable weights for calculating Gower distances (default all 1)

Value

(factor) vector of cluster assignments (0 to k-1)

Details

First, a distance matrix is computed using cluster::daisy() with metric="gower" and stand=TRUE. Next, clustering is performed around medoids (a more robust version of k-means clustering) as implemented in cluster::pam().

If no number of clusters k was specified, then the optimal number of clusters is determined for the current distance matrix using NbClust::NbClust() with the method="median" and index="silhouette".

Examples

df |> get_cluster() |> table()
#> x1 x2 x3 y1 y2 y3 
#>  1  1  1  1  1  1 
#> 
#>  Only frey, mcclain, cindex, sihouette and dunn can be computed. To compute the other indices, data matrix is needed 
#> k chosen by NbClust(): 2
#> 
#>   1   2 
#> 236 264 
#> 
#>   0   1 
#> 236 264