Skip to contents

gClusters() returns the clusters generated by k-means and yield an elbow plot as a way of finding the optimal parameter.

Usage

gClusters(data, ncluster = 20, elbow.max = 50, ...)

Arguments

data

A scaled data.frame that contain variables ID, value on time_1, ..., value on time_k, and type for extracting patterns across the time. See also pre_process().

ncluster

A number of clusters. It is related to the complexity of information in the network: When choosing the ncluster, we suggest thinking about how many nodes you are about to show on the visualization and how representative you want for each clustered pattern.

elbow.max

A number of the maximum value of x-axis for the elbow method plot. It should be larger than the expected ncluster and smaller than the sample size.

iter.max

A number of the maximum iterations allowed in k-means. See also stats::kmeans.

nstart

A number of random attempts of generating initial configurations. The k-means algorithm will choose the best one among these attempts. For larger data, 'nstart' can be set lower or just set to 1. See also stats::kmeans.

Value

This function return a list of 2 elements: a k-means cluster result and an elbow method plot.

Details

To determine the optimal number of clusters (ncluster), it is advised to closely examine the elbow plot and identify the point on the graph where a substantial change or 'elbow' occurs. This is often indicative of the most suitable cluster count. In cases where your dataset is extensive or intricate, you might consider increasing the value of elbow.max to ensure a more comprehensive exploration of potential cluster counts. This can help in achieving more accurate and meaningful results, especially when working with larger or more complex datasets. This function can be executed with only the data parameter at the outset. However, to achieve the best clustering results, further adjustments are recommended. After the initial run, users are expected to adjust the function's parameters based on the clustering outcomes and the elbow plot analysis.

Examples

data(test_data)
reslist <- gClusters(test_data_processed)
# k-means result
reslist[[1]]
#> K-means clustering with 20 clusters of sizes 6, 3, 6, 5, 5, 8, 6, 5, 5, 6, 4, 11, 3, 3, 3, 2, 4, 2, 2, 11
#> 
#> Cluster means:
#>            T1          T2          T3         T4         T5         T6
#> 1   2.0919348 -0.33320285 -0.50786563 -0.5579659 -0.5579659  1.5455977
#> 2   0.8277105  1.32036794 -0.09261332  1.2325986  0.4199176 -0.9576634
#> 3   0.5077879 -0.24564283  0.53503708  0.2363183  1.0793829  1.3583848
#> 4   0.2914366 -0.41308350 -0.76541563 -0.9724353 -0.9724353  1.6040887
#> 5   1.2081037 -0.66790409 -0.66790409 -0.9275526 -0.4204221  1.2719890
#> 6   0.1155971 -0.19000668 -0.25629794 -0.1487697 -0.4383348  2.5830148
#> 7   1.2574004 -0.64914997 -0.87502695  0.9353966  0.4760273  1.2681050
#> 8   0.9389212 -0.86278238  0.43887930  1.0009533  1.3733646 -1.0423264
#> 9  -0.8135539 -0.84696982 -0.32573017  1.1218534  2.0955812 -0.5620731
#> 10 -0.3802219 -0.36840199  0.02254271  1.6459653  1.9633618 -0.3246455
#> 11 -0.1225506 -0.44139873 -0.19548572  1.0299038  2.3840629 -0.2147528
#> 12  2.5784883 -0.02446156 -0.60355041 -0.2547486 -0.5120230 -0.3947043
#> 13  0.9922755  0.32399254 -0.94565021 -0.7125240 -0.5104935 -0.9456502
#> 14  0.2699801 -0.08094677 -0.28926312 -0.7813127 -0.6065051 -0.1494523
#> 15  1.7947637 -0.47746004 -0.52783931 -0.7635958 -0.7635958 -0.6438957
#> 16  0.1789247  1.26157166  0.82115649  0.3899368  1.2707671 -0.2614905
#> 17  1.4517382  0.26236463  1.31600581 -0.2531012  0.2004547 -1.1238942
#> 18 -0.2784769 -0.82258408 -0.82258408 -0.8225841  1.2338761 -0.8225841
#> 19  2.1235865  1.05868538 -0.39467611  0.1315587 -0.1315587  0.3946761
#> 20 -0.2262821 -0.73107613 -0.66098092 -0.4323193 -0.4035074 -0.3836848
#>              T7           T8          T9          T10
#> 1  -0.425513174 -0.557965933 -0.30571480 -0.391338269
#> 2  -1.160107459 -0.295057401 -0.64757654 -0.647576540
#> 3  -1.006282312 -0.766188425 -0.64803067 -1.050766722
#> 4   0.301678216 -0.371684852  0.66435231  0.633498705
#> 5   1.209152596 -0.095960244 -0.32347149 -0.586030729
#> 6  -0.006327004 -0.327677972 -0.66559889 -0.665598891
#> 7  -0.354972016 -0.601904415 -0.82778139 -0.628094550
#> 8  -1.159101258 -0.345888528 -0.27147562 -0.070544200
#> 9  -0.651292750 -0.498839228  0.42446013  0.056564270
#> 10 -0.647007074 -0.667438225 -0.63229313 -0.611861976
#> 11 -0.400726188 -0.616324716 -0.71136401 -0.711364008
#> 12 -0.526022073 -0.185303954 -0.06989626 -0.007778173
#> 13 -0.945650212  0.121962031  1.19430596  1.427432164
#> 14 -0.763902351 -0.412975440  0.39220455  2.422173070
#> 15 -0.260645101  0.389799714 -0.08327821  1.335746686
#> 16 -0.912917717 -0.481698036 -1.13312530 -1.133125299
#> 17 -0.987867401 -0.340013670 -0.33085684 -0.194830041
#> 18 -0.822584085  1.233876127  1.23387613  0.689768939
#> 19 -0.795567972 -0.795567972 -0.79556797 -0.795567972
#> 20 -0.612869917 -0.006483243  1.70560793  1.751595921
#> 
#> Clustering vector:
#>   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
#>  20   1  18  16   3  20   5  19  10   9   8  14  15  12   2   9   6  20  17  12 
#>  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40 
#>   7  20   6   1  12   9   7  12   6  11   5  11  16  18  10   3   8  13  20   2 
#>  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 
#>   6   1  20  20   7  20   6   3   1   6  15   5   4  13   6   5   4  10  12  10 
#>  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80 
#>   8   3   7  13   3   1  12   7  17  17  14   9   6  19  14  10   2  11   8   7 
#>  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 
#>   8  20  15   4   4   1  17  12   4  12  10   5  11  12  20   3   9  12  20  12 
#> 
#> Within cluster sum of squares by cluster:
#>  [1]  3.0260293  5.0357352 13.2343206 13.1118010 10.3695240  8.0412334
#>  [7] 11.7251902  8.3255313  3.4617088  2.4693324  1.6353685 13.0645655
#> [13]  2.9416715  3.1842269  5.0413438  2.4669004  9.9418120  0.9921913
#> [19]  0.9833964 12.4042363
#>  (between_SS / total_SS =  83.1 %)
#> 
#> Available components:
#> 
#> [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
#> [6] "betweenss"    "size"         "iter"         "ifault"      
# elbow plot
reslist[[2]]