Here is the code that I am running, I am running a variable k with range between (2,10) to find the optimum vale of k
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
X = vancouver_grouped.drop('Neighborhood', 1)
dist = []
for k in range(2, 10):
kmeans = KMeans(n_clusters=k)
kmeans.fit(X)
dist.append(kmeans.inertia_)
fig = plt.figure(figsize=(10, 10))
plt.plot(range(2, 10), dist, '-o')
plt.grid(True)
plt.xlabel('Number of clusters k')
plt.ylabel('Distortion')
plt.title('The Elbow Method showing the optimal k')
I just ran the same code 4 times, attached screenshots. I wasn't sure which value of k to use, so I used 5 then I used 4 ( I am aware that I can take an arbitrary value), currently I am using 4. Tell me what you think should be k.
------------------------------
Aparna Mookerjee
------------------------------
Original Message:
Sent: Fri March 20, 2020 08:40 AM
From: Muhammad Tehseen
Subject: k-means elbow method
It's normal to see the curve changing because, in the elbow method, the cluster range depends on K so with the problem in hand, to cluster the right range to get the right results, you will get distortions until you provide the optimal value of K.
One way is to keep providing the value by A/B testing until you get the right value with minimum distortion
Another way is to use Grid Search for Hyperparameter Tuning. Sklearn library provides us with functionality to define a grid of parameters and to pick the optimum one. Use the grid to define parameters from 1-10 for the K-means elbow method.
------------------------------
Muhammad Tehseen
Original Message:
Sent: Thu March 19, 2020 11:22 AM
From: Aparna Mookerjee
Subject: k-means elbow method
I am trying to use K-means clustering to segregate areas in Vancouver. Now every time I am running the code to find the optimum value of k, I find the distortion vs k line (curve) changing. What could be a reason for this?
Thanks,
------------------------------
Aparna Mookerjee
------------------------------
#GlobalAIandDataScience
#GlobalDataScience