Global AI and Data Science

 View Only
  • 1.  Into Data Science: Understanding K-Means --> Follow-up question

    Posted Mon April 29, 2019 07:48 PM
    Hi Jacques,

    In cell #9 of the Jupyter Notebook for this webinar, I have a question regarding the for loop that is being used to calculate the distortions for the elbow method.

    I understand the following line within the for loop where you are fitting the K-Means model on the scaled X values:
    KmeanModel = KMeans(n_clusters=k).fit(X_scaled)​

    However, the following line in the cell I don't understand why you are doing this:
    KmeanModel.fit(X)

    Why are you refitting the model on the original values of X before the calculations of the distortions?

    Thanks!

    Tom

    ------------------------------
    Thomas Weichle
    ------------------------------

    #GlobalAIandDataScience
    #GlobalDataScience


  • 2.  RE: Into Data Science: Understanding K-Means --> Follow-up question

    Posted Tue April 30, 2019 07:15 PM
    Congratulation, you found an error in the code.

    I don't know why I did not see it before...
    The error is that line should not be there at all.

    I updated the notebook and re-ran it. It turns out there is no "elbow".
    The distortion seems to be decreasing steadily.

    You can view the updated notebook here:
    http://bit.ly/W002-ClusteringCustomersViewNotebook

    ------------------------------
    Jacques Roy
    Digital Technical Engagement
    Watson Data and AI

    Test Drive Our Digital Offerings!  ibm.biz/dte-live
    Engage DTE at: ibm.biz/dte-request
    Byte-size data science channel: youtube.com/c/ByteSizeDataScience
    ------------------------------



  • 3.  RE: Into Data Science: Understanding K-Means --> Follow-up question

    Posted Tue April 30, 2019 08:23 PM
    Hi Jacques,

    Thanks for looking into it and clarifying regarding the error in the code!

    Best,

    Tom


    ------------------------------
    Tom Weichle
    ------------------------------