Global Data Science Forum

Expand all | Collapse all

Evaluation of Data Analysis of Sales dataset

  • 1.  Evaluation of Data Analysis of Sales dataset

    Posted 25 days ago
    Please evaluate my data analysis and visualization on sales dataset of a client company of KPMG which I got through virtual internship.
    Now, I have many doubts which ML algorithm is correct for sales data to achieve desired goal. The link of my jupyter notebook is https://github.com/ShuklaPrashant21/KPMG_Virtual_Internship/blob/master/Task_Solutions/KPMG_Virtual_Internship_Data_Visualization.ipynb 
    Please guide me to model development and how can I deploy any on this dataset.

    ------------------------------
    Prashant Shukla
    ------------------------------


  • 2.  RE: Evaluation of Data Analysis of Sales dataset

    Posted 24 days ago
    Prashant, thank you for sharing your work.
    After reviewing your work I have a general comment and a few specific comments.
    I trust you will take my comments as constructive feedback.
    1. General comment. Be very careful when making generalizations as to the causation of data results.
        Remembering the most fundament of rules: correlation does not imply causation. To be more specific,
        data scientist we must not assume facts that have not been placed into evidence. I will give specific examples below.
    2. Input [16] you state "Their future customer is more likely from NSW". This is not what the data shows. It only shows that historically more customers live in NSW. You cannot automatically infer that future customer will come from NSW. What if the market is saturated in NSW. Wouldn't this data beg the question:
    Is our advertising connecting well in the other states? Or are our product offering right for the consumers in the other states? You did this in Input [17].
    3. Input [17] This has the same issue as [16] Does the data show that car owners are better off financially as non-car owners? Could the non-car owners live in urban areas where car ownership is much more expensive, or unnecessary because everything is close to where they work and live?  
    4. Finally, you did a great job of Reporting & Visualizing the data. No, using machine learning algorithms you need to analyze the data. Look into the clustering, classification, association, and regression algorithms to quantify and predict the behavior you are interested in. So far you have only presented assumptions based on reporting and no predictions based upon the data provided to you.

    ------------------------------
    Lee Allan
    ------------------------------



  • 3.  RE: Evaluation of Data Analysis of Sales dataset

    Posted 24 days ago
    Lee Alan, thank you very much for your comments and suggestion. 
    After taking many MOOC courses, its my first genuine work and really need someone to tell me my mistakes. I really appreciate you for spending your valuable time to evaluate my work. I will definitely work on it.


    ------------------------------
    Prashant Shukla
    ------------------------------



  • 4.  RE: Evaluation of Data Analysis of Sales dataset

    Posted 20 days ago
    Please delete all personal data from your file. I can see names and dates of birth. In Europe this is a violation of the GDPR and as KPMG is also located in the Netherlands, this will not be appreciated by customers and the Privacy Chamber. If European persons are involved, this could lead to a big fine.

    ------------------------------
    Peter Moorer
    ------------------------------



  • 5.  RE: Evaluation of Data Analysis of Sales dataset

    Posted 20 days ago
    Peter, FYI the data is fictitious. There is no actual customer data including names and addresses.
    Although it is not my project I have discussed the data with the author and have seen the instructions that state it is made up data.
    You would be accurate if it was REAL data.

    ------------------------------
    Lee Allan
    ------------------------------



  • 6.  RE: Evaluation of Data Analysis of Sales dataset

    Posted 19 days ago
    Peter, please check out this https://in.insidesherpa.com/virtual-internships/m7W4GMqeT3bh9Nb2c

    ------------------------------
    Prashant Shukla
    ------------------------------



  • 7.  RE: Evaluation of Data Analysis of Sales dataset

    Posted 13 days ago
    Interesting project, please keep us updated. Glad to hear the PII is fictitious. :)

    ------------------------------
    Mike Brassil
    New York
    ------------------------------



  • 8.  RE: Evaluation of Data Analysis of Sales dataset

    Posted 9 days ago
    Thanks Mike, glad you found it interesting !
    I have to make many changes in my project like model development but before going on that I want to make my foundation stronger on EDA & Visualization. I recently work on a kaggle data set of campus recruitment of a XYZ campus. This is my Kaggle Notebook . I will be pleased if you look into it and any type of suggestion or feedback is welcomed.

    ------------------------------
    Prashant Shukla
    ------------------------------