Global AI and Data Science

 View Only
Expand all | Collapse all

Evaluation of Data Analysis of Sales dataset

  • 1.  Evaluation of Data Analysis of Sales dataset

    Posted Tue July 14, 2020 06:51 PM
    Edited by System Test Fri January 20, 2023 04:14 PM
    Please evaluate my data analysis and visualization on sales dataset of a client company of KPMG which I got through virtual internship.
    Now, I have many doubts which ML algorithm is correct for sales data to achieve desired goal. The link of my jupyter notebook is https://github.com/ShuklaPrashant21/Virtual-Internship/tree/master/KPMG%20Virtual%20Internship
    Please guide me to model development and how can I deploy any on this dataset.

    ------------------------------
    Prashant Shukla
    ------------------------------
    #GlobalAIandDataScience
    #GlobalDataScience


  • 2.  RE: Evaluation of Data Analysis of Sales dataset

    Posted Thu July 16, 2020 10:50 AM
    Prashant, thank you for sharing your work.
    After reviewing your work I have a general comment and a few specific comments.
    I trust you will take my comments as constructive feedback.
    1. General comment. Be very careful when making generalizations as to the causation of data results.
        Remembering the most fundament of rules: correlation does not imply causation. To be more specific,
        data scientist we must not assume facts that have not been placed into evidence. I will give specific examples below.
    2. Input [16] you state "Their future customer is more likely from NSW". This is not what the data shows. It only shows that historically more customers live in NSW. You cannot automatically infer that future customer will come from NSW. What if the market is saturated in NSW. Wouldn't this data beg the question:
    Is our advertising connecting well in the other states? Or are our product offering right for the consumers in the other states? You did this in Input [17].
    3. Input [17] This has the same issue as [16] Does the data show that car owners are better off financially as non-car owners? Could the non-car owners live in urban areas where car ownership is much more expensive, or unnecessary because everything is close to where they work and live?  
    4. Finally, you did a great job of Reporting & Visualizing the data. No, using machine learning algorithms you need to analyze the data. Look into the clustering, classification, association, and regression algorithms to quantify and predict the behavior you are interested in. So far you have only presented assumptions based on reporting and no predictions based upon the data provided to you.

    ------------------------------
    Lee Allan
    ------------------------------



  • 3.  RE: Evaluation of Data Analysis of Sales dataset

    Posted Thu July 16, 2020 03:58 PM
    Lee Alan, thank you very much for your comments and suggestion. 
    After taking many MOOC courses, its my first genuine work and really need someone to tell me my mistakes. I really appreciate you for spending your valuable time to evaluate my work. I will definitely work on it.


    ------------------------------
    Prashant Shukla
    ------------------------------



  • 4.  RE: Evaluation of Data Analysis of Sales dataset

    Posted Mon July 20, 2020 11:41 AM
    Please delete all personal data from your file. I can see names and dates of birth. In Europe this is a violation of the GDPR and as KPMG is also located in the Netherlands, this will not be appreciated by customers and the Privacy Chamber. If European persons are involved, this could lead to a big fine.

    ------------------------------
    Peter Moorer
    ------------------------------



  • 5.  RE: Evaluation of Data Analysis of Sales dataset

    Posted Mon July 20, 2020 12:30 PM
    Peter, FYI the data is fictitious. There is no actual customer data including names and addresses.
    Although it is not my project I have discussed the data with the author and have seen the instructions that state it is made up data.
    You would be accurate if it was REAL data.

    ------------------------------
    Lee Allan
    ------------------------------



  • 6.  RE: Evaluation of Data Analysis of Sales dataset

    Posted Tue July 21, 2020 03:19 AM
    Peter, please check out this https://in.insidesherpa.com/virtual-internships/m7W4GMqeT3bh9Nb2c

    ------------------------------
    Prashant Shukla
    ------------------------------



  • 7.  RE: Evaluation of Data Analysis of Sales dataset

    Posted Mon July 27, 2020 09:23 AM
    Interesting project, please keep us updated. Glad to hear the PII is fictitious. :)

    ------------------------------
    Mike Brassil
    New York
    ------------------------------



  • 8.  RE: Evaluation of Data Analysis of Sales dataset

    Posted Fri July 31, 2020 05:10 PM
    Thanks Mike, glad you found it interesting !
    I have to make many changes in my project like model development but before going on that I want to make my foundation stronger on EDA & Visualization. I recently work on a kaggle data set of campus recruitment of a XYZ campus. This is my Kaggle Notebook . I will be pleased if you look into it and any type of suggestion or feedback is welcomed.

    ------------------------------
    Prashant Shukla
    ------------------------------



  • 9.  RE: Evaluation of Data Analysis of Sales dataset

    Posted Wed August 12, 2020 12:09 AM
    hello Prashant.
    I am a newbie in these topics, you can share your code or recommend a resource!
    Thank you
    Regards.

    ------------------------------
    Ruben Quispe Llacctarimay
    Engineering
    Idat
    Lima
    915196604
    ------------------------------



  • 10.  RE: Evaluation of Data Analysis of Sales dataset

    Posted Wed August 12, 2020 05:34 AM
    Ruben sir,
    Sorry, I forgot to update my github code link but now you can check out complete project. I am pleased if you can check out my other project as well as numpy or pandas tutorial for learning at GitHub

    Sir, I am also a learner may be recommendation is not perfect but I strongly recommend to use Kaggle, it consists of mini course for learning and provides large variety data set to implement your knowledge and there are many youtube channels who provide good stuff like sentdex  and Corey Schafer

    I hope this will help you.



    ------------------------------
    Prashant Shukla
    ------------------------------