Global AI and Data Science

 View Only
  • 1.  Evaluate EDA performed on IMDB data

    Posted Wed May 12, 2021 09:00 AM
    Hello, 

    Kindly evaluate the work I have conducted using publicly available IMDB dataset. The idea was to come up with few questions regarding data and answer them with the help of Data Analysis and EDA. 

    Please note that I'm a recent graduate and attempted this with all the knowledge I gained over the past few months. The link to my Jupyter Notebook is Ab2207/General-Projects-Data-Analysis .

    Request you to go through and share your valuable feedback. 

    Thanks, 
    Abhinay Kattingeri

    ------------------------------
    Abhinay Kattingeri
    ------------------------------

    #GlobalAIandDataScience
    #GlobalDataScience


  • 2.  RE: Evaluate EDA performed on IMDB data

    Posted Thu May 13, 2021 04:28 PM
    Hello Abhinay,
    I think that is a great job!
    I am tech-sales data scientist L1,
    So... maybe not super expert in the domain.
    I would like to thank you for sharing with us your knowledge and experience :)

    ------------------------------
    Lourdes Martinez Medina
    ------------------------------



  • 3.  RE: Evaluate EDA performed on IMDB data

    Posted Thu May 13, 2021 05:15 PM
    Hi Lourdes, 

    Thank you for taking out time and evaluating my work. This is just my first post on this forum and very happy with the quick response I received. 

    I've been taking up various extra-curricular projects in the field of Data Analysis and Machine Learning. I will continue to share these projects and see what the industry experts such as you think about it. Hopefully I will get to learn a lot more :) 

    Thanks again!

    ------------------------------
    Abhinay Kattingeri
    ------------------------------



  • 4.  RE: Evaluate EDA performed on IMDB data

    Posted Fri May 14, 2021 02:18 PM
    This is great Abhinay! I like how you structured the project, the questions you asked and the visuals are very nice. My only constructive feedback is to try to remove the last 5 empty cells of the notebook, unless you kept them there for a reason. :)

    Great analysis!

    Madalina Lupu 




    ------------------------------
    Madalina Lupu
    ------------------------------



  • 5.  RE: Evaluate EDA performed on IMDB data

    Posted Sun May 16, 2021 12:15 PM
    Hi Abhinay,

    Good job.
    As you are started recently , your EDA seems to be fine. But I would also suggest you to not restrict yourself when you do EDA. You can get so many information indirectly from the Data .
    You could also include box plots for your Analysis.
    And With respect to the correlations , what you have done is just looked into the numerical features and applied the pearson's correlation.For example.  you have considered ordinal feature (Rank)  also as normal numerical features and used in the correlations. Which you don't need to. Consider that as categorical.
    Use the categorical vs categorical and categorical vs numerical feature correlations as well. There are so many commonly used methods chisquare, Cramers v etc. Have a look over them . All the best !!

    Thanks,
    Rajkumar Rajasekaran

    ------------------------------
    Rajkumar Rajasekaran
    ------------------------------



  • 6.  RE: Evaluate EDA performed on IMDB data

    Posted Sun May 16, 2021 06:17 PM
    Hi Rajkumar, 

    Thank you for spending your valuable time going through my work. 

    I certainly understand the usage of BoxPlot, like you mentioned it gives additional information of the data like the outliers and other statistical descriptions (mean, median, and inter-quartile range). I will try to implement these in my coming projects for sure. Can you think of other purposes BoxPlot serves? I'm asking this because, these statistical descriptions such as outliers, mean, median and mode can also be obtained by plotting the data distribution (for outliers) and df.describe() (for mean, median, mode, std. deviation etc.).  

    Secondly, I completely overlooked the fact that Rank should be considered as Ordinal. It makes much more sense now. 
    I have never used Chi-Sqaure or Cramers for this use case and this is a learning for me. I will test these and see how it goes. 

    Lastly, I'd be interested in knowing a thing or two about how IBM goes about Data Analysis. What do Data Analysts/Scientists in IBM use between R and Python for their day-to-day tasks?

    ------------------------------
    Abhinay Kattingeri
    ------------------------------