Hi Rajkumar,
Thank you for spending your valuable time going through my work.
I certainly understand the usage of BoxPlot, like you mentioned it gives additional information of the data like the outliers and other statistical descriptions (mean, median, and inter-quartile range). I will try to implement these in my coming projects for sure. Can you think of other purposes BoxPlot serves? I'm asking this because, these statistical descriptions such as outliers, mean, median and mode can also be obtained by plotting the data distribution (for outliers) and
df.describe() (for mean, median, mode, std. deviation etc.).
Secondly, I completely overlooked the fact that Rank should be considered as Ordinal. It makes much more sense now.
I have never used Chi-Sqaure or Cramers for this use case and this is a learning for me. I will test these and see how it goes.
Lastly, I'd be interested in knowing a thing or two about how IBM goes about Data Analysis. What do Data Analysts/Scientists in IBM use between R and Python for their day-to-day tasks?
------------------------------
Abhinay Kattingeri
------------------------------
Original Message:
Sent: Sun May 16, 2021 12:14 PM
From: Rajkumar Rajasekaran
Subject: Evaluate EDA performed on IMDB data
Hi Abhinay,
Good job.
As you are started recently , your EDA seems to be fine. But I would also suggest you to not restrict yourself when you do EDA. You can get so many information indirectly from the Data .
You could also include box plots for your Analysis.
And With respect to the correlations , what you have done is just looked into the numerical features and applied the pearson's correlation.For example. you have considered ordinal feature (Rank) also as normal numerical features and used in the correlations. Which you don't need to. Consider that as categorical.
Use the categorical vs categorical and categorical vs numerical feature correlations as well. There are so many commonly used methods chisquare, Cramers v etc. Have a look over them . All the best !!
Thanks,
Rajkumar Rajasekaran
------------------------------
Rajkumar Rajasekaran
Original Message:
Sent: Tue May 11, 2021 08:22 AM
From: Abhinay Kattingeri
Subject: Evaluate EDA performed on IMDB data
Hello,
Kindly evaluate the work I have conducted using publicly available IMDB dataset. The idea was to come up with few questions regarding data and answer them with the help of Data Analysis and EDA.
Please note that I'm a recent graduate and attempted this with all the knowledge I gained over the past few months. The link to my Jupyter Notebook is Ab2207/General-Projects-Data-Analysis .
Request you to go through and share your valuable feedback.
Thanks,
Abhinay Kattingeri
------------------------------
Abhinay Kattingeri
------------------------------
#GlobalAIandDataScience
#GlobalDataScience