Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

View Only

Back to Blog List

Spam Or Ham

By Moloy De posted Thu November 26, 2020 09:02 PM

It's a commonplace exercise to run Classification Algorithms to classify Spam Mails and Ham Mails. I have taken data having 2551 Ham mails and 501 Spam mails from Kaggle. Several Machine Learning Classification Models are run. As a part of pre-processing "tm" package in R is used to create the corpus and apply various data transformations. 432 frequently used words are selected to train the models. A 70% - 30% splitting is followed to create Training Data and Testing Data. Models along with their performances are given Below:

1) Naive Bayes Algorithm

2) Logistic Regression Model

3) Decision Tree Algorithm

4) Random Forest Algorithm

5) XGBoost Algorithm

QUESTION I : Accuracy 1 is a bit improbable. What are the basic reasons behind obtaining Accuracy 1 other that perfect classification?
QUESTION II: Could this Accuracy Measure be extended when there is a three way classification? What are the pitfalls?

REFERRENCE: Data available at Kaggle here.
#GlobalAIandDataScience
#GlobalDataScience

0 comments

3 views

Permalink

https://community.ibm.com/community/user/blogs/moloy-de1/2020/11/26/points-to-ponder

Global AI and Data Science

Global AI & Data Science

Spam Or Ham

By Moloy De posted Thu November 26, 2020 09:02 PM

Permalink

Additional
Resources

Office

Quick Links

Global AI and Data Science

Global AI & Data Science

Spam Or Ham

By Moloy De posted Thu November 26, 2020 09:02 PM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources