Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

Spam Or Ham

By Moloy De posted Thu November 26, 2020 09:02 PM

  
It's a commonplace exercise to run Classification Algorithms to classify Spam Mails and Ham Mails. I have taken data having 2551 Ham mails and 501 Spam mails from Kaggle. Several Machine Learning Classification Models are run. As a part of pre-processing "tm" package in R is used to create the corpus and apply various data transformations. 432 frequently used words are selected to train the models. A 70% - 30% splitting is followed to create Training Data and Testing Data. Models along with their performances are given Below: 

1) Naive Bayes Algorithm

2) Logistic Regression Model


3) Decision Tree Algorithm


4)  Random Forest Algorithm
5) XGBoost Algorithm
QUESTION I : Accuracy 1 is a bit improbable. What are the basic reasons behind obtaining Accuracy 1 other that perfect classification?
QUESTION II: Could this Accuracy Measure be extended when there is a three way classification? What are the pitfalls?

REFERRENCE: Data available at Kaggle here.
#GlobalAIandDataScience
#GlobalDataScience
0 comments
3 views

Permalink