Skip main navigation (Press Enter).
Log in
Toggle navigation
Log in
Community
Topic Groups
Champions
Meet the Champions
Program overview
Rising Champions
IBM Champions group
User Groups
Find your User Group
Program overview
Events
Dev Days
Conference
Community events
User Groups events
All TechXchange events
Participate
TechXchange Group
Welcome Corner
Blogging
Member directory
Community leaders
Resources
Badge Program
Rewards
Rewards programs
Rewards discussion
TechXchange
Training
Community
Credentials
Events
Conference
TechXchange
Training
Community
Credentials
Events
Conference
Global AI and Data Science
×
Global AI & Data Science
Train, tune and distribute models with generative AI and machine learning capabilities
Group Home
Threads
4.1K
Blogs
953
Upcoming Events
0
Library
373
Members
29.4K
View Only
Share
Share on LinkedIn
Share on X
Share on Facebook
Back to Blog List
Spam Or Ham
By
Moloy De
posted
Thu November 26, 2020 09:02 PM
Like
It's a commonplace exercise to run Classification Algorithms to classify Spam Mails and Ham Mails. I have taken data having 2551 Ham mails and 501 Spam mails from Kaggle. Several Machine Learning Classification Models are run. As a part of pre-processing "tm" package in R is used to create the corpus and apply various data transformations. 432 frequently used words are selected to train the models. A 70% - 30% splitting is followed to create Training Data and Testing Data. Models along with their performances are given Below:
1) Naive Bayes Algorithm
2) Logistic Regression Model
3) Decision Tree Algorithm
4) Random Forest Algorithm
5) XGBoost Algorithm
QUESTION I : Accuracy 1 is a bit improbable. What are the basic reasons behind obtaining Accuracy 1 other that perfect classification?
QUESTION II: Could this Accuracy Measure be extended when there is a three way classification? What are the pitfalls?
REFERRENCE: Data available at Kaggle
here
.
#GlobalAIandDataScience
#GlobalDataScience
0 comments
3 views
Permalink
Copy
https://community.ibm.com/community/user/blogs/moloy-de1/2020/11/26/points-to-ponder
Powered by Higher Logic