Community
Search Options
Search Options
Log in
Skip to main content (Press Enter).
Sign in
Skip auxiliary navigation (Press Enter).
AI and Data Science
Topic areas
AI and DS Skills
Decision Optimization
Embeddable AI
Global AI and Data Science
IBM Advanced Studies
SPSS Statistics
watsonx Assistant
Watson Discovery
User groups
Events
Upcoming AI Events
IBM TechXchange Webinars
All IBM TechXchange Community Events
Participate
Gamification Program
Community Manager's Welcome
Post to Forum
Share a Resource
Share Your Expertise
Blogging on the Community
Connect with Data Science Users
All IBM TechXchange Community Users
Resources
IBM TechXchange Group
AI Learning
IBM Champions
IBM Cloud Support
IBM Documentation
IBM Support
IBM Technology Zone
IBM Training
TechXchange Conference
IBM TechXchange Conference 2024
Marketplace
Marketplace
AI and Data Science
Master the art of data science.
Join now
Skip main navigation (Press Enter).
Toggle navigation
Search Options
Global AI and Data Science
Group Navigator
View Only
Community Home
Discussion
2.2K
Library
267
Blogs
724
Events
9
Members
27.7K
Share
Tukey Fences for Outliers
By
Moloy De
posted
Thu March 25, 2021 10:32 PM
0
Like
An outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. An outlier can cause serious problems in statistical analyses.
Outliers can occur by chance in any distribution, but they often indicate either measurement error or that the population has a heavy-tailed distribution. In the former case one wishes to discard them or use statistics that are robust to outliers, while in the latter case they indicate that the distribution has high skewness and that one should be very cautious in using tools or intuitions that assume a normal distribution. A frequent cause of outliers is a mixture of two distributions, which may be two distinct sub-populations, or may indicate 'correct trial' versus 'measurement error'; this is modeled by a mixture model.
Deletion of outlier data is a controversial practice frowned upon by many scientists and science instructors; while mathematical criteria provide an objective and quantitative method for data rejection, they do not make the practice more scientifically or methodologically sound, especially in small sets or where a normal distribution cannot be assumed. Rejection of outliers is more acceptable in areas of practice where the underlying model of the process being measured and the usual distribution of measurement error are confidently known. An outlier resulting from an instrument reading error may be excluded but it is desirable that the reading is at least verified.
There is no rigid mathematical definition of what constitutes an outlier; determining whether or not an observation is an outlier is ultimately a subjective exercise. There are various methods of outlier detection. Some are graphical such as normal probability plots. Others are model-based. Box plots are a hybrid.
Model-based methods which are commonly used for identification assume that the data are from a normal distribution, and identify observations which are deemed "unlikely" based on mean and standard deviation:
1. Chauvenet's criterion
2. Grubbs's test for outliers
3. Dixon's Q test
4. ASTM E178 Standard Practice for Dealing With Outlying Observations
5. Mahalanobis distance and leverage are often used to detect outliers, especially in the development of linear regression models.
6. Subspace and correlation based techniques for high-dimensional numerical data.
A nonparametric outlier detection method. It is calculated by creating a “fence” boundary a distance of 1.5 IQR beyond the 1st and 3rd quartiles. Any data beyond these fences are considered to be outliers.
for some nonnegative constant k. John Tukey proposed this test, where k = 1.5 indicates an "outlier", and k = 3 indicates data that is "far out". Being nonparametric Tukey Fences are robust methods in detecting outliers.
QUESTION I: Could we apply Tukey Fence on ranked observation?
QUESTION II: Could we find the probability of an observation escaping Tukey Fence?
REFERENCE:
Wikipedia
#GlobalAIandDataScience
#GlobalDataScience
0 comments
6 views
Permalink
IBM Community Home
Browse
Discussions
Resources
Groups
Events
IBM TechXchange Conference 2023
IBM Community Webinars
All IBM Community Events
Participate
Gamification Program
Community Manager's Welcome
Post to Forum
Share a Resource
Blogging on the Community
All IBM Community Users
Resources
Community Front Porch
IBM Champions
IBM Cloud Support
IBM Documentation
IBM Support
IBM Technology Zone
IBM Training
Marketplace
Marketplace
AI and Data Science
Topic areas
AI and DS Skills
Decision Optimization
Embeddable AI
Global AI and Data Science
IBM Advanced Studies
SPSS Statistics
watsonx Assistant
Watson Discovery
User groups
Events
Upcoming AI Events
IBM TechXchange Webinars
All IBM TechXchange Community Events
Participate
Gamification Program
Community Manager's Welcome
Post to Forum
Share a Resource
Share Your Expertise
Blogging on the Community
Connect with Data Science Users
All IBM TechXchange Community Users
Resources
IBM TechXchange Group
AI Learning
IBM Champions
IBM Cloud Support
IBM Documentation
IBM Support
IBM Technology Zone
IBM Training
TechXchange Conference
IBM TechXchange Conference 2024
Marketplace
Marketplace
Copyright © 2019 IBM Data Science Community. All rights reserved.
Powered by Higher Logic