Community
Search Options
Search Options
Log in
Skip to main content (Press Enter).
Sign in
Skip auxiliary navigation (Press Enter).
Community
AI and Data Science
Topic groups
Decision Optimization
Embeddable AI
Global AI and Data Science
SPSS Statistics
Watson Discovery
watsonx Assistant
Groups
AI
Automation
Data
Security
Sustainability
Cloud
IBM Z & LinuxONE
Power
Storage
IBM Japan
All Groups
Champions
User groups
AI and Data Science user groups
All user groups
Events
IBM TechXchange Conference
Upcoming AI and Data Science events
IBM TechXchange Webinars
All IBM TechXchange Community Events
Participate
Welcome Corner
Blogging in the Community
Directory
Community Leaders
Resources
Gamification
Marketplace
Marketplace
AI and Data Science
Master the art of AI and Data Science.
Ask a question
Missed IBM TechXchange Dev Day: Virtual Agents? On-demand viewing is available
here
Skip main navigation (Press Enter).
Toggle navigation
Search Options
Global AI and Data Science
Group Navigator
View Only
Community Home
Threads
4K
Library
361
Blogs
869
Events
0
Members
27.1K
Share
Tukey Fences for Outliers
By
Moloy De
posted
Thu March 25, 2021 10:32 PM
0
Like
An outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. An outlier can cause serious problems in statistical analyses.
Outliers can occur by chance in any distribution, but they often indicate either measurement error or that the population has a heavy-tailed distribution. In the former case one wishes to discard them or use statistics that are robust to outliers, while in the latter case they indicate that the distribution has high skewness and that one should be very cautious in using tools or intuitions that assume a normal distribution. A frequent cause of outliers is a mixture of two distributions, which may be two distinct sub-populations, or may indicate 'correct trial' versus 'measurement error'; this is modeled by a mixture model.
Deletion of outlier data is a controversial practice frowned upon by many scientists and science instructors; while mathematical criteria provide an objective and quantitative method for data rejection, they do not make the practice more scientifically or methodologically sound, especially in small sets or where a normal distribution cannot be assumed. Rejection of outliers is more acceptable in areas of practice where the underlying model of the process being measured and the usual distribution of measurement error are confidently known. An outlier resulting from an instrument reading error may be excluded but it is desirable that the reading is at least verified.
There is no rigid mathematical definition of what constitutes an outlier; determining whether or not an observation is an outlier is ultimately a subjective exercise. There are various methods of outlier detection. Some are graphical such as normal probability plots. Others are model-based. Box plots are a hybrid.
Model-based methods which are commonly used for identification assume that the data are from a normal distribution, and identify observations which are deemed "unlikely" based on mean and standard deviation:
1. Chauvenet's criterion
2. Grubbs's test for outliers
3. Dixon's Q test
4. ASTM E178 Standard Practice for Dealing With Outlying Observations
5. Mahalanobis distance and leverage are often used to detect outliers, especially in the development of linear regression models.
6. Subspace and correlation based techniques for high-dimensional numerical data.
A nonparametric outlier detection method. It is calculated by creating a “fence” boundary a distance of 1.5 IQR beyond the 1st and 3rd quartiles. Any data beyond these fences are considered to be outliers.
for some nonnegative constant k. John Tukey proposed this test, where k = 1.5 indicates an "outlier", and k = 3 indicates data that is "far out". Being nonparametric Tukey Fences are robust methods in detecting outliers.
QUESTION I: Could we apply Tukey Fence on ranked observation?
QUESTION II: Could we find the probability of an observation escaping Tukey Fence?
REFERENCE:
Wikipedia
#GlobalAIandDataScience
#GlobalDataScience
0 comments
6 views
Permalink
Community
AI and Data Science
Topic groups
Decision Optimization
Embeddable AI
Global AI and Data Science
SPSS Statistics
Watson Discovery
watsonx Assistant
Groups
AI
Automation
Data
Security
Sustainability
Cloud
IBM Z & LinuxONE
Power
Storage
IBM Japan
All Groups
Champions
User groups
AI and Data Science user groups
All user groups
Events
IBM TechXchange Conference
Upcoming AI and Data Science events
IBM TechXchange Webinars
All IBM TechXchange Community Events
Participate
Welcome Corner
Blogging in the Community
Directory
Community Leaders
Resources
Gamification
Marketplace
Marketplace
Powered by Higher Logic