Knowledge base

Outliers

Introduction: Outliers (Statistical Deviations)

Outliers are data points that significantly differ from the rest of a dataset. They may be unusually high or low values that lie far outside the expected range of variation. In statistics and Lean Six Sigma, identifying outliers is essential for understanding process stability, detecting measurement errors, and improving data quality.

Background

The concept of outliers dates back to early statistical analysis, with Sir Francis Galton and Karl Pearson developing methods to detect abnormal values. Outliers can arise from random variation, special causes, or data recording errors.
In Lean Six Sigma, detecting outliers helps distinguish between common cause variation (normal process fluctuation) and special cause variation (specific, correctable issues). Recognising this distinction supports effective problem-solving in the Analyse phase of DMAIC.

Key Elements / Features

  • Definition: Data points that fall outside the typical pattern of a dataset.
  • Causes: Measurement errors, process changes, environmental effects, or rare events.
  • Detection Methods:
    • Statistical tests: Using z-scores or interquartile ranges (IQR).
    • Visual tools: Boxplots, scatter plots, or control charts.

Impact: Outliers can distort averages, standard deviations, and regression models if not properly handled.

Rule of Thumb (Normal Distribution):

|z| = \left|\frac{x - \bar{x}}{s}\right| > 3

Where:

  • x = observed value
  • xˉ = mean of the dataset
  • s = standard deviation

A value with a z-score greater than 3 (or less than -3) is often considered an outlier.

Applications / Examples

  • Quality Control: Detecting unusually high defect rates or extreme measurement readings.
  • Finance: Identifying abnormal price movements or fraud indicators.
  • Healthcare: Recognising patients with outlier recovery times or lab results.
  • Manufacturing: Flagging production cycles with exceptional scrap or rework rates.

Example:
If the average part weight is 50 g with a standard deviation of 2 g, any measurement above 56 g or below 44 g may be treated as an outlier and investigated for root cause.

Relevance / Impact

Identifying and managing outliers is essential for maintaining data accuracy and process control. In Lean Six Sigma, understanding whether an outlier reflects random noise or a true process issue ensures that improvement efforts are focused and effective.
Removing invalid outliers improves the reliability of statistical analyses, while investigating valid ones often leads to breakthrough insights.

See also

Anend Harkhoe
Lean Consultant & Trainer | MBA in Lean & Six Sigma | Founder of Dmaic.com & Lean.nl
With extensive experience in healthcare (hospitals, elderly care, mental health, GP practices), banking and insurance, manufacturing, the food industry, consulting, IT services, and government, Anend is eager to guide you into the world of Lean and Six Sigma. He believes in the power of people, action, and experimentation. At Dmaic.com and Lean.nl, everything revolves around practical knowledge and hands-on training. Lean is not just a theory—it’s a way of life that you need to experience. From Tokyo’s karaoke bars to Toyota’s lessons—Anend makes Lean tangible and applicable. Lean.nl organises inspiring training sessions and study trips to Lean companies in Japan, such as Toyota. Contact: info@dmaic.com

Online Lean courses
100% Lean, at your own pace

Most popular article