HAHS.
Back to Catalog

Box Plot

chart

Also known as: box and whisker plot, box-and-whisker diagram, five-number summary

Show distributionCompareShow deviation NumericalCategorical Bar/Column

Description

The box plot (or box-and-whisker plot) is one of the most compact and informative ways to summarize a numerical distribution. It encodes the five-number summary — minimum, first quartile (Q1), median, third quartile (Q3), and maximum — into a simple geometric form. The “box” spans the interquartile range (IQR, from Q1 to Q3), containing the middle 50% of the data, while a line inside the box marks the median. “Whiskers” extend from the box to the most extreme data points within 1.5 times the IQR, and individual points beyond the whiskers are plotted as outliers.

Box plots are especially powerful when used side by side to compare distributions across categories. Because each box occupies minimal horizontal space, dozens of groups can be displayed simultaneously — something that would be overwhelming with histograms or density plots. The standardized anatomy also makes it easy for trained readers to quickly assess center, spread, skewness (from median position within the box), and the presence of outliers.

However, box plots hide the shape of the distribution within the box. A bimodal distribution and a uniform distribution can produce identical box plots if their quartiles match. For this reason, box plots are often paired with jittered points, strip plots, or violin plots to reveal the underlying data density.

Box Plot — interactive example

When to Use

  • Comparing distributions of a continuous variable across multiple categories
  • Identifying outliers in a dataset quickly
  • Summarizing large datasets where individual points would create visual clutter
  • Assessing symmetry and skewness of distributions
  • Showing the spread and central tendency in a compact space

When NOT to Use

  • When the audience is unfamiliar with statistical quartiles — consider a histogram or violin plot with annotations
  • When the shape of the distribution matters (bimodal, multi-modal) — use a violin plot or density plot instead
  • When you have very few data points (fewer than 10) — show the raw points directly as a strip plot or dot plot
  • When comparing exact values rather than distributions — use a bar chart with error bars

Anatomy

  • Box: A rectangle spanning from Q1 to Q3 (the interquartile range), representing the middle 50% of values
  • Median line: A line (or notch) within the box at the median (Q2)
  • Whiskers: Lines extending from each end of the box to the farthest data point within 1.5 * IQR
  • Outliers: Individual points plotted beyond the whiskers
  • Category axis: The axis along which different groups are arranged (horizontal or vertical)
  • Value axis: The continuous scale encoding the measured variable

Variations

  • Notched box plot: Notches around the median approximate a confidence interval; if notches of two boxes don’t overlap, their medians are significantly different
  • Variable-width box plot: Box width is proportional to sample size, so readers can assess group sizes
  • Box plot with jittered points: Raw data points are overlaid (with random horizontal jitter) to show density within the box
  • Grouped / nested box plot: Multiple box plots per category for comparing sub-groups (e.g., gender within age groups)
  • Horizontal box plot: Rotated 90 degrees, sometimes easier to read when category labels are long

Code Reference

// Observable Plot box plot
import * as Plot from "@observablehq/plot";

Plot.plot({
  y: {grid: true, label: "Value"},
  marks: [
    Plot.boxY(data, {x: "category", y: "value", fill: "category"}),
    Plot.ruleY([0])
  ]
})