Description
A histogram divides a continuous numerical variable into contiguous, non-overlapping intervals (bins) and draws a bar for each bin whose height represents the count or density of observations falling within that range. Unlike a bar chart, the bars are adjacent with no gaps, visually communicating that the underlying variable is continuous rather than categorical.
The histogram is the most common tool for understanding the shape of a univariate distribution. It reveals modality (unimodal, bimodal, multimodal), skewness (left, right, symmetric), spread, and the presence of outliers or gaps. These distributional features are invisible in summary statistics like mean and standard deviation, making histograms essential for exploratory data analysis.
Bin width is the critical design choice. Too few bins over-smooth the data and hide structure; too many bins create noisy, hard-to-read charts. Common heuristics include Sturges’ rule, the Freedman-Diaconis rule, and Scott’s rule, but visual inspection and domain knowledge should guide the final decision.
Prompt Examples
Try these prompts with Claude, ChatGPT, or other AI tools:
“고객 연령 분포를 5세 간격의 히스토그램으로 만들어주세요.”
“Create a histogram of test scores with 10-point bins. Add a normal distribution overlay.”
When to Use
- Understanding the shape of a continuous variable’s distribution (salary, temperature, test scores)
- Checking for skewness, outliers, or multi-modality before statistical modeling
- Comparing the distribution of a variable across groups using overlapping or faceted histograms
- Quality control: verifying that measurements fall within expected ranges
When NOT to Use
- Comparing values across categories — use a bar chart (histograms are for continuous data)
- When the sample size is very small (fewer than 20 observations) — individual data points or a dot plot is more informative
- Comparing distributions across many groups — a box plot or violin plot is more compact
- When temporal ordering matters — use a line graph
Anatomy
- Bins (bars): Contiguous rectangular bars. Width spans the bin interval; height encodes frequency or density.
- X-axis: The continuous variable, divided into bin intervals.
- Y-axis: Count (frequency), relative frequency, or density.
- Bin edges: The boundaries between adjacent bins. They define the intervals.
- No gaps: Bars touch each other, signaling a continuous variable. This distinguishes histograms from bar charts.
- Rug plot: Optional tick marks along the x-axis showing individual data points, adding detail that binning hides.
- Density curve: An optional smoothed overlay (kernel density estimate) summarizing the distributional shape.
Variations
- Density histogram: Y-axis shows probability density rather than counts, so the total area sums to 1. Enables comparison across datasets of different sizes.
- Cumulative histogram: Bars show running totals, useful for percentile analysis.
- Stacked histogram: Bars are subdivided by a categorical variable to compare group distributions.
- Overlapping histograms: Semi-transparent histograms from multiple groups drawn on the same axes.
- 2D histogram (hexbin): Extends the concept to two dimensions, binning a scatterplot into rectangles or hexagons colored by count.
Code Reference
// Observable Plot - histogram with automatic binning
Plot.plot({
marks: [
Plot.rectY(data, Plot.binX({ y: "count" }, {
x: "age",
fill: "steelblue",
tip: true
})),
Plot.ruleY([0])
],
x: { label: "Age" },
y: { label: "Count" }
})