Description
A beeswarm plot displays every individual data point along a single quantitative axis, using a force-directed or algorithmic layout to push overlapping points sideways. The result resembles a swarm of bees clustered along a line: dense regions bulge outward while sparse regions remain narrow. This creates a shape similar to a violin plot but composed of discrete, countable marks rather than a smooth density estimate.
The beeswarm’s greatest advantage over box plots and violin plots is that it preserves individual data points. Every observation is visible, making the plot ideal for small-to-medium datasets (roughly 20-500 points per group) where individual values carry meaning — for example, experiment results, student scores, or salary distributions where each dot represents a real entity.
Because the lateral displacement is purely cosmetic (it encodes no variable), readers can focus on the quantitative axis to read values and on the emergent shape to perceive distribution. Color or shape can encode an additional categorical variable, enabling within-group comparisons. The main limitation is scalability: with thousands of points, the swarm becomes too wide or too dense to resolve, at which point a violin plot or histogram is preferable.
When to Use
- Showing every data point in a distribution for small-to-medium datasets (20-500 points per group)
- Comparing distributions across categories while preserving individual identity
- Revealing clusters, gaps, and outliers that summary statistics would hide
- Communicating sample size visually — the number of dots is the number of observations
When NOT to Use
- When datasets are very large (>500 points per group) — the swarm becomes unresolvable; use a violin plot or histogram
- When the distribution shape is more important than individual values — use a violin plot for smoother density estimation
- When precise positional comparisons matter on both axes — lateral displacement is arbitrary; use a scatterplot if both axes encode real variables
- When you have only summary statistics (no raw data) — use a box plot or bar chart with error bars
Anatomy
- Points (dots): Each data point is drawn as a small circle, positioned along the quantitative axis
- Lateral displacement: Points are nudged sideways to prevent overlap; the displacement direction carries no data meaning
- Category axis: Groups are spaced along one axis, with the swarm centered on each category position
- Value axis: The quantitative measurement axis (usually vertical)
- Color encoding: Optional fill color to distinguish sub-groups within each category
- Reference lines: Optional median or mean lines overlaid on each swarm
Variations
- Strip plot (jitter plot): Random lateral jitter instead of force-based layout; simpler but may still have overlaps
- Sina plot: Constrains the jitter width to the local density, combining features of beeswarm and violin
- Beeswarm + box plot: A box plot drawn behind the swarm provides summary statistics alongside individual points
- Categorical beeswarm: Points are colored or shaped by a sub-category, enabling within-group comparisons
- Connected beeswarm: Lines connect the same entity across two groups, showing paired differences
Code Reference
// Observable Plot - dot plot with jitter (beeswarm approximation)
Plot.plot({
marks: [
Plot.dot(data, {
x: "category",
y: "value",
fill: "category",
r: 3,
fillOpacity: 0.6,
tip: true
}),
Plot.boxY(data, {x: "category", y: "value", stroke: "#999"})
],
y: {grid: true}
})