Beeswarm Plot

설명 / Description

비스웜 플롯은 힘-기반(force-directed) 또는 알고리즘적 레이아웃을 사용해 겹치는 점들을 옆으로 밀어내면서, 하나의 정량적 축을 따라 모든 개별 데이터 포인트를 표시합니다. 그 결과는 벌떼가 선을 따라 모여 있는 모습과 비슷합니다: 밀집된 구간은 옆으로 부풀어 오르고 희박한 구간은 좁게 유지됩니다. 이는 바이올린 플롯과 비슷한 형태를 만들어내지만, 매끄러운 밀도 추정치 대신 개별적으로 셀 수 있는 마크들로 구성된다는 점이 다릅니다.

A beeswarm plot displays every individual data point along a single quantitative axis, using a force-directed or algorithmic layout to push overlapping points sideways. The result resembles a swarm of bees clustered along a line: dense regions bulge outward while sparse regions remain narrow. This creates a shape similar to a violin plot but composed of discrete, countable marks rather than a smooth density estimate.

비스웜 플롯이 박스 플롯이나 바이올린 플롯보다 가진 가장 큰 장점은 개별 데이터 포인트를 그대로 보존한다는 점입니다. 모든 관측값이 눈에 보이기 때문에, 개별 값이 의미를 갖는 중소 규모 데이터셋(그룹당 대략 20~500개 포인트)에 이상적입니다 — 예를 들어 실험 결과, 학생 성적, 혹은 각 점이 실제 개체를 나타내는 급여 분포 등입니다.

The beeswarm's greatest advantage over box plots and violin plots is that it preserves individual data points. Every observation is visible, making the plot ideal for small-to-medium datasets (roughly 20-500 points per group) where individual values carry meaning -- for example, experiment results, student scores, or salary distributions where each dot represents a real entity.

옆으로의 변위는 순전히 시각적 장치일 뿐(어떤 변수도 인코딩하지 않음)이므로, 독자는 정량적 축에 집중해 값을 읽고 전체적으로 드러나는 형태를 통해 분포를 파악할 수 있습니다. 색상이나 모양으로 추가적인 범주형 변수를 인코딩하여 그룹 내 비교도 가능합니다. 주된 한계는 확장성입니다: 수천 개의 점이 있으면 스웜이 너무 넓어지거나 너무 조밀해져 구분이 어려워지는데, 이 경우 바이올린 플롯이나 히스토그램이 더 낫습니다.

Because the lateral displacement is purely cosmetic (it encodes no variable), readers can focus on the quantitative axis to read values and on the emergent shape to perceive distribution. Color or shape can encode an additional categorical variable, enabling within-group comparisons. The main limitation is scalability: with thousands of points, the swarm becomes too wide or too dense to resolve, at which point a violin plot or histogram is preferable.

Beeswarm Plot — interactive example

언제 사용하나 / When to Use

중소 규모 데이터셋(그룹당 20~500개 포인트)의 분포에서 모든 데이터 포인트 보여주기
Showing every data point in a distribution for small-to-medium datasets (20-500 points per group)
개별 정체성을 유지하면서 범주 간 분포 비교하기
Comparing distributions across categories while preserving individual identity
요약 통계로는 가려질 클러스터, 공백, 이상치 드러내기
Revealing clusters, gaps, and outliers that summary statistics would hide
표본 크기를 시각적으로 전달하기 -- 점의 개수가 곧 관측치의 개수
Communicating sample size visually -- the number of dots is the number of observations

이럴 땐 피하세요 / When NOT to Use

데이터셋이 매우 클 때(그룹당 500개 초과) -- 스웜을 구분할 수 없게 됩니다; 바이올린 플롯이나 히스토그램을 사용하세요
When datasets are very large (>500 points per group) -- the swarm becomes unresolvable; use a violin plot or histogram
개별 값보다 분포 형태가 더 중요할 때 -- 더 매끄러운 밀도 추정을 위해 바이올린 플롯을 사용하세요
When the distribution shape is more important than individual values -- use a violin plot for smoother density estimation
두 축 모두에서 정밀한 위치 비교가 중요할 때 -- 옆으로의 변위는 임의적입니다; 두 축이 모두 실제 변수를 인코딩한다면 산점도를 사용하세요
When precise positional comparisons matter on both axes -- lateral displacement is arbitrary; use a scatterplot if both axes encode real variables
요약 통계만 있고 원자료가 없을 때 -- 박스 플롯이나 오차 막대가 있는 막대 차트를 사용하세요
When you have only summary statistics (no raw data) -- use a box plot or bar chart with error bars

구조 / Anatomy

포인트(점): 각 데이터 포인트는 정량적 축을 따라 배치된 작은 원으로 그려집니다
Points (dots): Each data point is drawn as a small circle, positioned along the quantitative axis
옆으로의 변위: 겹침을 방지하기 위해 점들이 옆으로 살짝 밀리며, 변위 방향 자체는 데이터적 의미를 갖지 않습니다
Lateral displacement: Points are nudged sideways to prevent overlap; the displacement direction carries no data meaning
범주 축: 그룹들이 한 축을 따라 배치되며, 각 범주 위치를 중심으로 스웜이 형성됩니다
Category axis: Groups are spaced along one axis, with the swarm centered on each category position
값 축: 정량적 측정값을 나타내는 축(보통 세로축)
Value axis: The quantitative measurement axis (usually vertical)
색상 인코딩: 각 범주 내 하위 그룹을 구분하기 위한 선택적 채우기 색상
Color encoding: Optional fill color to distinguish sub-groups within each category
기준선: 각 스웜 위에 겹쳐 표시하는 선택적 중앙값 또는 평균선
Reference lines: Optional median or mean lines overlaid on each swarm

변형 / Variations

스트립 플롯(지터 플롯): 힘-기반 레이아웃 대신 무작위 좌우 지터를 사용; 더 단순하지만 여전히 겹침이 발생할 수 있음
Strip plot (jitter plot): Random lateral jitter instead of force-based layout; simpler but may still have overlaps
시나 플롯: 지터 폭을 지역 밀도에 맞게 제한하여 비스웜과 바이올린의 특징을 결합
Sina plot: Constrains the jitter width to the local density, combining features of beeswarm and violin
비스웜 + 박스 플롯: 스웜 뒤에 박스 플롯을 그려 개별 포인트와 함께 요약 통계를 제공
Beeswarm + box plot: A box plot drawn behind the swarm provides summary statistics alongside individual points
범주형 비스웜: 하위 범주에 따라 점의 색상이나 모양을 다르게 하여 그룹 내 비교를 가능하게 함
Categorical beeswarm: Points are colored or shaped by a sub-category, enabling within-group comparisons
연결된 비스웜: 동일한 개체를 두 그룹에 걸쳐 선으로 연결하여 짝지어진 차이를 보여줌
Connected beeswarm: Lines connect the same entity across two groups, showing paired differences

코드 레퍼런스 / Code Reference

// Observable Plot - dot plot with jitter (beeswarm approximation)
Plot.plot({
  marks: [
    Plot.dot(data, {
      x: "category",
      y: "value",
      fill: "category",
      r: 3,
      fillOpacity: 0.6,
      tip: true
    }),
    Plot.boxY(data, {x: "category", y: "value", stroke: "#999"})
  ],
  y: {grid: true}
})