Hexbin Plot

설명 / Description

헥스빈 플롯은 산점도에 수천~수백만 개의 점이 있을 때 발생하는 과다 표시(overplotting) 문제를 해결합니다. 개별 점을 각각 렌더링하는 대신, 2차원 평면을 육각형 빈의 정규 격자로 나누고 각 육각형에 속하는 데이터 포인트의 수를 셉니다. 그 개수(또는 평균이나 중앙값 같은 집계 통계량)는 육각형의 채우기 색, 불투명도, 또는 3D 변형에서는 높이로 인코딩됩니다.

A hexbin plot addresses the overplotting problem that arises when scatterplots contain thousands or millions of points. Instead of rendering each individual point, the 2D plane is divided into a regular grid of hexagonal bins, and the number of data points falling within each hexagon is counted. The count (or an aggregate statistic like mean or median) is then encoded as the hexagon's fill color, opacity, or even height in a 3D variant.

빈으로 육각형을 정사각형보다 선호하는 이유는 중심에서 가장자리까지의 거리가 더 균일하고(모든 이웃이 등거리), 정렬 아티팩트 없이 테셀레이션되며, 사각형 격자에서 발생하는 시각적 밴딩을 줄이기 때문입니다. 육각형 격자는 패킹 효율도 더 높아, 동일한 해상도의 사각형 격자보다 더 적은 수의 빈으로 밀도 패턴을 포착할 수 있습니다.

Hexagons are preferred over squares for binning because they have a more uniform distance from center to edge (all neighbors are equidistant), they tessellate without alignment artifacts, and they reduce visual banding that rectangular grids produce. The hexagonal grid also has a higher packing efficiency, meaning it captures density patterns with fewer bins than a square grid of equivalent resolution.

헥스빈 플롯은 본질적으로 2차원 히스토그램입니다: 표준 히스토그램이 하나의 축을 따라 데이터를 빈으로 나누는 것과 달리, 헥스빈 플롯은 두 축을 동시에 빈으로 나눕니다. 그 결과 두 변수의 결합 분포가 드러나서, 방대한 데이터셋에서도 군집, 상관관계, 이상치를 쉽게 식별할 수 있습니다. 빈 크기(육각형 반지름)를 선택하는 것은 히스토그램의 빈 너비를 선택하는 것과 비슷합니다 -- 너무 작으면 잡음이 많고 성긴 육각형이 되고, 너무 크면 흥미로운 구조가 뭉개집니다.

The hexbin plot is essentially a 2D histogram: where a standard histogram bins data along one axis, a hexbin plot bins along two axes simultaneously. The resulting display reveals the joint distribution of two variables, making it easy to identify clusters, correlations, and outliers even in massive datasets. The choice of bin size (hexagon radius) is analogous to choosing histogram bin width -- too small produces noisy, sparse hexagons; too large smooths out interesting structure.

Hexbin Plot — interactive example

언제 사용하나 / When to Use

과다 표시로 개별 점이 보이지 않는 매우 큰 산점도(수천~수백만 개의 점)의 밀도를 시각화할 때
Visualizing the density of very large scatterplots (thousands to millions of points) where overplotting makes individual points invisible
두 연속 변수의 결합 분포를 2차원 밀도 지도로 보여줄 때
Showing the joint distribution of two continuous variables as a 2D density map
이변량 데이터에서 군집, 능선, 공백 영역을 식별할 때
Identifying clusters, ridges, and voids in bivariate data
개별 점의 정체성보다 분포의 형태가 중요할 때 산점도를 대체할 때
Replacing scatterplots when point-level identity is not important but distributional shape is

이럴 땐 피하세요 / When NOT to Use

개별 데이터 포인트가 반드시 보여야 하는 정체성이나 의미를 가질 때 -- 투명도나 지터를 적용한 산점도를 사용하세요
When individual data points carry identity or meaning that must be visible -- use a scatterplot with transparency or jitter
점이 약 200개 미만일 때 -- 산점도만으로도 이미 읽기 쉬우며, 헥스빈 처리는 과도한 집계가 됩니다
When you have fewer than ~200 points -- the scatterplot is already readable; hexbinning would over-aggregate
변수 간 관계가 밀도보다 중요할 때 -- 추세선을 곁들인 산점도를 사용하세요
When the relationship between variables is more important than density -- use a scatterplot with a trend line
하나 이상의 변수가 범주형일 때 -- 대신 히트맵이나 스트립 플롯을 사용하세요
When one or both variables are categorical -- use a heatmap or strip plot instead
정확한 x/y 값을 읽어야 할 때 -- 육각형 집계는 개별 위치를 가립니다
When precise x/y values need to be read -- hexagonal aggregation obscures individual positions

구조 / Anatomy

육각형 빈: 2차원 평면을 타일링하는 정육각형으로, 각각이 작은 영역을 덮습니다
Hexagonal bins: Regular hexagons tiling the 2D plane, each covering a small region
채우기 색: 순차 색상 스케일(밝음 = 적음, 어두움 = 많음)을 사용해 각 육각형 내 개수(또는 다른 집계값)를 인코딩합니다
Fill color: Encodes the count (or other aggregate) within each hexagon, using a sequential color scale (light = few, dark = many)
빈 크기(반지름): 빈 분할의 해상도를 결정합니다; 빈이 작을수록 더 세밀한 구조를, 클수록 더 넓은 패턴을 보여줍니다
Bin size (radius): Controls the resolution of the binning; smaller bins reveal finer structure, larger bins show broader patterns
색상 스케일 범례: 색상을 개수 값에 대응시키는 그라디언트 범례
Color scale legend: A gradient legend mapping colors to count values
축: 두 연속 변수를 보여주는 표준 x축과 y축
Axes: Standard x and y axes showing the two continuous variables
빈 공간: 점이 0개인 육각형은 대개 그려지지 않아, 데이터 경계를 보여주는 여백이 남습니다
Empty space: Hexagons with zero points are typically not drawn, leaving white space to show the data boundary

변형 / Variations

크기 인코딩이 있는 헥스빈: 육각형 반지름이 (색상에 더해) 개수에 따라 달라져 버블과 유사한 효과를 만듭니다
Hexbin with size encoding: Hexagon radius varies with count (in addition to color), creating a bubble-like effect
헥스빈 맵: 지리적 좌표에 육각형 빈 분할을 적용하여 지도 위의 점 데이터를 집계합니다
Hexbin map: Hexagonal binning applied to geographic coordinates, aggregating point data on a map
등고선이 있는 헥스빈: 더 매끄러운 경계 인지를 위해 헥스빈 격자 위에 밀도 등고선을 겹쳐 표시합니다
Hexbin with contours: Density contour lines overlaid on the hexbin grid for smoother boundary perception
사각형 빈 분할: 육각형 대신 사각형 빈을 사용합니다 -- 구현은 더 간단하지만 시각적 아티팩트가 더 많습니다
Square binning: Uses square bins instead of hexagons -- simpler to implement but with more visual artifacts
적응형 헥스빈: 다중 해상도 뷰를 위해 데이터 밀도에 따라 빈 크기가 지역적으로 달라집니다
Adaptive hexbin: Bin size varies locally based on data density for multi-resolution views
3D 헥스빈: 육각형이 각기둥 형태로 돌출되어 높이가 개수를 인코딩합니다 -- 시각적으로 인상적이지만 읽기는 더 어렵습니다
3D hexbin: Hexagons are extruded into prisms, with height encoding count -- visually striking but harder to read

코드 레퍼런스 / Code Reference

// Observable Plot - hexbin via the hexbin transform
Plot.plot({
  color: {scheme: "YlGnBu", legend: true, label: "Count"},
  marks: [
    Plot.hexgrid(),
    Plot.hex(data, Plot.hexbin(
      {fill: "count", r: "count"},
      {x: "var1", y: "var2", binWidth: 20}
    ))
  ]
})