Cross-filter

설명 / Description

크로스필터링은 여러 뷰에 걸친 인터랙션 기법으로, 어느 한 차트에서 브러싱하거나 필터링하면 나머지 모든 차트에 표시되는 데이터가 동시에 제약됩니다. 이 기법의 핵심 특징은 각 차트가 자기 자신의 차원에 대한 분포를 보여주되, 다른 모든 차트에서 활성화된 선택 조건에 의해서만 필터링되고 자기 자신의 선택에는 영향받지 않는다는 점입니다. 이러한 비대칭적 필터링 덕분에 각 차트는 필터 컨트롤이자 데이터 디스플레이 역할을 동시에 수행하며, 긴밀하게 결합된 탐색 환경을 만들어냅니다.

Cross-filtering is a multi-view interaction technique in which brushing or filtering in any one chart simultaneously constrains the data shown in all other charts. The defining characteristic is that each chart shows the distribution of its own dimension, filtered by the selections active in all other charts — but not by its own selection. This asymmetric filtering allows each chart to serve as both a filter control and a data display, creating a tightly coupled exploratory environment.

이 기법은 Crossfilter.js 라이브러리(2012년 Square에서 마이크 보스톡(Mike Bostock) 등이 개발)로 대중화되었으며, 이 라이브러리는 수십만 개의 레코드에 대한 다차원 필터링이 브라우저에서 인터랙티브한 속도로 처리될 수 있음을 보여주었습니다. 정렬된 인덱스와 증분 필터링을 사용해 밀리초 미만의 업데이트를 달성함으로써, 이 인터랙션 특유의 긴밀한 피드백 루프를 자바스크립트만으로도 구현 가능하게 만들었습니다.

The technique was popularized by the Crossfilter.js library (developed by Mike Bostock and others at Square, 2012) which demonstrated that multi-dimensional filtering over hundreds of thousands of records could happen at interactive speeds in the browser. The library used sorted indices and incremental filtering to achieve sub-millisecond updates, making the tight feedback loop essential to the interaction feel achievable in JavaScript.

크로스필터링은 사용자가 여러 데이터 차원 간의 관계를 이해하고자 하는 탐색적 데이터 분석 작업에 특히 강력합니다. 예를 들어 한 히스토그램에서 범위를 브러싱(예: 항공편 지연 30분 이상)하고 다른 히스토그램들(시간대, 항공사, 출발 공항)의 분포가 어떻게 변화하는지 관찰함으로써, 사용자는 데이터의 다변량 구조에 대한 심상 모형을 빠르게 구축할 수 있습니다. 이 때문에 크로스필터링은 다차원 데이터 탐색을 위한 가장 강력한 기법 중 하나이지만, 데이터셋과 뷰의 개수가 늘어날수록 성능을 유지하기 위한 세심한 엔지니어링이 필요합니다.

Cross-filtering excels at exploratory data analysis tasks where the user wants to understand how different data dimensions relate. By brushing a range on one histogram (e.g., flight delay > 30 minutes) and observing how the distributions shift in other histograms (time of day, carrier, origin airport), the user can rapidly build a mental model of the data's multivariate structure. This makes it one of the most powerful techniques for multi-dimensional data exploration, but it requires careful engineering to maintain performance as the dataset and number of views grow.

Cross Filter — try it yourself

언제 사용하나 / When to Use

흥미롭게 상호작용할 수 있는 3~8개의 수치형 또는 범주형 차원을 가진 데이터셋을 탐색할 때.
When exploring a dataset with 3-8 numerical or categorical dimensions that might interact in interesting ways.
각 차원에 자연스러운 차트 유형이 있고(연속형에는 히스토그램, 범주형에는 막대 차트) 목표가 결합 분포를 이해하는 것일 때.
When each dimension has a natural chart type (histograms for continuous, bar charts for categorical) and the goal is to understand joint distributions.
사용자가 "X가 이 범위 안에 있는 항목들에서 Y의 분포를 보여줘"라는 질문에 답해야 하는 대시보드 환경에서.
In dashboard contexts where users need to answer "show me the distribution of Y for items where X is in this range."
데이터셋이 소~중규모(최대 수십만 행)이고 실시간 필터링이 가능할 때.
When the dataset is small-to-medium (up to a few hundred thousand rows) and real-time filtering is feasible.
데이터 감사, 이상치 탐지, 품질 검사 워크플로에서.
For data auditing, anomaly detection, and quality checking workflows.

이럴 땐 피하세요 / When NOT to Use

차원이 1~2개뿐일 때 — 단순한 동적 쿼리나 브러시로 충분합니다.
When there are only 1-2 dimensions — a simple dynamic query or brush suffices.
데이터셋이 너무 커서 서버 사이드 처리 없이는 실시간 크로스필터링이 계산상 불가능할 때.
When the dataset is so large that real-time cross-filtering is computationally infeasible without server-side processing.
차원들이 의미 있게 독립적이지 않을 때 — 중복된 차원을 크로스필터링하면 화면 공간이 낭비됩니다.
When the dimensions are not meaningfully independent — cross-filtering redundant dimensions wastes screen space.
청중이 열린 탐색이 아니라 안내된 서사를 기대할 때.
When the audience expects a guided narrative rather than open exploration.
차트 개수가 사용자가 인지적으로 추적할 수 있는 범위를 넘어설 때(동시에 8~10개 이상의 뷰는 압도적으로 느껴집니다).
When the number of charts exceeds what the user can cognitively track (more than 8-10 simultaneous views becomes overwhelming).

작동 방식 / How It Works

여러 차트가 대시보드 레이아웃에 배치되며, 각각 동일한 데이터셋의 서로 다른 차원을 보여줍니다(예: 가격, 평점, 연도의 히스토그램; 카테고리, 지역의 막대 차트).
Multiple charts are arranged in a dashboard layout, each showing a different dimension of the same dataset (e.g., histograms of price, rating, year; bar charts of category, region).
사용자가 한 차트에서 범위를 브러싱합니다 — 예를 들어 연도 히스토그램에서 2010~2015년을 선택합니다.
The user brushes a range in one chart — for example, selecting years 2010–2015 on the year histogram.
다른 모든 차트가 즉시 업데이트되어 브러싱된 연도 범위 안의 항목만으로 필터링된 자신의 차원 분포를 보여줍니다.
All other charts immediately update to show the distribution of their dimension filtered to only items within the brushed year range.
브러싱된 차트 자체는 자신의 막대를 다시 필터링하지 않습니다 — 브러시 오버레이와 함께 전체 분포를 계속 보여주어 사용자가 선택 범위를 조정할 수 있게 합니다.
The brushed chart itself does NOT re-filter its own bars — it continues to show the full distribution with the brush overlay, so the user can adjust the selection.
사용자는 추가 차트에도 브러싱하여 교집합 필터를 더할 수 있습니다. 항공사 막대 차트에서 "Delta"를 브러싱하면 항공사 차트를 제외한 모든 뷰의 데이터가 추가로 제약됩니다.
The user can brush additional charts, adding intersection filters. Brushing carrier "Delta" on the carrier bar chart further constrains the data in all views except the carrier chart.
어느 차트에서든 브러시를 지우면 해당 차원의 필터가 제거되고, 다른 모든 차트가 그에 맞춰 업데이트됩니다.
Clearing a brush on any chart removes that dimension's filter, and all other charts update accordingly.
요약 통계(총 개수, 평균값)도 차트와 함께 업데이트됩니다.
Summary statistics (total count, mean values) update alongside the charts.

변형 / Variations

히스토그램 크로스필터: 가장 고전적인 형태입니다. 각 차원마다 1차원 브러시가 있는 히스토그램이 부여됩니다. Crossfilter.js로 대중화되었습니다.
Histogram cross-filter: The classic form. Each dimension gets a histogram with a 1D brush. Popularized by Crossfilter.js.
이종 차트 유형 크로스필터: 차원마다 다른 차트 유형을 사용합니다 — 연속형 변수 두 개에는 산점도, 범주형에는 막대 차트, 시간형에는 타임라인.
Chart-type heterogeneous cross-filter: Different chart types per dimension — scatterplot for two continuous variables, bar chart for categorical, timeline for temporal.
서버 사이드 크로스필터: 대규모 데이터셋의 경우 필터링이 서버나 데이터베이스에서 계산되고, 클라이언트는 브러시 범위를 쿼리 파라미터로 전송합니다. Falcon, OmniSci Immerse 같은 라이브러리가 이를 구현합니다.
Server-side cross-filter: For large datasets, the filtering is computed on a server or database, with the client sending brush extents as query parameters. Libraries like Falcon and OmniSci Immerse implement this.
근사 크로스필터: 데이터 큐브, 샘플링, 또는 근사 쿼리 처리를 사용하여 매우 큰 데이터에서도 인터랙티브한 속도를 유지합니다.
Approximate cross-filter: Uses data cubes, sampling, or approximate query processing to maintain interactive speeds on very large data.
집계 변경이 있는 크로스필터: 필터가 좁혀짐에 따라 집계 수준이 바뀔 수 있습니다(예: 시간 범위가 좁아지면 월별 구간에서 일별 구간으로 전환).
Cross-filter with aggregation change: As filters narrow, the aggregation level can change (e.g., from monthly to daily bins as the time range shrinks).

코드 레퍼런스 / Code Reference

// Conceptual cross-filter with D3 and Crossfilter.js
const cf = crossfilter(data);
const dimYear = cf.dimension(d => d.year);
const dimPrice = cf.dimension(d => d.price);
const grpYear = dimYear.group();
const grpPrice = dimPrice.group(d => Math.floor(d / 10) * 10);

// When the user brushes the year chart:
function onYearBrush([y0, y1]) {
  dimYear.filterRange([y0, y1]);
  // Price chart re-renders using grpPrice.all()
  // which now reflects only items in the year range
  renderPriceChart(grpPrice.all());
}