Word Cloud

설명 / Description

워드 클라우드는 각 단어의 글자 크기가 빈도, 중요도, 또는 다른 정량적 지표에 비례하도록 조밀한 레이아웃으로 단어를 배치합니다. 단어들은 보통 다양한 각도(가로, 세로, 대각선)로 배치되어 사용 가능한 공간을 효율적으로 채우도록 촘촘하게 채워집니다. 그 결과 텍스트 코퍼스에서 지배적인 주제나 용어를 한눈에 전달하는, 즉각적으로 흥미를 끄는 시각물이 만들어집니다.

A word cloud arranges words in a compact layout where each word's font size is proportional to its frequency, importance, or another quantitative metric. Words are typically placed at various angles (horizontal, vertical, or diagonal) and packed together to fill the available space efficiently. The result is an immediately engaging visual that communicates the dominant themes or terms in a text corpus at a glance.

워드 클라우드가 인기 있는 이유는 직관적이기 때문입니다: 큰 단어일수록 더 중요하며, 단어들의 모음이 빠른 주제 요약을 제공합니다. 워드 클라우드는 설문 응답, 소셜 미디어 콘텐츠, 연설 대본, 키워드 분석을 요약하는 데 흔히 사용됩니다. 시각적 매력 덕분에 정밀도만큼이나 몰입도가 중요한 프레젠테이션, 인포그래픽, 에디토리얼 맥락에서도 인기 있는 선택지입니다.

Word clouds are popular because they are intuitive: larger words are more important, and the collection of words gives a quick thematic summary. They are commonly used for summarizing survey responses, social media content, speech transcripts, and keyword analysis. Their visual appeal also makes them a popular choice for presentations, infographics, and editorial contexts where engagement matters as much as precision.

그러나 워드 클라우드는 데이터 시각화 커뮤니티에서 가장 많이 비판받는 시각화 유형 중 하나입니다. 글자 크기는 정확한 정량적 비교를 위한 채널로는 취약합니다 -- 독자는 한 단어가 다른 단어보다 1.3배 큰지 1.5배 큰지 신뢰성 있게 구별할 수 없습니다. 단어의 배치는 본질적으로 임의적이며(패킹 알고리즘에 의해 결정됨), 공간적 위치가 어떤 의미를 가지는 것처럼 시각적으로 암시하지만 실제로는 아무 의미도 없습니다. 더 긴 단어는 단지 더 많은 가로 공간을 차지한다는 이유만으로 더 두드러져 보입니다. 단어 빈도의 정확한 비교가 필요한 어떤 분석에도, 빈도순으로 정렬된 막대 차트가 훨씬 더 효과적입니다. 워드 클라우드는 분석 도구가 아니라 정성적 개요 도구로 이해하는 것이 가장 적절합니다.

However, word clouds are among the most criticized visualization types in the data visualization community. Font size is a poor channel for precise quantitative comparison -- readers cannot reliably distinguish whether a word is 1.3x or 1.5x larger than another. Word placement is essentially arbitrary (determined by a packing algorithm), so spatial position carries no meaning despite the visual suggestion that it might. Longer words appear more prominent simply because they occupy more horizontal space. For any analysis requiring accurate comparison of word frequencies, a bar chart sorted by frequency is far more effective. Word clouds are best understood as a qualitative overview tool, not an analytical one.

Word Cloud — interactive example

언제 사용하나 / When to Use

텍스트 코퍼스에서 가장 흔한 용어들의 빠르고 시각적으로 흥미로운 개요를 제공할 때
Providing a quick, visually engaging overview of the most common terms in a text corpus
개방형 설문 응답이나 소셜 미디어 논의를 대략적인 수준에서 요약할 때
Summarizing open-ended survey responses or social media discussions at a high level
사용자가 단어를 클릭해 관련 콘텐츠로 드릴다운할 수 있는 탐색 진입점을 만들 때
Creating an entry point for exploration where users can click words to drill into associated content
목표가 정밀함이 아니라 주제적 인상을 주는 것일 때 프레젠테이션에 시각적 다양성을 더할 때
Adding visual variety to presentations when the goal is thematic impression rather than precision

이럴 땐 피하세요 / When NOT to Use

정확한 빈도 비교가 필요할 때 -- 값순으로 정렬된 막대 차트를 사용하세요
When accurate frequency comparison is needed -- use a bar chart sorted by value
단어의 맥락이나 동시 출현이 중요할 때 -- 워드 클라우드는 단어를 맥락에서 떼어내므로, 대신 동시 출현 용어의 네트워크 다이어그램을 사용하세요
When word context or co-occurrence matters -- word clouds strip words from their context; use a network diagram of co-occurring terms instead
텍스트 데이터가 전처리되지 않았을 때 -- 불용어 제거, 어간 추출, 정규화가 필수적이며, 가공되지 않은 워드 클라우드는 오해를 불러일으킵니다
When the text data hasn't been pre-processed -- stop words, stemming, and normalization are essential; raw word clouds are misleading
청중이 정량적 결론을 내려야 할 때 -- 크기-값 매핑이 분석 작업에 너무 부정확합니다
When the audience needs to draw quantitative conclusions -- the size-to-value mapping is too imprecise for analytical tasks
고유한 용어가 몇 개뿐일 때(10개 미만) -- 막대 차트나 단순한 순위 목록이 더 명확하고 정보가 풍부합니다
When there are only a few distinct terms (fewer than 10) -- a bar chart or simple ranked list is clearer and more informative

구조 / Anatomy

단어: 주요한 시각적 요소로, 각 단어는 해당 지표에 비례하는 글자 크기의 텍스트로 렌더링됩니다
Words: The primary visual elements; each word is rendered as text with font size proportional to its metric
글자 크기 스케일: 정량적 변수(빈도, TF-IDF, 중요도)를 글자 크기에 매핑합니다; 가장 큰 단어가 지배적으로 보이지 않도록 흔히 제곱근이나 로그 스케일을 사용합니다
Font size scale: Maps the quantitative variable (frequency, TF-IDF, importance) to font size; often uses a square root or log scale to prevent the largest words from dominating
색상 인코딩: 보통 미적인 목적(무작위 팔레트나 의미적 그룹화)이지만, 감성이나 카테고리 같은 두 번째 변수를 인코딩할 수도 있습니다
Color encoding: Typically aesthetic (random palette or semantic grouping), though it can encode a second variable like sentiment or category
레이아웃 알고리즘: 겹침을 피하면서 패킹 밀도를 최대화하도록 단어 위치를 결정합니다; 제이슨 데이비스(Jason Davies)의 나선형 알고리즘이 가장 흔히 사용됩니다
Layout algorithm: Determines word positions to maximize packing density while avoiding overlap; Jason Davies' spiral algorithm is the most common
회전: 공간 활용도를 높이기 위해 단어가 0도, 90도, 또는 다른 각도로 배치될 수 있습니다
Rotation: Words may be placed at 0, 90, or other angles to improve space utilization

변형 / Variations

형태 워드 클라우드: 단어가 특정 실루엣(하트, 지도 윤곽, 로고)을 채우도록 제약되어 미적 요소를 우선시합니다
Shaped word cloud: Words are constrained to fill a specific silhouette (heart, map outline, logo), prioritizing aesthetics
비교 워드 클라우드: 두 코퍼스를 나란히 배치하거나 색상 인코딩으로 비교합니다(예: 코퍼스 A에서 더 흔한 단어는 빨강, 코퍼스 B는 파랑)
Comparative word cloud: Two corpora are compared side by side or with color encoding (e.g., red for words more common in corpus A, blue for corpus B)
가중 태그 클라우드: 초기 웹 시대의 원형으로, 다양한 글자 크기를 가진 태그의 단순한 목록이며 흔히 알파벳순으로 정렬됩니다
Weighted tag cloud: The original web-era version: a simple list of tags with varying font sizes, often in alphabetical order
버블 클라우드: 단어가 원 안에 배치되고, 원의 크기가 빈도를 인코딩합니다; 순수 텍스트보다 크기 비교가 쉬울 수 있습니다
Bubble cloud: Words are placed inside circles, with circle size encoding frequency; can be easier to compare sizes than raw text
계층적 워드 클라우드: 단어가 주제나 카테고리별로 그룹화되며, 공간적 근접성이 의미적 유사성을 반영합니다
Hierarchical word cloud: Words are grouped by topic or category, with spatial proximity reflecting semantic similarity

코드 레퍼런스 / Code Reference

// D3 word cloud using d3-cloud
import * as d3 from "d3";
import cloud from "d3-cloud";

const fontScale = d3.scaleSqrt()
  .domain(d3.extent(words, d => d.freq))
  .range([12, 80]);

cloud()
  .size([width, height])
  .words(words.map(d => ({text: d.word, size: fontScale(d.freq)})))
  .padding(2)
  .rotate(() => (~~(Math.random() * 2)) * 90)
  .fontSize(d => d.size)
  .on("end", draw)
  .start();

function draw(words) {
  d3.select("#chart").append("svg")
    .attr("viewBox", [-width/2, -height/2, width, height])
    .selectAll("text").data(words).join("text")
    .attr("transform", d => `translate(${d.x},${d.y}) rotate(${d.rotate})`)
    .attr("font-size", d => d.size)
    .attr("text-anchor", "middle")
    .attr("fill", (_, i) => d3.schemeTableau10[i % 10])
    .text(d => d.text);
}