'2024/10/12 글 목록

'2024/10/12'에 해당되는 글 1건

2024.10.12 [pg] GIN index 특징 및 활용 예제

생계/PostgreSQL2024. 10. 12. 15:11

[pg] GIN index 특징 및 활용 예제

GIN(Generalized Inverted Index) 인덱스란 ?
PostgreSQL에서 복합적인 데이터를 효율적으로 검색하기 위해 사용되는 특수한 인덱스 유형
주요 특징은 아래와 같다.

1. 복합 데이터 처리: 배열, JSON, hstore, tsvector 등 복합적인 데이터 타입에 적합
2. 키워드 기반 검색: 문서나 텍스트에서 특정 단어를 빠르게 찾을 수 있다.
3. 역인덱스 구조: 키워드나 토큰을 기반으로 데이터를 인덱싱
4. 전문 검색(Full-text search) 지원: 텍스트 검색 속도를 크게 향상
5. 다양한 연산자 지원: LIKE, ILIKE, ~, ~* 등 다양한 검색 연산자를 지원

GIN 인덱스는 특히 대량의 텍스트 데이터를 처리할 때 B-tree 인덱스보다 훨씬 효과적이며,
검색 성능을 크게 향상시킬 수 있다.
다만, 인덱스 생성과 업데이트에는 상대적으로 많은 시간이 소요될 수 있습니다

## 복합 데이터 처리

테스트 테이블을 만들고 GIN 인덱스 사용 전후의 플랜을 비교해보자

-- Create a table with a JSONB column
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    details JSONB
);

-- Insert sample data
INSERT INTO products (name, details) VALUES
    ('Laptop', '{"brand": "TechCo", "specs": {"ram": "16GB", "storage": "512GB SSD"}, "tags": ["electronics", "computer"]}'),
    ('Smartphone', '{"brand": "Gadgetron", "specs": {"ram": "8GB", "storage": "256GB"}, "tags": ["electronics", "mobile"]}'),
    ('Headphones', '{"brand": "AudioPro", "specs": {"type": "over-ear", "wireless": true}, "tags": ["audio", "accessories"]}');

-- Query using the GIN index to find products with specific attributes
EXPLAIN ANALYZE
SELECT name, details
FROM products
WHERE details @> '{"brand": "TechCo"}';

-- Query to find products with specific tags
EXPLAIN ANALYZE
SELECT name, details
FROM products
WHERE details->'tags' @> '"electronics"'::jsonb;

-- Create a GIN index on the JSONB column
CREATE INDEX idx_product_details ON products USING GIN (details);

## 전문검색 처리

-- 문서 테이블 생성
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    content_tsv tsvector
);

-- 함수 생성: 랜덤 문장 생성
CREATE FUNCTION random_sentence() RETURNS TEXT AS $$
DECLARE
    words TEXT[] := ARRAY['PostgreSQL', 'database', 'index', 'query', 'performance', 'GIN', 'B-tree', 'full-text', 'search', 'optimization', 'SQL', 'data', 'management', 'system', 'relational', 'ACID', 'transaction', 'concurrency', 'scalability', 'replication'];
    sentence TEXT := '';
    word_count INT;
BEGIN
    word_count := 5 + random() * 10; -- 5에서 15 단어 사이의 문장 생성
    FOR i IN 1..word_count LOOP
        sentence := sentence || ' ' || words[1 + random() * (array_length(words, 1) - 1)];
    END LOOP;
    RETURN trim(sentence);
END;
$$
LANGUAGE plpgsql;

-- 10,000개의 샘플 문서 생성
INSERT INTO documents (content, content_tsv)
SELECT
    random_sentence(),
    to_tsvector('english', random_sentence())
FROM generate_series(1, 10000);

-- GIN 인덱스 생성
CREATE INDEX idx_gin_content ON documents USING gin(content_tsv);

-- 통계 업데이트
ANALYZE documents;

-- 검색 쿼리 실행 및 실행 계획 확인
EXPLAIN ANALYZE
SELECT id, content
FROM documents
WHERE content_tsv @@ to_tsquery('english', 'postgresql & index');

저작자표시 비영리 변경금지

'생계 > PostgreSQL' 카테고리의 다른 글

[pg] postgresql psql 설치 접속 간단한 사용법 알아보자 (0)	2024.10.23
[pg] 자주 사용되는 시스템 뷰 알아보자 postgresql (0)	2024.08.26

Posted by 돌고래트레이너

«이전 1 다음»

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

따봉 구걸남's 블로그

'2024/10/12'에 해당되는 글 1건

[pg] GIN index 특징 및 활용 예제

'생계 > PostgreSQL' 카테고리의 다른 글

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

글 보관함

달력

링크

티스토리툴바