PostgreSQL 18 with zhparser (Traditional Chinese)

A custom PostgreSQL 18 Docker image with:

zhparser - Chinese full-text search with SCWS segmentation
Traditional Chinese dictionary - Pre-configured for Traditional Chinese (Cantonese/HK/TW)
pgvector - Vector similarity search for AI/ML applications
pg_trgm - Trigram extension for fuzzy text matching

Multi-arch support for amd64 and arm64 (Apple Silicon).

Quick Start

# Build the image
docker build -t postgres-18-zhparser-cht:latest .

# Run a container
docker run -d \
  --name postgres-zhparser \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=mydb \
  -p 5432:5432 \
  postgres-18-zhparser-cht:latest

# Connect and verify
docker exec -it postgres-zhparser psql -U postgres -d mydb -c "SELECT * FROM pg_extension WHERE extname IN ('zhparser', 'vector', 'pg_trgm');"

Docker Compose

services:
  postgres:
    image: postgres-18-zhparser-cht:latest
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: mydb
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

Features

Chinese Full-Text Search

The image automatically initializes the chinese_zh text search configuration for the first database created:

-- Search Chinese text
SELECT * FROM articles
WHERE to_tsvector('chinese_zh', content) @@ to_tsquery('chinese_zh', '人工智能');

-- Create a GIN index for Chinese search
CREATE INDEX idx_articles_search ON articles
USING GIN (to_tsvector('chinese_zh', title || ' ' || content));

Custom Dictionary

Add custom words for better segmentation:

-- Add custom words
INSERT INTO zhparser.zhprs_custom_word (word) VALUES
    ('中美關係'),
    ('人工智能'),
    ('金融科技');

-- Sync the dictionary (writes to zh_custom.txt)
SELECT sync_zhprs_custom_word();

The sync_zhprs_custom_word() function writes to /usr/local/share/postgresql/tsearch_data/zh_custom.txt. The directory permissions are pre-configured so the postgres user can write to it.

Testing

Run the test suite to validate the build:

# Run all tests
./test.sh

# Test specific image
./test.sh my-image:tag

The test suite validates:

Docker image build
Extension installation (zhparser, pgvector, pg_trgm)
Chinese text search configuration
Word segmentation (Traditional and Simplified)
Full-text search with GIN index
Custom dictionary functionality (including sync permissions)
Vector similarity search
Trigram fuzzy search

Vector Search (pgvector)

-- Create vector extension
CREATE EXTENSION vector;

-- Create table with embedding column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(1024)
);

-- Create index for similarity search
CREATE INDEX idx_documents_embedding ON documents
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Trigram Search (pg_trgm)

-- Create pg_trgm extension
CREATE EXTENSION pg_trgm;

-- Fuzzy search
SELECT * FROM users WHERE name % 'Jonh';

-- Create GIN index for fuzzy matching
CREATE INDEX idx_users_name ON users USING GIN (name gin_trgm_ops);

Environment Variables

Variable	Default	Description
`POSTGRES_USER`	-	Superuser name
`POSTGRES_PASSWORD`	-	Superuser password
`POSTGRES_DB`	-	Database to create
`PGDATA`	`/var/lib/postgresql/data`	Data directory

Building

# Standard build
docker build -t postgres-18-zhparser-cht:latest .

# Build with no cache (clean build)
docker build --no-cache -t postgres-18-zhparser-cht:latest .

# Build for specific platform
docker buildx build --platform linux/arm64 -t postgres-18-zhparser-cht:latest .
docker buildx build --platform linux/amd64 -t postgres-18-zhparser-cht:latest .

# Multi-arch build on MacOS
docker buildx build --platform linux/arm64,linux/amd64 -t postgres-18-zhparser-cht:latest .

Version Information

Component	Version
PostgreSQL	18.x
zhparser	v2.3
SCWS	1.2.3
pgvector	v0.8.2
pg_trgm	REL_18_2
Dictionary	Traditional Chinese (cht.utf8)

All components use pinned versions for reproducible builds. Dictionary files are bundled in the repository for reliable builds without external network dependencies.

License

PostgreSQL: PostgreSQL License
zhparser: BSD License
SCWS: Apache 2.0
pgvector: PostgreSQL License

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
configure-zhparser-custom-dict.sh		configure-zhparser-custom-dict.sh
docker-entrypoint-initdb.d-zhparser.sh		docker-entrypoint-initdb.d-zhparser.sh
rules.tgz		rules.tgz
scws-dict-cht-utf8.tar.bz2		scws-dict-cht-utf8.tar.bz2
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PostgreSQL 18 with zhparser (Traditional Chinese)

Quick Start

Docker Compose

Features

Chinese Full-Text Search

Custom Dictionary

Testing

Vector Search (pgvector)

Trigram Search (pg_trgm)

Environment Variables

Building

Version Information

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PostgreSQL 18 with zhparser (Traditional Chinese)

Quick Start

Docker Compose

Features

Chinese Full-Text Search

Custom Dictionary

Testing

Vector Search (pgvector)

Trigram Search (pg_trgm)

Environment Variables

Building

Version Information

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages