A Node.js package for clustering keywords using OpenAI embeddings. This package can be used to cluster Google Ads keywords, AI chat topics, project tasks, or any other text-based items into semantically similar groups.
- Generate embeddings for keywords using OpenAI API
- Calculate distances between keywords
- Create clusters of semantically similar keywords
- Automatically generate cluster names and descriptions using OpenAI API
- Multiple clustering algorithms:
- Simple: Basic distance-based clustering
- K-means: Centroid-based clustering with configurable number of clusters
- Hierarchical: Agglomerative clustering with different linkage methods
- Direct: LLM-based clustering without embeddings (sends keywords directly to the model)
npm install clusterkwimport { KeywordClusterer } from 'clusterkw';
// Initialize with your OpenAI API key
const clusterer = new KeywordClusterer({
apiKey: 'your-openai-api-key',
// Optional configuration
embeddingModel: 'text-embedding-3-small',
completionModel: 'gpt-4o-mini-2024-07-18',
minClusterSize: 3,
distanceThreshold: 0.2,
algorithm: 'direct',
context: 'AI chat topics'
});
// Example keywords
const keywords = [
'machine learning algorithms',
'deep learning frameworks',
'neural networks',
'data visualization tools',
'business intelligence software',
'data analytics platforms',
'artificial intelligence applications',
'natural language processing',
'computer vision systems',
'reinforcement learning'
];
// Cluster the keywords
const clusters = await clusterer.clusterKeywords(keywords);
// Output the clusters with their names and descriptions
console.log(clusters);The main class for clustering keywords.
apiKey(string, required): Your OpenAI API keyembeddingModel(string, optional): The OpenAI model to use for embeddings. Default: 'text-embedding-3-small'completionModel(string, optional): The OpenAI model to use for generating cluster names and descriptions. Default: 'gpt-3.5-turbo'minClusterSize(number, optional): Minimum number of items to form a cluster. Default: 2distanceThreshold(number, optional): Maximum cosine distance for items to be considered similar. Default: 0.3algorithm(string, optional): Clustering algorithm to use ('simple', 'kmeans', or 'hierarchical'). Default: 'simple'k(number, optional): Number of clusters for k-means algorithm. Default: auto-calculated based on dataset sizemaxIterations(number, optional): Maximum iterations for k-means algorithm. Default: 100linkage(string, optional): Linkage method for hierarchical clustering ('single', 'complete', or 'average'). Default: 'average'
clusterKeywords(keywords: string[]): Promise<Cluster[]>: Clusters the provided keywords and returns the clusters with names and descriptions.getEmbeddings(texts: string[]): Promise<number[][]>: Gets embeddings for the provided texts.calculateDistances(embeddings: number[][]): number[][]: Calculates the cosine distances between all embeddings.generateClusters(keywords: string[], distances: number[][], embeddings?: number[][]): Cluster[]: Generates clusters based on the selected algorithm.nameAndDescribeClusters(clusters: Cluster[]): Promise<Cluster[]>: Generates names and descriptions for the clusters.
The package includes a CLI tool for clustering keywords directly from the command line:
# Install globally
npm install -g clusterkw
# Basic usage with a file
clusterkw --file keywords.txt
# Using a CSV file with a specific column
clusterkw --file keywords.csv --column keyword
# Specify output file
clusterkw --file keywords.txt --output clusters.json
# Customize clustering parameters
clusterkw --file keywords.txt --min-cluster-size 3 --distance 0.25
# Use different clustering algorithms
clusterkw --file keywords.txt --algorithm kmeans --clusters 5
clusterkw --file keywords.txt --algorithm hierarchical --linkage complete
clusterkw --file keywords.txt --algorithm direct --gpt-model gpt-4o-mini-2024-07-18
# Use context to guide clustering
clusterkw --file keywords.txt --context "AI chat topics"
clusterkw --file keywords.txt --algorithm direct --context "Google Ads keywords"
# Provide API key directly (alternatively, set OPENAI_API_KEY or OPENAI_KEY environment variable)
clusterkw --file keywords.txt --api-key your-openai-api-key| Option | Alias | Description | Default |
|---|---|---|---|
--file |
-f |
Path to file containing keywords (supports txt, csv, json) | (required) |
--column |
-c |
Column name containing keywords (for CSV files) | keyword |
--output |
-o |
Output file path (supports json, csv) | (none) |
--api-key |
OpenAI API key (overrides environment variables) | (from env) | |
--min-cluster-size |
-m |
Minimum cluster size | 2 |
--distance |
-d |
Maximum distance threshold | 0.3 |
--embedding-model |
-e |
OpenAI embedding model | text-embedding-3-small |
--gpt-model |
-g |
OpenAI completion model | gpt-4o-mini-2024-07-18 |
--algorithm |
-a |
Clustering algorithm (simple, kmeans, hierarchical, direct) | simple |
--clusters |
-k |
Number of clusters for k-means algorithm | 5 |
--max-iterations |
Maximum iterations for k-means algorithm | 100 |
|
--linkage |
Linkage method for hierarchical clustering | average |
|
--context |
Context description to guide clustering | (none) | |
--delimiter |
CSV delimiter | , |
|
--no-header |
CSV file has no header row | (false) |
This package follows Semantic Versioning (SemVer):
- MAJOR: Incompatible API changes
- MINOR: Add functionality (backward-compatible)
- PATCH: Bug fixes (backward-compatible)
You can update the version using npm scripts:
# Increment patch version (1.0.0 -> 1.0.1)
npm run version:patch
# Increment minor version (1.0.0 -> 1.1.0)
npm run version:minor
# Increment major version (1.0.0 -> 2.0.0)
npm run version:majorThis package uses GitHub Actions to automatically publish to npm when:
- A new version is pushed to the main branch (package.json changes)
- A new GitHub Release is created
- The publish workflow is manually triggered
MIT