Note: This package is under active development. Please open issues if you run into anything unclear.
A Nuxt module for easily integrating local Hugging Face transformer models into your Nuxt 4 application.
- Easily use local models in your Nuxt app
- Supports any Hugging Face task and model you want to configure
- Auto-imported composable,
useLocalModel()by default for frontend Vue code - Auto-imported
prewarmLocalModel()helper for eager browser model loading - Server-safe helper,
getLocalModel()forserver/apiand utilities - Explicit
serverPrewarmandbrowserPrewarmcontrols instead of implicit startup warmup - Fully configurable via
nuxt.config.ts - Supports changing model names, tasks, and settings per usage
- Optional worker-backed execution on the server or in the browser
- Server runtime support for Node, Bun, and Deno
- Works across macOS, Linux, Windows, and Docker
- Supports persistent model cache directories so models are not re-downloaded on every deploy
Install the module into your Nuxt application with one command:
npx nuxi module add nuxt-local-modelIf you prefer to install manually, run:
# Using npm
npm install nuxt-local-model
# Using yarn
yarn add nuxt-local-model
# Using pnpm
pnpm add nuxt-local-model
# Using bun
bun add nuxt-local-modelThen, add it to your Nuxt config:
export default defineNuxtConfig({
modules: ["nuxt-local-model"],
})Once installed, you can use useLocalModel() in your Vue app code.
For server routes and utilities, use getLocalModel().
If you want a browser model to start loading before a user interacts with the UI, use
prewarmLocalModel() or enable browserPrewarm in nuxt.config.ts.
If you want models to warm on the server during Nuxt startup, enable serverPrewarm
explicitly. Server routes using getLocalModel() do not automatically imply startup warmup.
<script setup lang="ts">
const embedder = await useLocalModel("embedding")
const output = await embedder("Nuxt local model example")
</script>// server/api/demo/search.get.ts
import { getLocalModel } from "nuxt-local-model/server"
export default defineEventHandler(async () => {
const embedder = await getLocalModel("embedding")
return await embedder("hello world")
})export default defineNuxtConfig({
modules: ["nuxt-local-model"],
localModel: {
runtime: "auto", // auto-detect Node, Bun, or Deno on the server
cacheDir: "./.ai-models", // one cache folder for downloads and reuse
allowRemoteModels: true, // allow fetching missing models from Hugging Face
allowLocalModels: true, // allow reusing cached / mounted model files
defaultTask: "feature-extraction", // default pipeline type when a model entry does not override it
serverPrewarm: false, // false disables startup warmup, true warms all aliases on server startup, or pass ["embedding"] for specific aliases
serverWorker: false, // run inference in a server worker thread on Node, Bun, or Deno
browserWorker: false, // run inference in a browser Web Worker; avoid this for very large models
browserPrewarm: false, // false disables browser prewarm, true warms all aliases after app mount, or pass ["embedding"] to warm specific aliases
models: {
embedding: {
task: "feature-extraction", // the pipeline type for this alias
model: "Xenova/all-MiniLM-L6-v2", // the Hugging Face model id
options: {
dtype: "q8", // model loading option passed through to Transformers.js
},
},
},
},
})Tip: a plain localModel: { ... } object is enough for Nuxt config IntelliSense, and configured
model aliases now flow into useLocalModel("...") / getLocalModel("...") suggestions automatically.
If you want to reuse the config as a separate constant elsewhere, as const satisfies LocalModelRuntimeConfig
is the most Nuxt-native way to preserve literal alias keys without a helper.
If you are writing server routes, import getLocalModel() from nuxt-local-model/server.
In Vue app code, useLocalModel() is auto-imported once the module is installed.
You can still provide the options for the model call where it is used:
<script setup lang="ts">
const model = await useLocalModel("embedding", {
pooling: "mean",
normalize: true,
})
</script><script setup lang="ts">
onMounted(() => {
void prewarmLocalModel("embedding", {
pooling: "mean",
normalize: true,
})
})
</script>export default defineNuxtConfig({
modules: ["nuxt-local-model"],
localModel: {
browserWorker: true,
browserPrewarm: ["embedding"],
models: {
embedding: {
task: "feature-extraction",
model: "Xenova/all-MiniLM-L6-v2",
},
},
},
})Set browserPrewarm: true to warm every configured alias on app mount, or pass a string array to warm only selected aliases.
export default defineNuxtConfig({
modules: ["nuxt-local-model"],
localModel: {
serverPrewarm: ["embedding"],
models: {
embedding: {
task: "feature-extraction",
model: "Xenova/all-MiniLM-L6-v2",
},
},
},
})Set serverPrewarm: true to warm every configured alias during Nuxt startup, or pass a string array to warm only selected aliases.
This is separate from browser prewarm:
serverPrewarmruns during Nuxt startup on the serverbrowserPrewarmruns afterapp:mountedin the browser- calling
getLocalModel()inside a server route stays on-demand and does not automatically prewarm at startup
You can configure the module in your nuxt.config.ts:
export default defineNuxtConfig({
modules: ["nuxt-local-model"],
localModel: {
runtime: "auto", // or "node", "bun", or "deno"
cacheDir: "./.ai-models", // persistent cache folder for downloaded model assets
allowRemoteModels: true, // download from Hugging Face if not yet cached
allowLocalModels: true, // reuse local cache or mounted volume contents
defaultTask: "feature-extraction", // default for aliases that do not override task
serverPrewarm: false, // eager server-side prewarm: false, true, or a list of aliases
serverWorker: true, // use a server worker thread so inference does not block the main server thread
browserWorker: false, // enable only if you intentionally want browser-side inference
browserPrewarm: false, // eager browser-side prewarm: false, true, or a list of aliases
models: {
embedding: {
task: "feature-extraction", // embeddings usually use feature-extraction
model: "Xenova/all-MiniLM-L6-v2", // any Hugging Face model id you choose
options: {
dtype: "q8", // loading/config option forwarded to Transformers.js
},
},
},
},
})If onnxruntime-node is not available in your server runtime, the module now falls back to the default Transformers.js backend instead of crashing during startup.
The module now separates on-demand model usage from startup warmup:
useLocalModel()loads a model in browser code when you call itgetLocalModel()loads a model on the server when you call itserverPrewarmis the only thing that triggers eager server startup warmupbrowserPrewarmis the only thing that triggers eager browser warmup
This makes static sites and mixed environments much easier to reason about. For example:
- a static docs site can use
browserPrewarm: ["embedding"]without warming models during server startup - an API service can use
serverPrewarm: trueif it wants lower-latency first requests
The cache directory controls where downloaded model files are stored and reused.
Recommended defaults:
- local development:
./.ai-models - Docker: mount a persistent volume to the same path
Important:
- the cache path in
nuxt.config.tsmust match the path inside the Docker container - the folder name on your laptop does not have to match the Docker folder name
- what matters in production is the path the app reads inside the container
Example Docker runtime setup:
docker run \
-e NUXT_LOCAL_MODEL_CACHE_DIR=/data/local-models \
-v local-models:/data/local-models \
your-image:latestThis ensures the model files stay available across redeploys and container restarts.
What this does:
NUXT_LOCAL_MODEL_CACHE_DIR=/data/local-modelstells the app which folder to use for model caching-v local-models:/data/local-modelsmounts a persistent Docker volume at that same folder- the first container start downloads missing models into the mounted cache folder
- later starts reuse the models already stored there
You can rename the host-facing volume however you want. What matters is that the path inside the container matches the cache path used by the module.
In Docker, the environment variable and volume path point the app to the mounted folder:
ENV NUXT_LOCAL_MODEL_CACHE_DIR=/models-cache
VOLUME ["/models-cache"]That means the Nuxt app will use /models-cache inside the container, and Docker will
attach a persistent volume there when you run the container with -v.
If you want Docker to download model files on first launch and reuse them on later redeploys, mount a persistent volume at the same cache path the app uses.
The build does not need to copy model files manually. The first container start writes them into the mounted volume, and subsequent starts reuse whatever is already there.
FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json pnpm-lock.yaml ./
RUN corepack enable && pnpm install --frozen-lockfile
FROM deps AS build
WORKDIR /app
COPY . .
ENV NUXT_LOCAL_MODEL_CACHE_DIR=/models-cache
RUN pnpm run build
FROM node:22-alpine
WORKDIR /app
ENV NUXT_LOCAL_MODEL_CACHE_DIR=/models-cache
VOLUME ["/models-cache"]
COPY --from=build /app/.output ./.output
COPY --from=deps /app/node_modules ./node_modules
CMD ["node", ".output/server/index.mjs"]Use this as a template in your Nuxt Docker build if you want a persistent cache path.
At runtime, the mounted volume should be attached to /models-cache, and the app will
download missing models into that volume the first time it runs.
In other words:
- your local dev cache can be
./.ai-models - your Docker cache can be
/models-cache - both are fine as long as the app config matches the environment it runs in
useLocalModel()is for frontend Vue components, pages, and composablesgetLocalModel()is forserver/apiroutes and Nitro utilities
Both use the same underlying model-loading logic, so the runtime behavior stays consistent.
You can choose where the model runs:
serverWorker: trueruns model inference in a Node worker thread on your Nuxt serverbrowserWorker: trueruns model inference in a browser Web Worker
This is useful if you want to keep heavy inference off the main request or UI thread.
Be careful with browserWorker and large models:
- the model must be downloaded into the user’s browser
- 100s of MB models can be slow or impractical for client delivery
- server worker mode is usually the better default for large models
| Mode | Where it runs | Best for | Tradeoff |
|---|---|---|---|
serverWorker |
Nuxt server / Node worker thread | Large models, shared cache, server-rendered apps | Uses server CPU and memory |
browserWorker |
User’s browser Web Worker | Small client-side models, privacy-sensitive local inference | Model must be downloaded into the browser |
For model/task behavior and runtime options, see the official Transformers.js docs:
This package includes a minimal playground app with an embedding example inside playground/.
The playground keeps the note list in the page and uses server routes for embeddings and search, so it demonstrates the server-backed flow end to end without a database.
Run it with:
npm run dev- This module is intentionally generic and does not ship opinionated preset models.
- The example playground shows how to wire an embedding model, but you can register any task/model combination supported by
@huggingface/transformers.