Skip to content

Aft1n/nuxt-local-model

Repository files navigation

Nuxt Local Model

npm version npm downloads License

Scalable local inference for Nuxt

Nuxt Local Model banner

Note: This package is under active development. Please open issues if you run into anything unclear.

A Nuxt module for easily integrating local Hugging Face transformer models into your Nuxt 4 application.

Features

  • Easily use local models in your Nuxt app
  • Supports any Hugging Face task and model you want to configure
  • Auto-imported composable, useLocalModel() by default for frontend Vue code
  • Auto-imported prewarmLocalModel() helper for eager browser model loading
  • Server-safe helper, getLocalModel() for server/api and utilities
  • Explicit serverPrewarm and browserPrewarm controls instead of implicit startup warmup
  • Fully configurable via nuxt.config.ts
  • Supports changing model names, tasks, and settings per usage
  • Optional worker-backed execution on the server or in the browser
  • Server runtime support for Node, Bun, and Deno
  • Works across macOS, Linux, Windows, and Docker
  • Supports persistent model cache directories so models are not re-downloaded on every deploy

Quick Setup

Install the module into your Nuxt application with one command:

npx nuxi module add nuxt-local-model

Manual Installation

If you prefer to install manually, run:

# Using npm
npm install nuxt-local-model

# Using yarn
yarn add nuxt-local-model

# Using pnpm
pnpm add nuxt-local-model

# Using bun
bun add nuxt-local-model

Then, add it to your Nuxt config:

export default defineNuxtConfig({
  modules: ["nuxt-local-model"],
})

Usage

Once installed, you can use useLocalModel() in your Vue app code.

For server routes and utilities, use getLocalModel().

If you want a browser model to start loading before a user interacts with the UI, use prewarmLocalModel() or enable browserPrewarm in nuxt.config.ts.

If you want models to warm on the server during Nuxt startup, enable serverPrewarm explicitly. Server routes using getLocalModel() do not automatically imply startup warmup.

Basic Example

<script setup lang="ts">
const embedder = await useLocalModel("embedding")
const output = await embedder("Nuxt local model example")
</script>

Server Example

// server/api/demo/search.get.ts
import { getLocalModel } from "nuxt-local-model/server"

export default defineEventHandler(async () => {
  const embedder = await getLocalModel("embedding")
  return await embedder("hello world")
})

Defining Models in nuxt.config.ts

export default defineNuxtConfig({
  modules: ["nuxt-local-model"],
  localModel: {
    runtime: "auto", // auto-detect Node, Bun, or Deno on the server
    cacheDir: "./.ai-models", // one cache folder for downloads and reuse
    allowRemoteModels: true, // allow fetching missing models from Hugging Face
    allowLocalModels: true, // allow reusing cached / mounted model files
    defaultTask: "feature-extraction", // default pipeline type when a model entry does not override it
    serverPrewarm: false, // false disables startup warmup, true warms all aliases on server startup, or pass ["embedding"] for specific aliases
    serverWorker: false, // run inference in a server worker thread on Node, Bun, or Deno
    browserWorker: false, // run inference in a browser Web Worker; avoid this for very large models
    browserPrewarm: false, // false disables browser prewarm, true warms all aliases after app mount, or pass ["embedding"] to warm specific aliases
    models: {
      embedding: {
        task: "feature-extraction", // the pipeline type for this alias
        model: "Xenova/all-MiniLM-L6-v2", // the Hugging Face model id
        options: {
          dtype: "q8", // model loading option passed through to Transformers.js
        },
      },
    },
  },
})

Tip: a plain localModel: { ... } object is enough for Nuxt config IntelliSense, and configured model aliases now flow into useLocalModel("...") / getLocalModel("...") suggestions automatically. If you want to reuse the config as a separate constant elsewhere, as const satisfies LocalModelRuntimeConfig is the most Nuxt-native way to preserve literal alias keys without a helper. If you are writing server routes, import getLocalModel() from nuxt-local-model/server. In Vue app code, useLocalModel() is auto-imported once the module is installed.

Overriding Settings at the Call Site

You can still provide the options for the model call where it is used:

<script setup lang="ts">
const model = await useLocalModel("embedding", {
  pooling: "mean",
  normalize: true,
})
</script>

Prewarming a Model in the Browser

<script setup lang="ts">
onMounted(() => {
  void prewarmLocalModel("embedding", {
    pooling: "mean",
    normalize: true,
  })
})
</script>

Automatic Browser Prewarm

export default defineNuxtConfig({
  modules: ["nuxt-local-model"],
  localModel: {
    browserWorker: true,
    browserPrewarm: ["embedding"],
    models: {
      embedding: {
        task: "feature-extraction",
        model: "Xenova/all-MiniLM-L6-v2",
      },
    },
  },
})

Set browserPrewarm: true to warm every configured alias on app mount, or pass a string array to warm only selected aliases.

Explicit Server Prewarm

export default defineNuxtConfig({
  modules: ["nuxt-local-model"],
  localModel: {
    serverPrewarm: ["embedding"],
    models: {
      embedding: {
        task: "feature-extraction",
        model: "Xenova/all-MiniLM-L6-v2",
      },
    },
  },
})

Set serverPrewarm: true to warm every configured alias during Nuxt startup, or pass a string array to warm only selected aliases.

This is separate from browser prewarm:

  • serverPrewarm runs during Nuxt startup on the server
  • browserPrewarm runs after app:mounted in the browser
  • calling getLocalModel() inside a server route stays on-demand and does not automatically prewarm at startup

Configuration Options

You can configure the module in your nuxt.config.ts:

export default defineNuxtConfig({
  modules: ["nuxt-local-model"],
  localModel: {
    runtime: "auto", // or "node", "bun", or "deno"
    cacheDir: "./.ai-models", // persistent cache folder for downloaded model assets
    allowRemoteModels: true, // download from Hugging Face if not yet cached
    allowLocalModels: true, // reuse local cache or mounted volume contents
    defaultTask: "feature-extraction", // default for aliases that do not override task
    serverPrewarm: false, // eager server-side prewarm: false, true, or a list of aliases
    serverWorker: true, // use a server worker thread so inference does not block the main server thread
    browserWorker: false, // enable only if you intentionally want browser-side inference
    browserPrewarm: false, // eager browser-side prewarm: false, true, or a list of aliases
    models: {
      embedding: {
        task: "feature-extraction", // embeddings usually use feature-extraction
        model: "Xenova/all-MiniLM-L6-v2", // any Hugging Face model id you choose
        options: {
          dtype: "q8", // loading/config option forwarded to Transformers.js
        },
      },
    },
  },
})

If onnxruntime-node is not available in your server runtime, the module now falls back to the default Transformers.js backend instead of crashing during startup.

Warmup Behavior

The module now separates on-demand model usage from startup warmup:

  • useLocalModel() loads a model in browser code when you call it
  • getLocalModel() loads a model on the server when you call it
  • serverPrewarm is the only thing that triggers eager server startup warmup
  • browserPrewarm is the only thing that triggers eager browser warmup

This makes static sites and mixed environments much easier to reason about. For example:

  • a static docs site can use browserPrewarm: ["embedding"] without warming models during server startup
  • an API service can use serverPrewarm: true if it wants lower-latency first requests

Cache Directory

The cache directory controls where downloaded model files are stored and reused.

Recommended defaults:

  • local development: ./.ai-models
  • Docker: mount a persistent volume to the same path

Important:

  • the cache path in nuxt.config.ts must match the path inside the Docker container
  • the folder name on your laptop does not have to match the Docker folder name
  • what matters in production is the path the app reads inside the container

Example Docker runtime setup:

docker run \
  -e NUXT_LOCAL_MODEL_CACHE_DIR=/data/local-models \
  -v local-models:/data/local-models \
  your-image:latest

This ensures the model files stay available across redeploys and container restarts.

What this does:

  • NUXT_LOCAL_MODEL_CACHE_DIR=/data/local-models tells the app which folder to use for model caching
  • -v local-models:/data/local-models mounts a persistent Docker volume at that same folder
    • the first container start downloads missing models into the mounted cache folder
    • later starts reuse the models already stored there

You can rename the host-facing volume however you want. What matters is that the path inside the container matches the cache path used by the module.

In Docker, the environment variable and volume path point the app to the mounted folder:

ENV NUXT_LOCAL_MODEL_CACHE_DIR=/models-cache
VOLUME ["/models-cache"]

That means the Nuxt app will use /models-cache inside the container, and Docker will attach a persistent volume there when you run the container with -v.

Docker Volume Cache Example

If you want Docker to download model files on first launch and reuse them on later redeploys, mount a persistent volume at the same cache path the app uses.

The build does not need to copy model files manually. The first container start writes them into the mounted volume, and subsequent starts reuse whatever is already there.

FROM node:22-alpine AS deps
WORKDIR /app

COPY package.json pnpm-lock.yaml ./
RUN corepack enable && pnpm install --frozen-lockfile

FROM deps AS build
WORKDIR /app

COPY . .

ENV NUXT_LOCAL_MODEL_CACHE_DIR=/models-cache
RUN pnpm run build

FROM node:22-alpine
WORKDIR /app

ENV NUXT_LOCAL_MODEL_CACHE_DIR=/models-cache
VOLUME ["/models-cache"]

COPY --from=build /app/.output ./.output
COPY --from=deps /app/node_modules ./node_modules

CMD ["node", ".output/server/index.mjs"]

Use this as a template in your Nuxt Docker build if you want a persistent cache path. At runtime, the mounted volume should be attached to /models-cache, and the app will download missing models into that volume the first time it runs.

In other words:

  • your local dev cache can be ./.ai-models
  • your Docker cache can be /models-cache
  • both are fine as long as the app config matches the environment it runs in

Naming Rule

  • useLocalModel() is for frontend Vue components, pages, and composables
  • getLocalModel() is for server/api routes and Nitro utilities

Both use the same underlying model-loading logic, so the runtime behavior stays consistent.

Worker Mode

You can choose where the model runs:

  • serverWorker: true runs model inference in a Node worker thread on your Nuxt server
  • browserWorker: true runs model inference in a browser Web Worker

This is useful if you want to keep heavy inference off the main request or UI thread.

Be careful with browserWorker and large models:

  • the model must be downloaded into the user’s browser
  • 100s of MB models can be slow or impractical for client delivery
  • server worker mode is usually the better default for large models

Server Worker vs Browser Worker

Mode Where it runs Best for Tradeoff
serverWorker Nuxt server / Node worker thread Large models, shared cache, server-rendered apps Uses server CPU and memory
browserWorker User’s browser Web Worker Small client-side models, privacy-sensitive local inference Model must be downloaded into the browser

Transformers.js Docs

For model/task behavior and runtime options, see the official Transformers.js docs:

Playground

This package includes a minimal playground app with an embedding example inside playground/.

The playground keeps the note list in the page and uses server routes for embeddings and search, so it demonstrates the server-backed flow end to end without a database.

Run it with:

npm run dev

Notes

  • This module is intentionally generic and does not ship opinionated preset models.
  • The example playground shows how to wire an embedding model, but you can register any task/model combination supported by @huggingface/transformers.

About

A Nuxt module for local in-app Hugging Face transformer models usage

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors