diff --git a/docs.json b/docs.json
index 5deb71c..9507528 100644
--- a/docs.json
+++ b/docs.json
@@ -104,6 +104,13 @@
               },
               "documentation/serverless/workergroup-parameters",
               "documentation/serverless/creating-new-pyworkers",
+              {
+                "group": "Monitoring and Debug",
+                "pages": [
+                  "documentation/serverless/worker-states",
+                  "documentation/serverless/logging"
+                ]
+              },
               "documentation/serverless/pricing",
               {
                 "group": "Pre-built Templates",
diff --git a/documentation/serverless/SDKoverview.mdx b/documentation/serverless/SDKoverview.mdx
index 3a2ab5e..7002db5 100644
--- a/documentation/serverless/SDKoverview.mdx
+++ b/documentation/serverless/SDKoverview.mdx
@@ -34,6 +34,6 @@ The SDK manages the following core functions for the client:
 
 ## Why Use the SDK
 
-While there are other ways to interact with serverless endpoint—such as the CLI and the REST API—the SDK is the **most powerful and easiest** method to use. It is the recommended approach for most applications due to its higher-level abstractions, reliability, and ease of integration into Python-based workflows.
+While there are other ways to interact with serverless endpoint — such as the CLI and the REST API — the SDK is the **most powerful and easiest** method to use as it incorporates all best practices for using the API. It is the **recommended approach** for most applications due to its higher-level abstractions, reliability, and ease of integration into Python-based workflows.
 
-If the Python SDK is not usable for your application, please contact support to request further assistance. We're happy to help.
\ No newline at end of file
+If the Python SDK or CLI are not usable for your application, please contact support to request further assistance. We're happy to help.
\ No newline at end of file
diff --git a/documentation/serverless/architecture.mdx b/documentation/serverless/architecture.mdx
index ff6b5e6..0369cf7 100644
--- a/documentation/serverless/architecture.mdx
+++ b/documentation/serverless/architecture.mdx
@@ -34,7 +34,7 @@ An **Endpoint** is the highest-level construct in Vast Serverless. Endpoints are
 An endpoint consists of:
 
 - A named endpoint identifier
-- One or more Workergroups
+- Typically one workergroup
 - Endpoint parameters such as `max_workers`, `min_load`, `min_workers`, `cold_mult`, `min_cold_load`, and `target_util`
 
 Users typically create one endpoint per **use case** (for example, text generation or image generation) and per **environment** (production, staging, development). Each endpoint acts as a router and load balances requests across its pool of managed workers based on worker queue time.
@@ -51,7 +51,7 @@ Each Workergroup includes:
 - Hardware requirements such as `gpu_ram`
 - A set of GPU instances (workers) created from the template
 
-Multiple Workergroups can exist within a single Endpoint, each with different configurations. This enables advanced use cases such as hardware comparison, gradual model rollout, or mixed-model serving. For many applications, a single Workergroup is sufficient.
+Multiple Workergroups can exist within a single Endpoint, each with different configurations. For most users, a single Workergroup is sufficient and recommended. Advanced use cases such as mixed-model serving and hardware comparisons can be enabled with multiple Workergorups. For such use cases, please contact Vast for assistance and best practices.
 
 ### Workers
 
diff --git a/documentation/serverless/comfyui-quickstart.mdx b/documentation/serverless/comfyui-quickstart.mdx
new file mode 100644
index 0000000..e29d0d8
--- /dev/null
+++ b/documentation/serverless/comfyui-quickstart.mdx
@@ -0,0 +1,337 @@
+# ComfyUI Serverless Quickstart
+
+Get a ComfyUI image generation endpoint running on Vast.ai Serverless and call it from Python. This guide assumes you understand the basics of Vast Serverless (PyWorker, WorkerConfig, HandlerConfig). If not, start with the [Hello World guide](serverless-hello-world.md) first.
+
+## Setup
+
+Install the SDK and set your API key:
+
+```bash
+pip install vastai-sdk
+export VAST_API_KEY="<YOUR_API_KEY>"
+```
+
+- **`<YOUR_API_KEY>`** -- Your Vast.ai API key from [Account settings](https://cloud.vast.ai/account/).
+
+## Create the Endpoint
+
+1. Go to the [Serverless dashboard](https://cloud.vast.ai/serverless/).
+2. Click **Create Endpoint**.
+3. Select the **ComfyUI** serverless template.
+4. Name it `my-comfy-endpoint`.
+5. Click **Create**.
+
+The template includes a pre-configured PyWorker. For reference, here is what it looks like:
+
+```python
+import random
+import sys
+from vastai import Worker, WorkerConfig, HandlerConfig, LogActionConfig, BenchmarkConfig
+
+benchmark_prompts = [
+    "Cartoon hoodie hero; orc, anime cat, bunny; black goo; buff; vector on white.",
+    "Cozy farming-game scene with fine details.",
+    "Realistic futuristic downtown of low buildings at sunset.",
+    "Perfect wave front view; sunny seascape; ultra-detailed water; artful feel.",
+    "Medieval village inside glass sphere; volumetric light; macro focus.",
+]
+
+benchmark_dataset = [
+    {
+        "input": {
+            "request_id": f"test-{random.randint(1000, 99999)}",
+            "modifier": "Text2Image",
+            "modifications": {
+                "prompt": prompt,
+                "width": 512,
+                "height": 512,
+                "steps": 20,
+                "seed": random.randint(0, sys.maxsize)
+            }
+        }
+    } for prompt in benchmark_prompts
+]
+
+worker_config = WorkerConfig(
+    model_server_url='http://127.0.0.1',
+    model_server_port=18288,
+    model_log_file='/var/log/portal/comfyui.log',
+    model_healthcheck_url="/health",
+    handlers=[
+        HandlerConfig(
+            route="/generate/sync",
+            allow_parallel_requests=False,
+            max_queue_time=10.0,
+            benchmark_config=BenchmarkConfig(dataset=benchmark_dataset)
+        )
+    ],
+    log_action_config=LogActionConfig(
+        on_load=["To see the GUI go to: "],
+        on_error=["MetadataIncompleteBuffer", "Value not in list: ", "[ERROR] Provisioning Script failed"],
+        on_info=['"message":"Downloading']
+    )
+)
+
+Worker(worker_config).run()
+```
+
+You do not need to modify this. It is bundled with the template and starts automatically when a Workergroup is created.
+
+---
+
+## Making Requests
+
+### Single Request
+
+Send one request and wait for the result.
+
+```python
+import asyncio
+from vastai import Serverless
+import random
+
+ENDPOINT_NAME = "<ENDPOINT_NAME>"   # e.g. "my-comfy-endpoint"
+
+async def main():
+    async with Serverless() as client:
+        endpoint = await client.get_endpoint(name=ENDPOINT_NAME)
+
+        payload = {
+            "input": {
+                "modifier": "<WORKFLOW_MODIFIER>",  # e.g. "Text2Image"
+                "modifications": {
+                    "prompt": "<YOUR_PROMPT>",      # e.g. "A cat in a spacesuit on Mars."
+                    "width": <IMAGE_WIDTH>,          # e.g. 512
+                    "height": <IMAGE_HEIGHT>,        # e.g. 512
+                    "steps": <NUM_STEPS>,            # e.g. 10
+                    "seed": <SEED>                   # e.g. random.randint(1, 1000)
+                }
+            }
+        }
+        try:
+            result = await endpoint.request("/generate/sync", payload)
+            if result["ok"]:
+                print(result["response"]["output"][0]["local_path"])
+            else:
+                print(f"Request failed. Status={result.get('status')}, Msg={result.get('text')}")
+        except Exception as ex:
+            print(f"Request failed with exception: {ex}")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+- **`<ENDPOINT_NAME>`** -- The name you gave your endpoint in the dashboard (e.g., `"my-comfy-endpoint"`). Must match exactly.
+- **`<WORKFLOW_MODIFIER>`** -- The ComfyUI workflow to run. Use `"Text2Image"` for text-to-image generation.
+- **`<YOUR_PROMPT>`** -- The text description of the image you want to generate.
+- **`<IMAGE_WIDTH>`, `<IMAGE_HEIGHT>`** -- Output image dimensions in pixels. Common values: `512`, `768`, `1024`.
+- **`<NUM_STEPS>`** -- Number of diffusion steps. Higher values (e.g., `20`--`50`) produce more detailed images but take longer. Lower values (e.g., `10`) are faster for previews.
+- **`<SEED>`** -- Random seed for reproducibility. Use a fixed integer to get the same image every time, or `random.randint(1, 1000)` for variety.
+
+### Continuous Load
+
+Fire requests continuously with callbacks. Useful for production services or load testing.
+
+```python
+import asyncio
+from vastai import Serverless, ServerlessRequest
+import random
+
+ENDPOINT_NAME    = "<ENDPOINT_NAME>"    # e.g. "my-comfy-endpoint"
+COST_PER_REQUEST = <COST_PER_REQUEST>   # e.g. 100
+TARGET_LOAD      = <TARGET_LOAD>        # e.g. 300
+
+async def main():
+    async with Serverless() as client:
+        endpoint = await client.get_endpoint(name=ENDPOINT_NAME)
+
+        payload = {
+            "input": {
+                "modifier": "<WORKFLOW_MODIFIER>",  # e.g. "Text2Image"
+                "modifications": {
+                    "prompt": "<YOUR_PROMPT>",      # e.g. "A cat in a spacesuit on Mars."
+                    "width": <IMAGE_WIDTH>,          # e.g. 512
+                    "height": <IMAGE_HEIGHT>,        # e.g. 512
+                    "steps": <NUM_STEPS>,            # e.g. 10
+                    "seed": <SEED>                   # e.g. random.randint(1, 1000)
+                }
+            }
+        }
+
+        responses = []
+
+        while True:
+            req = ServerlessRequest()
+
+            def work_finished_callback(response):
+                if response.get("ok"):
+                    print(f"{len([x for x in responses if x.status != 'Complete'])} in flight")
+                else:
+                    print(f"Request failed in callback. Status={response.get('status')}")
+
+            req.then(work_finished_callback)
+            responses.append(
+                endpoint.request(
+                    route="/generate/sync",
+                    payload=payload,
+                    serverless_request=req,
+                    cost=COST_PER_REQUEST
+                )
+            )
+            await asyncio.sleep(COST_PER_REQUEST / TARGET_LOAD)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+- **`<ENDPOINT_NAME>`** -- The name you gave your endpoint in the dashboard. Must match exactly.
+- **`<COST_PER_REQUEST>`** -- Numeric weight for load balancing and autoscaling. For constant-cost workloads like image generation, a fixed value (e.g., `100`) works well.
+- **`<TARGET_LOAD>`** -- Controls submission rate. The loop sleeps for `COST_PER_REQUEST / TARGET_LOAD` seconds between requests. With `COST_PER_REQUEST=100` and `TARGET_LOAD=300`, that is one request every ~0.33 seconds.
+- **`<WORKFLOW_MODIFIER>`** -- The ComfyUI workflow to run (e.g., `"Text2Image"`).
+- **`<YOUR_PROMPT>`** -- The text description of the image to generate. In production, replace with the prompt from each user request.
+- **`<IMAGE_WIDTH>`, `<IMAGE_HEIGHT>`** -- Output image dimensions in pixels.
+- **`<NUM_STEPS>`** -- Number of diffusion steps.
+- **`<SEED>`** -- Random seed. Use `random.randint(1, 1000)` for unique images, or a fixed integer for reproducibility.
+
+### Session-Based Requests
+
+Sessions pin all requests to a single worker. Use this when:
+
+- **Your workflow is multi-step.** Generate a base image, then refine it, then upscale. Each step needs intermediate files from the previous step, which live on one machine.
+- **The worker holds state between requests.** ComfyUI caches loaded LoRA weights, precomputed latents, and partial outputs. A different worker would not have that cache.
+- **You need predictable latency.** A session reserves a worker so requests do not compete with other users for routing.
+
+Single request example:
+
+```python
+import asyncio
+from vastai import Serverless
+import random
+
+ENDPOINT_NAME    = "<ENDPOINT_NAME>"    # e.g. "my-comfy-endpoint"
+SESSION_COST     = <SESSION_COST>       # e.g. 100
+SESSION_LIFETIME = <SESSION_LIFETIME>   # e.g. 30 (seconds)
+
+async def main():
+    async with Serverless() as client:
+        endpoint = await client.get_endpoint(name=ENDPOINT_NAME)
+        session = await endpoint.session(cost=SESSION_COST, lifetime=SESSION_LIFETIME)
+
+        payload = {
+            "input": {
+                "modifier": "<WORKFLOW_MODIFIER>",  # e.g. "Text2Image"
+                "modifications": {
+                    "prompt": "<YOUR_PROMPT>",      # e.g. "A cat in a spacesuit on Mars."
+                    "width": <IMAGE_WIDTH>,          # e.g. 512
+                    "height": <IMAGE_HEIGHT>,        # e.g. 512
+                    "steps": <NUM_STEPS>,            # e.g. 10
+                    "seed": <SEED>                   # e.g. random.randint(1, 1000)
+                }
+            }
+        }
+
+        try:
+            response = await session.request("/generate", payload)
+            if not response.get("ok"):
+                print(f"Request failed: {response.get('text')}")
+            else:
+                print("Request succeeded")
+        except Exception as ex:
+            print(f"Request failed: {ex}")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+- **`<ENDPOINT_NAME>`** -- The name you gave your endpoint in the dashboard. Must match exactly.
+- **`<SESSION_COST>`** -- Compute budget allocated to this session.
+- **`<SESSION_LIFETIME>`** -- How long (in seconds) the session reserves a worker before automatically closing. Set this long enough to cover all planned requests.
+- **`<WORKFLOW_MODIFIER>`** -- The ComfyUI workflow to run (e.g., `"Text2Image"`).
+- **`<YOUR_PROMPT>`** -- The text description of the image to generate.
+- **`<IMAGE_WIDTH>`, `<IMAGE_HEIGHT>`** -- Output image dimensions in pixels.
+- **`<NUM_STEPS>`** -- Number of diffusion steps.
+- **`<SEED>`** -- Random seed.
+
+Multi-step example (generate then refine on the same worker):
+
+```python
+import asyncio
+from vastai import Serverless
+import random
+
+ENDPOINT_NAME    = "<ENDPOINT_NAME>"    # e.g. "my-comfy-endpoint"
+SESSION_COST     = <SESSION_COST>       # e.g. 100
+SESSION_LIFETIME = <SESSION_LIFETIME>   # e.g. 120 (seconds)
+
+async def main():
+    async with Serverless() as client:
+        endpoint = await client.get_endpoint(name=ENDPOINT_NAME)
+        session = await endpoint.session(cost=SESSION_COST, lifetime=SESSION_LIFETIME)
+
+        # Step 1: Generate a base image
+        base_payload = {
+            "input": {
+                "modifier": "<WORKFLOW_MODIFIER>",      # e.g. "Text2Image"
+                "modifications": {
+                    "prompt": "<BASE_PROMPT>",          # e.g. "A robot exploring an ancient temple, cinematic lighting."
+                    "width": <IMAGE_WIDTH>,              # e.g. 512
+                    "height": <IMAGE_HEIGHT>,            # e.g. 512
+                    "steps": <NUM_STEPS>,                # e.g. 20
+                    "seed": <SEED>                       # e.g. 42
+                }
+            }
+        }
+
+        response = await session.request("/generate", base_payload)
+        if not response.get("ok"):
+            print(f"Base generation failed: {response.get('text')}")
+            return
+        print("Base image generated")
+
+        # Step 2: Refine with a variation on the same worker
+        refined_payload = {
+            "input": {
+                "modifier": "<WORKFLOW_MODIFIER>",
+                "modifications": {
+                    "prompt": "<REFINED_PROMPT>",       # e.g. "A robot exploring an ancient temple, cinematic lighting, mossy stone walls, volumetric fog."
+                    "width": <IMAGE_WIDTH>,
+                    "height": <IMAGE_HEIGHT>,
+                    "steps": <NUM_STEPS>,
+                    "seed": <SEED>
+                }
+            }
+        }
+
+        response = await session.request("/generate", refined_payload)
+        if not response.get("ok"):
+            print(f"Refinement failed: {response.get('text')}")
+            return
+        print("Refined image generated")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+- **`<ENDPOINT_NAME>`** -- The name you gave your endpoint in the dashboard. Must match exactly.
+- **`<SESSION_COST>`** -- Compute budget for this session. Should cover all planned requests.
+- **`<SESSION_LIFETIME>`** -- For multi-step workflows, set this to cover the total expected time for all steps plus buffer (e.g., `120` for two generation steps).
+- **`<WORKFLOW_MODIFIER>`** -- The ComfyUI workflow to run. Use the same modifier for both steps if they use the same pipeline.
+- **`<BASE_PROMPT>`** -- The text prompt for the initial image generation.
+- **`<REFINED_PROMPT>`** -- The modified prompt for the refinement step. Typically the base prompt with added or adjusted details.
+- **`<IMAGE_WIDTH>`, `<IMAGE_HEIGHT>`** -- Output image dimensions in pixels. Use the same dimensions across steps for consistency.
+- **`<NUM_STEPS>`** -- Number of diffusion steps. You can use different values per step if needed.
+- **`<SEED>`** -- Random seed. Using the same seed across steps produces more coherent iterations.
+
+Both requests are guaranteed to hit the same machine, so cached models and intermediate state from the first generation are available for the second.
+
+---
+
+## Troubleshooting
+
+**"Endpoint not found"** -- The `name` in `get_endpoint()` must match the dashboard name exactly.
+
+**Requests timing out** -- Workers may still be provisioning. Check endpoint status in the dashboard. Workers must download the model and pass benchmarks before accepting requests.
+
+**"Request failed" with no status** -- Check that `VAST_API_KEY` is set. You can also pass it directly: `Serverless(api_key="your-key")`.
+
+**Slow first request** -- The first request to a cold endpoint triggers worker provisioning. Subsequent requests are faster once workers are warm.
diff --git a/documentation/serverless/logging.mdx b/documentation/serverless/logging.mdx
new file mode 100644
index 0000000..4c6d163
--- /dev/null
+++ b/documentation/serverless/logging.mdx
@@ -0,0 +1,65 @@
+---
+title: Endpoint and Worker Logs
+description: Learn how to access Vast serverless logs
+"canonical": "/documentation/serverless/logging"
+---
+
+<script type="application/ld+json" dangerouslySetInnerHTML={{
+  __html: JSON.stringify({
+    "@context": "https://schema.org",
+    "@type": "TechArticle",
+    "headline": "Vast.ai Endpoint and Worker Logs",
+    "description": "Learn how to access endpoint and worker logs in Vast.ai serverless, including where to find them and how to retain them for debugging.",
+    "author": {
+      "@type": "Organization",
+      "name": "Vast.ai"
+    },
+    "articleSection": "Serverless Documentation",
+    "keywords": ["endpoint logs", "worker logs", "serverless", "vast.ai", "debugging", "CLI", "UI"]
+  })
+}} />
+
+Endpoint and worker logs provide real-time visibility into the behavior of your serverless infrastructure. These logs are primarily intended for debugging issues with endpoints, workergroups, and individual workers.
+
+## Endpoint Logs
+
+Endpoint logs are available under the **"All Workergroups"** tab in the Serverless endpoint, within the Vast console UI.
+
+<Frame caption="Endpoint Log">
+![Endpoint Log](/images/endpoint-log.webp)
+</Frame>
+
+These logs include low-level details about scaling decisions made by the serverless engine. They are useful for understanding how the system responds to traffic and workload changes, and include:
+- Summarized performance for all workers and workergroups
+- Measured and estimated performance and worker load
+- Marketplace offer details used in worker recruitment
+
+## Worker Logs
+
+Worker logs are accessible on a per-worker basis. To view worker logs, navigate to the serverless endpoint in question and click on this icon next to the worker.
+
+<Frame caption="Worker Log">
+![Worker Log](/images/worker-log.webp)
+</Frame>
+
+Worker logs provide detailed runtime output for individual workers, helping you debug model loading, request handling, container behavior, and other worker-specific events.
+
+## Log Characteristics and Retention
+
+- Logs are **streaming outputs**, typically only a few seconds behind real-time.
+- Logs are **not permanently maintained** and are intended for near real-time debugging of issues.
+- Users who need longer retention should **periodically download logs** to store them externally.
+
+## Accessing Logs Through the CLI
+
+In addition to accessing these logs through the UI, you can use the vastai CLI to check endpoint and worker group logs at different log levels (level 0 is the highest detail, level 3 the lowest).
+
+## Endpoint logs
+```cli CLI Command
+vastai get endpt-logs <endpoint_id> --level (0-3)
+```
+
+## Workergroup logs
+```cli CLI Command
+vastai get wrkgrp-logs <worker_group_id> --level (0-3)
+```
\ No newline at end of file
diff --git a/documentation/serverless/quickstart.mdx b/documentation/serverless/quickstart.mdx
index 0ccb1ae..b8bf7fc 100644
--- a/documentation/serverless/quickstart.mdx
+++ b/documentation/serverless/quickstart.mdx
@@ -119,7 +119,7 @@ Many popular models like Llama and Mistral require authentication to download. C
     Monitor the worker status in the dashboard:
     - **Stopped**: Worker has the model loaded and is ready to activate on-demand (cold worker)
     - **Loading**: Worker is starting up and loading the model into GPU memory
-    - **Ready**: Worker is active and handling requests
+    - **Ready**: Worker is active and ready to handle requests
 
     You can view detailed statistics by clicking **"View detailed stats"** on the Workergroup.
 
diff --git a/documentation/serverless/serverless-hello-world.mdx b/documentation/serverless/serverless-hello-world.mdx
new file mode 100644
index 0000000..6115710
--- /dev/null
+++ b/documentation/serverless/serverless-hello-world.mdx
@@ -0,0 +1,343 @@
+# Vast.ai Serverless from Scratch: A Hello World Guide
+
+This guide builds a serverless endpoint on Vast.ai from first principles. Instead of starting with a complex ML model, we start with the simplest possible backend -- a FastAPI server that returns `"Hello"` -- and use it to explain every piece of the system. By the end, you will understand exactly how the PyWorker, the Serverless Engine, and the client SDK fit together, and you will have a working endpoint you can hit from your own code.
+
+## Prerequisites
+
+- A [Vast.ai](https://vast.ai) account with credit balance
+- Your Vast.ai API key (found in your [Account settings](https://cloud.vast.ai/account/))
+- Python 3.9+
+- A public Git repository (e.g., on GitHub) where you will host your worker code
+
+Install the Vast.ai SDK:
+
+```bash
+pip install vastai-sdk
+```
+
+Set your API key as an environment variable:
+
+```bash
+export VAST_API_KEY="<YOUR_API_KEY>"
+```
+
+- **`<YOUR_API_KEY>`** -- Your Vast.ai API key. Find it in your [Account settings](https://cloud.vast.ai/account/).
+
+---
+
+## How Vast Serverless Works
+
+Before writing any code, here is the mental model.
+
+Vast.ai Serverless is a system that manages GPU-backed workers to serve your application. It has three layers:
+
+1. **Your model backend** -- An HTTP server (FastAPI, Flask, aiohttp, etc.) that does the actual work. It runs locally on each worker instance and is never exposed to the internet directly.
+2. **The PyWorker** -- A Python web server provided by the Vast SDK that runs alongside your model backend on the same instance. It acts as an ingress proxy: it receives requests from the Serverless Engine, forwards them to your backend, and returns the results. It also tails your backend's log file to know when it is ready, runs benchmarks to measure performance, and reports metrics to the Serverless Engine.
+3. **The Serverless Engine** -- Vast's infrastructure that routes client requests to available workers, recruits new workers when load increases, and releases workers when load drops.
+
+As a developer, you write two things:
+- The **model backend** (your application logic)
+- A **`worker.py`** file (a small configuration script that tells the PyWorker where your backend is and how to talk to it)
+
+The Serverless Engine and PyWorker runtime are provided by Vast.
+
+---
+
+## Step 1: Write the Model Backend
+
+This is the code that does the actual work on each worker instance. For this guide, it is a FastAPI server with a single `/hello` route:
+
+```python
+# model_backend.py
+from fastapi import FastAPI, Request
+from fastapi.responses import JSONResponse
+import logging
+import uvicorn
+
+logging.basicConfig(
+    level=logging.INFO,
+    filename="<MODEL_LOG_FILE>",  # e.g. "/var/log/model.log"
+    format="%(message)s"
+)
+logger = logging.getLogger(__name__)
+
+app = FastAPI()
+
+@app.on_event("startup")
+async def startup():
+    logger.info("Model backend started")
+
+@app.exception_handler(Exception)
+async def exception_handler(request: Request, exc: Exception):
+    logger.error("Model backend error", exc_info=exc)
+    return JSONResponse(status_code=500, content={"error": str(exc)})
+
+@app.get("/hello")
+async def hello():
+    return {"message": "Hello"}
+
+@app.get("/health")
+async def health():
+    return {"ok": True}
+
+if __name__ == "__main__":
+    uvicorn.run(app, host="0.0.0.0", port=<MODEL_PORT>)  # e.g. 8080
+```
+
+- **`<MODEL_LOG_FILE>`** -- Path where the backend writes its logs. The PyWorker tails this file to detect when the backend is ready. Must be an absolute path (e.g., `"/var/log/model.log"`).
+- **`<MODEL_PORT>`** -- The port your backend listens on (e.g., `8080`). You will reference this in `worker.py`.
+
+Notice a few things about this code that matter for the PyWorker:
+
+- **It logs `"Model backend started"` on startup.** The PyWorker watches for this exact string in the log file. When it appears, the PyWorker knows the backend is ready and begins benchmarking.
+- **It logs `"Model backend error"` on exceptions.** The PyWorker watches for this string too. If it appears, the PyWorker marks the worker as failed so the Serverless Engine can replace it.
+- **It has a `/health` endpoint.** The PyWorker calls this periodically to verify the backend is still alive.
+- **It has a `/hello` endpoint.** This is the route we will expose through the PyWorker.
+
+In a real application, you would replace `/hello` with your inference logic -- running a model, processing data, generating images, etc. The pattern is the same regardless of what the backend does.
+
+---
+
+## Step 2: Write the PyWorker Configuration
+
+The `worker.py` file tells the PyWorker everything it needs to know about your backend. Create this file in the same repository as your model backend:
+
+```python
+# worker.py
+from vastai import Worker, WorkerConfig, HandlerConfig, LogActionConfig, BenchmarkConfig
+
+worker_config = WorkerConfig(
+    model_server_url="http://127.0.0.1",
+    model_server_port=<MODEL_PORT>,             # e.g. 8080 -- must match model_backend.py
+    model_log_file="<MODEL_LOG_FILE>",          # e.g. "/var/log/model.log" -- must match model_backend.py
+    model_healthcheck_url="/health",
+
+    handlers=[
+        HandlerConfig(
+            route="/hello",
+            allow_parallel_requests=<ALLOW_PARALLEL>,   # e.g. False
+            max_queue_time=<MAX_QUEUE_TIME>,             # e.g. 30.0
+            benchmark_config=BenchmarkConfig(
+                dataset=[<BENCHMARK_PAYLOADS>],          # e.g. [{}]
+                runs=<BENCHMARK_RUNS>,                   # e.g. 3
+            ),
+        )
+    ],
+
+    log_action_config=LogActionConfig(
+        on_load=["Model backend started"],
+        on_error=["Model backend error"],
+        on_info=[]
+    )
+)
+
+Worker(worker_config).run()
+```
+
+- **`<MODEL_PORT>`** -- Must match the port in `model_backend.py`.
+- **`<MODEL_LOG_FILE>`** -- Must match the log file path in `model_backend.py`.
+- **`<ALLOW_PARALLEL>`** -- Set to `False` if your backend handles one request at a time (the PyWorker enforces FIFO queuing). Set to `True` if it can handle concurrent requests.
+- **`<MAX_QUEUE_TIME>`** -- Maximum seconds a request waits in the PyWorker's queue before being rejected (HTTP 429). The SDK automatically retries rejected requests, and the Serverless Engine treats queue pressure as a signal to recruit more workers.
+- **`<BENCHMARK_PAYLOADS>`** -- Sample request payloads for benchmarking. For our `/hello` route that takes no input, an empty dict `{}` works. For a real model, these would be representative inference payloads.
+- **`<BENCHMARK_RUNS>`** -- How many benchmark rounds to run. More runs give more accurate throughput measurements but delay worker readiness.
+
+### Why each field maps to your backend
+
+This is the key insight: **every config value in `worker.py` points directly at something in your model backend code.**
+
+| worker.py field | Points to... |
+|---|---|
+| `model_server_port=8080` | The port in `uvicorn.run(app, port=8080)` |
+| `model_log_file="/var/log/model.log"` | The `filename=` in `logging.basicConfig(...)` |
+| `model_healthcheck_url="/health"` | The `@app.get("/health")` route |
+| `route="/hello"` | The `@app.get("/hello")` route |
+| `on_load=["Model backend started"]` | The `logger.info("Model backend started")` in `startup()` |
+| `on_error=["Model backend error"]` | The `logger.error("Model backend error")` in `exception_handler()` |
+
+If you change a log message in the backend, you must update `LogActionConfig` to match. If you add a new route to the backend, you add a new `HandlerConfig` to expose it. The PyWorker does not inspect your code -- it only knows what you tell it in the config.
+
+---
+
+## Step 3: Transforming Requests and Responses
+
+The PyWorker can transform payloads on the way in and responses on the way out. This is useful when you want the public API shape to differ from what your backend expects.
+
+For example, suppose you want clients to send `{"name": "World"}` and receive `{"greeting": "Hello, World"}`, but your backend only returns `{"message": "Hello"}`. You can add a `request_parser` and `response_generator` to the `HandlerConfig`:
+
+```python
+from aiohttp import web, ClientResponse
+
+def parse_request(payload: dict) -> dict:
+    """Transform the incoming client payload before it reaches the backend."""
+    # The client sends {"name": "World"}, but our backend ignores the body.
+    # We could validate, reshape, or enrich the payload here.
+    return payload
+
+async def generate_response(
+    client_request: web.Request,
+    model_response: ClientResponse
+) -> web.Response:
+    """Transform the backend response before it reaches the client."""
+    body = await model_response.json()
+    # Backend returns {"message": "Hello"}, we reshape it
+    name = (await client_request.json()).get("name", "World")
+    return web.json_response({
+        "greeting": f"{body['message']}, {name}"
+    })
+
+worker_config = WorkerConfig(
+    model_server_url="http://127.0.0.1",
+    model_server_port=<MODEL_PORT>,             # e.g. 8080
+    model_log_file="<MODEL_LOG_FILE>",          # e.g. "/var/log/model.log"
+    model_healthcheck_url="/health",
+
+    handlers=[
+        HandlerConfig(
+            route="/hello",
+            allow_parallel_requests=False,
+            max_queue_time=30.0,
+            request_parser=parse_request,
+            response_generator=generate_response,
+            benchmark_config=BenchmarkConfig(
+                dataset=[{"name": "benchmark"}],
+                runs=3,
+            ),
+        )
+    ],
+
+    log_action_config=LogActionConfig(
+        on_load=["Model backend started"],
+        on_error=["Model backend error"],
+        on_info=[]
+    )
+)
+
+Worker(worker_config).run()
+```
+
+- **`<MODEL_PORT>`** -- Must match the port in `model_backend.py`.
+- **`<MODEL_LOG_FILE>`** -- Must match the log file path in `model_backend.py`.
+
+`request_parser` receives the raw JSON dict from the client and returns the dict that gets forwarded to the backend. `response_generator` receives the original client request and the backend's response, and returns the final response sent to the client.
+
+If you do not provide these, the PyWorker passes payloads and responses through unchanged, which is the right default for most cases.
+
+---
+
+## Step 4: Set Up the Repository
+
+Push both files to a public Git repository:
+
+```
+serverless-hello-world/
+  model_backend.py
+  worker.py
+  requirements.txt
+```
+
+Your `requirements.txt` should include:
+
+```
+fastapi
+uvicorn
+vastai-sdk
+```
+
+---
+
+## Step 5: Create a Vast Template and Endpoint
+
+1. Go to the [Vast.ai Templates page](https://cloud.vast.ai/templates/) and create a new template.
+2. Choose a base Docker image with Python (e.g., `python:3.10-slim`).
+3. In the **On-Start Script** field, add a script that:
+   - Clones your repository
+   - Installs dependencies
+   - Starts the model backend
+   - Starts the PyWorker
+
+   For example:
+
+   ```bash
+   #!/bin/bash
+   cd /workspace
+   git clone <YOUR_REPO_URL> app    # e.g. https://github.com/youruser/serverless-hello-world.git
+   cd app
+   pip install -r requirements.txt
+   python model_backend.py &
+   python worker.py
+   ```
+
+   - **`<YOUR_REPO_URL>`** -- The HTTPS URL of your public Git repository.
+
+4. Go to the [Serverless dashboard](https://cloud.vast.ai/serverless/) and create an endpoint using your new template.
+5. Name the endpoint `<ENDPOINT_NAME>` (e.g., `"my-hello-endpoint"`).
+
+   - **`<ENDPOINT_NAME>`** -- Pick a name you will reference in client code. Must be unique within your account.
+
+The Serverless Engine will begin recruiting workers. On each worker, the on-start script clones your repo, starts the model backend, and starts the PyWorker. The PyWorker tails the log file, sees `"Model backend started"`, runs benchmarks against `/hello`, and then marks the worker as ready.
+
+---
+
+## Step 6: Send a Request
+
+Now call your endpoint from any Python script:
+
+```python
+import asyncio
+from vastai import Serverless
+
+ENDPOINT_NAME = "<ENDPOINT_NAME>"   # e.g. "my-hello-endpoint"
+
+async def main():
+    async with Serverless() as client:
+        endpoint = await client.get_endpoint(name=ENDPOINT_NAME)
+
+        result = await endpoint.request("/hello", {})
+        if result["ok"]:
+            print(result["response"])
+        else:
+            print(f"Request failed. Status={result.get('status')}, Msg={result.get('text')}")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+- **`<ENDPOINT_NAME>`** -- The name you gave your endpoint in step 5. Must match exactly.
+
+The SDK handles routing, authentication, retries, and error handling. You send a request to the `/hello` route with an empty payload. The Serverless Engine picks an available worker, the PyWorker forwards the request to your FastAPI backend, and the response comes back as `{"message": "Hello"}`.
+
+If you added the `response_generator` from step 3, you could instead send:
+
+```python
+result = await endpoint.request("/hello", {"name": "World"})
+# result["response"] -> {"greeting": "Hello, World"}
+```
+
+---
+
+## What Happens Under the Hood
+
+Here is the full request lifecycle, mapped to the code you wrote:
+
+1. **Your script** calls `endpoint.request("/hello", {})` using the Serverless SDK.
+2. **The Serverless Engine** selects a ready worker from your endpoint's pool based on available capacity and returns its address.
+3. **The SDK** sends the payload to the selected worker's PyWorker.
+4. **The PyWorker** receives the request on the `/hello` route (defined in `HandlerConfig`). If `request_parser` is set, it transforms the payload. If `allow_parallel_requests=False`, it queues the request and waits for its turn.
+5. **The PyWorker** forwards the payload to your FastAPI backend at `http://127.0.0.1:8080/hello` (defined in `WorkerConfig`).
+6. **Your FastAPI backend** processes the request and returns `{"message": "Hello"}`.
+7. **The PyWorker** receives the response. If `response_generator` is set, it transforms the response. Otherwise, it passes it through unchanged.
+8. **The response** flows back through the SDK to your script.
+9. **Independently**, the PyWorker periodically reports workload metrics to the Serverless Engine, which uses them to decide whether to recruit or release workers.
+
+---
+
+## Where to Go from Here
+
+This guide used a trivial backend to show how the pieces connect. To build something real:
+
+- **Replace `/hello` with your inference route.** Load a model in the `startup()` function, run inference in your route handler, and return results. The PyWorker config stays the same -- just update the route, benchmarks, and log messages.
+- **Add more routes.** Each route gets its own `HandlerConfig`. Only one handler needs a `BenchmarkConfig`.
+- **Add a `workload_calculator`.** For backends where requests have variable cost (e.g., different prompt lengths), provide a function that returns a float representing the computational cost. This helps the Serverless Engine make better scaling decisions. For constant-cost workloads, you can omit it.
+- **Use sessions for stateful workflows.** If you need sequential requests pinned to the same worker, see the [ComfyUI Quickstart](comfyui-quickstart.md) for a session example.
+
+For a complete real-world example, see the [ComfyUI Quickstart](comfyui-quickstart.md) guide.
diff --git a/documentation/serverless/worker-states.mdx b/documentation/serverless/worker-states.mdx
new file mode 100644
index 0000000..3abf583
--- /dev/null
+++ b/documentation/serverless/worker-states.mdx
@@ -0,0 +1,64 @@
+---
+title: Worker States
+description: Learn about the different worker states
+"canonical": "/documentation/serverless/worker-states"
+---
+
+<script type="application/ld+json" dangerouslySetInnerHTML={{
+  __html: JSON.stringify({
+    "@context": "https://schema.org",
+    "@type": "TechArticle",
+    "headline": "Vast.ai Worker States Overview",
+    "description": "Understanding worker states in Vast.ai serverless system, including state descriptions and billing implications for each state.",
+    "author": {
+      "@type": "Organization",
+      "name": "Vast.ai"
+    },
+    "articleSection": "Serverless Documentation",
+    "keywords": ["worker states", "serverless", "billing", "vast.ai", "instance states", "worker lifecycle"]
+  })
+}} />
+
+Worker states represent the current operational status of your serverless worker instances. Understanding these states helps you monitor your workers' health and manage your costs effectively.
+
+## Worker States in the Vast UI
+
+The following table describes all available worker states that are shown to users in the UI:
+
+| State Name | Description | Billing State |
+|------------|-------------|---------------|
+| Ready | Worker is fully initialized with model loaded and ready to process tasks | Ready |
+| Loading | Worker is loading resources or configurations | Loading |
+| Inactive | Worker is created but stopped, not currently processing | Inactive |
+| Destroying | Worker is being destroyed and resources are being released | Not billed |
+| Error | Worker has encountered an error and requires attention | Not billed |
+| Rebooting | Worker is restarting to apply updates or recover from issues | Billed by previous state |
+| Creating | Worker is being created and initialized | Creating |
+| Starting | Worker is in the process of transitioning to Ready state | Ready |
+| Stopping | Worker is in the process of transitioning to Inactive state | Inactive |
+| Updating | Worker is in the process of being updated and rebooted | Ready |
+
+For more information on the specific breakdown of charges, please refer to [Serverless pricing](/documentation/serverless/pricing)
+
+## Worker States in the Vast CLI and logs
+
+The worker logs, accessible through the Vast CLI and UI, have more detailed descriptions of worker states to aid in debugging issues.
+These are described in the table below:
+
+| State Name | Description | Billing State |
+|------------|-------------|---------------|
+| Ready (IDLE) | Worker is fully initialized with model loaded and ready to process tasks | Ready |
+| Loading (LOADING) | Worker is performing docker pull/run operations | Loading |
+| Model Loading (MODEL_LOADING) | Worker is loading the model for the first time in its lifetime | Loading |
+| Inactive (STOPPED) | Worker is created but stopped, not currently processing | Inactive |
+| Destroying (DESTROYING) | Worker is being destroyed and resources are being released | Not Billed |
+| Error (ERROR) | Worker has encountered an error (version mismatch, error message, etc.) | Not Billed |
+| Rebooting (REBOOTING) | Worker is restarting to apply updates or recover from issues | Billed by previous state |
+| Creating (CREATING) | Worker instance creation API call has succeeded | Creating |
+| Created (CREATED) | Worker is created but stopped in database | Creating |
+| Starting (STARTING) | Worker is starting up after initial boot (not first time) | Ready |
+| Stopping (STOPPING) | Worker is in the process of transitioning to stopped state | Inactive |
+| Pending (PENDING) | Rental offer has been accepted, worker setup is pending | Not Billed |
+| Update Ready (UPDATE_READY) | Worker is prepared and ready for update reboot | Ready |
+| Update Rebooting (UPDATE_REBOOTING) | Worker is rebooting after update (non-alive) | Ready |
+| Update Starting (UPDATE_STARTING) | Worker is starting up after update reboot | Ready |
diff --git a/images/endpoint-log.webp b/images/endpoint-log.webp
new file mode 100644
index 0000000..cf205cd
Binary files /dev/null and b/images/endpoint-log.webp differ
diff --git a/images/serverless-architecture.webp b/images/serverless-architecture.webp
index 0a0cb05..fbf8f96 100644
Binary files a/images/serverless-architecture.webp and b/images/serverless-architecture.webp differ
diff --git a/images/worker-log.webp b/images/worker-log.webp
new file mode 100644
index 0000000..f4afac9
Binary files /dev/null and b/images/worker-log.webp differ