From 8ecc14bca9c9ad07c09e106679c192c60080729e Mon Sep 17 00:00:00 2001
From: David O'Keeffe <david.okeeffe1@coles.com.au>
Date: Sun, 19 Apr 2026 22:01:24 +1000
Subject: [PATCH 1/5] feat(databricks-skills): add databricks-mlflow-ml skill
 for classic ML

Fills the gap between databricks-mlflow-evaluation (GenAI agent eval) and
databricks-model-serving (real-time endpoints). Covers:

- Classic ML model training with MLflow tracking
  (sklearn / XGBoost / PyTorch)
- Experiment creation with UC volume artifact_location
  (required in UC-enforced workspaces)
- Unity Catalog model registration with three-level names
- @champion / @challenger alias management
- Batch inference via mlflow.pyfunc.load_model (notebook, up to ~10k rows)
- Distributed batch via mlflow.pyfunc.spark_udf in Lakeflow SDP pipelines

Structure mirrors databricks-mlflow-evaluation:
- SKILL.md: workflows + trigger description + quick start
- references/GOTCHAS.md: 12 common mistakes with symptoms + fixes
- references/CRITICAL-interfaces.md: exact API signatures + models:/ URI format
- references/patterns-experiment-setup.md: UC volume artifact_location setup
- references/patterns-training.md: logging with signature + input_example
- references/patterns-uc-registration.md: register + alias + verify + A/B
- references/patterns-batch-inference.md: pyfunc.load_model + spark_udf + ai_query anti-pattern
- references/user-journeys.md: 7 end-to-end workflows including debugging

Key gotchas covered that other MLflow guides miss:
- Experiment creation now requires UC volume artifact_location in UC-enforced
  workspaces (DBFS root writes are rejected)
- mlflow.set_registry_uri('databricks-uc') is required; silent workspace
  registry fallback is the #1 support question
- ai_query does NOT work on custom UC-registered models unless they're
  deployed to a serving endpoint; use pyfunc.load_model or spark_udf instead
- UC aliases (@champion/@challenger) replace deprecated stage transitions
  (transition_model_version_stage is a no-op on UC models)
- mlflow.pyfunc.spark_udf must be constructed at module scope in Lakeflow
  SDP pipelines, not inside the function body

Tested against MLflow 2.16+ on Databricks Runtime 15.4 LTS. Content battle-
tested in the Coles Vibe Workshop (classic-ML track running in an airgapped
environment where online MLflow docs aren't reachable).
---
 .../databricks-mlflow-ml/SKILL.md             | 125 +++++++++
 .../references/CRITICAL-interfaces.md         | 219 +++++++++++++++
 .../references/GOTCHAS.md                     | 265 ++++++++++++++++++
 .../references/patterns-batch-inference.md    | 244 ++++++++++++++++
 .../references/patterns-experiment-setup.md   | 141 ++++++++++
 .../references/patterns-training.md           | 205 ++++++++++++++
 .../references/patterns-uc-registration.md    | 232 +++++++++++++++
 .../references/user-journeys.md               | 195 +++++++++++++
 8 files changed, 1626 insertions(+)
 create mode 100644 databricks-skills/databricks-mlflow-ml/SKILL.md
 create mode 100644 databricks-skills/databricks-mlflow-ml/references/CRITICAL-interfaces.md
 create mode 100644 databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md
 create mode 100644 databricks-skills/databricks-mlflow-ml/references/patterns-batch-inference.md
 create mode 100644 databricks-skills/databricks-mlflow-ml/references/patterns-experiment-setup.md
 create mode 100644 databricks-skills/databricks-mlflow-ml/references/patterns-training.md
 create mode 100644 databricks-skills/databricks-mlflow-ml/references/patterns-uc-registration.md
 create mode 100644 databricks-skills/databricks-mlflow-ml/references/user-journeys.md

diff --git a/databricks-skills/databricks-mlflow-ml/SKILL.md b/databricks-skills/databricks-mlflow-ml/SKILL.md
new file mode 100644
index 00000000..43d4a2ed
--- /dev/null
+++ b/databricks-skills/databricks-mlflow-ml/SKILL.md
@@ -0,0 +1,125 @@
+---
+name: databricks-mlflow-ml
+description: "Classic ML model lifecycle on Databricks with MLflow and Unity Catalog. Use when training scikit-learn / XGBoost / PyTorch models with MLflow tracking, registering models to Unity Catalog (three-level names, @champion / @challenger aliases), setting mlflow.set_registry_uri('databricks-uc'), logging experiments with UC volume artifact_location, loading registered models via mlflow.pyfunc.load_model or mlflow.pyfunc.spark_udf, and running batch inference (notebook or Lakeflow SDP pipeline). Not for GenAI agent evaluation — use databricks-mlflow-evaluation for that. Not for Model Serving endpoints — use databricks-model-serving for that."
+---
+
+# MLflow + Unity Catalog — Classic ML
+
+## Before Writing Any Code
+
+1. **Read `GOTCHAS.md`** — 12 common mistakes that cause silent failures or wasted time
+2. **Read `CRITICAL-interfaces.md`** — exact API signatures and the `models:/` URI format
+
+## End-to-End Workflows
+
+Follow the workflow that matches your goal. Each step indicates which reference files to read.
+
+### Workflow 1: Train → Register → Batch Score (most common)
+
+For building a production-shape classic ML model with UC-native lineage. Covers the full path from raw features to predictions in a downstream table.
+
+| Step | Action | Reference Files |
+|------|--------|-----------------|
+| 1 | Create experiment with UC volume artifact_location | `patterns-experiment-setup.md` (Pattern 1) |
+| 2 | Train model with signature + input_example | `patterns-training.md` (Patterns 1–3) |
+| 3 | Register to Unity Catalog with three-level name | `patterns-uc-registration.md` (Patterns 1–2) |
+| 4 | Set `@champion` alias | `patterns-uc-registration.md` (Pattern 3) |
+| 5 | Verify registration (Navigator check) | `patterns-uc-registration.md` (Pattern 4) + `GOTCHAS.md` #5 |
+| 6 | Load + score in notebook (Tier 1) | `patterns-batch-inference.md` (Patterns 1–2) |
+| 7 | Optional: Lakeflow SDP batch via `spark_udf` | `patterns-batch-inference.md` (Patterns 3–4) |
+
+### Workflow 2: Retrain + Promote (A/B pattern)
+
+For adding a new version of an already-registered model and promoting it without touching downstream loader code.
+
+| Step | Action | Reference Files |
+|------|--------|-----------------|
+| 1 | Train new version, log to same UC model name | `patterns-training.md` (Pattern 4) |
+| 2 | Register as new version | `patterns-uc-registration.md` (Pattern 2) |
+| 3 | Set `@challenger` alias | `patterns-uc-registration.md` (Pattern 3) |
+| 4 | Validate `@challenger` predictions vs `@champion` | `patterns-batch-inference.md` (Pattern 5) |
+| 5 | Swap aliases (`@challenger` → `@champion`) | `patterns-uc-registration.md` (Pattern 5) |
+
+Downstream loader code that uses `models:/catalog.schema.model@champion` picks up the new version on next load — no code change needed.
+
+### Workflow 3: Debugging a Failed Registration or Load
+
+For the two most common support questions: "why did my model go to workspace registry?" and "why does pyfunc.load_model fail?"
+
+| Step | Action | Reference Files |
+|------|--------|-----------------|
+| 1 | Verify registry URI is set to `databricks-uc` | `GOTCHAS.md` #1 |
+| 2 | Verify three-level name | `GOTCHAS.md` #2 |
+| 3 | Confirm model appears in Catalog Explorer | `patterns-uc-registration.md` (Pattern 4) |
+| 4 | Check `CREATE MODEL` permissions | `GOTCHAS.md` #7 |
+| 5 | Diagnose load failures | `GOTCHAS.md` #3, #8, #11 |
+
+## Quick Start
+
+The minimum viable path from untrained model to UC-registered, notebook-scored:
+
+```python
+import mlflow
+from mlflow.models import infer_signature
+from mlflow import MlflowClient
+
+# 1. Configure: UC registry + UC volume for artifacts (both required)
+mlflow.set_registry_uri("databricks-uc")
+mlflow.set_experiment(
+    experiment_name="/Users/me@company.com/forecasting",
+    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting",
+)
+
+# 2. Train + log
+with mlflow.start_run() as run:
+    model.fit(X_train, y_train)
+    signature = infer_signature(X_train, model.predict(X_train[:5]))
+    mlflow.sklearn.log_model(
+        sk_model=model,
+        artifact_path="model",
+        signature=signature,
+        input_example=X_train.iloc[:5],
+    )
+
+# 3. Register + alias
+MODEL_NAME = "my_catalog.my_schema.my_model"
+result = mlflow.register_model(f"runs:/{run.info.run_id}/model", MODEL_NAME)
+MlflowClient().set_registered_model_alias(MODEL_NAME, "champion", result.version)
+
+# 4. Load + predict (in any notebook, anywhere)
+model = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}@champion")
+predictions = model.predict(X_test)
+```
+
+## Why This Skill Exists
+
+Three skills in the AI Dev Kit touch MLflow; this one owns **classic ML training + UC registration + batch inference**. The distinction matters because the APIs diverged:
+
+| Skill | Scope | MLflow API Surface |
+|-------|-------|--------------------|
+| `databricks-mlflow-evaluation` | GenAI agent evaluation | `mlflow.genai.evaluate()`, scorers, judges, traces |
+| `databricks-model-serving` | Real-time serving endpoints | Deployment APIs, endpoint management, `ai_query` |
+| `databricks-mlflow-ml` *(this skill)* | Classic ML + UC registration + batch inference | `mlflow.sklearn.log_model`, `register_model`, `set_registered_model_alias`, `pyfunc.load_model`, `pyfunc.spark_udf` |
+
+If you're training a forecasting / classification / regression model, registering it to UC, and scoring it in a notebook or Lakeflow pipeline — this skill. If you're evaluating an LLM agent's output quality — evaluation skill. If you're exposing a model behind an HTTP endpoint — model-serving skill.
+
+## Common Issues
+
+| Issue | Solution |
+|-------|----------|
+| **Model registered but not visible in Catalog Explorer** | Missing `mlflow.set_registry_uri("databricks-uc")`. See `GOTCHAS.md` #1. |
+| **`RestException: INVALID_PARAMETER_VALUE` on `register_model`** | Two-level name used. UC requires `catalog.schema.name`. See `GOTCHAS.md` #2. |
+| **Experiment creation fails with storage errors** | Missing `artifact_location` pointing at a UC volume. See `GOTCHAS.md` #4. |
+| **`PERMISSION_DENIED: CREATE MODEL`** | Pair/user needs `CREATE MODEL ON SCHEMA <schema>`. See `GOTCHAS.md` #7. |
+| **`pyfunc.load_model` returns but `predict()` fails** | Signature wasn't logged; inputs don't coerce. See `GOTCHAS.md` #8. |
+| **Agent proposes `ai_query` for batch inference** | Wrong primitive — that requires a serving endpoint. Use `pyfunc.load_model` or `spark_udf`. See `GOTCHAS.md` #9. |
+
+## Reference Files
+
+- [`GOTCHAS.md`](references/GOTCHAS.md) — 12 common mistakes + fixes
+- [`CRITICAL-interfaces.md`](references/CRITICAL-interfaces.md) — API signatures + `models:/` URI format
+- [`patterns-experiment-setup.md`](references/patterns-experiment-setup.md) — experiment creation with UC volume artifact_location
+- [`patterns-training.md`](references/patterns-training.md) — logging models with signature + input_example + autologging
+- [`patterns-uc-registration.md`](references/patterns-uc-registration.md) — register + alias + verify + A/B promotion
+- [`patterns-batch-inference.md`](references/patterns-batch-inference.md) — notebook (`pyfunc.load_model`) + Lakeflow (`spark_udf`) + champion-vs-challenger
+- [`user-journeys.md`](references/user-journeys.md) — end-to-end workflows with decision points
diff --git a/databricks-skills/databricks-mlflow-ml/references/CRITICAL-interfaces.md b/databricks-skills/databricks-mlflow-ml/references/CRITICAL-interfaces.md
new file mode 100644
index 00000000..a40483c5
--- /dev/null
+++ b/databricks-skills/databricks-mlflow-ml/references/CRITICAL-interfaces.md
@@ -0,0 +1,219 @@
+# CRITICAL-interfaces — Exact API signatures
+
+The minimum set of APIs that every classic-ML + UC workflow touches. Copy-pasteable, with the exact arguments that matter.
+
+---
+
+## Registry URI configuration
+
+```python
+mlflow.set_registry_uri("databricks-uc")    # Call at the start of every session
+mlflow.get_registry_uri()                    # Returns "databricks-uc" if set correctly
+```
+
+**Must be called BEFORE** any `register_model` or `load_model` call. Idempotent to repeat.
+
+---
+
+## Experiment creation with UC volume artifact_location
+
+```python
+mlflow.set_experiment(
+    experiment_name="/Users/<email>/<experiment_name>",
+    artifact_location="dbfs:/Volumes/<catalog>/<schema>/<volume>/<path>",
+)
+```
+
+**`artifact_location` is required** for UC-enforced workspaces. The volume must exist:
+
+```sql
+CREATE VOLUME IF NOT EXISTS <catalog>.<schema>.<volume>;
+```
+
+---
+
+## `models:/` URI format
+
+All load / deploy / spark_udf calls use this URI. **One format to memorize:**
+
+```
+models:/<catalog>.<schema>.<model_name>@<alias>
+```
+
+Examples:
+```
+models:/my_catalog.my_schema.grocery_forecaster@champion
+models:/my_catalog.my_schema.grocery_forecaster@challenger
+```
+
+**Avoid** these forms (either legacy, or not-UC-native):
+```
+models:/grocery_forecaster/3                  # workspace registry, version number
+models:/my_schema.grocery_forecaster/3        # invalid in UC
+```
+
+---
+
+## Model logging (sklearn-flavored)
+
+```python
+mlflow.sklearn.log_model(
+    sk_model=<fitted_estimator_or_pipeline>,
+    artifact_path="model",                    # convention — keep as "model"
+    signature=<Signature>,                    # REQUIRED — use infer_signature()
+    input_example=<pandas_DataFrame>,         # REQUIRED — 5 real rows
+    registered_model_name=None,               # leave None; register separately (cleaner)
+    code_paths=<optional_list_of_dependency_files>,
+    extra_pip_requirements=<optional_list>,   # only if custom deps beyond environment
+)
+```
+
+**Signature inference:**
+```python
+from mlflow.models import infer_signature
+signature = infer_signature(X_train, model.predict(X_train[:5]))
+```
+
+**Other flavors with identical signature:**
+- `mlflow.xgboost.log_model(xgb_model=..., ...)`
+- `mlflow.pytorch.log_model(pytorch_model=..., ...)`
+- `mlflow.tensorflow.log_model(model=..., ...)`
+- `mlflow.pyfunc.log_model(python_model=..., artifact_path=..., ...)` — for custom PythonModel wrappers
+
+---
+
+## Explicit registration
+
+```python
+result = mlflow.register_model(
+    model_uri=f"runs:/{run_id}/model",        # "runs:/<run_id>/<artifact_path>"
+    name="<catalog>.<schema>.<model_name>",   # three-level, not optional
+    tags=<optional_dict>,
+)
+# result.name: str — fully qualified name
+# result.version: str — newly-created version (e.g., "1", "2")
+```
+
+---
+
+## Alias management
+
+```python
+from mlflow import MlflowClient
+client = MlflowClient()
+
+# Set (creates if missing, moves if exists)
+client.set_registered_model_alias(
+    name="<catalog>.<schema>.<model_name>",
+    alias="champion",                         # or "challenger", or custom
+    version="<version_number>",                # accepts str or int
+)
+
+# Get current alias mapping
+model = client.get_registered_model("<catalog>.<schema>.<model_name>")
+print(model.aliases)   # {"champion": "3", "challenger": "4"}
+
+# Delete
+client.delete_registered_model_alias(
+    name="<catalog>.<schema>.<model_name>",
+    alias="challenger",
+)
+```
+
+---
+
+## Loading — notebook / single-node
+
+```python
+model = mlflow.pyfunc.load_model(
+    model_uri="models:/<catalog>.<schema>.<model_name>@champion",
+)
+
+# Predict on a pandas DataFrame matching the signature
+predictions = model.predict(features_df)
+```
+
+**Returns:** `mlflow.pyfunc.PyFuncModel`, regardless of the original flavor. Expose `.metadata.signature` for schema.
+
+---
+
+## Loading — distributed / Lakeflow SDP
+
+```python
+predict_udf = mlflow.pyfunc.spark_udf(
+    spark,
+    model_uri="models:/<catalog>.<schema>.<model_name>@champion",
+    result_type="double",                     # or "array<double>" for multi-output
+    env_manager="local",                      # "local" | "virtualenv" | "conda"
+)
+
+# Apply to a Spark DataFrame
+df_with_predictions = df.withColumn(
+    "prediction",
+    predict_udf("feature_a", "feature_b", "feature_c"),
+)
+```
+
+**Construct ONCE at module scope** in Lakeflow pipelines. See `GOTCHAS.md` #11.
+
+---
+
+## Model introspection
+
+```python
+from mlflow.models import get_model_info
+
+info = get_model_info("models:/<catalog>.<schema>.<model_name>@champion")
+info.signature               # ModelSignature with inputs/outputs
+info.flavors                 # {"sklearn": {...}, "python_function": {...}}
+info.utc_time_created
+info.model_uuid
+```
+
+Useful when debugging load-vs-predict mismatches.
+
+---
+
+## Run + experiment queries (introspection)
+
+```python
+runs = mlflow.search_runs(
+    experiment_names=["/Users/me@company.com/forecasting"],
+    filter_string="metrics.r2 > 0.8",
+    order_by=["metrics.r2 DESC"],
+    max_results=5,
+)
+# Returns a pandas DataFrame with run_id, metrics, params, etc.
+
+best_run_id = runs.iloc[0]["run_id"]
+```
+
+---
+
+## SQL introspection (UC-native)
+
+```sql
+-- Does the model exist and which aliases are set?
+DESCRIBE MODEL <catalog>.<schema>.<model_name>;
+
+-- List all model versions
+SHOW MODEL VERSIONS ON MODEL <catalog>.<schema>.<model_name>;
+
+-- Check grants
+SHOW GRANTS ON MODEL <catalog>.<schema>.<model_name>;
+SHOW GRANTS ON SCHEMA <catalog>.<schema>;
+```
+
+---
+
+## What's NOT in this skill
+
+If you see these in code, you're likely in the wrong skill:
+
+| API | Belongs in |
+|-----|------------|
+| `mlflow.genai.evaluate(...)` | `databricks-mlflow-evaluation` |
+| `@scorer` decorator, `GuidelinesJudge`, etc. | `databricks-mlflow-evaluation` |
+| `databricks.sdk.service.serving.EndpointCoreConfigInput` | `databricks-model-serving` |
+| `ai_query('<custom-uc-model>', ...)` | Wrong pattern — use `pyfunc.load_model` or `spark_udf` instead (see `GOTCHAS.md` #9) |
+| `transition_model_version_stage(...)` | Deprecated — use aliases (see `GOTCHAS.md` #6) |
diff --git a/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md b/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md
new file mode 100644
index 00000000..92615de2
--- /dev/null
+++ b/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md
@@ -0,0 +1,265 @@
+# GOTCHAS — Classic ML on MLflow + Unity Catalog
+
+Twelve mistakes that silently waste hours. Read before writing any code.
+
+---
+
+## 1. Missing `mlflow.set_registry_uri("databricks-uc")` → workspace registry
+
+**Symptom:** `register_model` succeeds, but the model doesn't appear in Catalog Explorer. It's in the legacy **workspace registry** (visible under the MLflow icon in the left nav), not Unity Catalog.
+
+**Fix:**
+```python
+import mlflow
+mlflow.set_registry_uri("databricks-uc")   # MUST come before register_model / load_model
+```
+
+**Verification:**
+```python
+assert mlflow.get_registry_uri() == "databricks-uc"
+```
+
+**Why it bites:** defaults still route to the workspace registry for backward compatibility. The only indicator you missed it is a URL that shows `/ml/models/<name>` instead of `/explore/data/models/<catalog>/<schema>/<name>`.
+
+---
+
+## 2. Two-level model names → rejected or wrong registry
+
+**Symptom:** `RestException: INVALID_PARAMETER_VALUE: Invalid model name`, or the model registers to the workspace registry silently.
+
+**Fix:** always use three-level names: `catalog.schema.model_name`.
+
+```python
+# WRONG
+mlflow.register_model(model_uri, "my_model")
+mlflow.register_model(model_uri, "my_schema.my_model")
+
+# CORRECT
+mlflow.register_model(model_uri, "my_catalog.my_schema.my_model")
+```
+
+**Why it bites:** the error message depends on the registry URI. With UC URI + two-level name → parameter error. With workspace URI + two-level name → registers successfully to workspace (the silently-wrong case).
+
+---
+
+## 3. Loading with version number instead of alias
+
+**Symptom:** works today, breaks tomorrow when someone registers a new version. You've hard-coded a version number into every downstream consumer.
+
+**Fix:** load via alias, never version.
+
+```python
+# FRAGILE — every retrain requires updating every loader
+model = mlflow.pyfunc.load_model("models:/my_catalog.my_schema.my_model/3")
+
+# STABLE — promote a new version by moving @champion; no loader changes
+model = mlflow.pyfunc.load_model("models:/my_catalog.my_schema.my_model@champion")
+```
+
+**Why it bites:** aliases are the UC-native way to decouple loader code from model lifecycle. Version numbers are legacy. New infrastructure (Lakeflow, Genie) assumes alias-based loading.
+
+---
+
+## 4. Experiment creation without UC volume `artifact_location`
+
+**Symptom:** experiment creates, but any `log_model` call fails with storage / permission errors. Or artifacts land in DBFS root (deprecated) and can't be loaded downstream.
+
+**Fix:** when you create the experiment, pin it to a UC volume.
+
+```python
+# Prerequisite: the UC volume must exist
+# CREATE VOLUME my_catalog.my_schema.mlflow_artifacts;
+
+mlflow.set_experiment(
+    experiment_name="/Users/me@company.com/forecasting",
+    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting",
+)
+```
+
+**Why it bites:** the default `artifact_location` used to be DBFS root. Unity-Catalog-enforced workspaces reject DBFS root writes, so `log_model` fails with opaque errors. Pointing at a UC volume makes artifact storage first-class-governed and keeps lineage intact.
+
+**When the experiment already exists without a UC volume:** you can't retroactively change `artifact_location`. Either (a) delete + recreate, or (b) create a new experiment. Don't try to relocate artifacts manually.
+
+---
+
+## 5. Trusting `register_model` success without verifying in UC
+
+**Symptom:** `register_model` returns a `ModelVersion` object. Feels successful. But the model is in workspace registry, or the version number is stale, or an alias wasn't set.
+
+**Fix:** always verify explicitly.
+
+```sql
+-- In a SQL cell or notebook:
+DESCRIBE MODEL my_catalog.my_schema.my_model;
+```
+
+Or via Python:
+```python
+from mlflow import MlflowClient
+model = MlflowClient().get_registered_model("my_catalog.my_schema.my_model")
+assert "champion" in model.aliases, "Missing @champion alias"
+```
+
+Or visually: open Catalog Explorer → `my_catalog` → `my_schema` → **Models** tab. If the model is under MLflow's workspace UI instead, you registered to the wrong place (see #1).
+
+**Why it bites:** `register_model`'s return value only tells you a version was created. It doesn't tell you *where* or *with what aliases*. The Navigator's V-step in pair programming: verify before trusting.
+
+---
+
+## 6. Setting the alias to `"production"` or `"staging"` (legacy MLflow stages)
+
+**Symptom:** you remember MLflow had `stage="Production"` / `"Staging"` transitions. You try the same with aliases and nothing recognizes them.
+
+**Fix:** UC model aliases are free-form labels. The conventions are `@champion` (current winner) and `@challenger` (under evaluation). MLflow stages are deprecated in the UC registry.
+
+```python
+# WRONG (legacy stage concept)
+MlflowClient().set_registered_model_alias(name, "Production", version)
+
+# CORRECT
+MlflowClient().set_registered_model_alias(name, "champion", version)
+```
+
+**Why it bites:** the old `transition_model_version_stage()` API still exists but is a no-op on UC-registered models. No error, no effect.
+
+---
+
+## 7. Missing `CREATE MODEL ON SCHEMA` permission
+
+**Symptom:** `RestException: PERMISSION_DENIED: User ... does not have CREATE MODEL permission`.
+
+**Fix:** grant the permission at the schema level.
+
+```sql
+GRANT CREATE MODEL ON SCHEMA my_catalog.my_schema TO `user@company.com`;
+-- Or for a group:
+GRANT CREATE MODEL ON SCHEMA my_catalog.my_schema TO `data-science-team`;
+```
+
+**Why it bites:** workspace admins often assume `USE SCHEMA` covers model registration. It doesn't — `CREATE MODEL` is a separate UC privilege that must be granted explicitly.
+
+**Verification:**
+```sql
+SHOW GRANTS ON SCHEMA my_catalog.my_schema;
+```
+
+---
+
+## 8. Logging a model without `signature` or `input_example`
+
+**Symptom:** `mlflow.pyfunc.load_model(...)` returns an object, but `.predict(spark_df)` raises cryptic coercion errors. Or predictions silently cast (int → float, string → category) and produce wrong numbers.
+
+**Fix:** always log both.
+
+```python
+from mlflow.models import infer_signature
+
+signature = infer_signature(X_train, model.predict(X_train[:5]))
+mlflow.sklearn.log_model(
+    sk_model=model,
+    artifact_path="model",
+    signature=signature,
+    input_example=X_train.iloc[:5],   # 5 real rows for the pyfunc wrapper to introspect
+)
+```
+
+**Why it bites:** without a signature, the pyfunc wrapper can't coerce inputs — it accepts whatever you pass, then downstream operations (especially `spark_udf`) fail or produce wrong results. `input_example` is what `pyfunc.load_model` reads to build the wrapper's input coercer.
+
+---
+
+## 9. `ai_query` used for batch inference on a custom UC model
+
+**Symptom:** you want batch inference on your custom-registered model. You see `ai_query()` in Genie docs and assume it works. It doesn't (for custom models) — `ai_query` only invokes **serving endpoints**, and your UC-registered model isn't behind one unless you deployed a serving endpoint for it.
+
+**Fix:** for batch inference, use `pyfunc.load_model` (notebook) or `pyfunc.spark_udf` (Lakeflow SDP pipeline).
+
+```python
+# WRONG for custom UC models — requires a serving endpoint
+spark.sql(f"SELECT ai_query('{MODEL_NAME}', features) FROM silver_features")
+
+# CORRECT — notebook batch (single node)
+model = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}@champion")
+predictions = model.predict(features_pandas_df)
+
+# CORRECT — Lakeflow SDP batch (distributed)
+predict_udf = mlflow.pyfunc.spark_udf(spark, f"models:/{MODEL_NAME}@champion", result_type="double")
+silver_features.withColumn("prediction", predict_udf(*feature_cols))
+```
+
+**Why it bites:** `ai_query` *is* the right call for Foundation Model API endpoints (`ai_query('databricks-dbrx-instruct', prompt)`). The naming overlap leads to wrong assumptions for custom models.
+
+---
+
+## 10. Trying to delete / re-register a model at the same version number
+
+**Symptom:** `RestException: ALREADY_EXISTS` when re-registering. You can't reuse version numbers.
+
+**Fix:** UC versions are monotonically-increasing and immutable. To supersede a bad version, register a new version and move `@champion` to it. The old version stays in history for lineage.
+
+```python
+new_result = mlflow.register_model(new_run_uri, MODEL_NAME)
+MlflowClient().set_registered_model_alias(MODEL_NAME, "champion", new_result.version)
+# Old version is still there; that's correct. Lineage preserved.
+```
+
+**Why it bites:** habits from the workspace registry (where deletion was forgiving) don't transfer. UC treats model versions as first-class auditable artifacts.
+
+---
+
+## 11. `pyfunc.spark_udf` constructed inside a function call
+
+**Symptom:** in a Lakeflow SDP `@dp.materialized_view`, the UDF is constructed every time the view evaluates — slow and sometimes fails with serialization errors.
+
+**Fix:** construct the UDF at module scope, reuse it inside the view.
+
+```python
+import mlflow
+import databricks.declarative_pipelines as dp
+
+# Construct ONCE, at module scope
+mlflow.set_registry_uri("databricks-uc")
+predict_udf = mlflow.pyfunc.spark_udf(
+    spark,
+    f"models:/{MODEL_NAME}@champion",
+    result_type="double",
+)
+
+@dp.materialized_view
+def gold_forecast():
+    return spark.read.table("silver_features").withColumn(
+        "prediction",
+        predict_udf("feat_a", "feat_b", "feat_c"),
+    )
+```
+
+**Why it bites:** Lakeflow SDP may evaluate the function definition multiple times. Model deserialization is expensive — don't repeat it.
+
+---
+
+## 12. Custom preprocessing not captured in the logged model
+
+**Symptom:** in the training notebook, predictions are accurate. After `pyfunc.load_model(...)`, predictions are garbage. The pipeline works in training because you're calling `scaler.transform()` manually; at inference time, nobody calls the scaler.
+
+**Fix:** wrap preprocessing + model in an `sklearn.pipeline.Pipeline` (or a custom `PythonModel` for non-sklearn preprocessing). Log the whole pipeline.
+
+```python
+from sklearn.pipeline import Pipeline
+from sklearn.preprocessing import StandardScaler
+from sklearn.ensemble import GradientBoostingRegressor
+
+pipeline = Pipeline([
+    ("scaler", StandardScaler()),
+    ("model", GradientBoostingRegressor()),
+])
+pipeline.fit(X_train, y_train)
+
+# Logs both the fitted scaler AND the model as a single artifact
+mlflow.sklearn.log_model(
+    sk_model=pipeline,
+    artifact_path="model",
+    signature=infer_signature(X_train, pipeline.predict(X_train[:5])),
+    input_example=X_train.iloc[:5],
+)
+```
+
+**Why it bites:** the most painful post-registration bug. Training and inference code paths are different files; the divergence is invisible until predictions are obviously wrong.
diff --git a/databricks-skills/databricks-mlflow-ml/references/patterns-batch-inference.md b/databricks-skills/databricks-mlflow-ml/references/patterns-batch-inference.md
new file mode 100644
index 00000000..ed4d86ae
--- /dev/null
+++ b/databricks-skills/databricks-mlflow-ml/references/patterns-batch-inference.md
@@ -0,0 +1,244 @@
+# patterns-batch-inference
+
+Loading a UC-registered model and scoring features in batch. Two scales — interactive notebook (Pattern 1–2) and distributed Lakeflow pipeline (Patterns 3–4). Plus A/B validation (Pattern 5).
+
+---
+
+## Pattern 1: Notebook batch inference — pandas path
+
+For interactive exploration, ad-hoc scoring, and sample sizes up to ~10k rows.
+
+```python
+import mlflow
+
+mlflow.set_registry_uri("databricks-uc")
+
+model = mlflow.pyfunc.load_model(
+    "models:/my_catalog.my_schema.grocery_forecaster@champion"
+)
+
+# Load a sample of features (LIMIT in SQL to avoid loading full table)
+features = (
+    spark.table("my_catalog.my_schema.silver_features")
+    .orderBy("month_date")
+    .limit(1000)
+    .toPandas()
+)
+
+# The model's signature determines which columns it expects
+feature_cols = model.metadata.get_input_schema().input_names()
+
+predictions = model.predict(features[feature_cols])
+
+# Attach predictions for display/export
+features["prediction"] = predictions
+display(spark.createDataFrame(features))
+```
+
+---
+
+## Pattern 2: Notebook batch inference with chart
+
+Same pattern, adds a predicted-vs-actual visual. Useful as a demo artifact.
+
+```python
+import matplotlib.pyplot as plt
+
+# (continuing from Pattern 1)
+features_with_pred = features.sort_values("month_date")
+
+fig, ax = plt.subplots(figsize=(10, 5))
+ax.plot(features_with_pred["month_date"], features_with_pred["actual"],
+        label="Actual", linewidth=2)
+ax.plot(features_with_pred["month_date"], features_with_pred["prediction"],
+        label="Predicted", linestyle="--", linewidth=2)
+ax.set_xlabel("Month")
+ax.set_ylabel("Turnover (millions)")
+ax.set_title(f"Forecast — {model.metadata.run_id[:8]}")
+ax.legend()
+plt.xticks(rotation=45)
+plt.tight_layout()
+display(fig)
+```
+
+---
+
+## Pattern 3: Lakeflow SDP batch via `spark_udf`
+
+For scheduled batch inference at scale. Distributes across Spark executors — no per-row Python overhead, no serving endpoint.
+
+```python
+# src/gold/gold_forecast.py
+import mlflow
+import databricks.declarative_pipelines as dp
+
+# Construct the UDF ONCE at module scope — see GOTCHAS #11
+mlflow.set_registry_uri("databricks-uc")
+
+MODEL_NAME = "my_catalog.my_schema.grocery_forecaster"
+predict_udf = mlflow.pyfunc.spark_udf(
+    spark,
+    model_uri=f"models:/{MODEL_NAME}@champion",
+    result_type="double",
+    env_manager="local",   # "local" avoids conda/virtualenv setup overhead
+)
+
+@dp.materialized_view(
+    comment="Grocery turnover forecast from @champion model",
+)
+def gold_forecast():
+    return (
+        spark.read.table("my_catalog.my_schema.silver_features")
+        .withColumn(
+            "forecast_turnover_millions",
+            predict_udf(
+                "turnover_lag_1",
+                "turnover_lag_12",
+                "rolling_3m_avg",
+                "state_share_of_national",
+                # ... pass each signature input column in the order the signature declares
+            ),
+        )
+    )
+```
+
+**What this gives you:**
+- A `gold_forecast` table that refreshes on every pipeline run
+- Distributed scoring (no serving endpoint, no auth token)
+- Full UC lineage: `silver_features` → `gold_forecast` via `grocery_forecaster@champion`
+- Genie can query it: *"what's the forecast for each state next month?"*
+
+---
+
+## Pattern 4: `spark_udf` with `result_type` for multi-output models
+
+Multi-output regressors or classifiers need a richer result type.
+
+```python
+from pyspark.sql.types import ArrayType, DoubleType, StructType, StructField
+
+# Multi-output regression — model returns 2 predictions per row
+predict_udf = mlflow.pyfunc.spark_udf(
+    spark,
+    model_uri=f"models:/{MODEL_NAME}@champion",
+    result_type=ArrayType(DoubleType()),
+)
+
+# Classifier with probabilities
+predict_udf = mlflow.pyfunc.spark_udf(
+    spark,
+    model_uri=f"models:/{MODEL_NAME}@champion",
+    result_type=StructType([
+        StructField("class", StringType(), True),
+        StructField("confidence", DoubleType(), True),
+    ]),
+)
+```
+
+---
+
+## Pattern 5: A/B validation — compare `@challenger` vs `@champion`
+
+Run both models on a validation set, compare error metrics, decide whether to promote.
+
+```python
+import mlflow
+from sklearn.metrics import mean_absolute_error, root_mean_squared_error
+
+mlflow.set_registry_uri("databricks-uc")
+MODEL_NAME = "my_catalog.my_schema.grocery_forecaster"
+
+champion = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}@champion")
+challenger = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}@challenger")
+
+# Hold-out validation set (not seen during training)
+validation = spark.table(f"{MODEL_NAME.rsplit('.', 1)[0]}.validation_features").toPandas()
+feature_cols = champion.metadata.get_input_schema().input_names()
+actuals = validation["turnover_millions"]
+
+champion_preds = champion.predict(validation[feature_cols])
+challenger_preds = challenger.predict(validation[feature_cols])
+
+print(f"Champion    RMSE: {root_mean_squared_error(actuals, champion_preds):.2f}")
+print(f"Challenger  RMSE: {root_mean_squared_error(actuals, challenger_preds):.2f}")
+print(f"Champion    MAE:  {mean_absolute_error(actuals, champion_preds):.2f}")
+print(f"Challenger  MAE:  {mean_absolute_error(actuals, challenger_preds):.2f}")
+
+# Decision logic — promote if challenger beats champion by >2%
+if root_mean_squared_error(actuals, challenger_preds) < root_mean_squared_error(actuals, champion_preds) * 0.98:
+    print("→ Promote @challenger. See patterns-uc-registration.md Pattern 5.")
+else:
+    print("→ Keep @champion. Delete @challenger.")
+```
+
+---
+
+## Pattern 6: Structured streaming inference
+
+For models scoring events as they arrive (not batch-scheduled).
+
+```python
+from pyspark.sql.functions import col
+
+predict_udf = mlflow.pyfunc.spark_udf(
+    spark,
+    model_uri=f"models:/{MODEL_NAME}@champion",
+    result_type="double",
+)
+
+events = (
+    spark.readStream
+    .format("delta")
+    .table("my_catalog.my_schema.silver_events")
+)
+
+scored = events.withColumn(
+    "prediction",
+    predict_udf(*[col(c) for c in feature_cols]),
+)
+
+(
+    scored.writeStream
+    .format("delta")
+    .outputMode("append")
+    .option("checkpointLocation", "dbfs:/Volumes/my_catalog/my_schema/checkpoints/scoring")
+    .toTable("my_catalog.my_schema.gold_scored_events")
+)
+```
+
+For most classic-ML batch use cases, Pattern 3 (Lakeflow SDP) is simpler. Use streaming only when event-time scoring matters.
+
+---
+
+## What NOT to do for batch inference
+
+### Do not use `ai_query` for custom UC models
+
+`ai_query('<custom-uc-model>', <input>)` requires the model to be deployed as a **Model Serving endpoint**. UC-registered models are NOT automatically behind an endpoint. Use `pyfunc.load_model` (Pattern 1) or `pyfunc.spark_udf` (Pattern 3) instead.
+
+`ai_query` IS the right call for:
+- Foundation Model API endpoints: `ai_query('databricks-dbrx-instruct', prompt)`
+- Model Serving endpoints you've explicitly provisioned
+
+See `GOTCHAS.md` #9.
+
+### Do not use `mlflow.pyfunc.load_model` for billion-row batches on a single node
+
+Pattern 1 collects to pandas — fine up to ~10k rows, painful beyond ~100k, impossible for millions. For distributed scale, use Pattern 3 (`spark_udf`).
+
+### Do not construct `spark_udf` inside the function body
+
+See `GOTCHAS.md` #11. Construct once at module scope, reuse inside `@dp.materialized_view` / `@dp.table`.
+
+---
+
+## Troubleshooting batch inference
+
+| Error | Cause | Fix |
+|-------|-------|-----|
+| `RESOURCE_DOES_NOT_EXIST` on load | Wrong registry URI or two-level name | `GOTCHAS.md` #1, #2 |
+| Predictions are NaN | Input columns in wrong order | Pass columns in the order `model.metadata.get_input_schema().input_names()` declares |
+| `PERMISSION_DENIED: EXECUTE ON MODEL` | No read access to model | `GRANT EXECUTE ON MODEL ... TO <user>` |
+| `spark_udf` raises `PicklingError` | Model has un-picklable state (e.g., Spark session) | Re-train ensuring the model is pure Python/numpy — don't capture `spark` at training time |
+| Pipeline hangs on `gold_forecast` | Model artifact is large; first load is slow | Normal — subsequent runs are fast (UDF is cached per executor) |
+| Column type mismatch in Spark | UDF expects double; column is int/string | Cast explicitly: `col("feature").cast("double")` |
diff --git a/databricks-skills/databricks-mlflow-ml/references/patterns-experiment-setup.md b/databricks-skills/databricks-mlflow-ml/references/patterns-experiment-setup.md
new file mode 100644
index 00000000..00c6e2ba
--- /dev/null
+++ b/databricks-skills/databricks-mlflow-ml/references/patterns-experiment-setup.md
@@ -0,0 +1,141 @@
+# patterns-experiment-setup
+
+Experiments in UC-enforced workspaces need more setup than older MLflow guides show. The critical change: you must pin the experiment's `artifact_location` to a Unity Catalog volume, or `log_model` will fail with storage errors.
+
+---
+
+## Pattern 1: Create experiment with UC volume artifact_location
+
+```python
+import mlflow
+
+mlflow.set_registry_uri("databricks-uc")   # always first
+
+# Prerequisite: the UC volume must exist
+# CREATE VOLUME IF NOT EXISTS my_catalog.my_schema.mlflow_artifacts;
+
+mlflow.set_experiment(
+    experiment_name="/Users/me@company.com/forecasting",
+    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting",
+)
+```
+
+**Why both are required:**
+- `experiment_name` — the workspace-visible path (browsable from the Experiments UI)
+- `artifact_location` — where logged artifacts (model binaries, plots, datasets) physically live
+
+In older workspaces, `artifact_location` defaulted to DBFS root. UC-enforced workspaces reject DBFS root writes, so `log_model` fails with opaque errors like:
+
+```
+MlflowException: API request to endpoint /api/2.0/mlflow/runs/log-artifact failed
+with error code 403 != 200. Response body: PERMISSION_DENIED ...
+```
+
+Pointing at a UC volume resolves this AND makes artifacts first-class-governed under UC lineage.
+
+---
+
+## Pattern 2: Create the volume if it doesn't exist (idempotent)
+
+Run once per schema, before any experiment creation:
+
+```python
+spark.sql(f"""
+    CREATE VOLUME IF NOT EXISTS my_catalog.my_schema.mlflow_artifacts
+    COMMENT 'MLflow experiment artifacts for forecasting models'
+""")
+```
+
+Or via SQL editor:
+
+```sql
+CREATE VOLUME IF NOT EXISTS my_catalog.my_schema.mlflow_artifacts;
+```
+
+**Permissions needed:** `USE SCHEMA` + `CREATE VOLUME`. If missing, request `CREATE VOLUME ON SCHEMA my_catalog.my_schema` from the schema owner.
+
+---
+
+## Pattern 3: Experiment already exists, wrong `artifact_location`
+
+You can't retroactively change `artifact_location`. Three options, in order of preference:
+
+**Option A — New experiment** (cleanest, keeps old runs intact):
+```python
+mlflow.set_experiment(
+    experiment_name="/Users/me@company.com/forecasting_v2",   # v2 suffix
+    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting_v2",
+)
+# New runs land in v2. Old runs stay in v1 (archive them if you like).
+```
+
+**Option B — Delete + recreate** (loses history; use only if no good runs exist):
+```python
+from mlflow import MlflowClient
+client = MlflowClient()
+
+exp = client.get_experiment_by_name("/Users/me@company.com/forecasting")
+client.delete_experiment(exp.experiment_id)
+
+mlflow.set_experiment(
+    experiment_name="/Users/me@company.com/forecasting",
+    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting",
+)
+```
+
+**Option C — Manual relocation of DBFS artifacts to UC volume**: do not do this. Storage paths are resolved at log time and encoded in the run's metadata; moving files doesn't update the pointers.
+
+---
+
+## Pattern 4: Verify experiment is correctly configured
+
+After setup, before training:
+
+```python
+exp = mlflow.get_experiment_by_name("/Users/me@company.com/forecasting")
+assert exp is not None, "Experiment not created"
+assert exp.artifact_location.startswith("dbfs:/Volumes/"), (
+    f"artifact_location is not a UC volume: {exp.artifact_location}"
+)
+print(f"Experiment ID: {exp.experiment_id}")
+print(f"Artifact location: {exp.artifact_location}")
+```
+
+If the assert fails, you have an old experiment pointing at DBFS root. Apply Pattern 3.
+
+---
+
+## Pattern 5: Workspace-path vs Repo-path experiments
+
+MLflow accepts two conventions for `experiment_name`:
+
+```python
+# Workspace-path convention (recommended for collaborative experiments)
+mlflow.set_experiment(experiment_name="/Users/me@company.com/forecasting")
+
+# Repo-path convention (only if you're running from a Git folder)
+mlflow.set_experiment(experiment_name="/Repos/me@company.com/my-repo/forecasting")
+```
+
+**Prefer workspace path** for experiments shared across pairs/teams. Repo-path experiments become orphans when the repo is deleted.
+
+**Both need `artifact_location` pointing at a UC volume.** The path convention only affects where the experiment metadata is browsable, not where artifacts live.
+
+---
+
+## Pattern 6: Running from a notebook cell with autoselected experiment
+
+Databricks notebooks auto-associate runs with an experiment matching the notebook's workspace path:
+
+```python
+# In a notebook at /Users/me@company.com/Notebooks/train.py
+# Databricks will auto-set experiment_name to the notebook path
+# BUT the default artifact_location is still DBFS root — you still need to override:
+
+mlflow.set_experiment(
+    experiment_name="/Users/me@company.com/Notebooks/train",
+    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/train",
+)
+```
+
+Or call `set_experiment` explicitly before the first `start_run` — the artifact_location fix must be applied regardless of notebook auto-association.
diff --git a/databricks-skills/databricks-mlflow-ml/references/patterns-training.md b/databricks-skills/databricks-mlflow-ml/references/patterns-training.md
new file mode 100644
index 00000000..017e3cfb
--- /dev/null
+++ b/databricks-skills/databricks-mlflow-ml/references/patterns-training.md
@@ -0,0 +1,205 @@
+# patterns-training
+
+How to log classic ML models (sklearn / XGBoost / PyTorch) so they register cleanly and load correctly downstream. The two load-bearing decisions: `signature` and `input_example`.
+
+---
+
+## Pattern 1: Baseline sklearn training loop
+
+```python
+import mlflow
+import mlflow.sklearn
+from sklearn.ensemble import GradientBoostingRegressor
+from sklearn.metrics import root_mean_squared_error, mean_absolute_error
+from sklearn.model_selection import train_test_split
+from mlflow.models import infer_signature
+
+mlflow.set_registry_uri("databricks-uc")
+mlflow.set_experiment(
+    experiment_name="/Users/me@company.com/forecasting",
+    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting",
+)
+
+X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2)
+
+with mlflow.start_run(run_name="gbr_baseline"):
+    model = GradientBoostingRegressor(n_estimators=100, max_depth=3)
+    model.fit(X_train, y_train)
+
+    # Signature + input_example are both load-bearing
+    signature = infer_signature(X_train, model.predict(X_train[:5]))
+
+    mlflow.sklearn.log_model(
+        sk_model=model,
+        artifact_path="model",
+        signature=signature,
+        input_example=X_train.iloc[:5],
+    )
+
+    # Log everything needed to reproduce
+    mlflow.log_params({"n_estimators": 100, "max_depth": 3})
+    predictions = model.predict(X_test)
+    mlflow.log_metrics({
+        "rmse": root_mean_squared_error(y_test, predictions),
+        "mae": mean_absolute_error(y_test, predictions),
+    })
+```
+
+---
+
+## Pattern 2: Preprocessing + model as a Pipeline
+
+Always log preprocessing alongside the model. See `GOTCHAS.md` #12 — inference-time preprocessing drift is the most painful post-registration bug.
+
+```python
+from sklearn.pipeline import Pipeline
+from sklearn.preprocessing import StandardScaler
+from sklearn.compose import ColumnTransformer
+
+numeric_features = ["turnover_lag_1", "turnover_lag_12", "rolling_3m_avg"]
+categorical_features = ["state", "industry"]
+
+preprocessor = ColumnTransformer([
+    ("num", StandardScaler(), numeric_features),
+    ("cat", "passthrough", categorical_features),    # handle in the model if needed
+])
+
+pipeline = Pipeline([
+    ("preprocessor", preprocessor),
+    ("model", GradientBoostingRegressor(n_estimators=100)),
+])
+
+with mlflow.start_run():
+    pipeline.fit(X_train, y_train)
+
+    signature = infer_signature(X_train, pipeline.predict(X_train[:5]))
+    mlflow.sklearn.log_model(
+        sk_model=pipeline,    # logs both preprocessor AND model as one artifact
+        artifact_path="model",
+        signature=signature,
+        input_example=X_train.iloc[:5],
+    )
+```
+
+At inference time, callers never need to know about `StandardScaler` — they pass raw features, `pyfunc.load_model` dispatches through the pipeline.
+
+---
+
+## Pattern 3: XGBoost / PyTorch — same interface, different flavor
+
+```python
+# XGBoost
+import mlflow.xgboost
+import xgboost as xgb
+
+model = xgb.XGBRegressor(n_estimators=100, max_depth=3)
+model.fit(X_train, y_train)
+
+with mlflow.start_run():
+    mlflow.xgboost.log_model(
+        xgb_model=model,
+        artifact_path="model",
+        signature=infer_signature(X_train, model.predict(X_train[:5])),
+        input_example=X_train.iloc[:5],
+    )
+
+# PyTorch
+import mlflow.pytorch
+import torch
+
+class Forecaster(torch.nn.Module):
+    ...
+
+model = Forecaster()
+# ... training loop ...
+
+with mlflow.start_run():
+    # For PyTorch, input_example must be a tensor or numpy array
+    example = X_train.iloc[:5].to_numpy()
+    mlflow.pytorch.log_model(
+        pytorch_model=model,
+        artifact_path="model",
+        signature=infer_signature(example, model(torch.tensor(example)).detach().numpy()),
+        input_example=example,
+    )
+```
+
+---
+
+## Pattern 4: Retraining — same experiment, new run
+
+Retraining for an A/B test or a scheduled refresh. Log to the same experiment; register as a new version in Workflow 2.
+
+```python
+with mlflow.start_run(run_name="gbr_v2_with_seasonality") as run:
+    model = GradientBoostingRegressor(n_estimators=200, max_depth=4)
+    model.fit(X_train_with_seasonality, y_train)
+
+    mlflow.sklearn.log_model(
+        sk_model=model,
+        artifact_path="model",
+        signature=infer_signature(X_train_with_seasonality,
+                                  model.predict(X_train_with_seasonality[:5])),
+        input_example=X_train_with_seasonality.iloc[:5],
+    )
+    # Remember the run_id for the register step
+    print(f"New run: {run.info.run_id}")
+```
+
+---
+
+## Pattern 5: Autologging (quick path for iteration)
+
+Autologging wraps `fit()` and logs params + metrics + model automatically. Convenient during experimentation; less explicit than manual logging.
+
+```python
+mlflow.sklearn.autolog(
+    log_models=True,
+    log_input_examples=True,       # IMPORTANT — otherwise no input_example is captured
+    log_model_signatures=True,     # IMPORTANT — otherwise no signature is captured
+    silent=False,
+)
+
+# Any subsequent fit() call auto-logs
+model = GradientBoostingRegressor(n_estimators=100)
+model.fit(X_train, y_train)
+# Autolog handled the MLflow calls
+```
+
+**Caveat:** autologging infers signature + input_example heuristically. For production runs, prefer manual logging (Pattern 1) — you control what gets captured.
+
+---
+
+## Pattern 6: Searching runs to pick the best one for registration
+
+Before registering, you typically want the best run from an experiment:
+
+```python
+runs = mlflow.search_runs(
+    experiment_names=["/Users/me@company.com/forecasting"],
+    filter_string="metrics.rmse < 100 AND tags.mlflow.runName LIKE 'gbr_%'",
+    order_by=["metrics.rmse ASC"],
+    max_results=1,
+)
+
+if runs.empty:
+    raise RuntimeError("No runs match criteria")
+
+best_run_id = runs.iloc[0]["run_id"]
+best_rmse = runs.iloc[0]["metrics.rmse"]
+print(f"Best run: {best_run_id} (RMSE={best_rmse:.2f})")
+
+# Now register this run's model — see patterns-uc-registration.md Pattern 1
+```
+
+---
+
+## Common logging mistakes
+
+| Mistake | Effect | Fix |
+|---------|--------|-----|
+| No `signature` | `pyfunc.load_model` works, but `.predict()` coerces wrong | Always call `infer_signature(X_train, y_hat[:5])` |
+| No `input_example` | `pyfunc.load_model` can't introspect input schema | Pass `X_train.iloc[:5]` (or `.to_numpy()[:5]` for non-pandas) |
+| `artifact_path` changes between logs | Same model name → different paths → broken load URIs | Always use `artifact_path="model"` |
+| Log preprocessing separately | Inference callers must reapply preprocessing manually | Wrap in a sklearn `Pipeline` and log the pipeline |
+| Use `pickle.dump` directly | Loses MLflow's flavor dispatch | Always use `mlflow.<flavor>.log_model` |
diff --git a/databricks-skills/databricks-mlflow-ml/references/patterns-uc-registration.md b/databricks-skills/databricks-mlflow-ml/references/patterns-uc-registration.md
new file mode 100644
index 00000000..4d8929ed
--- /dev/null
+++ b/databricks-skills/databricks-mlflow-ml/references/patterns-uc-registration.md
@@ -0,0 +1,232 @@
+# patterns-uc-registration
+
+Register a logged model to Unity Catalog, set aliases, verify, and handle promotion / rollback.
+
+---
+
+## Pattern 1: Explicit register from a specific run
+
+Cleanest workflow. Train (separate step) → pick best run → register.
+
+```python
+import mlflow
+from mlflow import MlflowClient
+
+mlflow.set_registry_uri("databricks-uc")
+
+MODEL_NAME = "my_catalog.my_schema.grocery_forecaster"
+
+# run_id from a specific training run (see patterns-training.md Pattern 6)
+run_id = "abc123def456"
+
+result = mlflow.register_model(
+    model_uri=f"runs:/{run_id}/model",
+    name=MODEL_NAME,
+    tags={
+        "trained_by": "forecasting_team",
+        "dataset_version": "2024-Q4",
+    },
+)
+print(f"Registered {MODEL_NAME} version {result.version}")
+```
+
+`result` is a `ModelVersion` object:
+- `result.name` — fully qualified three-level name
+- `result.version` — the new version (string, e.g., `"3"`)
+- `result.status` — should be `"READY"` by the time this returns
+
+---
+
+## Pattern 2: Log-and-register in one call
+
+Shorter but couples logging and registration. Use when you *know* the current run is the one worth registering.
+
+```python
+with mlflow.start_run():
+    model.fit(X_train, y_train)
+    mlflow.sklearn.log_model(
+        sk_model=model,
+        artifact_path="model",
+        signature=infer_signature(X_train, model.predict(X_train[:5])),
+        input_example=X_train.iloc[:5],
+        registered_model_name="my_catalog.my_schema.grocery_forecaster",
+    )
+    # Model is registered as a new version; you still need to set alias separately.
+```
+
+**Still need a separate alias call** — `log_model` doesn't set aliases.
+
+---
+
+## Pattern 3: Set aliases (`@champion`, `@challenger`)
+
+Aliases decouple the loader from the version. Moving `@champion` to a new version silently updates every `models:/...@champion` loader.
+
+```python
+from mlflow import MlflowClient
+client = MlflowClient()
+
+# Set or move an alias
+client.set_registered_model_alias(
+    name="my_catalog.my_schema.grocery_forecaster",
+    alias="champion",
+    version=result.version,
+)
+```
+
+**Conventions:**
+- `@champion` — the current production winner. Exactly one version at a time.
+- `@challenger` — a candidate under evaluation. Exactly one at a time.
+- Custom aliases — free-form, e.g., `@pair_team_07`, `@nightly`, `@reviewed`.
+
+**Read existing aliases:**
+```python
+model = client.get_registered_model("my_catalog.my_schema.grocery_forecaster")
+print(model.aliases)   # e.g., {"champion": "3", "challenger": "4"}
+```
+
+**Delete an alias:**
+```python
+client.delete_registered_model_alias(
+    name="my_catalog.my_schema.grocery_forecaster",
+    alias="challenger",
+)
+```
+
+---
+
+## Pattern 4: Verify registration (Navigator's V-step)
+
+Don't trust `register_model`'s success message alone. See `GOTCHAS.md` #5.
+
+### Via SQL
+
+```sql
+DESCRIBE MODEL my_catalog.my_schema.grocery_forecaster;
+```
+
+Expected output includes the model metadata and (if set) aliases. If the result is "table or view not found," the model didn't register to UC — check `set_registry_uri` (GOTCHAS #1).
+
+### Via Catalog Explorer UI
+
+1. Open Catalog Explorer
+2. Navigate to `my_catalog` → `my_schema` → **Models** tab
+3. Confirm `grocery_forecaster` appears with an `@champion` badge
+
+If the model appears under the workspace MLflow icon instead (left sidebar, under MLflow), you registered to the workspace registry. See GOTCHAS #1.
+
+### Via Python assertion (scriptable)
+
+```python
+from mlflow import MlflowClient
+client = MlflowClient()
+
+model = client.get_registered_model("my_catalog.my_schema.grocery_forecaster")
+
+# Three assertions that should always hold post-registration
+assert model is not None, "Model not registered to UC"
+assert len(model.latest_versions) > 0, "No versions exist"
+assert "champion" in model.aliases, "@champion alias not set"
+print(f"✓ {model.name} v{model.aliases['champion']} is @champion")
+```
+
+---
+
+## Pattern 5: A/B promotion — swap `@challenger` to `@champion`
+
+You've trained a new version, registered it, and validated its predictions against the current champion. Now promote:
+
+```python
+client = MlflowClient()
+MODEL_NAME = "my_catalog.my_schema.grocery_forecaster"
+
+# Get current state
+model = client.get_registered_model(MODEL_NAME)
+old_champion = model.aliases.get("champion")
+new_champion = model.aliases.get("challenger")
+
+if new_champion is None:
+    raise RuntimeError("No @challenger set — nothing to promote")
+
+# Move the alias (atomic — downstream loaders see the switch on next load)
+client.set_registered_model_alias(MODEL_NAME, "champion", new_champion)
+
+# Optional: archive the old champion version with a custom alias
+if old_champion:
+    client.set_registered_model_alias(MODEL_NAME, f"archived_{old_champion}", old_champion)
+
+# Remove the @challenger alias
+client.delete_registered_model_alias(MODEL_NAME, "challenger")
+
+print(f"Promoted v{new_champion} from @challenger to @champion (was v{old_champion})")
+```
+
+**Rollback** is the inverse — move `@champion` back to the previous version.
+
+---
+
+## Pattern 6: List all model versions
+
+Useful for lineage inspection or cleanup.
+
+```sql
+SHOW MODEL VERSIONS ON MODEL my_catalog.my_schema.grocery_forecaster;
+```
+
+Or via Python:
+```python
+from mlflow import MlflowClient
+client = MlflowClient()
+
+versions = client.search_model_versions(
+    filter_string=f"name='my_catalog.my_schema.grocery_forecaster'",
+    order_by=["version_number DESC"],
+)
+for v in versions:
+    print(f"v{v.version}: run_id={v.run_id}, status={v.status}, aliases={v.aliases}")
+```
+
+---
+
+## Pattern 7: Tags — richer metadata without new versions
+
+Tags are key-value metadata on the registered model (or a specific version). Useful for:
+- Team ownership: `set_model_version_tag(name, "1", "team", "forecasting")`
+- Dataset provenance: `set_model_version_tag(name, "1", "dataset_version", "2024-Q4")`
+- Review status: `set_model_version_tag(name, "1", "reviewed", "true")`
+
+```python
+from mlflow import MlflowClient
+client = MlflowClient()
+
+# Tag on the registered model (applies to all versions)
+client.set_registered_model_tag(
+    name="my_catalog.my_schema.grocery_forecaster",
+    key="domain",
+    value="retail",
+)
+
+# Tag on a specific version
+client.set_model_version_tag(
+    name="my_catalog.my_schema.grocery_forecaster",
+    version="3",
+    key="reviewed_by",
+    value="jane@company.com",
+)
+```
+
+Tags are queryable via `search_model_versions(filter_string="tags.reviewed = 'true'")`.
+
+---
+
+## Permission requirements
+
+| Operation | Permission needed | Granted via |
+|-----------|-------------------|-------------|
+| `register_model` (first version of a model) | `CREATE MODEL ON SCHEMA <schema>` | `GRANT CREATE MODEL ON SCHEMA ... TO ...` |
+| `register_model` (new version of existing) | `EDIT ON MODEL <model>` | Automatic for model owner; otherwise grant |
+| `set_registered_model_alias` | `EDIT ON MODEL <model>` | Same as above |
+| `get_registered_model` / `DESCRIBE MODEL` | `USE CATALOG` + `USE SCHEMA` + `EXECUTE ON MODEL` | Standard read grants |
+| `load_model` | `EXECUTE ON MODEL <model>` | `GRANT EXECUTE ON MODEL ... TO ...` |
+
+If any of these fail, request the specific grant from the schema owner. See `GOTCHAS.md` #7.
diff --git a/databricks-skills/databricks-mlflow-ml/references/user-journeys.md b/databricks-skills/databricks-mlflow-ml/references/user-journeys.md
new file mode 100644
index 00000000..a72f9106
--- /dev/null
+++ b/databricks-skills/databricks-mlflow-ml/references/user-journeys.md
@@ -0,0 +1,195 @@
+# user-journeys
+
+End-to-end workflows with decision points. Read the journey that matches your situation.
+
+---
+
+## Journey 1: First model (train → register → score) — the 90%-case
+
+Most users arrive here. Goal: a UC-registered model with a `@champion` alias, producing batch predictions.
+
+**Prerequisites:**
+- UC catalog + schema where you have `CREATE MODEL` permission
+- A UC volume for MLflow artifacts (create if missing — `patterns-experiment-setup.md` Pattern 2)
+- Features in a Spark table (Bronze → Silver → Gold already done)
+
+**Steps:**
+
+1. **Set up the experiment** (`patterns-experiment-setup.md` Pattern 1)
+   - `mlflow.set_registry_uri("databricks-uc")`
+   - `mlflow.set_experiment(experiment_name=..., artifact_location=<uc_volume_path>)`
+2. **Train + log** (`patterns-training.md` Pattern 1 or 2)
+   - Always include `signature` and `input_example`
+   - If you have preprocessing, wrap in `sklearn.Pipeline` (Pattern 2)
+3. **Register** (`patterns-uc-registration.md` Pattern 1)
+   - `mlflow.register_model(f"runs:/{run_id}/model", "catalog.schema.model")`
+4. **Set alias** (`patterns-uc-registration.md` Pattern 3)
+   - `client.set_registered_model_alias(name, "champion", version)`
+5. **Verify** (`patterns-uc-registration.md` Pattern 4)
+   - `DESCRIBE MODEL catalog.schema.model` OR Catalog Explorer UI
+6. **Load + score** (`patterns-batch-inference.md` Pattern 1 or 2)
+   - `model = mlflow.pyfunc.load_model("models:/catalog.schema.model@champion")`
+   - `model.predict(features_df)`
+
+**Done.** You have a UC-registered model with a canonical loading URI that downstream code can depend on.
+
+---
+
+## Journey 2: Retrain + promote (A/B)
+
+You already have `@champion`. You trained a new version and want to decide whether to promote it.
+
+**Prerequisites:**
+- Model exists in UC with `@champion` set (you did Journey 1)
+- New training run logged to the same experiment
+
+**Steps:**
+
+1. **Register new version** (`patterns-uc-registration.md` Pattern 1)
+   - Same `MODEL_NAME` as before — UC auto-increments version
+2. **Set `@challenger`** (`patterns-uc-registration.md` Pattern 3)
+   - `client.set_registered_model_alias(name, "challenger", new_version)`
+3. **A/B validate** (`patterns-batch-inference.md` Pattern 5)
+   - Load both aliases, score validation set, compare metrics
+4. **Decide**:
+   - Challenger wins → **Pattern 5 in `patterns-uc-registration.md`**: swap aliases
+   - Champion wins → delete `@challenger` alias, keep current `@champion`
+5. **Verify** downstream loaders picked up the new version (after swap)
+   - Any code using `models:/<name>@champion` will see the new version on next load
+
+---
+
+## Journey 3: Lakeflow SDP batch pipeline
+
+You want predictions to land in a scheduled gold table, not an ad-hoc notebook.
+
+**Prerequisites:**
+- Model registered with `@champion` (Journey 1 complete)
+- Lakeflow SDP pipeline defined (one already running is ideal)
+
+**Steps:**
+
+1. **Add a new file** to the pipeline source: `src/gold/gold_forecast.py`
+2. **Construct the UDF at module scope** (`patterns-batch-inference.md` Pattern 3)
+   - `mlflow.set_registry_uri("databricks-uc")`
+   - `predict_udf = mlflow.pyfunc.spark_udf(spark, "models:/...@champion", result_type="double")`
+3. **Define the `@dp.materialized_view`** that reads silver features, applies the UDF
+4. **Deploy + run** the pipeline
+   - `databricks bundle deploy && databricks bundle run <pipeline_name>`
+5. **Verify** the `gold_forecast` table materializes
+   - Row count matches `silver_features`
+   - Query from Genie or SQL editor
+
+**Do NOT use `ai_query`** in this pipeline — see `GOTCHAS.md` #9.
+
+---
+
+## Journey 4: Debug a registration that went to workspace registry
+
+The #1 support question. Symptoms: model doesn't appear in Catalog Explorer; URL contains `/ml/models/` instead of `/explore/data/models/`.
+
+**Steps:**
+
+1. Confirm the diagnosis:
+   - Catalog Explorer → catalog → schema → Models tab: **missing**
+   - MLflow icon (left sidebar) → Models: **present**
+   - That's the workspace registry, not UC
+2. Verify registry URI in the training session
+   - `mlflow.get_registry_uri()` — should return `"databricks-uc"`, not a workspace URI
+3. If the URI was wrong, fix it and re-register:
+   - Add `mlflow.set_registry_uri("databricks-uc")` at the top of the training code
+   - Re-run `mlflow.register_model(...)` — this creates a new entry in UC
+   - The orphaned workspace-registry entry can be deleted via MLflow UI (optional)
+4. Set the `@champion` alias on the new UC version
+5. Verify via `DESCRIBE MODEL` — see `patterns-uc-registration.md` Pattern 4
+
+---
+
+## Journey 5: Debug a `pyfunc.load_model` that fails or predicts wrong
+
+Model loaded successfully, but `.predict()` raises or produces nonsense.
+
+**Steps:**
+
+1. **Check the signature was logged:**
+   ```python
+   from mlflow.models import get_model_info
+   info = get_model_info("models:/<name>@champion")
+   print(info.signature)
+   ```
+   If `None` — see `GOTCHAS.md` #8. Re-log the model with `signature=infer_signature(...)`.
+
+2. **Check the input column order:**
+   ```python
+   expected = model.metadata.get_input_schema().input_names()
+   print(f"Model expects: {expected}")
+   print(f"You passed: {list(features_df.columns)}")
+   ```
+   If the order differs, pass `features_df[expected]`.
+
+3. **Check preprocessing coverage:**
+   - Does the training notebook call a scaler / encoder / imputer before fitting?
+   - Is that preprocessing in the logged artifact?
+   - If not — see `GOTCHAS.md` #12. Re-train with preprocessing wrapped in `sklearn.Pipeline`.
+
+4. **Check for type coercion:**
+   - Integer column becoming float (or vice versa) — fine for sklearn, sometimes breaks for xgboost/pytorch
+   - Categorical as string vs int — depends on the flavor
+   - Fix: cast `features_df` to match `model.metadata.get_input_schema()` dtypes before predicting
+
+---
+
+## Journey 6: Schema evolution — your features changed since the model was logged
+
+The silver features pipeline added a new column. Your deployed `@champion` model was trained without it. Predictions still work (extra columns are ignored), but you want to include the new feature.
+
+**Steps:**
+
+1. Retrain with the new feature:
+   ```python
+   # Same Journey 1 steps, but with expanded feature set
+   mlflow.sklearn.log_model(
+       sk_model=new_pipeline,
+       artifact_path="model",
+       signature=infer_signature(X_train_expanded, new_pipeline.predict(X_train_expanded[:5])),
+       input_example=X_train_expanded.iloc[:5],
+   )
+   ```
+2. Register as a new version
+3. Validate via A/B (Journey 2)
+4. Promote to `@champion`
+
+Schema changes are always a new version. Never mutate a logged model in place.
+
+---
+
+## Journey 7: "Everything is on fire, I have 10 minutes to demo"
+
+Someone registered a fallback model. Load it.
+
+```python
+import mlflow
+mlflow.set_registry_uri("databricks-uc")
+model = mlflow.pyfunc.load_model(
+    "models:/<fallback_catalog>.<fallback_schema>.<model>@fallback"
+)
+features = spark.table("<fallback_catalog>.<fallback_schema>.sample_features").limit(500).toPandas()
+features["prediction"] = model.predict(features)
+display(spark.createDataFrame(features))
+```
+
+Every escape-hatch pattern should pre-register a `@fallback` version for exactly this case.
+
+---
+
+## When to use which journey
+
+| Situation | Journey |
+|-----------|---------|
+| I'm starting from zero | 1 |
+| I have `@champion`, trained something new | 2 |
+| I want predictions in a scheduled table | 3 |
+| Registered but can't find in Catalog Explorer | 4 |
+| `load_model` succeeds but `predict` fails | 5 |
+| My features changed | 6 |
+| Demo in 10 minutes, nothing works | 7 |

From deb6a30a3f850245ae09bea0d7dd3e481cb5cefa Mon Sep 17 00:00:00 2001
From: David O'Keeffe <david.okeeffe1@coles.com.au>
Date: Sun, 19 Apr 2026 23:04:18 +1000
Subject: [PATCH 2/5] docs(mlflow-ml): add two gotchas from real-world test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Field-tested the skill end-to-end from a local Python environment against
a live Databricks workspace. Surfaced two gotchas not in the original set:

#12 mlflow[databricks] extras missing when running outside Databricks:
plain `pip install mlflow` omits azure-core / boto3 / google.cloud SDKs
that UC registration needs to stage artifacts. Training + log_model work;
register_model fails with opaque "No module named 'azure'". Databricks
clusters ship the extras pre-installed, so this only bites laptops / CI.

#13 artifact_path= deprecated in favour of name= (MLflow 2.16+): emits
warning on every log_model call. Non-blocking, but worth flagging since
most online tutorials + training courses still use the old param.

Both verified against the workshop's test run — skill workflow 1 now
completes cleanly with these fixes documented.
---
 .../references/GOTCHAS.md                     | 40 ++++++++++++++++++-
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md b/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md
index 92615de2..a2ab11d4 100644
--- a/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md
+++ b/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md
@@ -1,6 +1,6 @@
 # GOTCHAS — Classic ML on MLflow + Unity Catalog
 
-Twelve mistakes that silently waste hours. Read before writing any code.
+Fourteen mistakes that silently waste hours. Read before writing any code.
 
 ---
 
@@ -236,7 +236,43 @@ def gold_forecast():
 
 ---
 
-## 12. Custom preprocessing not captured in the logged model
+## 12. `mlflow[databricks]` extras missing when running outside Databricks
+
+**Symptom:** training + logging works; `register_model` fails with `MlflowException: Unable to import necessary dependencies to access model version files in Unity Catalog` — root cause `ModuleNotFoundError: No module named 'azure'` (for Azure-hosted workspaces) or `'boto3'` (AWS) / `'google.cloud'` (GCP).
+
+**Fix:** install the `databricks` extras, which pull cloud-storage SDKs MLflow needs to stage artifacts into the UC-managed location.
+
+```bash
+pip install 'mlflow[databricks]'
+# or, for a lighter install:
+pip install 'mlflow-skinny[databricks]'
+```
+
+**Why it bites:** plain `pip install mlflow` leaves out the cloud-provider SDKs because they're large and most local workflows don't need them. UC registration REQUIRES them because the registry stages artifacts into cloud-managed storage (Azure ADLS / S3 / GCS), and MLflow uses the provider's SDK for the upload. Local `log_model` works fine (artifacts go to the tracking server); registration doesn't.
+
+**When it most commonly hits:** running training scripts from a laptop, CI runner, or non-Databricks compute — anywhere that isn't a Databricks cluster (which ships the extras pre-installed).
+
+---
+
+## 13. `artifact_path=` parameter is deprecated; new name is `name=`
+
+**Symptom:** warning in logs: `WARNING mlflow.models.model: `artifact_path` is deprecated. Please use `name` instead.` Still works today; may break in a future MLflow major version.
+
+**Fix:** use `name=` instead of `artifact_path=` in `log_model` calls.
+
+```python
+# OLD (still works, warns)
+mlflow.sklearn.log_model(sk_model=model, artifact_path="model", ...)
+
+# NEW (preferred, no warning)
+mlflow.sklearn.log_model(sk_model=model, name="model", ...)
+```
+
+**Why it bites:** most online tutorials and training courses still use `artifact_path`. The rename shipped in MLflow 2.16. `name=` semantics are identical — still the within-run artifact folder. Aliases this to the preferred parameter, not a rename of what the parameter represents.
+
+---
+
+## 14. Custom preprocessing not captured in the logged model
 
 **Symptom:** in the training notebook, predictions are accurate. After `pyfunc.load_model(...)`, predictions are garbage. The pipeline works in training because you're calling `scaler.transform()` manually; at inference time, nobody calls the scaler.
 

From cf211958d34aa090f205345cc244f17941828abd Mon Sep 17 00:00:00 2001
From: David O'Keeffe <david.okeeffe1@coles.com.au>
Date: Sun, 19 Apr 2026 23:10:15 +1000
Subject: [PATCH 3/5] =?UTF-8?q?docs(mlflow-ml):=20runtime=20claim=20?=
 =?UTF-8?q?=E2=80=94=20MLflow=203.11=20on=20serverless=20compute=20v5?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Original SKILL.md didn't state a runtime target. Adds a "Runtime compatibility"
section anchored on what the skill was actually tested against — MLflow 3.11
on Lakeflow SDP serverless compute v5 — with a compat note for MLflow 2.16+
(classic DBR 15.4 LTS still ships 2.x). Points at GOTCHAS.md for the 3.x-vs-2.x
divergence (artifact_path deprecation, etc.).
---
 databricks-skills/databricks-mlflow-ml/SKILL.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/databricks-skills/databricks-mlflow-ml/SKILL.md b/databricks-skills/databricks-mlflow-ml/SKILL.md
index 43d4a2ed..cb3f7d0b 100644
--- a/databricks-skills/databricks-mlflow-ml/SKILL.md
+++ b/databricks-skills/databricks-mlflow-ml/SKILL.md
@@ -123,3 +123,7 @@ If you're training a forecasting / classification / regression model, registerin
 - [`patterns-uc-registration.md`](references/patterns-uc-registration.md) — register + alias + verify + A/B promotion
 - [`patterns-batch-inference.md`](references/patterns-batch-inference.md) — notebook (`pyfunc.load_model`) + Lakeflow (`spark_udf`) + champion-vs-challenger
 - [`user-journeys.md`](references/user-journeys.md) — end-to-end workflows with decision points
+
+## Runtime compatibility
+
+Patterns verified against **MLflow 3.11** on **Lakeflow SDP serverless compute version 5** (default at time of writing). All APIs used (`set_registry_uri`, `log_model`, `register_model`, `set_registered_model_alias`, `pyfunc.load_model`, `pyfunc.spark_udf`) are compatible with MLflow 2.16+ as well, so the patterns work on older classic Databricks Runtimes that still ship 2.x. Where 3.x behaviour diverges (e.g., `artifact_path` deprecation → use `name=`), GOTCHAS.md calls it out.

From 1a4a608d75a87488e7287f4b92ab96b9057dc8e7 Mon Sep 17 00:00:00 2001
From: David O'Keeffe <david.okeeffe1@coles.com.au>
Date: Sat, 9 May 2026 15:26:17 +1000
Subject: [PATCH 4/5] docs(mlflow-ml): densify per Quentin's audit (gpt-5.5 in
 logfood)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Quentin posted a Claude-generated audit on PR #474 specifying the
restructure. Ran gpt-5.5 in logfood with the audit as the spec.

Changes: 8 files / 1,666 lines → 3 files / 485 lines (71% reduction).

Structure:
- SKILL.md (91 lines) — frontmatter, 3-skill comparison table, hard
  rules, Quick Start, decision table for situation→recipe routing,
  read-order instruction at top, negative list ("don't read X-pattern.md
  for sklearn 101").
- references/gotchas.md (161 lines) — only Databricks/UC-specific
  failures: silently-wrong workspace registry, three-level UC names,
  artifact_location UC volume in UC-enforced workspaces, alias-on-stage
  no-op, CREATE MODEL ON SCHEMA grant, ai_query vs custom-model batch,
  spark_udf module-scope in Lakeflow SDP, mlflow[databricks] extras,
  artifact_path→name deprecation. Each entry: symptom + silent/loud +
  fix + one-sentence why.
- references/recipes.md (233 lines) — UC-specific code shapes only:
  experiment + UC volume setup, log→register→alias canonical pattern,
  Lakeflow SDP spark_udf module-scope, A/B alias swap order, verification
  one-liners.

Deleted (per Quentin's audit):
- references/CRITICAL-interfaces.md (90% plain MLflow API)
- references/GOTCHAS.md (replaced by lowercase gotchas.md, dropping the
  generic entries: alias-not-version, verify-after-register, signature
  basics, version reuse, Pipeline preprocessing — all generic MLflow /
  sklearn knowledge)
- references/user-journeys.md (pure pointer-shuffling)
- references/patterns-experiment-setup.md
- references/patterns-training.md
- references/patterns-uc-registration.md
- references/patterns-batch-inference.md

Workflow tables in SKILL.md replaced by a 6-row decision table.
Common Issues table consolidated into gotchas.md.
Reference Files list dropped — Claude can ls.

Co-authored-by: Isaac
---
 .../databricks-mlflow-ml/SKILL.md             | 136 ++++-----
 .../references/CRITICAL-interfaces.md         | 219 ---------------
 .../references/GOTCHAS.md                     | 260 ++++--------------
 .../references/patterns-batch-inference.md    | 244 ----------------
 .../references/patterns-experiment-setup.md   | 141 ----------
 .../references/patterns-training.md           | 205 --------------
 .../references/patterns-uc-registration.md    | 232 ----------------
 .../references/recipes.md                     | 233 ++++++++++++++++
 .../references/user-journeys.md               | 195 -------------
 9 files changed, 342 insertions(+), 1523 deletions(-)
 delete mode 100644 databricks-skills/databricks-mlflow-ml/references/CRITICAL-interfaces.md
 delete mode 100644 databricks-skills/databricks-mlflow-ml/references/patterns-batch-inference.md
 delete mode 100644 databricks-skills/databricks-mlflow-ml/references/patterns-experiment-setup.md
 delete mode 100644 databricks-skills/databricks-mlflow-ml/references/patterns-training.md
 delete mode 100644 databricks-skills/databricks-mlflow-ml/references/patterns-uc-registration.md
 create mode 100644 databricks-skills/databricks-mlflow-ml/references/recipes.md
 delete mode 100644 databricks-skills/databricks-mlflow-ml/references/user-journeys.md

diff --git a/databricks-skills/databricks-mlflow-ml/SKILL.md b/databricks-skills/databricks-mlflow-ml/SKILL.md
index cb3f7d0b..26286e8c 100644
--- a/databricks-skills/databricks-mlflow-ml/SKILL.md
+++ b/databricks-skills/databricks-mlflow-ml/SKILL.md
@@ -5,125 +5,87 @@ description: "Classic ML model lifecycle on Databricks with MLflow and Unity Cat
 
 # MLflow + Unity Catalog — Classic ML
 
-## Before Writing Any Code
+Read this file fully; consult `references/gotchas.md` before writing UC code; consult `references/recipes.md` only for the alias-swap and `spark_udf` patterns.
 
-1. **Read `GOTCHAS.md`** — 12 common mistakes that cause silent failures or wasted time
-2. **Read `CRITICAL-interfaces.md`** — exact API signatures and the `models:/` URI format
+If you're tempted to read `patterns-training.md`, `patterns-experiment-setup.md`, `patterns-uc-registration.md`, or `patterns-batch-inference.md` to figure out basic sklearn training, stop — you don't need them. This skill is only about the Databricks / Unity Catalog parts that are easy to miss.
 
-## End-to-End Workflows
-
-Follow the workflow that matches your goal. Each step indicates which reference files to read.
-
-### Workflow 1: Train → Register → Batch Score (most common)
-
-For building a production-shape classic ML model with UC-native lineage. Covers the full path from raw features to predictions in a downstream table.
-
-| Step | Action | Reference Files |
-|------|--------|-----------------|
-| 1 | Create experiment with UC volume artifact_location | `patterns-experiment-setup.md` (Pattern 1) |
-| 2 | Train model with signature + input_example | `patterns-training.md` (Patterns 1–3) |
-| 3 | Register to Unity Catalog with three-level name | `patterns-uc-registration.md` (Patterns 1–2) |
-| 4 | Set `@champion` alias | `patterns-uc-registration.md` (Pattern 3) |
-| 5 | Verify registration (Navigator check) | `patterns-uc-registration.md` (Pattern 4) + `GOTCHAS.md` #5 |
-| 6 | Load + score in notebook (Tier 1) | `patterns-batch-inference.md` (Patterns 1–2) |
-| 7 | Optional: Lakeflow SDP batch via `spark_udf` | `patterns-batch-inference.md` (Patterns 3–4) |
-
-### Workflow 2: Retrain + Promote (A/B pattern)
-
-For adding a new version of an already-registered model and promoting it without touching downstream loader code.
+## Why This Skill Exists
 
-| Step | Action | Reference Files |
-|------|--------|-----------------|
-| 1 | Train new version, log to same UC model name | `patterns-training.md` (Pattern 4) |
-| 2 | Register as new version | `patterns-uc-registration.md` (Pattern 2) |
-| 3 | Set `@challenger` alias | `patterns-uc-registration.md` (Pattern 3) |
-| 4 | Validate `@challenger` predictions vs `@champion` | `patterns-batch-inference.md` (Pattern 5) |
-| 5 | Swap aliases (`@challenger` → `@champion`) | `patterns-uc-registration.md` (Pattern 5) |
+Three skills in the AI Dev Kit touch MLflow; this one owns **classic ML training + UC registration + batch inference**.
 
-Downstream loader code that uses `models:/catalog.schema.model@champion` picks up the new version on next load — no code change needed.
+| Skill | Scope | MLflow API Surface |
+|-------|-------|--------------------|
+| `databricks-mlflow-evaluation` | GenAI agent evaluation | `mlflow.genai.evaluate()`, scorers, judges, traces |
+| `databricks-model-serving` | Real-time serving endpoints | Deployment APIs, endpoint management, `ai_query` |
+| `databricks-mlflow-ml` *(this skill)* | Classic ML + UC registration + batch inference | `mlflow.sklearn.log_model`, `register_model`, `set_registered_model_alias`, `pyfunc.load_model`, `pyfunc.spark_udf` |
 
-### Workflow 3: Debugging a Failed Registration or Load
+Use this skill when training forecasting / classification / regression models, registering them to Unity Catalog, and scoring them in a notebook or Lakeflow pipeline. Do not use it for GenAI evaluation or Model Serving endpoint management.
 
-For the two most common support questions: "why did my model go to workspace registry?" and "why does pyfunc.load_model fail?"
+## Hard Rules
 
-| Step | Action | Reference Files |
-|------|--------|-----------------|
-| 1 | Verify registry URI is set to `databricks-uc` | `GOTCHAS.md` #1 |
-| 2 | Verify three-level name | `GOTCHAS.md` #2 |
-| 3 | Confirm model appears in Catalog Explorer | `patterns-uc-registration.md` (Pattern 4) |
-| 4 | Check `CREATE MODEL` permissions | `GOTCHAS.md` #7 |
-| 5 | Diagnose load failures | `GOTCHAS.md` #3, #8, #11 |
+1. Call `mlflow.set_registry_uri("databricks-uc")` before registering or loading UC models.
+2. UC model names are always three-level: `catalog.schema.model_name`.
+3. Load by alias, not version: `models:/catalog.schema.model@champion`, not `models:/catalog.schema.model/3`.
+4. In UC-enforced workspaces, experiments need `artifact_location="dbfs:/Volumes/<catalog>/<schema>/<volume>/<path>"`.
+5. `register_model` creates a version; it does **not** set `@champion` or `@challenger`.
+6. Use aliases for lifecycle. Legacy stages like `Production` / `Staging` are deprecated for UC models.
 
 ## Quick Start
 
-The minimum viable path from untrained model to UC-registered, notebook-scored:
+Minimum viable path from trained model object to UC-registered, notebook-scored model:
 
 ```python
 import mlflow
-from mlflow.models import infer_signature
+import mlflow.sklearn
 from mlflow import MlflowClient
+from mlflow.models import infer_signature
+
+CATALOG = "my_catalog"
+SCHEMA = "my_schema"
+MODEL_NAME = f"{CATALOG}.{SCHEMA}.my_model"
 
-# 1. Configure: UC registry + UC volume for artifacts (both required)
+# 1. Configure UC registry + UC volume-backed experiment.
 mlflow.set_registry_uri("databricks-uc")
 mlflow.set_experiment(
     experiment_name="/Users/me@company.com/forecasting",
-    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting",
+    artifact_location=f"dbfs:/Volumes/{CATALOG}/{SCHEMA}/mlflow_artifacts/forecasting",
 )
 
-# 2. Train + log
+# 2. Train + log. Use name="model" in MLflow 3.x; artifact_path="model" only for older code.
 with mlflow.start_run() as run:
     model.fit(X_train, y_train)
     signature = infer_signature(X_train, model.predict(X_train[:5]))
+
     mlflow.sklearn.log_model(
-        sk_model=model,
-        artifact_path="model",
+        sk_model=model,                  # log the full Pipeline if preprocessing exists
+        name="model",
         signature=signature,
         input_example=X_train.iloc[:5],
     )
 
-# 3. Register + alias
-MODEL_NAME = "my_catalog.my_schema.my_model"
-result = mlflow.register_model(f"runs:/{run.info.run_id}/model", MODEL_NAME)
+# 3. Register + set alias. register_model returns a ModelVersion; alias is a separate call.
+result = mlflow.register_model(
+    model_uri=f"runs:/{run.info.run_id}/model",
+    name=MODEL_NAME,
+)
 MlflowClient().set_registered_model_alias(MODEL_NAME, "champion", result.version)
 
-# 4. Load + predict (in any notebook, anywhere)
-model = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}@champion")
-predictions = model.predict(X_test)
+# 4. Load by alias, never by hard-coded version.
+loaded = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}@champion")
+predictions = loaded.predict(X_test)
 ```
 
-## Why This Skill Exists
-
-Three skills in the AI Dev Kit touch MLflow; this one owns **classic ML training + UC registration + batch inference**. The distinction matters because the APIs diverged:
-
-| Skill | Scope | MLflow API Surface |
-|-------|-------|--------------------|
-| `databricks-mlflow-evaluation` | GenAI agent evaluation | `mlflow.genai.evaluate()`, scorers, judges, traces |
-| `databricks-model-serving` | Real-time serving endpoints | Deployment APIs, endpoint management, `ai_query` |
-| `databricks-mlflow-ml` *(this skill)* | Classic ML + UC registration + batch inference | `mlflow.sklearn.log_model`, `register_model`, `set_registered_model_alias`, `pyfunc.load_model`, `pyfunc.spark_udf` |
-
-If you're training a forecasting / classification / regression model, registering it to UC, and scoring it in a notebook or Lakeflow pipeline — this skill. If you're evaluating an LLM agent's output quality — evaluation skill. If you're exposing a model behind an HTTP endpoint — model-serving skill.
-
-## Common Issues
-
-| Issue | Solution |
-|-------|----------|
-| **Model registered but not visible in Catalog Explorer** | Missing `mlflow.set_registry_uri("databricks-uc")`. See `GOTCHAS.md` #1. |
-| **`RestException: INVALID_PARAMETER_VALUE` on `register_model`** | Two-level name used. UC requires `catalog.schema.name`. See `GOTCHAS.md` #2. |
-| **Experiment creation fails with storage errors** | Missing `artifact_location` pointing at a UC volume. See `GOTCHAS.md` #4. |
-| **`PERMISSION_DENIED: CREATE MODEL`** | Pair/user needs `CREATE MODEL ON SCHEMA <schema>`. See `GOTCHAS.md` #7. |
-| **`pyfunc.load_model` returns but `predict()` fails** | Signature wasn't logged; inputs don't coerce. See `GOTCHAS.md` #8. |
-| **Agent proposes `ai_query` for batch inference** | Wrong primitive — that requires a serving endpoint. Use `pyfunc.load_model` or `spark_udf`. See `GOTCHAS.md` #9. |
-
-## Reference Files
+## Decision Table
 
-- [`GOTCHAS.md`](references/GOTCHAS.md) — 12 common mistakes + fixes
-- [`CRITICAL-interfaces.md`](references/CRITICAL-interfaces.md) — API signatures + `models:/` URI format
-- [`patterns-experiment-setup.md`](references/patterns-experiment-setup.md) — experiment creation with UC volume artifact_location
-- [`patterns-training.md`](references/patterns-training.md) — logging models with signature + input_example + autologging
-- [`patterns-uc-registration.md`](references/patterns-uc-registration.md) — register + alias + verify + A/B promotion
-- [`patterns-batch-inference.md`](references/patterns-batch-inference.md) — notebook (`pyfunc.load_model`) + Lakeflow (`spark_udf`) + champion-vs-challenger
-- [`user-journeys.md`](references/user-journeys.md) — end-to-end workflows with decision points
+| Situation | Do this |
+|-----------|---------|
+| Starting a first UC-registered classic ML model | Quick Start, then `recipes.md` §1–2; check `gotchas.md` #1, #2, #4, #7 |
+| Model registered but missing from Catalog Explorer | Diagnose `set_registry_uri` and three-level names in `gotchas.md` #1–2 |
+| Need notebook batch scoring | Use `mlflow.pyfunc.load_model("models:/catalog.schema.model@champion")`; keep the alias rule above |
+| Need scheduled / distributed batch scoring in Lakeflow SDP | Use `recipes.md` §3 and `gotchas.md` #11; construct `spark_udf` at module scope |
+| Retrained a challenger and need promotion | Use `recipes.md` §4 exactly; delete old `@champion` before setting new `@champion` |
+| Load or predict behaves oddly | Use `recipes.md` §5 for `get_model_info` / signature checks, then `gotchas.md` for UC-specific failures |
 
-## Runtime compatibility
+## Runtime Compatibility
 
-Patterns verified against **MLflow 3.11** on **Lakeflow SDP serverless compute version 5** (default at time of writing). All APIs used (`set_registry_uri`, `log_model`, `register_model`, `set_registered_model_alias`, `pyfunc.load_model`, `pyfunc.spark_udf`) are compatible with MLflow 2.16+ as well, so the patterns work on older classic Databricks Runtimes that still ship 2.x. Where 3.x behaviour diverges (e.g., `artifact_path` deprecation → use `name=`), GOTCHAS.md calls it out.
+MLflow 3.x prefers `name=` in `log_model`; MLflow 2.x examples often use `artifact_path=`, which works but warns in newer versions. UC model stages are deprecated across modern Databricks runtimes; use aliases.
diff --git a/databricks-skills/databricks-mlflow-ml/references/CRITICAL-interfaces.md b/databricks-skills/databricks-mlflow-ml/references/CRITICAL-interfaces.md
deleted file mode 100644
index a40483c5..00000000
--- a/databricks-skills/databricks-mlflow-ml/references/CRITICAL-interfaces.md
+++ /dev/null
@@ -1,219 +0,0 @@
-# CRITICAL-interfaces — Exact API signatures
-
-The minimum set of APIs that every classic-ML + UC workflow touches. Copy-pasteable, with the exact arguments that matter.
-
----
-
-## Registry URI configuration
-
-```python
-mlflow.set_registry_uri("databricks-uc")    # Call at the start of every session
-mlflow.get_registry_uri()                    # Returns "databricks-uc" if set correctly
-```
-
-**Must be called BEFORE** any `register_model` or `load_model` call. Idempotent to repeat.
-
----
-
-## Experiment creation with UC volume artifact_location
-
-```python
-mlflow.set_experiment(
-    experiment_name="/Users/<email>/<experiment_name>",
-    artifact_location="dbfs:/Volumes/<catalog>/<schema>/<volume>/<path>",
-)
-```
-
-**`artifact_location` is required** for UC-enforced workspaces. The volume must exist:
-
-```sql
-CREATE VOLUME IF NOT EXISTS <catalog>.<schema>.<volume>;
-```
-
----
-
-## `models:/` URI format
-
-All load / deploy / spark_udf calls use this URI. **One format to memorize:**
-
-```
-models:/<catalog>.<schema>.<model_name>@<alias>
-```
-
-Examples:
-```
-models:/my_catalog.my_schema.grocery_forecaster@champion
-models:/my_catalog.my_schema.grocery_forecaster@challenger
-```
-
-**Avoid** these forms (either legacy, or not-UC-native):
-```
-models:/grocery_forecaster/3                  # workspace registry, version number
-models:/my_schema.grocery_forecaster/3        # invalid in UC
-```
-
----
-
-## Model logging (sklearn-flavored)
-
-```python
-mlflow.sklearn.log_model(
-    sk_model=<fitted_estimator_or_pipeline>,
-    artifact_path="model",                    # convention — keep as "model"
-    signature=<Signature>,                    # REQUIRED — use infer_signature()
-    input_example=<pandas_DataFrame>,         # REQUIRED — 5 real rows
-    registered_model_name=None,               # leave None; register separately (cleaner)
-    code_paths=<optional_list_of_dependency_files>,
-    extra_pip_requirements=<optional_list>,   # only if custom deps beyond environment
-)
-```
-
-**Signature inference:**
-```python
-from mlflow.models import infer_signature
-signature = infer_signature(X_train, model.predict(X_train[:5]))
-```
-
-**Other flavors with identical signature:**
-- `mlflow.xgboost.log_model(xgb_model=..., ...)`
-- `mlflow.pytorch.log_model(pytorch_model=..., ...)`
-- `mlflow.tensorflow.log_model(model=..., ...)`
-- `mlflow.pyfunc.log_model(python_model=..., artifact_path=..., ...)` — for custom PythonModel wrappers
-
----
-
-## Explicit registration
-
-```python
-result = mlflow.register_model(
-    model_uri=f"runs:/{run_id}/model",        # "runs:/<run_id>/<artifact_path>"
-    name="<catalog>.<schema>.<model_name>",   # three-level, not optional
-    tags=<optional_dict>,
-)
-# result.name: str — fully qualified name
-# result.version: str — newly-created version (e.g., "1", "2")
-```
-
----
-
-## Alias management
-
-```python
-from mlflow import MlflowClient
-client = MlflowClient()
-
-# Set (creates if missing, moves if exists)
-client.set_registered_model_alias(
-    name="<catalog>.<schema>.<model_name>",
-    alias="champion",                         # or "challenger", or custom
-    version="<version_number>",                # accepts str or int
-)
-
-# Get current alias mapping
-model = client.get_registered_model("<catalog>.<schema>.<model_name>")
-print(model.aliases)   # {"champion": "3", "challenger": "4"}
-
-# Delete
-client.delete_registered_model_alias(
-    name="<catalog>.<schema>.<model_name>",
-    alias="challenger",
-)
-```
-
----
-
-## Loading — notebook / single-node
-
-```python
-model = mlflow.pyfunc.load_model(
-    model_uri="models:/<catalog>.<schema>.<model_name>@champion",
-)
-
-# Predict on a pandas DataFrame matching the signature
-predictions = model.predict(features_df)
-```
-
-**Returns:** `mlflow.pyfunc.PyFuncModel`, regardless of the original flavor. Expose `.metadata.signature` for schema.
-
----
-
-## Loading — distributed / Lakeflow SDP
-
-```python
-predict_udf = mlflow.pyfunc.spark_udf(
-    spark,
-    model_uri="models:/<catalog>.<schema>.<model_name>@champion",
-    result_type="double",                     # or "array<double>" for multi-output
-    env_manager="local",                      # "local" | "virtualenv" | "conda"
-)
-
-# Apply to a Spark DataFrame
-df_with_predictions = df.withColumn(
-    "prediction",
-    predict_udf("feature_a", "feature_b", "feature_c"),
-)
-```
-
-**Construct ONCE at module scope** in Lakeflow pipelines. See `GOTCHAS.md` #11.
-
----
-
-## Model introspection
-
-```python
-from mlflow.models import get_model_info
-
-info = get_model_info("models:/<catalog>.<schema>.<model_name>@champion")
-info.signature               # ModelSignature with inputs/outputs
-info.flavors                 # {"sklearn": {...}, "python_function": {...}}
-info.utc_time_created
-info.model_uuid
-```
-
-Useful when debugging load-vs-predict mismatches.
-
----
-
-## Run + experiment queries (introspection)
-
-```python
-runs = mlflow.search_runs(
-    experiment_names=["/Users/me@company.com/forecasting"],
-    filter_string="metrics.r2 > 0.8",
-    order_by=["metrics.r2 DESC"],
-    max_results=5,
-)
-# Returns a pandas DataFrame with run_id, metrics, params, etc.
-
-best_run_id = runs.iloc[0]["run_id"]
-```
-
----
-
-## SQL introspection (UC-native)
-
-```sql
--- Does the model exist and which aliases are set?
-DESCRIBE MODEL <catalog>.<schema>.<model_name>;
-
--- List all model versions
-SHOW MODEL VERSIONS ON MODEL <catalog>.<schema>.<model_name>;
-
--- Check grants
-SHOW GRANTS ON MODEL <catalog>.<schema>.<model_name>;
-SHOW GRANTS ON SCHEMA <catalog>.<schema>;
-```
-
----
-
-## What's NOT in this skill
-
-If you see these in code, you're likely in the wrong skill:
-
-| API | Belongs in |
-|-----|------------|
-| `mlflow.genai.evaluate(...)` | `databricks-mlflow-evaluation` |
-| `@scorer` decorator, `GuidelinesJudge`, etc. | `databricks-mlflow-evaluation` |
-| `databricks.sdk.service.serving.EndpointCoreConfigInput` | `databricks-model-serving` |
-| `ai_query('<custom-uc-model>', ...)` | Wrong pattern — use `pyfunc.load_model` or `spark_udf` instead (see `GOTCHAS.md` #9) |
-| `transition_model_version_stage(...)` | Deprecated — use aliases (see `GOTCHAS.md` #6) |
diff --git a/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md b/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md
index a2ab11d4..586b8ce6 100644
--- a/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md
+++ b/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md
@@ -1,301 +1,161 @@
-# GOTCHAS — Classic ML on MLflow + Unity Catalog
+# Databricks / Unity Catalog Gotchas
 
-Fourteen mistakes that silently waste hours. Read before writing any code.
+Only the Databricks + Unity Catalog-specific failures are here. Generic MLflow, sklearn, and modeling advice intentionally lives elsewhere.
 
----
-
-## 1. Missing `mlflow.set_registry_uri("databricks-uc")` → workspace registry
-
-**Symptom:** `register_model` succeeds, but the model doesn't appear in Catalog Explorer. It's in the legacy **workspace registry** (visible under the MLflow icon in the left nav), not Unity Catalog.
-
-**Fix:**
-```python
-import mlflow
-mlflow.set_registry_uri("databricks-uc")   # MUST come before register_model / load_model
-```
+## Runtime Gotcha Matrix
 
-**Verification:**
-```python
-assert mlflow.get_registry_uri() == "databricks-uc"
-```
-
-**Why it bites:** defaults still route to the workspace registry for backward compatibility. The only indicator you missed it is a URL that shows `/ml/models/<name>` instead of `/explore/data/models/<catalog>/<schema>/<name>`.
+| Area | MLflow 2.x | MLflow 3.x / newer Databricks guidance |
+|------|------------|-----------------------------------------|
+| Model artifact argument | `artifact_path="model"` is common | Prefer `name="model"`; `artifact_path` warns and may disappear later |
+| UC lifecycle | Stages already deprecated for UC | Use aliases only: `@champion`, `@challenger`, custom aliases |
+| Registry target | Workspace registry remains default unless changed | Still call `mlflow.set_registry_uri("databricks-uc")` explicitly |
 
 ---
 
-## 2. Two-level model names → rejected or wrong registry
+## 1. Missing `mlflow.set_registry_uri("databricks-uc")`
 
-**Symptom:** `RestException: INVALID_PARAMETER_VALUE: Invalid model name`, or the model registers to the workspace registry silently.
+**How it fails:** Silent. `register_model` succeeds, but the model lands in the legacy workspace registry, not Unity Catalog; Catalog Explorer cannot find it.
 
-**Fix:** always use three-level names: `catalog.schema.model_name`.
+**Fix:** call this before any register or load:
 
 ```python
-# WRONG
-mlflow.register_model(model_uri, "my_model")
-mlflow.register_model(model_uri, "my_schema.my_model")
-
-# CORRECT
-mlflow.register_model(model_uri, "my_catalog.my_schema.my_model")
+mlflow.set_registry_uri("databricks-uc")
+assert mlflow.get_registry_uri() == "databricks-uc"
 ```
 
-**Why it bites:** the error message depends on the registry URI. With UC URI + two-level name → parameter error. With workspace URI + two-level name → registers successfully to workspace (the silently-wrong case).
+**Why:** MLflow keeps workspace-registry defaults for backward compatibility, so the API call can succeed in the wrong registry.
 
 ---
 
-## 3. Loading with version number instead of alias
+## 2. Not using a three-level UC model name
 
-**Symptom:** works today, breaks tomorrow when someone registers a new version. You've hard-coded a version number into every downstream consumer.
+**How it fails:** Loud with UC registry (`INVALID_PARAMETER_VALUE`), but silent-wrong if you also forgot `set_registry_uri`: two-level names can register to the workspace registry.
 
-**Fix:** load via alias, never version.
+**Fix:** always use `catalog.schema.model_name`.
 
 ```python
-# FRAGILE — every retrain requires updating every loader
-model = mlflow.pyfunc.load_model("models:/my_catalog.my_schema.my_model/3")
+# Wrong
+"my_model"
+"my_schema.my_model"
 
-# STABLE — promote a new version by moving @champion; no loader changes
-model = mlflow.pyfunc.load_model("models:/my_catalog.my_schema.my_model@champion")
+# Correct
+"my_catalog.my_schema.my_model"
 ```
 
-**Why it bites:** aliases are the UC-native way to decouple loader code from model lifecycle. Version numbers are legacy. New infrastructure (Lakeflow, Genie) assumes alias-based loading.
+**Why:** Unity Catalog models are securable objects under a catalog and schema; workspace-registry names are not.
 
 ---
 
-## 4. Experiment creation without UC volume `artifact_location`
+## 3. Experiment artifact location is not a UC volume
 
-**Symptom:** experiment creates, but any `log_model` call fails with storage / permission errors. Or artifacts land in DBFS root (deprecated) and can't be loaded downstream.
+**How it fails:** Usually loud later, not at setup: `log_model` or artifact upload fails with storage / permission errors. In older patterns, artifacts may silently land in DBFS root, which breaks UC governance expectations.
 
-**Fix:** when you create the experiment, pin it to a UC volume.
+**Fix:** set a UC volume-backed artifact location when creating the experiment.
 
 ```python
-# Prerequisite: the UC volume must exist
-# CREATE VOLUME my_catalog.my_schema.mlflow_artifacts;
-
 mlflow.set_experiment(
     experiment_name="/Users/me@company.com/forecasting",
     artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting",
 )
 ```
 
-**Why it bites:** the default `artifact_location` used to be DBFS root. Unity-Catalog-enforced workspaces reject DBFS root writes, so `log_model` fails with opaque errors. Pointing at a UC volume makes artifact storage first-class-governed and keeps lineage intact.
-
-**When the experiment already exists without a UC volume:** you can't retroactively change `artifact_location`. Either (a) delete + recreate, or (b) create a new experiment. Don't try to relocate artifacts manually.
-
----
-
-## 5. Trusting `register_model` success without verifying in UC
-
-**Symptom:** `register_model` returns a `ModelVersion` object. Feels successful. But the model is in workspace registry, or the version number is stale, or an alias wasn't set.
-
-**Fix:** always verify explicitly.
-
-```sql
--- In a SQL cell or notebook:
-DESCRIBE MODEL my_catalog.my_schema.my_model;
-```
-
-Or via Python:
-```python
-from mlflow import MlflowClient
-model = MlflowClient().get_registered_model("my_catalog.my_schema.my_model")
-assert "champion" in model.aliases, "Missing @champion alias"
-```
-
-Or visually: open Catalog Explorer → `my_catalog` → `my_schema` → **Models** tab. If the model is under MLflow's workspace UI instead, you registered to the wrong place (see #1).
-
-**Why it bites:** `register_model`'s return value only tells you a version was created. It doesn't tell you *where* or *with what aliases*. The Navigator's V-step in pair programming: verify before trusting.
+**Why:** UC-enforced workspaces reject unmanaged DBFS-root artifact writes; UC volumes keep model artifacts governed and loadable.
 
 ---
 
-## 6. Setting the alias to `"production"` or `"staging"` (legacy MLflow stages)
+## 4. Using legacy `Production` / `Staging` stages
 
-**Symptom:** you remember MLflow had `stage="Production"` / `"Staging"` transitions. You try the same with aliases and nothing recognizes them.
+**How it fails:** Silent or misleading. Stage APIs such as `transition_model_version_stage()` are deprecated / ineffective for UC models; aliases named `"Production"` may exist as labels but are not treated as lifecycle stages.
 
-**Fix:** UC model aliases are free-form labels. The conventions are `@champion` (current winner) and `@challenger` (under evaluation). MLflow stages are deprecated in the UC registry.
+**Fix:** use UC aliases by convention:
 
 ```python
-# WRONG (legacy stage concept)
-MlflowClient().set_registered_model_alias(name, "Production", version)
-
-# CORRECT
 MlflowClient().set_registered_model_alias(name, "champion", version)
+MlflowClient().set_registered_model_alias(name, "challenger", version)
 ```
 
-**Why it bites:** the old `transition_model_version_stage()` API still exists but is a no-op on UC-registered models. No error, no effect.
+**Why:** Unity Catalog model lifecycle moved from stages to free-form aliases; downstream loaders should use `models:/name@champion`.
 
 ---
 
-## 7. Missing `CREATE MODEL ON SCHEMA` permission
+## 5. Missing `CREATE MODEL ON SCHEMA`
 
-**Symptom:** `RestException: PERMISSION_DENIED: User ... does not have CREATE MODEL permission`.
+**How it fails:** Loud. `register_model` raises `PERMISSION_DENIED: User ... does not have CREATE MODEL permission`.
 
-**Fix:** grant the permission at the schema level.
+**Fix:** ask the schema owner for the schema-level model-creation grant.
 
 ```sql
 GRANT CREATE MODEL ON SCHEMA my_catalog.my_schema TO `user@company.com`;
--- Or for a group:
-GRANT CREATE MODEL ON SCHEMA my_catalog.my_schema TO `data-science-team`;
-```
-
-**Why it bites:** workspace admins often assume `USE SCHEMA` covers model registration. It doesn't — `CREATE MODEL` is a separate UC privilege that must be granted explicitly.
-
-**Verification:**
-```sql
 SHOW GRANTS ON SCHEMA my_catalog.my_schema;
 ```
 
----
-
-## 8. Logging a model without `signature` or `input_example`
-
-**Symptom:** `mlflow.pyfunc.load_model(...)` returns an object, but `.predict(spark_df)` raises cryptic coercion errors. Or predictions silently cast (int → float, string → category) and produce wrong numbers.
-
-**Fix:** always log both.
-
-```python
-from mlflow.models import infer_signature
-
-signature = infer_signature(X_train, model.predict(X_train[:5]))
-mlflow.sklearn.log_model(
-    sk_model=model,
-    artifact_path="model",
-    signature=signature,
-    input_example=X_train.iloc[:5],   # 5 real rows for the pyfunc wrapper to introspect
-)
-```
-
-**Why it bites:** without a signature, the pyfunc wrapper can't coerce inputs — it accepts whatever you pass, then downstream operations (especially `spark_udf`) fail or produce wrong results. `input_example` is what `pyfunc.load_model` reads to build the wrapper's input coercer.
-
----
-
-## 9. `ai_query` used for batch inference on a custom UC model
-
-**Symptom:** you want batch inference on your custom-registered model. You see `ai_query()` in Genie docs and assume it works. It doesn't (for custom models) — `ai_query` only invokes **serving endpoints**, and your UC-registered model isn't behind one unless you deployed a serving endpoint for it.
-
-**Fix:** for batch inference, use `pyfunc.load_model` (notebook) or `pyfunc.spark_udf` (Lakeflow SDP pipeline).
-
-```python
-# WRONG for custom UC models — requires a serving endpoint
-spark.sql(f"SELECT ai_query('{MODEL_NAME}', features) FROM silver_features")
-
-# CORRECT — notebook batch (single node)
-model = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}@champion")
-predictions = model.predict(features_pandas_df)
-
-# CORRECT — Lakeflow SDP batch (distributed)
-predict_udf = mlflow.pyfunc.spark_udf(spark, f"models:/{MODEL_NAME}@champion", result_type="double")
-silver_features.withColumn("prediction", predict_udf(*feature_cols))
-```
-
-**Why it bites:** `ai_query` *is* the right call for Foundation Model API endpoints (`ai_query('databricks-dbrx-instruct', prompt)`). The naming overlap leads to wrong assumptions for custom models.
+**Why:** `USE CATALOG` and `USE SCHEMA` are not enough; model creation is a separate UC privilege.
 
 ---
 
-## 10. Trying to delete / re-register a model at the same version number
+## 6. Assuming `ai_query` is batch inference for custom UC models
 
-**Symptom:** `RestException: ALREADY_EXISTS` when re-registering. You can't reuse version numbers.
+**How it fails:** Loud or wrong-primitive. `ai_query` calls serving endpoints; a UC-registered custom model is not automatically a serving endpoint.
 
-**Fix:** UC versions are monotonically-increasing and immutable. To supersede a bad version, register a new version and move `@champion` to it. The old version stays in history for lineage.
+**Fix:** for batch inference, use:
 
 ```python
-new_result = mlflow.register_model(new_run_uri, MODEL_NAME)
-MlflowClient().set_registered_model_alias(MODEL_NAME, "champion", new_result.version)
-# Old version is still there; that's correct. Lineage preserved.
+mlflow.pyfunc.load_model("models:/catalog.schema.model@champion")   # notebook / pandas path
+mlflow.pyfunc.spark_udf(spark, "models:/catalog.schema.model@champion", result_type="double")
 ```
 
-**Why it bites:** habits from the workspace registry (where deletion was forgiving) don't transfer. UC treats model versions as first-class auditable artifacts.
+**Why:** registration and serving are separate. `ai_query` belongs to Model Serving / Foundation Model endpoint workflows, not ordinary UC batch scoring.
 
 ---
 
-## 11. `pyfunc.spark_udf` constructed inside a function call
+## 7. Constructing `spark_udf` inside a Lakeflow SDP function
 
-**Symptom:** in a Lakeflow SDP `@dp.materialized_view`, the UDF is constructed every time the view evaluates — slow and sometimes fails with serialization errors.
+**How it fails:** Often loud and slow: repeated model deserialization, serialization errors, or pipeline refreshes that hang / retry. Sometimes just silently expensive.
 
-**Fix:** construct the UDF at module scope, reuse it inside the view.
+**Fix:** construct the UDF once at module scope and call it inside `@dp.table` / `@dp.materialized_view`.
 
 ```python
-import mlflow
-import databricks.declarative_pipelines as dp
-
-# Construct ONCE, at module scope
 mlflow.set_registry_uri("databricks-uc")
 predict_udf = mlflow.pyfunc.spark_udf(
     spark,
-    f"models:/{MODEL_NAME}@champion",
+    "models:/catalog.schema.model@champion",
     result_type="double",
 )
-
-@dp.materialized_view
-def gold_forecast():
-    return spark.read.table("silver_features").withColumn(
-        "prediction",
-        predict_udf("feat_a", "feat_b", "feat_c"),
-    )
 ```
 
-**Why it bites:** Lakeflow SDP may evaluate the function definition multiple times. Model deserialization is expensive — don't repeat it.
+**Why:** Lakeflow SDP can evaluate dataset functions repeatedly; model loading belongs at module import time, not inside the dataset function body.
 
 ---
 
-## 12. `mlflow[databricks]` extras missing when running outside Databricks
+## 8. Missing `mlflow[databricks]` extras outside Databricks compute
 
-**Symptom:** training + logging works; `register_model` fails with `MlflowException: Unable to import necessary dependencies to access model version files in Unity Catalog` — root cause `ModuleNotFoundError: No module named 'azure'` (for Azure-hosted workspaces) or `'boto3'` (AWS) / `'google.cloud'` (GCP).
+**How it fails:** Loud. Local laptop / CI / non-Databricks jobs may train and log, then fail on UC registration with missing cloud SDK imports such as `azure`, `boto3`, or `google.cloud`.
 
-**Fix:** install the `databricks` extras, which pull cloud-storage SDKs MLflow needs to stage artifacts into the UC-managed location.
+**Fix:**
 
 ```bash
 pip install 'mlflow[databricks]'
-# or, for a lighter install:
+# or
 pip install 'mlflow-skinny[databricks]'
 ```
 
-**Why it bites:** plain `pip install mlflow` leaves out the cloud-provider SDKs because they're large and most local workflows don't need them. UC registration REQUIRES them because the registry stages artifacts into cloud-managed storage (Azure ADLS / S3 / GCS), and MLflow uses the provider's SDK for the upload. Local `log_model` works fine (artifacts go to the tracking server); registration doesn't.
-
-**When it most commonly hits:** running training scripts from a laptop, CI runner, or non-Databricks compute — anywhere that isn't a Databricks cluster (which ships the extras pre-installed).
+**Why:** UC registration stages artifacts through cloud-managed storage; the Databricks extras include the provider SDKs that plain `mlflow` may omit.
 
 ---
 
-## 13. `artifact_path=` parameter is deprecated; new name is `name=`
+## 9. Using deprecated `artifact_path=` instead of `name=`
 
-**Symptom:** warning in logs: `WARNING mlflow.models.model: `artifact_path` is deprecated. Please use `name` instead.` Still works today; may break in a future MLflow major version.
+**How it fails:** Noisy now, possibly loud later. Newer MLflow warns that `artifact_path` is deprecated; future major versions may remove it.
 
-**Fix:** use `name=` instead of `artifact_path=` in `log_model` calls.
+**Fix:** prefer:
 
 ```python
-# OLD (still works, warns)
-mlflow.sklearn.log_model(sk_model=model, artifact_path="model", ...)
-
-# NEW (preferred, no warning)
-mlflow.sklearn.log_model(sk_model=model, name="model", ...)
-```
-
-**Why it bites:** most online tutorials and training courses still use `artifact_path`. The rename shipped in MLflow 2.16. `name=` semantics are identical — still the within-run artifact folder. Aliases this to the preferred parameter, not a rename of what the parameter represents.
-
----
-
-## 14. Custom preprocessing not captured in the logged model
-
-**Symptom:** in the training notebook, predictions are accurate. After `pyfunc.load_model(...)`, predictions are garbage. The pipeline works in training because you're calling `scaler.transform()` manually; at inference time, nobody calls the scaler.
-
-**Fix:** wrap preprocessing + model in an `sklearn.pipeline.Pipeline` (or a custom `PythonModel` for non-sklearn preprocessing). Log the whole pipeline.
-
-```python
-from sklearn.pipeline import Pipeline
-from sklearn.preprocessing import StandardScaler
-from sklearn.ensemble import GradientBoostingRegressor
-
-pipeline = Pipeline([
-    ("scaler", StandardScaler()),
-    ("model", GradientBoostingRegressor()),
-])
-pipeline.fit(X_train, y_train)
-
-# Logs both the fitted scaler AND the model as a single artifact
 mlflow.sklearn.log_model(
-    sk_model=pipeline,
-    artifact_path="model",
-    signature=infer_signature(X_train, pipeline.predict(X_train[:5])),
-    input_example=X_train.iloc[:5],
+    sk_model=model,
+    name="model",
+    signature=signature,
+    input_example=input_example,
 )
 ```
 
-**Why it bites:** the most painful post-registration bug. Training and inference code paths are different files; the divergence is invisible until predictions are obviously wrong.
+**Why:** MLflow renamed the within-run model artifact argument; the value still becomes the path used by `runs:/<run_id>/model`.
diff --git a/databricks-skills/databricks-mlflow-ml/references/patterns-batch-inference.md b/databricks-skills/databricks-mlflow-ml/references/patterns-batch-inference.md
deleted file mode 100644
index ed4d86ae..00000000
--- a/databricks-skills/databricks-mlflow-ml/references/patterns-batch-inference.md
+++ /dev/null
@@ -1,244 +0,0 @@
-# patterns-batch-inference
-
-Loading a UC-registered model and scoring features in batch. Two scales — interactive notebook (Pattern 1–2) and distributed Lakeflow pipeline (Patterns 3–4). Plus A/B validation (Pattern 5).
-
----
-
-## Pattern 1: Notebook batch inference — pandas path
-
-For interactive exploration, ad-hoc scoring, and sample sizes up to ~10k rows.
-
-```python
-import mlflow
-
-mlflow.set_registry_uri("databricks-uc")
-
-model = mlflow.pyfunc.load_model(
-    "models:/my_catalog.my_schema.grocery_forecaster@champion"
-)
-
-# Load a sample of features (LIMIT in SQL to avoid loading full table)
-features = (
-    spark.table("my_catalog.my_schema.silver_features")
-    .orderBy("month_date")
-    .limit(1000)
-    .toPandas()
-)
-
-# The model's signature determines which columns it expects
-feature_cols = model.metadata.get_input_schema().input_names()
-
-predictions = model.predict(features[feature_cols])
-
-# Attach predictions for display/export
-features["prediction"] = predictions
-display(spark.createDataFrame(features))
-```
-
----
-
-## Pattern 2: Notebook batch inference with chart
-
-Same pattern, adds a predicted-vs-actual visual. Useful as a demo artifact.
-
-```python
-import matplotlib.pyplot as plt
-
-# (continuing from Pattern 1)
-features_with_pred = features.sort_values("month_date")
-
-fig, ax = plt.subplots(figsize=(10, 5))
-ax.plot(features_with_pred["month_date"], features_with_pred["actual"],
-        label="Actual", linewidth=2)
-ax.plot(features_with_pred["month_date"], features_with_pred["prediction"],
-        label="Predicted", linestyle="--", linewidth=2)
-ax.set_xlabel("Month")
-ax.set_ylabel("Turnover (millions)")
-ax.set_title(f"Forecast — {model.metadata.run_id[:8]}")
-ax.legend()
-plt.xticks(rotation=45)
-plt.tight_layout()
-display(fig)
-```
-
----
-
-## Pattern 3: Lakeflow SDP batch via `spark_udf`
-
-For scheduled batch inference at scale. Distributes across Spark executors — no per-row Python overhead, no serving endpoint.
-
-```python
-# src/gold/gold_forecast.py
-import mlflow
-import databricks.declarative_pipelines as dp
-
-# Construct the UDF ONCE at module scope — see GOTCHAS #11
-mlflow.set_registry_uri("databricks-uc")
-
-MODEL_NAME = "my_catalog.my_schema.grocery_forecaster"
-predict_udf = mlflow.pyfunc.spark_udf(
-    spark,
-    model_uri=f"models:/{MODEL_NAME}@champion",
-    result_type="double",
-    env_manager="local",   # "local" avoids conda/virtualenv setup overhead
-)
-
-@dp.materialized_view(
-    comment="Grocery turnover forecast from @champion model",
-)
-def gold_forecast():
-    return (
-        spark.read.table("my_catalog.my_schema.silver_features")
-        .withColumn(
-            "forecast_turnover_millions",
-            predict_udf(
-                "turnover_lag_1",
-                "turnover_lag_12",
-                "rolling_3m_avg",
-                "state_share_of_national",
-                # ... pass each signature input column in the order the signature declares
-            ),
-        )
-    )
-```
-
-**What this gives you:**
-- A `gold_forecast` table that refreshes on every pipeline run
-- Distributed scoring (no serving endpoint, no auth token)
-- Full UC lineage: `silver_features` → `gold_forecast` via `grocery_forecaster@champion`
-- Genie can query it: *"what's the forecast for each state next month?"*
-
----
-
-## Pattern 4: `spark_udf` with `result_type` for multi-output models
-
-Multi-output regressors or classifiers need a richer result type.
-
-```python
-from pyspark.sql.types import ArrayType, DoubleType, StructType, StructField
-
-# Multi-output regression — model returns 2 predictions per row
-predict_udf = mlflow.pyfunc.spark_udf(
-    spark,
-    model_uri=f"models:/{MODEL_NAME}@champion",
-    result_type=ArrayType(DoubleType()),
-)
-
-# Classifier with probabilities
-predict_udf = mlflow.pyfunc.spark_udf(
-    spark,
-    model_uri=f"models:/{MODEL_NAME}@champion",
-    result_type=StructType([
-        StructField("class", StringType(), True),
-        StructField("confidence", DoubleType(), True),
-    ]),
-)
-```
-
----
-
-## Pattern 5: A/B validation — compare `@challenger` vs `@champion`
-
-Run both models on a validation set, compare error metrics, decide whether to promote.
-
-```python
-import mlflow
-from sklearn.metrics import mean_absolute_error, root_mean_squared_error
-
-mlflow.set_registry_uri("databricks-uc")
-MODEL_NAME = "my_catalog.my_schema.grocery_forecaster"
-
-champion = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}@champion")
-challenger = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}@challenger")
-
-# Hold-out validation set (not seen during training)
-validation = spark.table(f"{MODEL_NAME.rsplit('.', 1)[0]}.validation_features").toPandas()
-feature_cols = champion.metadata.get_input_schema().input_names()
-actuals = validation["turnover_millions"]
-
-champion_preds = champion.predict(validation[feature_cols])
-challenger_preds = challenger.predict(validation[feature_cols])
-
-print(f"Champion    RMSE: {root_mean_squared_error(actuals, champion_preds):.2f}")
-print(f"Challenger  RMSE: {root_mean_squared_error(actuals, challenger_preds):.2f}")
-print(f"Champion    MAE:  {mean_absolute_error(actuals, champion_preds):.2f}")
-print(f"Challenger  MAE:  {mean_absolute_error(actuals, challenger_preds):.2f}")
-
-# Decision logic — promote if challenger beats champion by >2%
-if root_mean_squared_error(actuals, challenger_preds) < root_mean_squared_error(actuals, champion_preds) * 0.98:
-    print("→ Promote @challenger. See patterns-uc-registration.md Pattern 5.")
-else:
-    print("→ Keep @champion. Delete @challenger.")
-```
-
----
-
-## Pattern 6: Structured streaming inference
-
-For models scoring events as they arrive (not batch-scheduled).
-
-```python
-from pyspark.sql.functions import col
-
-predict_udf = mlflow.pyfunc.spark_udf(
-    spark,
-    model_uri=f"models:/{MODEL_NAME}@champion",
-    result_type="double",
-)
-
-events = (
-    spark.readStream
-    .format("delta")
-    .table("my_catalog.my_schema.silver_events")
-)
-
-scored = events.withColumn(
-    "prediction",
-    predict_udf(*[col(c) for c in feature_cols]),
-)
-
-(
-    scored.writeStream
-    .format("delta")
-    .outputMode("append")
-    .option("checkpointLocation", "dbfs:/Volumes/my_catalog/my_schema/checkpoints/scoring")
-    .toTable("my_catalog.my_schema.gold_scored_events")
-)
-```
-
-For most classic-ML batch use cases, Pattern 3 (Lakeflow SDP) is simpler. Use streaming only when event-time scoring matters.
-
----
-
-## What NOT to do for batch inference
-
-### Do not use `ai_query` for custom UC models
-
-`ai_query('<custom-uc-model>', <input>)` requires the model to be deployed as a **Model Serving endpoint**. UC-registered models are NOT automatically behind an endpoint. Use `pyfunc.load_model` (Pattern 1) or `pyfunc.spark_udf` (Pattern 3) instead.
-
-`ai_query` IS the right call for:
-- Foundation Model API endpoints: `ai_query('databricks-dbrx-instruct', prompt)`
-- Model Serving endpoints you've explicitly provisioned
-
-See `GOTCHAS.md` #9.
-
-### Do not use `mlflow.pyfunc.load_model` for billion-row batches on a single node
-
-Pattern 1 collects to pandas — fine up to ~10k rows, painful beyond ~100k, impossible for millions. For distributed scale, use Pattern 3 (`spark_udf`).
-
-### Do not construct `spark_udf` inside the function body
-
-See `GOTCHAS.md` #11. Construct once at module scope, reuse inside `@dp.materialized_view` / `@dp.table`.
-
----
-
-## Troubleshooting batch inference
-
-| Error | Cause | Fix |
-|-------|-------|-----|
-| `RESOURCE_DOES_NOT_EXIST` on load | Wrong registry URI or two-level name | `GOTCHAS.md` #1, #2 |
-| Predictions are NaN | Input columns in wrong order | Pass columns in the order `model.metadata.get_input_schema().input_names()` declares |
-| `PERMISSION_DENIED: EXECUTE ON MODEL` | No read access to model | `GRANT EXECUTE ON MODEL ... TO <user>` |
-| `spark_udf` raises `PicklingError` | Model has un-picklable state (e.g., Spark session) | Re-train ensuring the model is pure Python/numpy — don't capture `spark` at training time |
-| Pipeline hangs on `gold_forecast` | Model artifact is large; first load is slow | Normal — subsequent runs are fast (UDF is cached per executor) |
-| Column type mismatch in Spark | UDF expects double; column is int/string | Cast explicitly: `col("feature").cast("double")` |
diff --git a/databricks-skills/databricks-mlflow-ml/references/patterns-experiment-setup.md b/databricks-skills/databricks-mlflow-ml/references/patterns-experiment-setup.md
deleted file mode 100644
index 00c6e2ba..00000000
--- a/databricks-skills/databricks-mlflow-ml/references/patterns-experiment-setup.md
+++ /dev/null
@@ -1,141 +0,0 @@
-# patterns-experiment-setup
-
-Experiments in UC-enforced workspaces need more setup than older MLflow guides show. The critical change: you must pin the experiment's `artifact_location` to a Unity Catalog volume, or `log_model` will fail with storage errors.
-
----
-
-## Pattern 1: Create experiment with UC volume artifact_location
-
-```python
-import mlflow
-
-mlflow.set_registry_uri("databricks-uc")   # always first
-
-# Prerequisite: the UC volume must exist
-# CREATE VOLUME IF NOT EXISTS my_catalog.my_schema.mlflow_artifacts;
-
-mlflow.set_experiment(
-    experiment_name="/Users/me@company.com/forecasting",
-    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting",
-)
-```
-
-**Why both are required:**
-- `experiment_name` — the workspace-visible path (browsable from the Experiments UI)
-- `artifact_location` — where logged artifacts (model binaries, plots, datasets) physically live
-
-In older workspaces, `artifact_location` defaulted to DBFS root. UC-enforced workspaces reject DBFS root writes, so `log_model` fails with opaque errors like:
-
-```
-MlflowException: API request to endpoint /api/2.0/mlflow/runs/log-artifact failed
-with error code 403 != 200. Response body: PERMISSION_DENIED ...
-```
-
-Pointing at a UC volume resolves this AND makes artifacts first-class-governed under UC lineage.
-
----
-
-## Pattern 2: Create the volume if it doesn't exist (idempotent)
-
-Run once per schema, before any experiment creation:
-
-```python
-spark.sql(f"""
-    CREATE VOLUME IF NOT EXISTS my_catalog.my_schema.mlflow_artifacts
-    COMMENT 'MLflow experiment artifacts for forecasting models'
-""")
-```
-
-Or via SQL editor:
-
-```sql
-CREATE VOLUME IF NOT EXISTS my_catalog.my_schema.mlflow_artifacts;
-```
-
-**Permissions needed:** `USE SCHEMA` + `CREATE VOLUME`. If missing, request `CREATE VOLUME ON SCHEMA my_catalog.my_schema` from the schema owner.
-
----
-
-## Pattern 3: Experiment already exists, wrong `artifact_location`
-
-You can't retroactively change `artifact_location`. Three options, in order of preference:
-
-**Option A — New experiment** (cleanest, keeps old runs intact):
-```python
-mlflow.set_experiment(
-    experiment_name="/Users/me@company.com/forecasting_v2",   # v2 suffix
-    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting_v2",
-)
-# New runs land in v2. Old runs stay in v1 (archive them if you like).
-```
-
-**Option B — Delete + recreate** (loses history; use only if no good runs exist):
-```python
-from mlflow import MlflowClient
-client = MlflowClient()
-
-exp = client.get_experiment_by_name("/Users/me@company.com/forecasting")
-client.delete_experiment(exp.experiment_id)
-
-mlflow.set_experiment(
-    experiment_name="/Users/me@company.com/forecasting",
-    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting",
-)
-```
-
-**Option C — Manual relocation of DBFS artifacts to UC volume**: do not do this. Storage paths are resolved at log time and encoded in the run's metadata; moving files doesn't update the pointers.
-
----
-
-## Pattern 4: Verify experiment is correctly configured
-
-After setup, before training:
-
-```python
-exp = mlflow.get_experiment_by_name("/Users/me@company.com/forecasting")
-assert exp is not None, "Experiment not created"
-assert exp.artifact_location.startswith("dbfs:/Volumes/"), (
-    f"artifact_location is not a UC volume: {exp.artifact_location}"
-)
-print(f"Experiment ID: {exp.experiment_id}")
-print(f"Artifact location: {exp.artifact_location}")
-```
-
-If the assert fails, you have an old experiment pointing at DBFS root. Apply Pattern 3.
-
----
-
-## Pattern 5: Workspace-path vs Repo-path experiments
-
-MLflow accepts two conventions for `experiment_name`:
-
-```python
-# Workspace-path convention (recommended for collaborative experiments)
-mlflow.set_experiment(experiment_name="/Users/me@company.com/forecasting")
-
-# Repo-path convention (only if you're running from a Git folder)
-mlflow.set_experiment(experiment_name="/Repos/me@company.com/my-repo/forecasting")
-```
-
-**Prefer workspace path** for experiments shared across pairs/teams. Repo-path experiments become orphans when the repo is deleted.
-
-**Both need `artifact_location` pointing at a UC volume.** The path convention only affects where the experiment metadata is browsable, not where artifacts live.
-
----
-
-## Pattern 6: Running from a notebook cell with autoselected experiment
-
-Databricks notebooks auto-associate runs with an experiment matching the notebook's workspace path:
-
-```python
-# In a notebook at /Users/me@company.com/Notebooks/train.py
-# Databricks will auto-set experiment_name to the notebook path
-# BUT the default artifact_location is still DBFS root — you still need to override:
-
-mlflow.set_experiment(
-    experiment_name="/Users/me@company.com/Notebooks/train",
-    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/train",
-)
-```
-
-Or call `set_experiment` explicitly before the first `start_run` — the artifact_location fix must be applied regardless of notebook auto-association.
diff --git a/databricks-skills/databricks-mlflow-ml/references/patterns-training.md b/databricks-skills/databricks-mlflow-ml/references/patterns-training.md
deleted file mode 100644
index 017e3cfb..00000000
--- a/databricks-skills/databricks-mlflow-ml/references/patterns-training.md
+++ /dev/null
@@ -1,205 +0,0 @@
-# patterns-training
-
-How to log classic ML models (sklearn / XGBoost / PyTorch) so they register cleanly and load correctly downstream. The two load-bearing decisions: `signature` and `input_example`.
-
----
-
-## Pattern 1: Baseline sklearn training loop
-
-```python
-import mlflow
-import mlflow.sklearn
-from sklearn.ensemble import GradientBoostingRegressor
-from sklearn.metrics import root_mean_squared_error, mean_absolute_error
-from sklearn.model_selection import train_test_split
-from mlflow.models import infer_signature
-
-mlflow.set_registry_uri("databricks-uc")
-mlflow.set_experiment(
-    experiment_name="/Users/me@company.com/forecasting",
-    artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting",
-)
-
-X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2)
-
-with mlflow.start_run(run_name="gbr_baseline"):
-    model = GradientBoostingRegressor(n_estimators=100, max_depth=3)
-    model.fit(X_train, y_train)
-
-    # Signature + input_example are both load-bearing
-    signature = infer_signature(X_train, model.predict(X_train[:5]))
-
-    mlflow.sklearn.log_model(
-        sk_model=model,
-        artifact_path="model",
-        signature=signature,
-        input_example=X_train.iloc[:5],
-    )
-
-    # Log everything needed to reproduce
-    mlflow.log_params({"n_estimators": 100, "max_depth": 3})
-    predictions = model.predict(X_test)
-    mlflow.log_metrics({
-        "rmse": root_mean_squared_error(y_test, predictions),
-        "mae": mean_absolute_error(y_test, predictions),
-    })
-```
-
----
-
-## Pattern 2: Preprocessing + model as a Pipeline
-
-Always log preprocessing alongside the model. See `GOTCHAS.md` #12 — inference-time preprocessing drift is the most painful post-registration bug.
-
-```python
-from sklearn.pipeline import Pipeline
-from sklearn.preprocessing import StandardScaler
-from sklearn.compose import ColumnTransformer
-
-numeric_features = ["turnover_lag_1", "turnover_lag_12", "rolling_3m_avg"]
-categorical_features = ["state", "industry"]
-
-preprocessor = ColumnTransformer([
-    ("num", StandardScaler(), numeric_features),
-    ("cat", "passthrough", categorical_features),    # handle in the model if needed
-])
-
-pipeline = Pipeline([
-    ("preprocessor", preprocessor),
-    ("model", GradientBoostingRegressor(n_estimators=100)),
-])
-
-with mlflow.start_run():
-    pipeline.fit(X_train, y_train)
-
-    signature = infer_signature(X_train, pipeline.predict(X_train[:5]))
-    mlflow.sklearn.log_model(
-        sk_model=pipeline,    # logs both preprocessor AND model as one artifact
-        artifact_path="model",
-        signature=signature,
-        input_example=X_train.iloc[:5],
-    )
-```
-
-At inference time, callers never need to know about `StandardScaler` — they pass raw features, `pyfunc.load_model` dispatches through the pipeline.
-
----
-
-## Pattern 3: XGBoost / PyTorch — same interface, different flavor
-
-```python
-# XGBoost
-import mlflow.xgboost
-import xgboost as xgb
-
-model = xgb.XGBRegressor(n_estimators=100, max_depth=3)
-model.fit(X_train, y_train)
-
-with mlflow.start_run():
-    mlflow.xgboost.log_model(
-        xgb_model=model,
-        artifact_path="model",
-        signature=infer_signature(X_train, model.predict(X_train[:5])),
-        input_example=X_train.iloc[:5],
-    )
-
-# PyTorch
-import mlflow.pytorch
-import torch
-
-class Forecaster(torch.nn.Module):
-    ...
-
-model = Forecaster()
-# ... training loop ...
-
-with mlflow.start_run():
-    # For PyTorch, input_example must be a tensor or numpy array
-    example = X_train.iloc[:5].to_numpy()
-    mlflow.pytorch.log_model(
-        pytorch_model=model,
-        artifact_path="model",
-        signature=infer_signature(example, model(torch.tensor(example)).detach().numpy()),
-        input_example=example,
-    )
-```
-
----
-
-## Pattern 4: Retraining — same experiment, new run
-
-Retraining for an A/B test or a scheduled refresh. Log to the same experiment; register as a new version in Workflow 2.
-
-```python
-with mlflow.start_run(run_name="gbr_v2_with_seasonality") as run:
-    model = GradientBoostingRegressor(n_estimators=200, max_depth=4)
-    model.fit(X_train_with_seasonality, y_train)
-
-    mlflow.sklearn.log_model(
-        sk_model=model,
-        artifact_path="model",
-        signature=infer_signature(X_train_with_seasonality,
-                                  model.predict(X_train_with_seasonality[:5])),
-        input_example=X_train_with_seasonality.iloc[:5],
-    )
-    # Remember the run_id for the register step
-    print(f"New run: {run.info.run_id}")
-```
-
----
-
-## Pattern 5: Autologging (quick path for iteration)
-
-Autologging wraps `fit()` and logs params + metrics + model automatically. Convenient during experimentation; less explicit than manual logging.
-
-```python
-mlflow.sklearn.autolog(
-    log_models=True,
-    log_input_examples=True,       # IMPORTANT — otherwise no input_example is captured
-    log_model_signatures=True,     # IMPORTANT — otherwise no signature is captured
-    silent=False,
-)
-
-# Any subsequent fit() call auto-logs
-model = GradientBoostingRegressor(n_estimators=100)
-model.fit(X_train, y_train)
-# Autolog handled the MLflow calls
-```
-
-**Caveat:** autologging infers signature + input_example heuristically. For production runs, prefer manual logging (Pattern 1) — you control what gets captured.
-
----
-
-## Pattern 6: Searching runs to pick the best one for registration
-
-Before registering, you typically want the best run from an experiment:
-
-```python
-runs = mlflow.search_runs(
-    experiment_names=["/Users/me@company.com/forecasting"],
-    filter_string="metrics.rmse < 100 AND tags.mlflow.runName LIKE 'gbr_%'",
-    order_by=["metrics.rmse ASC"],
-    max_results=1,
-)
-
-if runs.empty:
-    raise RuntimeError("No runs match criteria")
-
-best_run_id = runs.iloc[0]["run_id"]
-best_rmse = runs.iloc[0]["metrics.rmse"]
-print(f"Best run: {best_run_id} (RMSE={best_rmse:.2f})")
-
-# Now register this run's model — see patterns-uc-registration.md Pattern 1
-```
-
----
-
-## Common logging mistakes
-
-| Mistake | Effect | Fix |
-|---------|--------|-----|
-| No `signature` | `pyfunc.load_model` works, but `.predict()` coerces wrong | Always call `infer_signature(X_train, y_hat[:5])` |
-| No `input_example` | `pyfunc.load_model` can't introspect input schema | Pass `X_train.iloc[:5]` (or `.to_numpy()[:5]` for non-pandas) |
-| `artifact_path` changes between logs | Same model name → different paths → broken load URIs | Always use `artifact_path="model"` |
-| Log preprocessing separately | Inference callers must reapply preprocessing manually | Wrap in a sklearn `Pipeline` and log the pipeline |
-| Use `pickle.dump` directly | Loses MLflow's flavor dispatch | Always use `mlflow.<flavor>.log_model` |
diff --git a/databricks-skills/databricks-mlflow-ml/references/patterns-uc-registration.md b/databricks-skills/databricks-mlflow-ml/references/patterns-uc-registration.md
deleted file mode 100644
index 4d8929ed..00000000
--- a/databricks-skills/databricks-mlflow-ml/references/patterns-uc-registration.md
+++ /dev/null
@@ -1,232 +0,0 @@
-# patterns-uc-registration
-
-Register a logged model to Unity Catalog, set aliases, verify, and handle promotion / rollback.
-
----
-
-## Pattern 1: Explicit register from a specific run
-
-Cleanest workflow. Train (separate step) → pick best run → register.
-
-```python
-import mlflow
-from mlflow import MlflowClient
-
-mlflow.set_registry_uri("databricks-uc")
-
-MODEL_NAME = "my_catalog.my_schema.grocery_forecaster"
-
-# run_id from a specific training run (see patterns-training.md Pattern 6)
-run_id = "abc123def456"
-
-result = mlflow.register_model(
-    model_uri=f"runs:/{run_id}/model",
-    name=MODEL_NAME,
-    tags={
-        "trained_by": "forecasting_team",
-        "dataset_version": "2024-Q4",
-    },
-)
-print(f"Registered {MODEL_NAME} version {result.version}")
-```
-
-`result` is a `ModelVersion` object:
-- `result.name` — fully qualified three-level name
-- `result.version` — the new version (string, e.g., `"3"`)
-- `result.status` — should be `"READY"` by the time this returns
-
----
-
-## Pattern 2: Log-and-register in one call
-
-Shorter but couples logging and registration. Use when you *know* the current run is the one worth registering.
-
-```python
-with mlflow.start_run():
-    model.fit(X_train, y_train)
-    mlflow.sklearn.log_model(
-        sk_model=model,
-        artifact_path="model",
-        signature=infer_signature(X_train, model.predict(X_train[:5])),
-        input_example=X_train.iloc[:5],
-        registered_model_name="my_catalog.my_schema.grocery_forecaster",
-    )
-    # Model is registered as a new version; you still need to set alias separately.
-```
-
-**Still need a separate alias call** — `log_model` doesn't set aliases.
-
----
-
-## Pattern 3: Set aliases (`@champion`, `@challenger`)
-
-Aliases decouple the loader from the version. Moving `@champion` to a new version silently updates every `models:/...@champion` loader.
-
-```python
-from mlflow import MlflowClient
-client = MlflowClient()
-
-# Set or move an alias
-client.set_registered_model_alias(
-    name="my_catalog.my_schema.grocery_forecaster",
-    alias="champion",
-    version=result.version,
-)
-```
-
-**Conventions:**
-- `@champion` — the current production winner. Exactly one version at a time.
-- `@challenger` — a candidate under evaluation. Exactly one at a time.
-- Custom aliases — free-form, e.g., `@pair_team_07`, `@nightly`, `@reviewed`.
-
-**Read existing aliases:**
-```python
-model = client.get_registered_model("my_catalog.my_schema.grocery_forecaster")
-print(model.aliases)   # e.g., {"champion": "3", "challenger": "4"}
-```
-
-**Delete an alias:**
-```python
-client.delete_registered_model_alias(
-    name="my_catalog.my_schema.grocery_forecaster",
-    alias="challenger",
-)
-```
-
----
-
-## Pattern 4: Verify registration (Navigator's V-step)
-
-Don't trust `register_model`'s success message alone. See `GOTCHAS.md` #5.
-
-### Via SQL
-
-```sql
-DESCRIBE MODEL my_catalog.my_schema.grocery_forecaster;
-```
-
-Expected output includes the model metadata and (if set) aliases. If the result is "table or view not found," the model didn't register to UC — check `set_registry_uri` (GOTCHAS #1).
-
-### Via Catalog Explorer UI
-
-1. Open Catalog Explorer
-2. Navigate to `my_catalog` → `my_schema` → **Models** tab
-3. Confirm `grocery_forecaster` appears with an `@champion` badge
-
-If the model appears under the workspace MLflow icon instead (left sidebar, under MLflow), you registered to the workspace registry. See GOTCHAS #1.
-
-### Via Python assertion (scriptable)
-
-```python
-from mlflow import MlflowClient
-client = MlflowClient()
-
-model = client.get_registered_model("my_catalog.my_schema.grocery_forecaster")
-
-# Three assertions that should always hold post-registration
-assert model is not None, "Model not registered to UC"
-assert len(model.latest_versions) > 0, "No versions exist"
-assert "champion" in model.aliases, "@champion alias not set"
-print(f"✓ {model.name} v{model.aliases['champion']} is @champion")
-```
-
----
-
-## Pattern 5: A/B promotion — swap `@challenger` to `@champion`
-
-You've trained a new version, registered it, and validated its predictions against the current champion. Now promote:
-
-```python
-client = MlflowClient()
-MODEL_NAME = "my_catalog.my_schema.grocery_forecaster"
-
-# Get current state
-model = client.get_registered_model(MODEL_NAME)
-old_champion = model.aliases.get("champion")
-new_champion = model.aliases.get("challenger")
-
-if new_champion is None:
-    raise RuntimeError("No @challenger set — nothing to promote")
-
-# Move the alias (atomic — downstream loaders see the switch on next load)
-client.set_registered_model_alias(MODEL_NAME, "champion", new_champion)
-
-# Optional: archive the old champion version with a custom alias
-if old_champion:
-    client.set_registered_model_alias(MODEL_NAME, f"archived_{old_champion}", old_champion)
-
-# Remove the @challenger alias
-client.delete_registered_model_alias(MODEL_NAME, "challenger")
-
-print(f"Promoted v{new_champion} from @challenger to @champion (was v{old_champion})")
-```
-
-**Rollback** is the inverse — move `@champion` back to the previous version.
-
----
-
-## Pattern 6: List all model versions
-
-Useful for lineage inspection or cleanup.
-
-```sql
-SHOW MODEL VERSIONS ON MODEL my_catalog.my_schema.grocery_forecaster;
-```
-
-Or via Python:
-```python
-from mlflow import MlflowClient
-client = MlflowClient()
-
-versions = client.search_model_versions(
-    filter_string=f"name='my_catalog.my_schema.grocery_forecaster'",
-    order_by=["version_number DESC"],
-)
-for v in versions:
-    print(f"v{v.version}: run_id={v.run_id}, status={v.status}, aliases={v.aliases}")
-```
-
----
-
-## Pattern 7: Tags — richer metadata without new versions
-
-Tags are key-value metadata on the registered model (or a specific version). Useful for:
-- Team ownership: `set_model_version_tag(name, "1", "team", "forecasting")`
-- Dataset provenance: `set_model_version_tag(name, "1", "dataset_version", "2024-Q4")`
-- Review status: `set_model_version_tag(name, "1", "reviewed", "true")`
-
-```python
-from mlflow import MlflowClient
-client = MlflowClient()
-
-# Tag on the registered model (applies to all versions)
-client.set_registered_model_tag(
-    name="my_catalog.my_schema.grocery_forecaster",
-    key="domain",
-    value="retail",
-)
-
-# Tag on a specific version
-client.set_model_version_tag(
-    name="my_catalog.my_schema.grocery_forecaster",
-    version="3",
-    key="reviewed_by",
-    value="jane@company.com",
-)
-```
-
-Tags are queryable via `search_model_versions(filter_string="tags.reviewed = 'true'")`.
-
----
-
-## Permission requirements
-
-| Operation | Permission needed | Granted via |
-|-----------|-------------------|-------------|
-| `register_model` (first version of a model) | `CREATE MODEL ON SCHEMA <schema>` | `GRANT CREATE MODEL ON SCHEMA ... TO ...` |
-| `register_model` (new version of existing) | `EDIT ON MODEL <model>` | Automatic for model owner; otherwise grant |
-| `set_registered_model_alias` | `EDIT ON MODEL <model>` | Same as above |
-| `get_registered_model` / `DESCRIBE MODEL` | `USE CATALOG` + `USE SCHEMA` + `EXECUTE ON MODEL` | Standard read grants |
-| `load_model` | `EXECUTE ON MODEL <model>` | `GRANT EXECUTE ON MODEL ... TO ...` |
-
-If any of these fail, request the specific grant from the schema owner. See `GOTCHAS.md` #7.
diff --git a/databricks-skills/databricks-mlflow-ml/references/recipes.md b/databricks-skills/databricks-mlflow-ml/references/recipes.md
new file mode 100644
index 00000000..db326fad
--- /dev/null
+++ b/databricks-skills/databricks-mlflow-ml/references/recipes.md
@@ -0,0 +1,233 @@
+# UC-Specific Recipes
+
+These are code shapes, not full sklearn implementations. Use them to get Databricks / Unity Catalog arguments and ordering right.
+
+## 1. Experiment + UC Volume Setup
+
+Do this before training if the workspace enforces Unity Catalog storage.
+
+- Set the registry URI every session:
+  ```python
+  mlflow.set_registry_uri("databricks-uc")
+  ```
+- Create the artifact volume once per schema:
+  ```sql
+  CREATE VOLUME IF NOT EXISTS my_catalog.my_schema.mlflow_artifacts;
+  ```
+- Create / select the experiment with a UC volume artifact location:
+  ```python
+  mlflow.set_experiment(
+      experiment_name="/Users/me@company.com/forecasting",
+      artifact_location="dbfs:/Volumes/my_catalog/my_schema/mlflow_artifacts/forecasting",
+  )
+  ```
+
+If the experiment already exists with a non-UC artifact location, create a new experiment path. Do not try to move MLflow artifacts manually; run metadata already points at the original location.
+
+## 2. Log → Register → Alias
+
+### Logging UC essentials
+
+When logging the model:
+
+- Include `signature=infer_signature(X_train, model.predict(X_train[:5]))`.
+- Include `input_example=X_train.iloc[:5]` or equivalent real rows.
+- Use `name="model"` for MLflow 3.x / newer code; `artifact_path="model"` is the older spelling.
+- If preprocessing exists, log the whole pipeline / wrapper, not just the final estimator.
+
+Shape:
+
+```python
+with mlflow.start_run() as run:
+    # train your estimator or pipeline here
+    mlflow.<flavor>.log_model(
+        <model_arg>=model_or_pipeline,
+        name="model",
+        signature=signature,
+        input_example=input_example,
+    )
+```
+
+### Register + champion alias
+
+After training:
+
+```python
+result = mlflow.register_model(
+    f"runs:/{run_id}/model",
+    "my_catalog.my_schema.my_model",
+)
+MlflowClient().set_registered_model_alias(
+    "my_catalog.my_schema.my_model",
+    "champion",
+    result.version,
+)
+```
+
+`register_model` returns a `ModelVersion`; `result.version` is a string such as `"1"`. It does **not** set aliases — the alias call is separate and required.
+
+### Tags syntax
+
+Tags can be set at registration time:
+
+```python
+result = mlflow.register_model(
+    f"runs:/{run_id}/model",
+    MODEL_NAME,
+    tags={"dataset_version": "2024-Q4", "trained_by": "forecasting_team"},
+)
+```
+
+Or after registration:
+
+```python
+client.set_registered_model_tag(MODEL_NAME, "domain", "retail")
+client.set_model_version_tag(MODEL_NAME, result.version, "reviewed", "true")
+```
+
+### Minimal UC permission checklist
+
+| Operation | Required UC privilege |
+|-----------|-----------------------|
+| First registration of a model in a schema | `CREATE MODEL ON SCHEMA catalog.schema` |
+| Registering a new version | `EDIT ON MODEL catalog.schema.model` |
+| Setting aliases / tags | `EDIT ON MODEL catalog.schema.model` |
+| Loading for inference | `EXECUTE ON MODEL catalog.schema.model` plus `USE CATALOG` / `USE SCHEMA` |
+
+## 3. Lakeflow SDP `spark_udf` Shape
+
+For Lakeflow SDP, create the UDF at module scope, not inside the decorated dataset function.
+
+```python
+# src/gold/score_model.py
+import mlflow
+import databricks.declarative_pipelines as dp
+
+mlflow.set_registry_uri("databricks-uc")
+
+MODEL_NAME = "my_catalog.my_schema.my_model"
+
+predict_udf = mlflow.pyfunc.spark_udf(
+    spark,
+    model_uri=f"models:/{MODEL_NAME}@champion",
+    result_type="double",
+    env_manager="local",
+)
+
+@dp.materialized_view
+def gold_predictions():
+    return (
+        spark.read.table("my_catalog.my_schema.silver_features")
+        .withColumn(
+            "prediction",
+            predict_udf("feature_a", "feature_b", "feature_c"),
+        )
+    )
+```
+
+Pass feature columns in the order expected by the model signature.
+
+`result_type` shapes:
+
+| Model output | `result_type` |
+|--------------|---------------|
+| Single numeric prediction | `"double"` |
+| Integer class id | `"long"` |
+| String class label | `"string"` |
+| Multi-output numeric vector | `"array<double>"` |
+| Named outputs | `StructType([...])` |
+
+Do not use `ai_query` here unless you have explicitly deployed a Model Serving endpoint.
+
+## 4. A/B Promotion Alias Swap
+
+This order is intentional: delete old `@champion` before setting the new one. Otherwise, during a botched sequence or retry, the pre-existing alias can still point consumers at the wrong version.
+
+```python
+from mlflow import MlflowClient
+
+client = MlflowClient()
+MODEL_NAME = "my_catalog.my_schema.my_model"
+
+model = client.get_registered_model(MODEL_NAME)
+old_champion = model.aliases.get("champion")
+new_champion = model.aliases.get("challenger")
+
+if new_champion is None:
+    raise RuntimeError("No @challenger alias set; nothing to promote")
+
+# Optional: preserve an explicit rollback handle before moving champion.
+if old_champion:
+    client.set_registered_model_alias(
+        MODEL_NAME,
+        f"archived_{old_champion}",
+        old_champion,
+    )
+
+# Required order: remove old champion, then set new champion.
+if old_champion:
+    client.delete_registered_model_alias(MODEL_NAME, "champion")
+
+client.set_registered_model_alias(MODEL_NAME, "champion", new_champion)
+
+# Remove challenger after it has become champion.
+client.delete_registered_model_alias(MODEL_NAME, "challenger")
+```
+
+Downstream code using `models:/my_catalog.my_schema.my_model@champion` picks up the new version on next load. No loader code changes.
+
+Rollback shape:
+
+```python
+client.delete_registered_model_alias(MODEL_NAME, "champion")
+client.set_registered_model_alias(MODEL_NAME, "champion", old_champion)
+```
+
+## 5. Verification One-Liners
+
+### SQL
+
+```sql
+DESCRIBE MODEL my_catalog.my_schema.my_model;
+SHOW MODEL VERSIONS ON MODEL my_catalog.my_schema.my_model;
+SHOW GRANTS ON MODEL my_catalog.my_schema.my_model;
+SHOW GRANTS ON SCHEMA my_catalog.my_schema;
+```
+
+If `DESCRIBE MODEL` cannot find it but `register_model` succeeded, suspect the workspace-registry trap: missing `mlflow.set_registry_uri("databricks-uc")`.
+
+### Alias dictionary shape
+
+```python
+model = MlflowClient().get_registered_model("my_catalog.my_schema.my_model")
+model.aliases
+# Expected shape: {"champion": "3", "challenger": "4"}
+```
+
+Use this to confirm that `@champion` exists and points at the version you intended.
+
+### Signature debugging
+
+```python
+from mlflow.models import get_model_info
+
+info = get_model_info("models:/my_catalog.my_schema.my_model@champion")
+info.signature
+info.flavors
+```
+
+If `info.signature` is missing or does not match the DataFrame columns you pass to `predict`, re-log the model with a signature and input example.
+
+### Load URI sanity check
+
+```python
+mlflow.pyfunc.load_model("models:/my_catalog.my_schema.my_model@champion")
+```
+
+Correct URI shape is:
+
+```text
+models:/<catalog>.<schema>.<model>@<alias>
+```
+
+Avoid version-pinned loaders such as `models:/catalog.schema.model/3` unless you are doing forensic debugging.
diff --git a/databricks-skills/databricks-mlflow-ml/references/user-journeys.md b/databricks-skills/databricks-mlflow-ml/references/user-journeys.md
deleted file mode 100644
index a72f9106..00000000
--- a/databricks-skills/databricks-mlflow-ml/references/user-journeys.md
+++ /dev/null
@@ -1,195 +0,0 @@
-# user-journeys
-
-End-to-end workflows with decision points. Read the journey that matches your situation.
-
----
-
-## Journey 1: First model (train → register → score) — the 90%-case
-
-Most users arrive here. Goal: a UC-registered model with a `@champion` alias, producing batch predictions.
-
-**Prerequisites:**
-- UC catalog + schema where you have `CREATE MODEL` permission
-- A UC volume for MLflow artifacts (create if missing — `patterns-experiment-setup.md` Pattern 2)
-- Features in a Spark table (Bronze → Silver → Gold already done)
-
-**Steps:**
-
-1. **Set up the experiment** (`patterns-experiment-setup.md` Pattern 1)
-   - `mlflow.set_registry_uri("databricks-uc")`
-   - `mlflow.set_experiment(experiment_name=..., artifact_location=<uc_volume_path>)`
-2. **Train + log** (`patterns-training.md` Pattern 1 or 2)
-   - Always include `signature` and `input_example`
-   - If you have preprocessing, wrap in `sklearn.Pipeline` (Pattern 2)
-3. **Register** (`patterns-uc-registration.md` Pattern 1)
-   - `mlflow.register_model(f"runs:/{run_id}/model", "catalog.schema.model")`
-4. **Set alias** (`patterns-uc-registration.md` Pattern 3)
-   - `client.set_registered_model_alias(name, "champion", version)`
-5. **Verify** (`patterns-uc-registration.md` Pattern 4)
-   - `DESCRIBE MODEL catalog.schema.model` OR Catalog Explorer UI
-6. **Load + score** (`patterns-batch-inference.md` Pattern 1 or 2)
-   - `model = mlflow.pyfunc.load_model("models:/catalog.schema.model@champion")`
-   - `model.predict(features_df)`
-
-**Done.** You have a UC-registered model with a canonical loading URI that downstream code can depend on.
-
----
-
-## Journey 2: Retrain + promote (A/B)
-
-You already have `@champion`. You trained a new version and want to decide whether to promote it.
-
-**Prerequisites:**
-- Model exists in UC with `@champion` set (you did Journey 1)
-- New training run logged to the same experiment
-
-**Steps:**
-
-1. **Register new version** (`patterns-uc-registration.md` Pattern 1)
-   - Same `MODEL_NAME` as before — UC auto-increments version
-2. **Set `@challenger`** (`patterns-uc-registration.md` Pattern 3)
-   - `client.set_registered_model_alias(name, "challenger", new_version)`
-3. **A/B validate** (`patterns-batch-inference.md` Pattern 5)
-   - Load both aliases, score validation set, compare metrics
-4. **Decide**:
-   - Challenger wins → **Pattern 5 in `patterns-uc-registration.md`**: swap aliases
-   - Champion wins → delete `@challenger` alias, keep current `@champion`
-5. **Verify** downstream loaders picked up the new version (after swap)
-   - Any code using `models:/<name>@champion` will see the new version on next load
-
----
-
-## Journey 3: Lakeflow SDP batch pipeline
-
-You want predictions to land in a scheduled gold table, not an ad-hoc notebook.
-
-**Prerequisites:**
-- Model registered with `@champion` (Journey 1 complete)
-- Lakeflow SDP pipeline defined (one already running is ideal)
-
-**Steps:**
-
-1. **Add a new file** to the pipeline source: `src/gold/gold_forecast.py`
-2. **Construct the UDF at module scope** (`patterns-batch-inference.md` Pattern 3)
-   - `mlflow.set_registry_uri("databricks-uc")`
-   - `predict_udf = mlflow.pyfunc.spark_udf(spark, "models:/...@champion", result_type="double")`
-3. **Define the `@dp.materialized_view`** that reads silver features, applies the UDF
-4. **Deploy + run** the pipeline
-   - `databricks bundle deploy && databricks bundle run <pipeline_name>`
-5. **Verify** the `gold_forecast` table materializes
-   - Row count matches `silver_features`
-   - Query from Genie or SQL editor
-
-**Do NOT use `ai_query`** in this pipeline — see `GOTCHAS.md` #9.
-
----
-
-## Journey 4: Debug a registration that went to workspace registry
-
-The #1 support question. Symptoms: model doesn't appear in Catalog Explorer; URL contains `/ml/models/` instead of `/explore/data/models/`.
-
-**Steps:**
-
-1. Confirm the diagnosis:
-   - Catalog Explorer → catalog → schema → Models tab: **missing**
-   - MLflow icon (left sidebar) → Models: **present**
-   - That's the workspace registry, not UC
-2. Verify registry URI in the training session
-   - `mlflow.get_registry_uri()` — should return `"databricks-uc"`, not a workspace URI
-3. If the URI was wrong, fix it and re-register:
-   - Add `mlflow.set_registry_uri("databricks-uc")` at the top of the training code
-   - Re-run `mlflow.register_model(...)` — this creates a new entry in UC
-   - The orphaned workspace-registry entry can be deleted via MLflow UI (optional)
-4. Set the `@champion` alias on the new UC version
-5. Verify via `DESCRIBE MODEL` — see `patterns-uc-registration.md` Pattern 4
-
----
-
-## Journey 5: Debug a `pyfunc.load_model` that fails or predicts wrong
-
-Model loaded successfully, but `.predict()` raises or produces nonsense.
-
-**Steps:**
-
-1. **Check the signature was logged:**
-   ```python
-   from mlflow.models import get_model_info
-   info = get_model_info("models:/<name>@champion")
-   print(info.signature)
-   ```
-   If `None` — see `GOTCHAS.md` #8. Re-log the model with `signature=infer_signature(...)`.
-
-2. **Check the input column order:**
-   ```python
-   expected = model.metadata.get_input_schema().input_names()
-   print(f"Model expects: {expected}")
-   print(f"You passed: {list(features_df.columns)}")
-   ```
-   If the order differs, pass `features_df[expected]`.
-
-3. **Check preprocessing coverage:**
-   - Does the training notebook call a scaler / encoder / imputer before fitting?
-   - Is that preprocessing in the logged artifact?
-   - If not — see `GOTCHAS.md` #12. Re-train with preprocessing wrapped in `sklearn.Pipeline`.
-
-4. **Check for type coercion:**
-   - Integer column becoming float (or vice versa) — fine for sklearn, sometimes breaks for xgboost/pytorch
-   - Categorical as string vs int — depends on the flavor
-   - Fix: cast `features_df` to match `model.metadata.get_input_schema()` dtypes before predicting
-
----
-
-## Journey 6: Schema evolution — your features changed since the model was logged
-
-The silver features pipeline added a new column. Your deployed `@champion` model was trained without it. Predictions still work (extra columns are ignored), but you want to include the new feature.
-
-**Steps:**
-
-1. Retrain with the new feature:
-   ```python
-   # Same Journey 1 steps, but with expanded feature set
-   mlflow.sklearn.log_model(
-       sk_model=new_pipeline,
-       artifact_path="model",
-       signature=infer_signature(X_train_expanded, new_pipeline.predict(X_train_expanded[:5])),
-       input_example=X_train_expanded.iloc[:5],
-   )
-   ```
-2. Register as a new version
-3. Validate via A/B (Journey 2)
-4. Promote to `@champion`
-
-Schema changes are always a new version. Never mutate a logged model in place.
-
----
-
-## Journey 7: "Everything is on fire, I have 10 minutes to demo"
-
-Someone registered a fallback model. Load it.
-
-```python
-import mlflow
-mlflow.set_registry_uri("databricks-uc")
-model = mlflow.pyfunc.load_model(
-    "models:/<fallback_catalog>.<fallback_schema>.<model>@fallback"
-)
-features = spark.table("<fallback_catalog>.<fallback_schema>.sample_features").limit(500).toPandas()
-features["prediction"] = model.predict(features)
-display(spark.createDataFrame(features))
-```
-
-Every escape-hatch pattern should pre-register a `@fallback` version for exactly this case.
-
----
-
-## When to use which journey
-
-| Situation | Journey |
-|-----------|---------|
-| I'm starting from zero | 1 |
-| I have `@champion`, trained something new | 2 |
-| I want predictions in a scheduled table | 3 |
-| Registered but can't find in Catalog Explorer | 4 |
-| `load_model` succeeds but `predict` fails | 5 |
-| My features changed | 6 |
-| Demo in 10 minutes, nothing works | 7 |

From b424134f23d9e4a81a3f93a3f8beb5c1ef600248 Mon Sep 17 00:00:00 2001
From: David O'Keeffe <david.okeeffe1@coles.com.au>
Date: Sat, 9 May 2026 15:26:49 +1000
Subject: [PATCH 5/5] =?UTF-8?q?chore(mlflow-ml):=20rename=20GOTCHAS.md=20?=
 =?UTF-8?q?=E2=86=92=20gotchas.md=20(case=20fix)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

macOS case-insensitive filesystem hid this from the previous commit.
The content was already lowercased in references; this commit makes
the git index match.

Co-authored-by: Isaac
---
 .../databricks-mlflow-ml/references/{GOTCHAS.md => gotchas.md}    | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename databricks-skills/databricks-mlflow-ml/references/{GOTCHAS.md => gotchas.md} (100%)

diff --git a/databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md b/databricks-skills/databricks-mlflow-ml/references/gotchas.md
similarity index 100%
rename from databricks-skills/databricks-mlflow-ml/references/GOTCHAS.md
rename to databricks-skills/databricks-mlflow-ml/references/gotchas.md