Skip to content

imtkain/Fabric-Pathfinder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Fabric Pathfinder

Tenant-wide inventory collection and cross-environment artifact mapping for Microsoft Fabric CI/CD workflows.


STRONGLY RECOMMENDED: Run as a Service Principal (SPN) with Admin API Access

For the most complete and accurate inventory data, run this notebook as a Service Principal with Admin API permissions. This provides:

  • Tenant-wide visibility: See all workspaces and items, not just those you have access to
  • Creator metadata: Item creator IDs, names, types, and UPNs
  • Item state and timestamps: Last updated dates and state information
  • Capacity information: Which capacity each item belongs to

Without admin access, the notebook falls back to non-admin APIs which only return items in workspaces you can access, and many metadata fields will be NULL.


TL;DR

  1. Import nb_000_fabric_pathfinder.ipynb into a Fabric workspace
  2. Configure SINK_TYPE ("warehouse" or "lakehouse") in the notebook
  3. Run the notebook to populate inventory tables
  4. Execute fabric_pathfinder_views.sql in the warehouse/lakehouse to create environment mapping views
  5. Query vw_env_map_all to get DEV/UAT/PROD artifact mappings for CI/CD parameterization
image

Table of Contents

Why This Exists

Promoting Fabric artifacts between environments (DEV to UAT to PROD) requires replacing environment-specific identifiers: workspace IDs, connection GUIDs, SQL connection strings, and item references. Manually tracking these mappings is error-prone and doesn't scale.

Fabric Pathfinder solves this by:

  1. Inventorying your tenant via Fabric REST APIs (the notebook)
  2. Mapping artifacts across environments using naming conventions (the SQL views)

Vacation and Offboarding Coverage

Beyond CI/CD, the inventory provides critical visibility into artifact ownership. When someone goes on vacation or leaves the organization, you need to know what they own. Microsoft Entra ID's 30-day sign-in requirement can result in account deletion, and with it, orphaned artifacts that nobody knows exist until something breaks.

The workspace_role_assignments table captures who has access to what, enabling proactive ownership transfers before access is lost.

Coming Soon: Fabric Usurp

A companion project using undocumented APIs that enables non-SPN users to take over all objects in a specified workspace in one shot, without the UI, from a notebook. This will streamline bulk ownership transfers during offboarding.

The output supports two CI/CD approaches:

  • fabric-cicd: Generate parameter files for Azure DevOps deployments
  • Fabric Deployment Pipelines: Validate Variable Library completeness by comparing mapped artifacts against configured value sets before promotion

What's New

Lakehouse Sink Support

You can now write inventory data to either a Data Warehouse or a Lakehouse:

  • Warehouse (SINK_TYPE = "warehouse"): Uses the synapsesql connector. Best for SQL-based analytics.
  • Lakehouse (SINK_TYPE = "lakehouse"): Uses Delta format via abfss paths. Best for Spark-based workflows.

The lakehouse is created with schema support enabled by default (lh_integration).

Non-Admin Fallback

The notebook now automatically detects whether you have admin API access:

  • Admin mode: Uses tenant-wide Admin APIs for complete visibility
  • Non-admin mode: Falls back to per-workspace Core APIs if admin access is unavailable

In non-admin mode, the same table schema is used, but certain columns will be NULL (see Admin vs Non-Admin Mode for details).

Prerequisites

For Full Functionality (Recommended)

Admin API Access Required for Complete Data

For the most complete inventory data, use admin API access:

  • User mode: The executing user must have Fabric tenant administrator rights.

  • SPN mode (recommended): A Fabric Administrator must enable the following tenant settings:

    • Service principals can use Fabric APIs (Developer settings)
    • Service principals can access read-only admin APIs (Admin API settings)

    The SPN's security group must be added to both settings.

For Limited Functionality (Non-Admin)

If admin access is unavailable, the notebook will still run but with reduced data:

  • Only inventories workspaces you have access to
  • Creator metadata, state, timestamps, and capacity IDs will be NULL
  • Git connections use per-workspace API (slower but functional)

Additional Requirements

  • A Fabric workspace with capacity (F or P SKU)
  • Permission to create a Warehouse or Lakehouse in the target workspace
  • Artifacts following the naming conventions described below (or willingness to refactor the views)

Files

File Description
nb_000_fabric_pathfinder.ipynb PySpark notebook that collects tenant inventory via REST APIs and writes to wh_integration or lh_integration
fabric_pathfinder_views.sql SQL views that map artifacts across DEV/UAT/PROD environments
claude.md Code style guide for contributors

How It Works

Step 1: Notebook Populates Inventory Tables

The notebook calls Fabric REST APIs to collect tenant metadata and writes to these tables:

Table Admin API Non-Admin Fallback Description
workspace_items Full metadata Limited (no creator, state, timestamps) All Fabric items with workspace context
tenant_connections - Same data Tenant-level connection definitions
sql_endpoint_connections - Same data SQL connection strings for Warehouses/Lakehouses
item_schedules - Same data Refresh schedules for Dataflows and Semantic Models
workspace_role_assignments - Same data User/group/SPN role assignments per workspace
capacities - Same data Fabric capacities (F-SKUs) with region and state
gateways - Same data On-premises, personal, and VNet gateways
workspace_git_connections Tenant-wide Per-workspace (slower) Git integration status per workspace

Step 2: Views Map Artifacts Across Environments

The SQL views read from the inventory tables and correlate artifacts across environments using naming conventions:

workspace_items ─────────────────► vw_env_map_workspace_items ───┐
workspace_items (grouped) ───────► vw_env_map_workspaces ────────┤
tenant_connections ──────────────► vw_env_map_connections ───────┼──► vw_env_map_all
sql_endpoint_connections ────────► vw_env_map_sql_endpoints ─────┘

All views produce a consistent output schema:

Column Description
item_name Canonical artifact name (environment suffix stripped)
item_type Category: Workspace, Connection, SQL Connection String, or item type
id_dev DEV environment identifier (GUID, connection string, etc.)
id_uat UAT environment identifier (NULL if not found)
id_prod PROD environment identifier (NULL if not found)

DEV is the canonical source. All views LEFT JOIN from DEV, so:

  • id_dev is never NULL
  • Artifacts that exist only in UAT or PROD will not appear
  • NULL values in id_uat or id_prod indicate missing artifacts in those environments

Sink Type Selection

Configure where inventory data is written:

# In the notebook configuration section
SINK_TYPE = "warehouse"  # or "lakehouse"

Warehouse Mode (Default)

SINK_TYPE = "warehouse"
WAREHOUSE_NAME = "wh_integration"
  • Creates/uses wh_integration warehouse
  • Writes using synapsesql connector with mode("overwrite")
  • Best for SQL-based analytics and reporting

Lakehouse Mode

SINK_TYPE = "lakehouse"
LAKEHOUSE_NAME = "lh_integration"
  • Creates/uses lh_integration lakehouse with schema support enabled
  • Writes Delta format via abfss path
  • Removes existing files before writing for clean overwrite
  • Best for Spark-based data engineering workflows

Admin vs Non-Admin Mode

The notebook automatically detects admin access at startup and displays the mode in the execution summary.

Execution Summary Example

============================================================
EXECUTION SUMMARY
============================================================
API Mode: ADMIN (full metadata available)
Sink: warehouse (wh_integration)
------------------------------------------------------------
workspace_items              [OK] 1234 rows
...

Or for non-admin:

============================================================
EXECUTION SUMMARY
============================================================
API Mode: NON-ADMIN (limited metadata - some columns will be NULL)
Sink: warehouse (wh_integration)
------------------------------------------------------------
workspace_items              [OK] 567 rows
...

Data Differences by Mode

Column Admin Mode Non-Admin Mode
guid, name, type, description Available Available
workspace_id, workspace_name Available Available
state Available NULL
last_updated_date Available NULL
capacity_id Available NULL
creator_id Available NULL
creator_name Available NULL
creator_type Available NULL
creator_upn Available NULL

Why Non-Admin Mode Exists

Non-admin mode enables teams without tenant admin rights to still benefit from the inventory system. While the data is less complete, it's sufficient for:

  • Basic artifact tracking within accessible workspaces
  • CI/CD parameterization for workspaces you manage
  • Workspace role assignment auditing (uses non-admin API)

Usage

1. Import and Configure the Notebook

Upload nb_000_fabric_pathfinder.ipynb to your Fabric workspace. Configure the authentication and sink sections:

# Authentication mode: "user" for interactive, "spn" for service principal
auth_mode = "user"

# Sink type: "warehouse" or "lakehouse"
SINK_TYPE = "warehouse"

# For SPN mode, configure Key Vault secrets:
KEY_VAULT_NAME = "your-keyvault-name"

2. Run the Notebook

Execute all cells or call refresh_all() directly. The notebook will:

  1. Check admin API access (falls back to non-admin if unavailable)
  2. Create the warehouse/lakehouse if it doesn't exist
  3. Fetch data from Fabric REST APIs
  4. Write to all 8 inventory tables (full overwrite)
  5. Display execution summary with API mode and results

Typical runtime: 2-10 minutes depending on tenant size and API mode.

3. Create the Views

Open the warehouse/lakehouse in Fabric and execute fabric_pathfinder_views.sql. This creates:

  • vw_env_map_connections
  • vw_env_map_sql_endpoints
  • vw_env_map_workspace_items
  • vw_env_map_workspaces
  • vw_env_map_all (unified view)

4. Query Environment Mappings

-- All mappings
SELECT * FROM dbo.vw_env_map_all;

-- Only artifacts present in all three environments
SELECT * FROM dbo.vw_env_map_all
WHERE id_uat IS NOT NULL
  AND id_prod IS NOT NULL;

-- Find artifacts missing from PROD
SELECT * FROM dbo.vw_env_map_all
WHERE id_prod IS NULL;

-- Specific item types
SELECT * FROM dbo.vw_env_map_all
WHERE item_type = 'Lakehouse';

Naming Convention Requirements

The views rely on consistent naming conventions to identify environments. If your naming differs, you must modify the views.

Workspaces

Environment Pattern Example
DEV <Name> - DEV DATA Sales - DEV
UAT <Name> - UAT DATA Sales - UAT
PROD <Name> (no suffix) DATA Sales

Additionally, workspaces must start with DATA to be included in vw_env_map_workspaces.

Connections

Environment Pattern Example
DEV conn_<name>_dev conn_sales_dev
UAT conn_<name>_uat conn_sales_uat
PROD conn_<name>_prd conn_sales_prd

Connections must start with conn_ to be included.

Workspace Items (Lakehouses, Warehouses, etc.)

Items are matched by name and type across workspaces. Environment detection uses the workspace name pattern (same as above).

Example: A Lakehouse named lh_sales in workspace DATA Sales - DEV maps to lh_sales in DATA Sales - UAT and DATA Sales.

SQL Endpoints

SQL endpoint connection strings are matched by item name and type across workspaces, using the same workspace naming pattern.

Customizing the Views

If your naming conventions differ, modify the views in fabric_pathfinder_views.sql.

Changing Workspace Patterns

Find the environment detection logic in each view:

-- Current: workspace ends with ' - DEV'
WHERE UPPER(workspace_name) LIKE '% - DEV'

-- Example: workspace contains '_DEV_'
WHERE UPPER(workspace_name) LIKE '%\_DEV\_%' ESCAPE '\'

Changing Connection Patterns

Find the connection suffix logic in vw_env_map_connections:

-- Current: connection ends with '_dev', '_uat', '_prd'
WHERE UPPER(display_name) LIKE 'CONN\_%' ESCAPE '\'
  AND UPPER(display_name) LIKE '%\_DEV' ESCAPE '\'

-- Example: connection ends with '.dev', '.uat', '.prod'
WHERE UPPER(display_name) LIKE 'CONN\_%' ESCAPE '\'
  AND UPPER(display_name) LIKE '%.DEV'

Changing Included Item Types

Find the item type filter in vw_env_map_workspace_items:

d.type IN (
    'Lakehouse',
    'Warehouse',
    'SQLEndpoint',
    'DataPipeline',
    'Report',
    'SemanticModel',
    'Reflex'
)

Add or remove types as needed.

Changing Exclusions

The vw_env_map_workspace_items view excludes certain items. Modify these patterns:

-- Exclude auto-generated usage metrics reports
AND UPPER(d.name) NOT LIKE '%USAGE METRICS%'

-- Exclude specific sample items by name
AND d.name NOT IN (
    'Potential Outage Detection for Orders',
    'Example'
)

Changing Notebook Inclusion Patterns

Only notebooks matching specific patterns are included:

OR UPPER(d.name) LIKE 'NB\_100%' ESCAPE '\'
OR UPPER(d.name) LIKE 'NB\_400%' ESCAPE '\'

Modify to match your notebook naming convention, or remove to exclude all notebooks.

Authentication

The notebook supports two authentication modes:

User Mode (Default)

auth_mode = "user"

Uses the identity of the user running the notebook. For full functionality, requires Fabric tenant administrator rights.

Service Principal Mode (Recommended for Production)

auth_mode = "spn"

KEY_VAULT_NAME = "your-keyvault-name"
KEY_VAULT_URL = f"https://{KEY_VAULT_NAME}.vault.azure.net/"

TENANT_ID = notebookutils.credentials.getSecret(KEY_VAULT_URL, "aad-tenant-id")
SP_CLIENT_ID = notebookutils.credentials.getSecret(KEY_VAULT_URL, "fabric-spn-client-id")
SP_CLIENT_SECRET = notebookutils.credentials.getSecret(KEY_VAULT_URL, "fabric-spn-client-secret")

Required tenant settings for full admin access:

  • Service principals can use Fabric APIs
  • Service principals can access read-only admin APIs

The SPN's security group must be added to both settings.

Why SPN is recommended:

  • Consistent, automated execution
  • No dependency on individual user availability
  • Full admin API access when properly configured
  • Required for scheduled/automated inventory refreshes

Troubleshooting

Error Cause Solution
401 Unauthorized Token expired or insufficient permissions Check auth_mode and refresh the notebook
403 Forbidden on admin API User/SPN lacks admin rights Configure admin access or accept non-admin mode
429 Too Many Requests Rate limit exceeded Reduce REQUESTS_PER_MINUTE or wait
Empty vw_env_map_* results Naming conventions don't match Verify workspace/connection names match expected patterns
NULL values in id_uat/id_prod Artifact missing in that environment Create the artifact or exclude from mapping
NULL creator/state columns Running in non-admin mode Configure admin API access for full metadata
Warehouse/Lakehouse creation timeout Capacity overloaded Retry or use existing item
API Mode: NON-ADMIN in summary Admin API access unavailable Check tenant settings or user permissions

Limitations

  • Full overwrite only: All tables are completely replaced on each run. No incremental updates.
  • Point-in-time snapshot: Data reflects tenant state at execution time only.
  • Naming convention dependency: Views require consistent naming to match artifacts across environments.
  • DEV is canonical: Artifacts existing only in UAT/PROD are not captured.
  • No secrets storage: Connection strings are stored in plain text. Use Variable Libraries for sensitive values.
  • Non-admin data gaps: Without admin access, creator metadata, state, and timestamps are unavailable.
  • Non-admin workspace scope: Without admin access, only workspaces the identity can access are inventoried.

References

About

Maps Microsoft Fabric artifacts across DEV/UAT/PROD environments via Admin API. Great for creating variable libraries. Generates YAML find/replace config for fabric-cicd / ADO parameterized deployments. Solves the GUID-per-environment problem for automated Fabric CI/CD.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors