Dev by AB-Law · Pull Request #108 · AB-Law/Pluck-It

AB-Law · 2026-03-17T14:37:46Z

Closes #100
Closes #101
Closes #99

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved scraper error handling to fail-fast on load failures, preventing silent execution issues.
Refactor
- Consolidated authentication token resolution pipeline across services for streamlined credential handling.
- Enhanced request tracing and metadata propagation throughout the system for improved debugging and system observability.
Tests
- Added comprehensive unit tests for token resolution and digest processing workflows.

- Rewrite QUICKSTART.md with Azurite + Cosmos emulator setup, container creation steps, port reference, and 5-tab run guide - Add CONTRIBUTING.md: branch strategy, tech stack, per-layer conventions, PR checklist, and secrets policy - Add local.settings.json.example for both PluckIt.Functions and PluckIt.Processor pre-filled with emulator connection strings and placeholder values for Azure OpenAI credentials Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…in permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

…rings

* feat: integrate ESLint for improved code quality - Added ESLint configuration to the project for TypeScript and Angular files. - Updated `angular.json` to include linting options. - Installed necessary ESLint packages for Angular and TypeScript support. - Refactored code to replace `any` with `unknown` in several service files for better type safety. - Enhanced unit tests to align with the new type definitions and ensure consistency. * refactor: remove unused page * fix: correct Cosmos emulator key (invalid base64 Tg== -> g==) * fix: update Cosmos DB keys and enhance Azurite compatibility for local development * fix: enhance UUID generation and improve error handling in BlobSasService and setup-local-cosmos.py * chore: update CI/CD workflows to include 'dev' branch for push and pull request triggers * fix: improve UUID generation logic in DashboardComponent and update CI to disable watch mode for tests

* feat: enhance wardrobe loading logic - Updated the `_load_user_wardrobe` function to prefetch item IDs and limit the number of items returned based on wear count. - Enhanced unit tests for `_load_user_wardrobe` to verify correct behavior and query execution. * test: enhance user wardrobe loading tests to verify wearCount conditions - Added assertions to check for the presence of "IS_DEFINED(c.wearCount)" and "c.wearCount > 0" in the SQL query for loading user wardrobe items.

* feat: Enhance deduplication logic in RunDeduplicator by implementing prefix-based pHash storage. Introduced methods for registering pHashes by prefix and identifying candidate buckets for Hamming distance checks. Updated tests to validate detection of duplicates with nearby prefix variations. * refactor: Replace candidate bucket method with a cached version for improved performance in deduplication logic. The new method optimizes pHash prefix bucket retrieval by caching results, enhancing efficiency during duplicate checks. * refactor: Simplify candidate bucket retrieval by removing max value check in deduplicator. Added new tests to validate deduplication behavior at threshold boundaries and just under threshold conditions.

* Refactor wardrobe item loading and enhance query limits - Updated `_load_user_wardrobe` to first fetch item IDs and then retrieve top scored items with a limit of 50. - Introduced a constant `_WARDROBE_SCAN_LIMIT` to cap the number of items fetched in wardrobe queries to 500. - Enhanced `_load_wardrobe_items` in both `gaps.py` and `wear_patterns.py` to include the new limit and ensure recent wear events are capped. - Added unit tests to verify the new loading logic and query limits for wardrobe items. * refactor:Refactor item loading in `_load_user_wardrobe` for improved query * fix: remove duplicate sorting in `get_wear_patterns` function * test: Update assertions in `test_load_user_wardrobe_fetches_ids_then_top_scored_items` to validate query parameters and limits - Refactored test to extract query parameters from the second call of `sync_wardrobe.query_items`. - Updated assertions to check for the correct limit parameter in the SQL query.

- Introduced a new Cosmos DB container for image cleanup indexing, enhancing the wardrobe management system. - Updated the `WardrobeRepository` to support syncing and deleting entries in the image cleanup index. - Modified `CleanupFunctions` to utilize the new index for identifying known item IDs. - Enhanced local settings and infrastructure scripts to accommodate the new container. - Added unit tests to ensure correct behavior of the image cleanup index operations.

- Removed the `_normalise_query_text` and `_get_cache_scope` functions to simplify the codebase. - Updated `_expand_query_cached` to use a shared cache key based on normalized terms, improving cache efficiency across users and sessions. - Adjusted unit tests to reflect changes in caching behavior and ensure correct functionality with normalized queries.

- Added in-memory caching to the GenerateSasUrl method to improve performance by returning cached SAS URLs for repeated requests within a validity window. - Introduced MemoryCache for managing cached SAS URLs and defined cache skew and minimum cache duration constants. - Updated unit tests to verify caching behavior for allowed containers and validity windows, ensuring correct functionality under various scenarios. - Included Microsoft.Extensions.Caching.Memory package for caching support.

- Implemented Redis caching for SAS URL generation in BlobSasService to enhance performance and scalability. - Updated local.settings.json.example and QUICKSTART.md to include configuration for Redis cache. - Modified Program.cs to conditionally use Redis or in-memory caching based on configuration settings. - Enhanced unit tests to validate caching behavior with Redis and ensure correct functionality. - Updated Terraform scripts to support new SAS cache settings.

- Added `enable_cross_partition_query=True` to various query items in `function_app.py` and `scraper_runner.py` to enhance data retrieval across partitions. - Updated relevant functions to ensure efficient querying of active and global scraper sources, improving overall performance and reliability.

…ved clarity

…seconds for improved queue processing efficiency

…inclusion - Added a new robots.txt file to allow all user agents and specify the sitemap location. - Updated staticwebapp.config.json to exclude robots.txt and well-known paths from navigation fallback.

Limit wardrobe and wear-event scans for vault insights and extend cache TTL to reduce RU usage while preserving bounded analytics freshness.

Prevent unbounded profile reads during weekly digest startup by switching from list(read_all_items()) to a paginated query on user profile id only.

Canonicalized mood names are now re-embedded in one Azure OpenAI batch call instead of one request per item, with a safe fallback to existing embeddings when batch re-embedding fails.

…mprove error logging in digest agent. Added partition key to user data fetch in vault insights. Updated tests for mood processor and vault insights to reflect changes.

…digest_agent (#103) - Added support for extracting trace ID from headers in `_metadata_request_id_from_headers`. - Updated `_set_trace_identifier_kwargs` to accept additional metadata and conditionally include it in the kwargs. - Modified `_build_langfuse_callbacks` to accept metadata for improved context in callback generation. - Enhanced `run_digest_now` to forward trace ID from the request headers. - Introduced `_build_digest_langfuse_callbacks` in `digest_agent` for consistent metadata handling. - Updated tests to verify trace ID propagation and metadata inclusion in LLM calls.

Propagate load errors from run_global_scrapers so scraper jobs fail visible and can retry.

* Fix: reuse digest LLM instance across batch runs Digest generation now keeps a single Azure OpenAI client in-process to avoid creating a fresh client per user while preserving prompt and save behavior. * Fix test mock for digest llm invoke config path

coderabbitai · 2026-03-17T14:38:20Z

📝 Walkthrough

Walkthrough

This PR consolidates duplicated token resolution logic across three HTTP function handlers (AuthFunctions, CollectionFunctions, StylistFunctions) into a shared ITokenResolver interface and implementation. It also introduces LLM client singleton caching in digest processing, enhances error visibility in scraper runner, and adds trace ID propagation to Langfuse callbacks for improved observability.

Changes

Cohort / File(s)	Summary
Token Resolution Abstraction `PluckIt.Functions/Auth/TokenResolver.cs`, `PluckIt.Functions/Program.cs`	New `ITokenResolver` interface and `TokenResolver` sealed class centralize bearer token extraction, Google validation, session fallback, and local dev user resolution with secure logging. Registered as DI singleton.
Auth Functions Refactoring `PluckIt.Functions/Functions/AuthFunctions.cs`, `PluckIt.Functions/Functions/CollectionFunctions.cs`, `PluckIt.Functions/Functions/StylistFunctions.cs`	Replaced inline token validation and per-handler authentication logic with injected `ITokenResolver` dependency, removing duplicate validation and session resolution code. Simplified constructor signatures.
C# Unit Tests `PluckIt.Tests/Unit/Auth/TokenResolverTests.cs`, `PluckIt.Tests/Unit/Functions/CollectionFunctionsTests.cs`, `PluckIt.Tests/Unit/Functions/StylistFunctionsTests.cs`	Added comprehensive tests for TokenResolver covering token validation, session fallback, and local dev user scenarios. Updated existing function tests to use simplified TestTokenResolver mock.
Digest Agent Singleton & Tracing `PluckIt.Processor/agents/digest_agent.py`	Introduced per-process LLM singleton (`_get_digest_llm`) to reuse client across batch digest runs. Added parameterized `_build_digest_llm` with Langfuse callback support. Extended `run_digest_for_user` and `_generate_suggestions` with optional `trace_id` for trace propagation.
Function App Metadata & Trace Propagation `PluckIt.Processor/function_app.py`, `PluckIt.Processor/agents/scraper_runner.py`	Enhanced Langfuse callback construction to accept and propagate extra metadata (component, trace_label). Updated `run_digest_now` to extract X-Trace-Id header. Changed scraper runner error handling from silent return to exception re-raise for operational visibility.
Python Unit Tests `PluckIt.Processor/tests/unit/test_digest_agent.py`, `PluckIt.Processor/tests/unit/test_routes.py`, `PluckIt.Processor/tests/unit/test_scraper_runner.py`	Added tests for LLM singleton behavior, trace/metadata propagation through digest pipeline, and scraper runner failure modes. Verified X-Trace-Id header forwarding to digest and metadata inference paths.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HttpFunction as HTTP Function<br/>(AuthFunctions/CollectionFunctions<br/>/StylistFunctions)
    participant TokenResolver
    participant GoogleValidator as Google<br/>Validator
    participant SessionStore as RefreshSession<br/>Store
    participant Business as Business<br/>Logic

    Client->>HttpFunction: POST with Authorization header
    HttpFunction->>TokenResolver: ResolveUserIdAsync(request)
    
    alt Bearer Token Present
        TokenResolver->>TokenResolver: TryGetBearerToken()
        TokenResolver->>GoogleValidator: validateToken(token)
        alt Validation Success
            GoogleValidator-->>TokenResolver: userId
            TokenResolver-->>HttpFunction: userId
        else Validation Fails
            TokenResolver->>SessionStore: ResolveUserIdFromSessionToken()
            alt Session Resolves
                SessionStore-->>TokenResolver: userId
                TokenResolver-->>HttpFunction: userId
            else Session Fails
                TokenResolver-->>HttpFunction: null
            end
        end
    else No Bearer Token
        TokenResolver-->>HttpFunction: localDevUserId
    end
    
    HttpFunction->>Business: Process with resolved userId
    Business-->>Client: Response

sequenceDiagram
    participant Trigger as Timer Trigger
    participant FunctionApp as Function App<br/>(function_app.py)
    participant DigestAgent as Digest Agent
    participant DigestLLM as Digest LLM<br/>(Singleton)
    participant Langfuse as Langfuse<br/>Callbacks
    participant AzureOpenAI as Azure<br/>OpenAI

    Trigger->>FunctionApp: run_digest_now(trace_id)
    FunctionApp->>FunctionApp: Extract X-Trace-Id from headers
    FunctionApp->>DigestAgent: run_digest_for_user_with_status(user_id, trace_id)
    
    DigestAgent->>Langfuse: _build_digest_langfuse_callbacks(trace_id, user_id)
    Langfuse-->>DigestAgent: callbacks
    
    DigestAgent->>DigestAgent: _infer_climate_zone(callbacks, trace_id, user_id)
    DigestAgent->>DigestLLM: _get_digest_llm() [cached]
    DigestLLM->>AzureOpenAI: invoke with callbacks
    AzureOpenAI-->>DigestLLM: result
    DigestLLM-->>DigestAgent: climate_result
    
    DigestAgent->>DigestAgent: _generate_suggestions(callbacks, trace_id, user_id)
    DigestAgent->>DigestLLM: _get_digest_llm() [reused]
    DigestLLM->>AzureOpenAI: invoke with callbacks & metadata
    AzureOpenAI-->>DigestLLM: suggestions
    DigestLLM-->>DigestAgent: suggestions
    
    DigestAgent->>Langfuse: _flush_digest_langfuse_callbacks(callbacks)
    Langfuse-->>DigestAgent: flushed
    
    DigestAgent-->>FunctionApp: digest_result
    FunctionApp-->>Trigger: success

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: enhance user authentication and session management #47: Directly related as both PRs modify the same authentication codepaths (CollectionFunctions, StylistFunctions, and AuthFunctions) and integrate RefreshSessionStore-backed session fallback resolution patterns that TokenResolver now centralizes.
feat: implement refresh session functionality with token management #45: Directly related because TokenResolver wires an optional RefreshSessionStore dependency to resolve session user IDs when token validation fails, operating on the same session/auth infrastructure.

Poem

🐰 A Rabbit's Refactoring Rhyme

Once tokens sprawled in triplicate mess,
Each handler a copy—a duplicative stress!
Now TokenResolver reigns, singular and bright,
While digest LLMs cache for the night.
🥕 Traces now flow from tail to toe—
One source of truth: observe and grow! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title 'Dev' is generic and vague, providing no meaningful information about the primary change in this substantial refactoring pull request.	Revise the title to clearly summarize the main objective, such as 'Consolidate token resolution and optimize digest LLM initialization' or 'Refactor auth token validation and fix scraper error handling'.
Description check	⚠️ Warning	The PR description is incomplete, listing only linked issue numbers without addressing required template sections (Summary, Changes, Testing, Risks/Checklist).	Complete the description by filling in Summary (what/why), Changes (bullet list), Testing (which tests ran), and Risks/Notes sections per the repository template.
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	All three linked issues (`#99`, `#100`, `#101`) have corresponding implementations: TokenResolver consolidates auth logic [`#100`], scraper_runner re-raises exceptions [`#101`], and digest_agent implements LLM singleton caching [`#99`].
Out of Scope Changes check	✅ Passed	All changes directly support the three linked issues: token resolver refactoring, scraper error handling, digest LLM caching, and related test updates are all in scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

PluckIt.Tests/Unit/Functions/CollectionFunctionsTests.cs (1)

229-237: Consider centralizing TestTokenResolver in shared test helpers.

This test double is now duplicated across function test classes; moving it into a shared helper would reduce drift and maintenance overhead.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@PluckIt.Tests/Unit/Functions/CollectionFunctionsTests.cs` around lines 229 -
237, Centralize the duplicated test double by extracting the sealed
TestTokenResolver(string? userId) implementation (which implements
ITokenResolver and its ResolveUserIdAsync(HttpRequestData, CancellationToken)
method) into a shared test helper class in the test project and update all
function test classes to reference this single helper instead of their local
copies; remove the duplicated inner classes from tests and ensure the shared
TestTokenResolver is public/internal as needed so existing tests compile and
continue returning Task.FromResult<string?>(userId).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@PluckIt.Tests/Unit/Functions/CollectionFunctionsTests.cs`:
- Around line 229-237: Centralize the duplicated test double by extracting the
sealed TestTokenResolver(string? userId) implementation (which implements
ITokenResolver and its ResolveUserIdAsync(HttpRequestData, CancellationToken)
method) into a shared test helper class in the test project and update all
function test classes to reference this single helper instead of their local
copies; remove the duplicated inner classes from tests and ensure the shared
TestTokenResolver is public/internal as needed so existing tests compile and
continue returning Task.FromResult<string?>(userId).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 669f017a-b5c1-43db-a355-4aef80667b43

📥 Commits

Reviewing files that changed from the base of the PR and between 6f16e95 and 31fa30e.

📒 Files selected for processing (14)

PluckIt.Functions/Auth/TokenResolver.cs
PluckIt.Functions/Functions/AuthFunctions.cs
PluckIt.Functions/Functions/CollectionFunctions.cs
PluckIt.Functions/Functions/StylistFunctions.cs
PluckIt.Functions/Program.cs
PluckIt.Processor/agents/digest_agent.py
PluckIt.Processor/agents/scraper_runner.py
PluckIt.Processor/function_app.py
PluckIt.Processor/tests/unit/test_digest_agent.py
PluckIt.Processor/tests/unit/test_routes.py
PluckIt.Processor/tests/unit/test_scraper_runner.py
PluckIt.Tests/Unit/Auth/TokenResolverTests.cs
PluckIt.Tests/Unit/Functions/CollectionFunctionsTests.cs
PluckIt.Tests/Unit/Functions/StylistFunctionsTests.cs

AB-Law and others added 30 commits March 15, 2026 21:08

Merge remote-tracking branch 'origin/main' into dev

2259bdb

ci: auto-sync dev from main after every merge

62690d9

fix: use http for vnext-preview Cosmos emulator (no TLS on ARM Mac)

bb64c2e

feat: add setup-local-cosmos.py script to bootstrap emulator containers

b438563

fix: use UseDevelopmentStorage=true for Azurite blob container creation

e937186

feat: add use-env.sh script to switch between local and prod settings

49f1de7

Potential fix for code scanning alert no. 10: Workflow does not conta…

1f33755

…in permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

fix: use UseDevelopmentStorage=true for Azurite storage connection st…

c816e51

…rings

Merge remote-tracking branch 'origin/main' into dev

5e305d7

Merge branch 'dev' of https://github.com/AB-Law/Pluck-It into dev

917c950

fix: codeRabbit comments

a0b560c

refactor: Update MakeItem method in WardrobeRepositoryTests for impro…

4ac9bc3

…ved clarity

chore: Increase maxPollingInterval in host.json from 2 seconds to 15 …

2c62371

…seconds for improved queue processing efficiency

Merge remote-tracking branch 'origin/main' into dev

2d45250

feat: add robots.txt and update staticwebapp.config.json for sitemap …

29c0a81

…inclusion - Added a new robots.txt file to allow all user agents and specify the sitemap location. - Updated staticwebapp.config.json to exclude robots.txt and well-known paths from navigation fallback.

fix: test case

73663c0

Merge remote-tracking branch 'origin/main' into dev

25c728a

fix: cap vault insights scans and extend cache TTL (#88)

2fba10e

Limit wardrobe and wear-event scans for vault insights and extend cache TTL to reduce RU usage while preserving bounded analytics freshness.

Fix issue #59: validate LLM category output (#89)

08416a6

Fix: paginate digest profile reads (#90)

92c6524

Prevent unbounded profile reads during weekly digest startup by switching from list(read_all_items()) to a paginated query on user profile id only.

AB-Law and others added 9 commits March 16, 2026 22:43

Optimize mood canonicalization embeddings via batching. (#91)

8abba56

Canonicalized mood names are now re-embedded in one Azure OpenAI batch call instead of one request per item, with a safe fallback to existing embeddings when batch re-embedding fails.

fix: clamp wear history maxResults (#93)

061a479

Migrate mood processor to async Cosmos container calls (#92)

53dbb3b

fix: Enhance mood processor to filter out unknown primary moods and i…

9c505df

…mprove error logging in digest agent. Added partition key to user data fetch in vault insights. Updated tests for mood processor and vault insights to reflect changes.

Merge remote-tracking branch 'origin/main' into dev

8af1f7a

Fix silent scraper source-load failures (#105)

17975df

Propagate load errors from run_global_scrapers so scraper jobs fail visible and can retry.

Refactor auth token resolution into shared service (#106)

31fa30e

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

AB-Law merged commit 854d10f into main Mar 17, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev#108

Dev#108
AB-Law merged 39 commits intomainfrom
dev

AB-Law commented Mar 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 17, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (3 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AB-Law commented Mar 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (3 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AB-Law commented Mar 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 17, 2026 •

edited

Loading