Skip to content

Dev#108

Merged
AB-Law merged 39 commits intomainfrom
dev
Mar 17, 2026
Merged

Dev#108
AB-Law merged 39 commits intomainfrom
dev

Conversation

@AB-Law
Copy link
Copy Markdown
Owner

@AB-Law AB-Law commented Mar 17, 2026

Closes #100
Closes #101
Closes #99

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved scraper error handling to fail-fast on load failures, preventing silent execution issues.
  • Refactor

    • Consolidated authentication token resolution pipeline across services for streamlined credential handling.
    • Enhanced request tracing and metadata propagation throughout the system for improved debugging and system observability.
  • Tests

    • Added comprehensive unit tests for token resolution and digest processing workflows.

AB-Law and others added 30 commits March 15, 2026 21:08
- Rewrite QUICKSTART.md with Azurite + Cosmos emulator setup, container
  creation steps, port reference, and 5-tab run guide
- Add CONTRIBUTING.md: branch strategy, tech stack, per-layer conventions,
  PR checklist, and secrets policy
- Add local.settings.json.example for both PluckIt.Functions and
  PluckIt.Processor pre-filled with emulator connection strings and
  placeholder values for Azure OpenAI credentials

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…in permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* feat: integrate ESLint for improved code quality

- Added ESLint configuration to the project for TypeScript and Angular files.
- Updated `angular.json` to include linting options.
- Installed necessary ESLint packages for Angular and TypeScript support.
- Refactored code to replace `any` with `unknown` in several service files for better type safety.
- Enhanced unit tests to align with the new type definitions and ensure consistency.

* refactor: remove unused page

* fix: correct Cosmos emulator key (invalid base64 Tg== -> g==)

* fix: update Cosmos DB keys and enhance Azurite compatibility for local development

* fix: enhance UUID generation and improve error handling in BlobSasService and setup-local-cosmos.py

* chore: update CI/CD workflows to include 'dev' branch for push and pull request triggers

* fix: improve UUID generation logic in DashboardComponent and update CI to disable watch mode for tests
* feat: enhance wardrobe loading logic
- Updated the `_load_user_wardrobe` function to prefetch item IDs and limit the number of items returned based on wear count.
- Enhanced unit tests for `_load_user_wardrobe` to verify correct behavior and query execution.

* test: enhance user wardrobe loading tests to verify wearCount conditions
- Added assertions to check for the presence of "IS_DEFINED(c.wearCount)" and "c.wearCount > 0" in the SQL query for loading user wardrobe items.
* feat: Enhance deduplication logic in RunDeduplicator by implementing prefix-based pHash storage. Introduced methods for registering pHashes by prefix and identifying candidate buckets for Hamming distance checks. Updated tests to validate detection of duplicates with nearby prefix variations.

* refactor: Replace candidate bucket method with a cached version for improved performance in deduplication logic. The new method optimizes pHash prefix bucket retrieval by caching results, enhancing efficiency during duplicate checks.

* refactor: Simplify candidate bucket retrieval by removing max value check in deduplicator. Added new tests to validate deduplication behavior at threshold boundaries and just under threshold conditions.
* Refactor wardrobe item loading and enhance query limits

- Updated `_load_user_wardrobe` to first fetch item IDs and then retrieve top scored items with a limit of 50.
- Introduced a constant `_WARDROBE_SCAN_LIMIT` to cap the number of items fetched in wardrobe queries to 500.
- Enhanced `_load_wardrobe_items` in both `gaps.py` and `wear_patterns.py` to include the new limit and ensure recent wear events are capped.
- Added unit tests to verify the new loading logic and query limits for wardrobe items.

* refactor:Refactor item loading in `_load_user_wardrobe` for improved query

* fix: remove duplicate sorting in `get_wear_patterns` function

* test: Update assertions in `test_load_user_wardrobe_fetches_ids_then_top_scored_items` to validate query parameters and limits

- Refactored test to extract query parameters from the second call of `sync_wardrobe.query_items`.
- Updated assertions to check for the correct limit parameter in the SQL query.
- Introduced a new Cosmos DB container for image cleanup indexing, enhancing the wardrobe management system.
- Updated the `WardrobeRepository` to support syncing and deleting entries in the image cleanup index.
- Modified `CleanupFunctions` to utilize the new index for identifying known item IDs.
- Enhanced local settings and infrastructure scripts to accommodate the new container.
- Added unit tests to ensure correct behavior of the image cleanup index operations.
- Removed the `_normalise_query_text` and `_get_cache_scope` functions to simplify the codebase.
- Updated `_expand_query_cached` to use a shared cache key based on normalized terms, improving cache efficiency across users and sessions.
- Adjusted unit tests to reflect changes in caching behavior and ensure correct functionality with normalized queries.
- Added in-memory caching to the GenerateSasUrl method to improve performance by returning cached SAS URLs for repeated requests within a validity window.
- Introduced MemoryCache for managing cached SAS URLs and defined cache skew and minimum cache duration constants.
- Updated unit tests to verify caching behavior for allowed containers and validity windows, ensuring correct functionality under various scenarios.
- Included Microsoft.Extensions.Caching.Memory package for caching support.
- Implemented Redis caching for SAS URL generation in BlobSasService to enhance performance and scalability.
- Updated local.settings.json.example and QUICKSTART.md to include configuration for Redis cache.
- Modified Program.cs to conditionally use Redis or in-memory caching based on configuration settings.
- Enhanced unit tests to validate caching behavior with Redis and ensure correct functionality.
- Updated Terraform scripts to support new SAS cache settings.
- Added `enable_cross_partition_query=True` to various query items in `function_app.py` and `scraper_runner.py` to enhance data retrieval across partitions.
- Updated relevant functions to ensure efficient querying of active and global scraper sources, improving overall performance and reliability.
…seconds for improved queue processing efficiency
…inclusion

- Added a new robots.txt file to allow all user agents and specify the sitemap location.
- Updated staticwebapp.config.json to exclude robots.txt and well-known paths from navigation fallback.
Limit wardrobe and wear-event scans for vault insights and extend cache TTL to reduce RU usage while preserving bounded analytics freshness.
Prevent unbounded profile reads during weekly digest startup by switching
from list(read_all_items()) to a paginated query on user profile id only.
AB-Law and others added 9 commits March 16, 2026 22:43
Canonicalized mood names are now re-embedded in one Azure OpenAI batch call
instead of one request per item, with a safe fallback to existing embeddings
when batch re-embedding fails.
…mprove error logging in digest agent. Added partition key to user data fetch in vault insights. Updated tests for mood processor and vault insights to reflect changes.
…digest_agent (#103)

- Added support for extracting trace ID from headers in `_metadata_request_id_from_headers`.
- Updated `_set_trace_identifier_kwargs` to accept additional metadata and conditionally include it in the kwargs.
- Modified `_build_langfuse_callbacks` to accept metadata for improved context in callback generation.
- Enhanced `run_digest_now` to forward trace ID from the request headers.
- Introduced `_build_digest_langfuse_callbacks` in `digest_agent` for consistent metadata handling.
- Updated tests to verify trace ID propagation and metadata inclusion in LLM calls.
Propagate load errors from run_global_scrapers so scraper jobs fail visible and can retry.
* Fix: reuse digest LLM instance across batch runs

Digest generation now keeps a single Azure OpenAI client in-process to avoid
creating a fresh client per user while preserving prompt and save behavior.

* Fix test mock for digest llm invoke config path
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

This PR consolidates duplicated token resolution logic across three HTTP function handlers (AuthFunctions, CollectionFunctions, StylistFunctions) into a shared ITokenResolver interface and implementation. It also introduces LLM client singleton caching in digest processing, enhances error visibility in scraper runner, and adds trace ID propagation to Langfuse callbacks for improved observability.

Changes

Cohort / File(s) Summary
Token Resolution Abstraction
PluckIt.Functions/Auth/TokenResolver.cs, PluckIt.Functions/Program.cs
New ITokenResolver interface and TokenResolver sealed class centralize bearer token extraction, Google validation, session fallback, and local dev user resolution with secure logging. Registered as DI singleton.
Auth Functions Refactoring
PluckIt.Functions/Functions/AuthFunctions.cs, PluckIt.Functions/Functions/CollectionFunctions.cs, PluckIt.Functions/Functions/StylistFunctions.cs
Replaced inline token validation and per-handler authentication logic with injected ITokenResolver dependency, removing duplicate validation and session resolution code. Simplified constructor signatures.
C# Unit Tests
PluckIt.Tests/Unit/Auth/TokenResolverTests.cs, PluckIt.Tests/Unit/Functions/CollectionFunctionsTests.cs, PluckIt.Tests/Unit/Functions/StylistFunctionsTests.cs
Added comprehensive tests for TokenResolver covering token validation, session fallback, and local dev user scenarios. Updated existing function tests to use simplified TestTokenResolver mock.
Digest Agent Singleton & Tracing
PluckIt.Processor/agents/digest_agent.py
Introduced per-process LLM singleton (_get_digest_llm) to reuse client across batch digest runs. Added parameterized _build_digest_llm with Langfuse callback support. Extended run_digest_for_user and _generate_suggestions with optional trace_id for trace propagation.
Function App Metadata & Trace Propagation
PluckIt.Processor/function_app.py, PluckIt.Processor/agents/scraper_runner.py
Enhanced Langfuse callback construction to accept and propagate extra metadata (component, trace_label). Updated run_digest_now to extract X-Trace-Id header. Changed scraper runner error handling from silent return to exception re-raise for operational visibility.
Python Unit Tests
PluckIt.Processor/tests/unit/test_digest_agent.py, PluckIt.Processor/tests/unit/test_routes.py, PluckIt.Processor/tests/unit/test_scraper_runner.py
Added tests for LLM singleton behavior, trace/metadata propagation through digest pipeline, and scraper runner failure modes. Verified X-Trace-Id header forwarding to digest and metadata inference paths.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HttpFunction as HTTP Function<br/>(AuthFunctions/CollectionFunctions<br/>/StylistFunctions)
    participant TokenResolver
    participant GoogleValidator as Google<br/>Validator
    participant SessionStore as RefreshSession<br/>Store
    participant Business as Business<br/>Logic

    Client->>HttpFunction: POST with Authorization header
    HttpFunction->>TokenResolver: ResolveUserIdAsync(request)
    
    alt Bearer Token Present
        TokenResolver->>TokenResolver: TryGetBearerToken()
        TokenResolver->>GoogleValidator: validateToken(token)
        alt Validation Success
            GoogleValidator-->>TokenResolver: userId
            TokenResolver-->>HttpFunction: userId
        else Validation Fails
            TokenResolver->>SessionStore: ResolveUserIdFromSessionToken()
            alt Session Resolves
                SessionStore-->>TokenResolver: userId
                TokenResolver-->>HttpFunction: userId
            else Session Fails
                TokenResolver-->>HttpFunction: null
            end
        end
    else No Bearer Token
        TokenResolver-->>HttpFunction: localDevUserId
    end
    
    HttpFunction->>Business: Process with resolved userId
    Business-->>Client: Response
Loading
sequenceDiagram
    participant Trigger as Timer Trigger
    participant FunctionApp as Function App<br/>(function_app.py)
    participant DigestAgent as Digest Agent
    participant DigestLLM as Digest LLM<br/>(Singleton)
    participant Langfuse as Langfuse<br/>Callbacks
    participant AzureOpenAI as Azure<br/>OpenAI

    Trigger->>FunctionApp: run_digest_now(trace_id)
    FunctionApp->>FunctionApp: Extract X-Trace-Id from headers
    FunctionApp->>DigestAgent: run_digest_for_user_with_status(user_id, trace_id)
    
    DigestAgent->>Langfuse: _build_digest_langfuse_callbacks(trace_id, user_id)
    Langfuse-->>DigestAgent: callbacks
    
    DigestAgent->>DigestAgent: _infer_climate_zone(callbacks, trace_id, user_id)
    DigestAgent->>DigestLLM: _get_digest_llm() [cached]
    DigestLLM->>AzureOpenAI: invoke with callbacks
    AzureOpenAI-->>DigestLLM: result
    DigestLLM-->>DigestAgent: climate_result
    
    DigestAgent->>DigestAgent: _generate_suggestions(callbacks, trace_id, user_id)
    DigestAgent->>DigestLLM: _get_digest_llm() [reused]
    DigestLLM->>AzureOpenAI: invoke with callbacks & metadata
    AzureOpenAI-->>DigestLLM: suggestions
    DigestLLM-->>DigestAgent: suggestions
    
    DigestAgent->>Langfuse: _flush_digest_langfuse_callbacks(callbacks)
    Langfuse-->>DigestAgent: flushed
    
    DigestAgent-->>FunctionApp: digest_result
    FunctionApp-->>Trigger: success
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 A Rabbit's Refactoring Rhyme

Once tokens sprawled in triplicate mess,
Each handler a copy—a duplicative stress!
Now TokenResolver reigns, singular and bright,
While digest LLMs cache for the night.
🥕 Traces now flow from tail to toe—
One source of truth: observe and grow!

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'Dev' is generic and vague, providing no meaningful information about the primary change in this substantial refactoring pull request. Revise the title to clearly summarize the main objective, such as 'Consolidate token resolution and optimize digest LLM initialization' or 'Refactor auth token validation and fix scraper error handling'.
Description check ⚠️ Warning The PR description is incomplete, listing only linked issue numbers without addressing required template sections (Summary, Changes, Testing, Risks/Checklist). Complete the description by filling in Summary (what/why), Changes (bullet list), Testing (which tests ran), and Risks/Notes sections per the repository template.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed All three linked issues (#99, #100, #101) have corresponding implementations: TokenResolver consolidates auth logic [#100], scraper_runner re-raises exceptions [#101], and digest_agent implements LLM singleton caching [#99].
Out of Scope Changes check ✅ Passed All changes directly support the three linked issues: token resolver refactoring, scraper error handling, digest LLM caching, and related test updates are all in scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
PluckIt.Tests/Unit/Functions/CollectionFunctionsTests.cs (1)

229-237: Consider centralizing TestTokenResolver in shared test helpers.

This test double is now duplicated across function test classes; moving it into a shared helper would reduce drift and maintenance overhead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@PluckIt.Tests/Unit/Functions/CollectionFunctionsTests.cs` around lines 229 -
237, Centralize the duplicated test double by extracting the sealed
TestTokenResolver(string? userId) implementation (which implements
ITokenResolver and its ResolveUserIdAsync(HttpRequestData, CancellationToken)
method) into a shared test helper class in the test project and update all
function test classes to reference this single helper instead of their local
copies; remove the duplicated inner classes from tests and ensure the shared
TestTokenResolver is public/internal as needed so existing tests compile and
continue returning Task.FromResult<string?>(userId).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@PluckIt.Tests/Unit/Functions/CollectionFunctionsTests.cs`:
- Around line 229-237: Centralize the duplicated test double by extracting the
sealed TestTokenResolver(string? userId) implementation (which implements
ITokenResolver and its ResolveUserIdAsync(HttpRequestData, CancellationToken)
method) into a shared test helper class in the test project and update all
function test classes to reference this single helper instead of their local
copies; remove the duplicated inner classes from tests and ensure the shared
TestTokenResolver is public/internal as needed so existing tests compile and
continue returning Task.FromResult<string?>(userId).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 669f017a-b5c1-43db-a355-4aef80667b43

📥 Commits

Reviewing files that changed from the base of the PR and between 6f16e95 and 31fa30e.

📒 Files selected for processing (14)
  • PluckIt.Functions/Auth/TokenResolver.cs
  • PluckIt.Functions/Functions/AuthFunctions.cs
  • PluckIt.Functions/Functions/CollectionFunctions.cs
  • PluckIt.Functions/Functions/StylistFunctions.cs
  • PluckIt.Functions/Program.cs
  • PluckIt.Processor/agents/digest_agent.py
  • PluckIt.Processor/agents/scraper_runner.py
  • PluckIt.Processor/function_app.py
  • PluckIt.Processor/tests/unit/test_digest_agent.py
  • PluckIt.Processor/tests/unit/test_routes.py
  • PluckIt.Processor/tests/unit/test_scraper_runner.py
  • PluckIt.Tests/Unit/Auth/TokenResolverTests.cs
  • PluckIt.Tests/Unit/Functions/CollectionFunctionsTests.cs
  • PluckIt.Tests/Unit/Functions/StylistFunctionsTests.cs

@AB-Law AB-Law merged commit 854d10f into main Mar 17, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant