Skip to content

Conversation

@monkscode
Copy link
Owner

@monkscode monkscode commented Nov 16, 2025

  • Added KeywordSearchTool for semantic search over Robot Framework keywords.
  • Introduced QueryPatternMatcher for learning and predicting keyword usage patterns.
  • Developed SmartKeywordProvider to orchestrate keyword retrieval with a hybrid architecture.
  • Configured centralized logging for optimization components in logging_config.py.
  • Enhanced RobotTasks to provide minimal keyword guidelines for planning phase.
  • Updated requirements.txt to include dependencies for ChromaDB and sentence-transformers.
  • Modified workflow service to learn from successful test executions and store patterns.
  • Updated frontend to store and pass the original user query for pattern learning during execution.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added performance optimization system with configurable token reduction and keyword search capabilities
    • Introduced pattern learning to improve keyword predictions from successful test executions
    • Added context pruning to streamline library guidance based on user queries
  • Configuration

    • New optional settings for enabling optimization, configuring storage paths, and tuning keyword search and confidence thresholds
  • Documentation

    • Added comprehensive guides for optimization system setup, configuration, and developer extension points
  • Metrics & Monitoring

    • Enhanced metrics tracking to include token usage, keyword search statistics, pattern learning accuracy, and context reduction measurements

- Added `KeywordSearchTool` for semantic search over Robot Framework keywords.
- Introduced `QueryPatternMatcher` for learning and predicting keyword usage patterns.
- Developed `SmartKeywordProvider` to orchestrate keyword retrieval with a hybrid architecture.
- Configured centralized logging for optimization components in `logging_config.py`.
- Enhanced `RobotTasks` to provide minimal keyword guidelines for planning phase.
- Updated `requirements.txt` to include dependencies for ChromaDB and sentence-transformers.
- Modified workflow service to learn from successful test executions and store patterns.
- Updated frontend to store and pass the original user query for pattern learning during execution.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 16, 2025

Walkthrough

Introduces a comprehensive optimization system for CrewAI that reduces token usage via a hybrid knowledge architecture. Implements core rules, semantic keyword search (ChromaDB), pattern learning from executed code, context pruning, and workflow metrics tracking across backend configuration, agent initialization, optimization components, and frontend integration.

Changes

Cohort / File(s) Summary
Configuration & Environment
.gitignore, src/backend/.env.example, src/backend/core/config.py
Updated ignore rules for optimization-related artifacts; introduced OPTIMIZATION_* environment variables and Settings fields (enabled, paths, thresholds, pruning controls) with validation for confidence thresholds.
Documentation
docs/OPTIMIZATION.md, docs/OPTIMIZATION_DEVELOPER_GUIDE.md
Added two comprehensive guides: user-facing documentation covering architecture, four-tier flow, configuration, monitoring, and troubleshooting; and developer guide with module structure, class interfaces, workflows, testing approaches, and extensions.
Metrics & Tracking
src/backend/core/workflow_metrics.py
Added optimization metrics fields (token_usage, keyword_search_stats, pattern_learning_stats, context_reduction), tracking methods, serialization support, and count_tokens helper function.
Library Context Base & Implementations
src/backend/crew_ai/library_context/base.py, src/backend/crew_ai/library_context/browser_context.py, src/backend/crew_ai/library_context/selenium_context.py, src/backend/crew_ai/library_context/dynamic_context.py
Added abstract core_rules property to base class; implemented core_rules, lazy-loaded planning/code-assembly contexts, and get_minimal_planning_context utility across concrete implementations.
Agent Architecture
src/backend/crew_ai/agents.py, src/backend/crew_ai/tasks.py
Extended RobotAgents.init to accept optimized_context, keyword_search_tool, and role-specific contexts; simplified agent goals and backstories; updated task keyword guidelines to use minimal planning_context.
Crew & Workflow Integration
src/backend/crew_ai/crew.py, src/backend/api/endpoints.py, src/backend/services/workflow_service.py
Modified run_crew to initialize optimization components and return 3-tuple with metrics; added user_query parameter to execute endpoints and stream functions; integrated pattern-learning post-execution triggers.
Core Optimization Package
src/backend/crew_ai/optimization/__init__.py, src/backend/crew_ai/optimization/chroma_store.py, src/backend/crew_ai/optimization/keyword_search_tool.py, src/backend/crew_ai/optimization/pattern_learning.py, src/backend/crew_ai/optimization/smart_keyword_provider.py, src/backend/crew_ai/optimization/context_pruner.py, src/backend/crew_ai/optimization/logging_config.py
Introduced complete optimization subsystem: ChromaDB-backed keyword vector store with semantic search; keyword search tool for agents; pattern learning system with SQLite; context pruner for semantic classification; three-tier keyword provider with fallback logic; and centralized optimization logging.
Frontend & Dependencies
src/frontend/script.js, src/backend/requirements.txt
Captured and propagated user_query through execution pathway for pattern learning; added chromadb, sentence-transformers, and numpy dependencies.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Frontend as Frontend<br/>script.js
    participant API as API<br/>endpoints.py
    participant Crew as Crew<br/>crew.py
    participant Optimization as Optimization<br/>Components
    participant Agents as Agent Planner<br/>Identifier, etc.
    participant Workflow as Workflow<br/>Service
    
    User->>Frontend: Submit query
    Frontend->>Frontend: Store currentUserQuery
    Frontend->>API: POST /generate + /execute-test
    
    Note over Crew,Optimization: Optimization Phase
    API->>Crew: run_crew (with settings)
    Crew->>Optimization: Initialize if OPTIMIZATION_ENABLED
    Optimization->>Optimization: Build core_rules for all agents
    Optimization->>Optimization: Load pattern predictions (Tier 2)
    Optimization->>Optimization: Fallback to zero-context+tool (Tier 2)
    Optimization->>Optimization: Fallback to full context (Tier 3)
    Crew->>Agents: Pass optimized_context + keyword_search_tool
    
    Note over Agents: Agent Execution
    Agents->>Agents: Generate code
    Workflow->>Workflow: Execute test
    
    Note over Workflow: Pattern Learning Phase
    Workflow->>Workflow: Check if test passed + user_query exists
    Workflow->>Optimization: Initialize optimization components
    Optimization->>Optimization: Extract keywords from code
    Optimization->>Optimization: Learn pattern: query→keywords
    Optimization->>Optimization: Update stats in SQLite
    
    Frontend->>User: Display result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • src/backend/crew_ai/optimization/smart_keyword_provider.py — Orchestrates three-tier fallback logic with multiple conditional branches; verify all fallback paths are correctly triggered and metrics are accurately tracked.
  • src/backend/crew_ai/optimization/chroma_store.py — ChromaDB integration with version tracking and automatic rebuilds; review error handling for edge cases (corrupt DB, missing embeddings, collection mismatches).
  • src/backend/crew_ai/crew.py — Return signature changed from 2-tuple to 3-tuple with optimization_metrics; verify all call sites properly unpack the return value.
  • src/backend/crew_ai/optimization/pattern_learning.py — SQLite database initialization and keyword extraction regex; validate extraction logic against Robot Framework syntax edge cases.
  • src/backend/crew_ai/agents.py — Significant agent refactoring with context prioritization; confirm all agent role-specific context selections (planner_context, identifier_context, etc.) are correctly applied.
  • src/backend/services/workflow_service.py — Post-execution pattern learning sidecar logic; ensure non-blocking error handling and verify it doesn't interfere with normal workflow completion.
  • Context pruning thresholds and similarity scoring across multiple files — Validate that configuration ranges (0.0–1.0) are enforced and that threshold logic is consistent.

Poem

🐰 Hop and hum, the optimization drum,
Core rules gleam, patterns strum,
Context pruned, token bills shrink,
Three tiers tall, no more chink!
Learn and search, learn and sway,
CrewAI's smarter every day! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main additions: a keyword search tool and pattern learning system for the optimization feature.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/keyword-learning-optimisation

Tip

📝 Customizable high-level summaries are now available!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide custom instructions to shape the summary (bullet lists, tables, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example:

"Create a concise high-level summary as a bullet-point list. Then include a Markdown table showing lines added and removed by each contributing author."


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@monkscode monkscode requested a review from Devasy23 November 16, 2025 13:40
@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/backend/crew_ai/agents.py (2)

80-112: Planner context is never used

SmartKeywordProvider now supplies a trimmed planner_context, but step_planner_agent still builds guidance solely from library_context. As a result the optimization work never reaches the agent and we keep sending the heavy fallback context. Please honor the optimized planner_context first and only fall back to the legacy guidance when it is missing.

-        library_name = self.library_context.library_name if self.library_context else 'Robot Framework'
-        
-        # Step Planner needs MINIMAL context - just library name and core principles
-        # It doesn't need keyword details - that's for the Code Assembler
-        library_guidance = ""
-        if self.library_context:
-            library_guidance = f"""
+        library_name = self.library_context.library_name if self.library_context else 'Robot Framework'
+        
+        # Prefer optimized planner context when provided, otherwise fall back to minimal library guidance
+        planner_guidance = ""
+        if self.planner_context:
+            planner_guidance = f"\n\n{self.planner_context.strip()}"
+        elif self.library_context:
+            planner_guidance = f"""
@@
-                "8. Create HIGH-LEVEL steps - the Code Assembler will handle keyword details."
-                f"{library_guidance}"
+                "8. Create HIGH-LEVEL steps - the Code Assembler will handle keyword details."
+                f"{planner_guidance}"

118-134: Identifier context is ignored

The optimized identifier_context passed into RobotAgents is never used, so the element identifier agent still operates on the legacy backstory. That defeats the optimization pipeline’s attempt to shrink context for this role. Please insert the identifier_context when provided before falling back to the static text.

-    def element_identifier_agent(self) -> Agent:
-        return Agent(
+    def element_identifier_agent(self) -> Agent:
+        identifier_guidance = ""
+        if self.identifier_context:
+            identifier_guidance = f"\n\n{self.identifier_context.strip()}"
+
+        return Agent(
@@
-                "Benefits: Browser opens once (3-5x faster), full context awareness, intelligent popup handling, validated locators."
-            ),
+                "Benefits: Browser opens once (3-5x faster), full context awareness, intelligent popup handling, validated locators."
+                f"{identifier_guidance}"
+            ),
🧹 Nitpick comments (12)
.gitignore (1)

40-40: Use trailing slash for directory pattern consistency.

Line 40 ignores logs/temp_metrics without a trailing slash. For consistency with standard gitignore directory patterns (e.g., chroma_db/ on line 38), use a trailing slash to explicitly denote this as a directory:

-logs/temp_metrics
+logs/temp_metrics/

This prevents accidental matching of files named temp_metrics at different levels.

src/backend/requirements.txt (1)

26-34: Validate new ML dependency versions against your Python/runtime targets

The new chromadb==0.4.22, sentence-transformers==2.2.2, and numpy==1.24.3 pins look reasonable, but they are heavy and somewhat opinionated:

  • sentence-transformers will pull in substantial ML dependencies (e.g., torch), which will noticeably increase image size and cold-start time.
  • numpy==1.24.3 may not be compatible with newer Python runtimes (e.g., Python 3.12 prefers a newer NumPy).

I’d suggest double-checking that:

  • These versions are supported on the Python version you ship in Docker/production.
  • The footprint/performance impact is acceptable (or consider putting these behind an extra or separate image if not always needed).
src/backend/.env.example (1)

69-107: Consider aligning example OPTIMIZATION_ENABLED value with the documented default

The comment says default is false (disabled until fully tested), but .env.example ships with:

OPTIMIZATION_ENABLED=true

Given many users will cp .env.example .env, this effectively enables the new optimization system by default, which may surprise them if they haven’t read the docs yet.

I’d consider either:

  • Changing the example to OPTIMIZATION_ENABLED=false and letting the docs show how to turn it on, or
  • Adjusting the comment to clarify that the example enables optimization even though the code default is false.
src/backend/core/config.py (1)

39-47: Optimization config looks solid; consider adding TOP_K validation and tightening error handling

The new optimization settings and confidence-threshold validator are well-structured and match the env/example usage.

Two minor improvement ideas:

  1. Enforce the documented range for OPTIMIZATION_KEYWORD_SEARCH_TOP_K
    Docs and .env.example describe a valid range of 1–10, but the config doesn’t enforce it. Adding a small validator would prevent misconfiguration, e.g.:

    @validator("OPTIMIZATION_KEYWORD_SEARCH_TOP_K")
    def validate_keyword_search_top_k(cls, v: int) -> int:
        if not 1 <= v <= 10:
            raise ValueError(f"OPTIMIZATION_KEYWORD_SEARCH_TOP_K must be between 1 and 10, got {v}")
        return v
  2. Error message length (Ruff TRY003)
    Ruff flags the relatively long error message in validate_confidence_threshold. This is purely stylistic; if you want to appease it, you could shorten the message or factor it into a constant, but functionally it’s fine as-is.

Also note: OPTIMIZATION_CONTEXT_PRUNING_THRESHOLD here defaults to 0.6, while docs/OPTIMIZATION.md currently states a default of 0.8; worth reconciling so operators don’t get conflicting information.

Also applies to: 77-82

src/backend/crew_ai/optimization/logging_config.py (1)

23-47: Simplify logger naming to guarantee hierarchy and narrow the exception catch around file handler

Nice centralized logging surface; a couple of small robustness points:

  1. get_optimization_logger hierarchy can be brittle
    Current behavior depends on whether name starts with or contains "optimization". If callers follow the doc and pass __name__, modules with names like src.backend.crew_ai.optimization.keyword_search_tool will get loggers outside the crew_ai.optimization tree, so they won’t automatically inherit the handlers configured on OPTIMIZATION_LOGGER_NAME.

    A simpler, more predictable pattern is:

    def get_optimization_logger(name: str) -> logging.Logger:
        if name.startswith(OPTIMIZATION_LOGGER_NAME):
            logger_name = name
        else:
            logger_name = f"{OPTIMIZATION_LOGGER_NAME}.{name}"
        return logging.getLogger(logger_name)

    and then pass a short component name (e.g. "keyword_search") or __name__ if you really want the fully-qualified under that prefix.

  2. Catching bare Exception when configuring file logging (Ruff BLE001)
    Functionally it’s acceptable to treat any failure as “log a warning and continue”, but to keep linters quiet and make intent clearer, you may want to narrow this to OSError / IOError / PermissionError, which covers the usual file-handler failures.

Also applies to: 90-99

src/backend/core/workflow_metrics.py (1)

52-91: Optimization metrics wiring looks good; minor serialization and docstring nits

Overall, the new optimization metrics surface is well thought out: defaults via __post_init__, dedicated track_* helpers, and backward-compatible parsing in from_dict all look solid.

A couple of small things you might want to tweak:

  1. Duplicate exposure of optimization fields in to_dict
    to_dict() uses asdict(self) (which already includes token_usage, keyword_search_stats, etc. at the top level) and then adds an optimization sub-dict that nests the same values. The JSON shape shown in docs/OPTIMIZATION.md only uses the nested optimization section.

    If you want the external JSON to match the docs and avoid redundancy, you could pop the top-level keys before adding optimization, for example:

    def to_dict(self) -> Dict[str, Any]:
        data = asdict(self)
        data["timestamp"] = self.timestamp.isoformat()
    
        token_usage = data.pop("token_usage", None)
        keyword_search = data.pop("keyword_search_stats", None)
        pattern_learning = data.pop("pattern_learning_stats", None)
        context_reduction = data.pop("context_reduction", None)
    
        data["optimization"] = {
            "token_usage": token_usage,
            "keyword_search": keyword_search,
            "pattern_learning": pattern_learning,
            "context_reduction": context_reduction,
        }
        return data

    from_dict is already set up to consume an optimization section, so this would align the serialized shape with what you document.

  2. count_tokens docstring example doesn’t match implementation
    Given words = text.split() and estimated_tokens = int(len(words) * 1.33), the example:

    >>> count_tokens("Hello world, this is a test")
    8

    actually returns 7 with the current heuristic (6 words * 1.33 → 7 after int). Either adjust the example value or tweak the multiplier if you want the example to be exact.

Also applies to: 92-147, 150-161, 221-235, 386-414

src/frontend/script.js (1)

26-28: User query tracking across generate/execute looks consistent

Storing currentUserQuery on generation, resetting it on clearAll, and forwarding it in /execute-test payload is coherent and matches the backend contract for optional user_query. This keeps the original query bound to the generated code and avoids leakage across sessions.

One behavioral nuance to be aware of: if a user generates once, then manually replaces the code without pressing “New Test”, executions will still send the original currentUserQuery. If you’d rather only learn from queries that directly produced the executed code, consider clearing currentUserQuery when users paste or heavily edit code after generation.

Also applies to: 387-389, 536-538, 648-650

src/backend/crew_ai/library_context/dynamic_context.py (1)

179-213: Minimal planning context implementation is sound; consider tightening exception scope

get_minimal_planning_context correctly reuses get_library_documentation, includes version in the banner when available, and degrades gracefully when documentation loading fails.

The broad except Exception is acceptable here for resiliency, but if you want to satisfy BLE001 and avoid hiding programmer errors, you could narrow it to the expected failure modes (e.g., ImportError, OSError, json.JSONDecodeError) while letting unexpected exceptions surface.

docs/OPTIMIZATION_DEVELOPER_GUIDE.md (1)

1-2233: Address markdownlint issues for better tooling compatibility

The guide is thorough and well‑structured. markdownlint is flagging a few mechanical issues:

  • Some fenced code blocks lack a language spec (e.g., around lines 26, 54, 861, 1933, 1946, etc.). Consider adding bash, python, json, etc. to those fences.
  • Several lines use emphasis (**...**) in places where a heading level (e.g., ### ...) would be more appropriate (MD036).

These don’t affect rendering much but fixing them will reduce noise from docs linters and improve IDE support.

src/backend/services/workflow_service.py (3)

81-83: Unused validation_output and optimization_metrics from run_crew

run_crew now returns three values, but validation_output and optimization_metrics are never used in run_agentic_workflow. This is harmless but flagged by Ruff and slightly misleading.

If you don’t plan to use them here, consider marking them as intentionally unused:

-        validation_output, crew_with_results, optimization_metrics = run_crew(
+        _result, crew_with_results, _optimization_metrics = run_crew(
             natural_language_query, model_provider, model_name, library_type=None, workflow_id=workflow_id)

If you do intend to surface optimization metrics later, wiring them into the unified WorkflowMetrics would be a good follow‑up.


449-457: Type hint for user_query is slightly non‑idiomatic

stream_execute_only uses user_query: str = None, which is valid at runtime but violates PEP‑484 style and triggers RUF013.

Consider switching to an explicit optional type for clarity:

-async def stream_execute_only(robot_code: str, user_query: str = None) -> Generator[str, None, None]:
+async def stream_execute_only(robot_code: str, user_query: str | None = None) -> Generator[str, None, None]:

(Similarly, you can use Optional[str] if you prefer older syntax.)


489-520: Broad except Exception around learning is acceptable but could be narrowed

Both learning blocks wrap all errors in a generic except Exception and log a warning. This is reasonable for a non‑critical sidecar that must not break execution, but it also swallows programming errors (e.g., misconfigurations) the same way as transient environment failures.

If you want stricter behavior, consider:

  • Narrowing to expected runtime failures (e.g., ImportError, OSError, chromadb.errors.*), or
  • Re‑raising on clearly programmer‑error types while continuing to log and swallow transient ones.

Not urgent, but worth considering once the main wiring is stable.

Also applies to: 593-621

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4d78c6 and f65f26e.

📒 Files selected for processing (24)
  • .gitignore (1 hunks)
  • docs/OPTIMIZATION.md (1 hunks)
  • docs/OPTIMIZATION_DEVELOPER_GUIDE.md (1 hunks)
  • src/backend/.env.example (1 hunks)
  • src/backend/api/endpoints.py (2 hunks)
  • src/backend/core/config.py (2 hunks)
  • src/backend/core/workflow_metrics.py (3 hunks)
  • src/backend/crew_ai/agents.py (6 hunks)
  • src/backend/crew_ai/crew.py (5 hunks)
  • src/backend/crew_ai/library_context/base.py (1 hunks)
  • src/backend/crew_ai/library_context/browser_context.py (3 hunks)
  • src/backend/crew_ai/library_context/dynamic_context.py (1 hunks)
  • src/backend/crew_ai/library_context/selenium_context.py (4 hunks)
  • src/backend/crew_ai/optimization/__init__.py (1 hunks)
  • src/backend/crew_ai/optimization/chroma_store.py (1 hunks)
  • src/backend/crew_ai/optimization/context_pruner.py (1 hunks)
  • src/backend/crew_ai/optimization/keyword_search_tool.py (1 hunks)
  • src/backend/crew_ai/optimization/logging_config.py (1 hunks)
  • src/backend/crew_ai/optimization/pattern_learning.py (1 hunks)
  • src/backend/crew_ai/optimization/smart_keyword_provider.py (1 hunks)
  • src/backend/crew_ai/tasks.py (1 hunks)
  • src/backend/requirements.txt (1 hunks)
  • src/backend/services/workflow_service.py (4 hunks)
  • src/frontend/script.js (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (15)
src/backend/crew_ai/optimization/logging_config.py (1)
src/backend/crew_ai/llm_output_cleaner.py (2)
  • LLMFormattingMonitor (366-413)
  • LLMOutputCleaner (31-363)
src/backend/services/workflow_service.py (5)
src/backend/crew_ai/crew.py (1)
  • run_crew (53-277)
src/backend/crew_ai/optimization/smart_keyword_provider.py (2)
  • SmartKeywordProvider (20-335)
  • learn_from_execution (323-335)
src/backend/crew_ai/optimization/pattern_learning.py (2)
  • QueryPatternMatcher (22-308)
  • learn_from_execution (145-195)
src/backend/crew_ai/optimization/chroma_store.py (1)
  • KeywordVectorStore (18-378)
src/backend/crew_ai/library_context/__init__.py (1)
  • get_library_context (21-43)
src/backend/crew_ai/library_context/base.py (2)
src/backend/crew_ai/library_context/browser_context.py (1)
  • core_rules (52-96)
src/backend/crew_ai/library_context/selenium_context.py (1)
  • core_rules (189-234)
src/backend/crew_ai/optimization/pattern_learning.py (2)
src/backend/crew_ai/optimization/chroma_store.py (1)
  • get_or_create_pattern_collection (84-104)
src/backend/crew_ai/optimization/smart_keyword_provider.py (1)
  • learn_from_execution (323-335)
src/backend/crew_ai/library_context/browser_context.py (3)
src/backend/crew_ai/library_context/base.py (4)
  • core_rules (123-140)
  • planning_context (33-42)
  • code_assembly_context (46-56)
  • validation_context (60-69)
src/backend/crew_ai/library_context/selenium_context.py (4)
  • core_rules (189-234)
  • planning_context (35-43)
  • code_assembly_context (46-154)
  • validation_context (237-246)
src/backend/crew_ai/library_context/dynamic_context.py (1)
  • get_minimal_planning_context (179-212)
src/backend/crew_ai/optimization/chroma_store.py (4)
src/backend/core/config.py (1)
  • Settings (11-87)
src/backend/crew_ai/library_context/browser_context.py (1)
  • library_name (27-28)
src/backend/crew_ai/library_context/selenium_context.py (1)
  • library_name (27-28)
src/backend/crew_ai/library_context/dynamic_context.py (2)
  • DynamicLibraryDocumentation (23-233)
  • get_library_documentation (41-89)
src/backend/crew_ai/optimization/__init__.py (6)
src/backend/crew_ai/optimization/chroma_store.py (1)
  • KeywordVectorStore (18-378)
src/backend/crew_ai/optimization/keyword_search_tool.py (1)
  • KeywordSearchTool (18-169)
src/backend/crew_ai/optimization/pattern_learning.py (1)
  • QueryPatternMatcher (22-308)
src/backend/crew_ai/optimization/smart_keyword_provider.py (1)
  • SmartKeywordProvider (20-335)
src/backend/crew_ai/optimization/context_pruner.py (1)
  • ContextPruner (17-204)
src/backend/crew_ai/optimization/logging_config.py (6)
  • get_optimization_logger (23-47)
  • configure_optimization_logging (50-104)
  • LogMessages (108-146)
  • log_fallback (150-166)
  • log_critical_failure (169-184)
  • log_performance_metric (187-207)
src/backend/api/endpoints.py (1)
src/backend/services/workflow_service.py (1)
  • stream_execute_only (449-524)
src/backend/crew_ai/agents.py (3)
src/backend/crew_ai/library_context/base.py (3)
  • library_name (21-23)
  • code_assembly_context (46-56)
  • validation_context (60-69)
src/backend/crew_ai/library_context/browser_context.py (3)
  • library_name (27-28)
  • code_assembly_context (110-205)
  • validation_context (208-216)
src/backend/crew_ai/library_context/selenium_context.py (3)
  • library_name (27-28)
  • code_assembly_context (46-154)
  • validation_context (237-246)
src/backend/crew_ai/tasks.py (3)
src/backend/crew_ai/library_context/base.py (1)
  • planning_context (33-42)
src/backend/crew_ai/library_context/browser_context.py (1)
  • planning_context (99-107)
src/backend/crew_ai/library_context/selenium_context.py (1)
  • planning_context (35-43)
src/backend/crew_ai/library_context/selenium_context.py (3)
src/backend/crew_ai/library_context/dynamic_context.py (1)
  • get_minimal_planning_context (179-212)
src/backend/crew_ai/library_context/base.py (3)
  • code_assembly_context (46-56)
  • core_rules (123-140)
  • validation_context (60-69)
src/backend/crew_ai/library_context/browser_context.py (3)
  • code_assembly_context (110-205)
  • core_rules (52-96)
  • validation_context (208-216)
src/backend/crew_ai/optimization/keyword_search_tool.py (2)
src/backend/crew_ai/optimization/chroma_store.py (1)
  • search (197-243)
src/backend/core/workflow_metrics.py (1)
  • track_keyword_search (106-121)
src/backend/crew_ai/library_context/dynamic_context.py (3)
src/backend/crew_ai/library_context/base.py (1)
  • library_name (21-23)
src/backend/crew_ai/library_context/browser_context.py (1)
  • library_name (27-28)
src/backend/crew_ai/library_context/selenium_context.py (1)
  • library_name (27-28)
src/backend/crew_ai/optimization/smart_keyword_provider.py (8)
src/backend/crew_ai/optimization/pattern_learning.py (2)
  • get_relevant_keywords (197-257)
  • learn_from_execution (145-195)
src/backend/crew_ai/optimization/chroma_store.py (2)
  • KeywordVectorStore (18-378)
  • search (197-243)
src/backend/crew_ai/optimization/keyword_search_tool.py (1)
  • KeywordSearchTool (18-169)
src/backend/crew_ai/optimization/context_pruner.py (4)
  • ContextPruner (17-204)
  • classify_query (94-142)
  • prune_keywords (144-175)
  • get_pruning_stats (177-204)
src/backend/crew_ai/library_context/base.py (5)
  • library_name (21-23)
  • core_rules (123-140)
  • planning_context (33-42)
  • code_assembly_context (46-56)
  • validation_context (60-69)
src/backend/crew_ai/library_context/browser_context.py (5)
  • library_name (27-28)
  • core_rules (52-96)
  • planning_context (99-107)
  • code_assembly_context (110-205)
  • validation_context (208-216)
src/backend/crew_ai/library_context/selenium_context.py (5)
  • library_name (27-28)
  • core_rules (189-234)
  • planning_context (35-43)
  • code_assembly_context (46-154)
  • validation_context (237-246)
src/backend/core/workflow_metrics.py (1)
  • track_pattern_learning (123-134)
src/backend/crew_ai/crew.py (5)
src/backend/core/workflow_metrics.py (3)
  • WorkflowMetrics (18-236)
  • count_tokens (386-414)
  • track_context_reduction (136-148)
src/backend/crew_ai/optimization/chroma_store.py (2)
  • KeywordVectorStore (18-378)
  • ensure_collection_ready (361-378)
src/backend/crew_ai/optimization/pattern_learning.py (1)
  • QueryPatternMatcher (22-308)
src/backend/crew_ai/optimization/smart_keyword_provider.py (2)
  • get_agent_context (210-280)
  • get_keyword_search_tool (310-321)
src/backend/crew_ai/agents.py (1)
  • RobotAgents (52-237)
🪛 dotenv-linter (4.0.0)
src/backend/.env.example

[warning] 68-68: [ExtraBlankLine] Extra blank line detected

(ExtraBlankLine)

🪛 markdownlint-cli2 (0.18.1)
docs/OPTIMIZATION_DEVELOPER_GUIDE.md

26-26: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


54-54: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


534-534: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


557-557: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


572-572: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


588-588: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


617-617: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


633-633: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


660-660: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


679-679: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


721-721: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


748-748: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


761-761: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


782-782: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


839-839: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


861-861: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


870-870: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


950-950: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1064-1064: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1933-1933: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


1946-1946: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


1974-1974: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


2001-2001: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


2027-2027: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


2056-2056: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/OPTIMIZATION.md

289-289: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


371-371: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.4)
src/backend/crew_ai/optimization/context_pruner.py

27-52: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

src/backend/crew_ai/optimization/logging_config.py

98-98: Do not catch blind exception: Exception

(BLE001)

src/backend/services/workflow_service.py

81-81: Unpacked variable validation_output is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


81-81: Unpacked variable optimization_metrics is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


449-449: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


515-515: Do not catch blind exception: Exception

(BLE001)


616-616: Undefined name natural_language_query

(F821)


618-618: Do not catch blind exception: Exception

(BLE001)

src/backend/crew_ai/optimization/pattern_learning.py

253-253: Consider moving this statement to an else block

(TRY300)


285-285: Consider moving this statement to an else block

(TRY300)


304-304: Consider moving this statement to an else block

(TRY300)

src/backend/crew_ai/optimization/chroma_store.py

56-56: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


78-78: Consider moving this statement to an else block

(TRY300)


81-81: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


100-100: Consider moving this statement to an else block

(TRY300)


103-103: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


161-161: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


193-193: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


239-239: Consider moving this statement to an else block

(TRY300)


241-241: Do not catch blind exception: Exception

(BLE001)


242-242: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


264-264: Consider moving this statement to an else block

(TRY300)


266-266: Do not catch blind exception: Exception

(BLE001)


285-285: Do not catch blind exception: Exception

(BLE001)


314-314: Consider moving this statement to an else block

(TRY300)


316-316: Do not catch blind exception: Exception

(BLE001)


334-334: Do not catch blind exception: Exception

(BLE001)


341-341: Local variable collection is assigned to but never used

Remove assignment to unused variable collection

(F841)


358-358: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


377-377: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

src/backend/crew_ai/optimization/__init__.py

26-38: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

src/backend/core/config.py

81-81: Avoid specifying long messages outside the exception class

(TRY003)

src/backend/api/endpoints.py

57-57: f-string without any placeholders

Remove extraneous f prefix

(F541)

src/backend/crew_ai/optimization/keyword_search_tool.py

128-128: Consider moving this statement to an else block

(TRY300)


130-130: Do not catch blind exception: Exception

(BLE001)


131-131: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

src/backend/crew_ai/library_context/dynamic_context.py

196-196: Do not catch blind exception: Exception

(BLE001)

src/backend/crew_ai/optimization/smart_keyword_provider.py

69-69: Unused method argument: agent_role

(ARG002)


114-114: Unused method argument: agent_role

(ARG002)


165-165: Do not catch blind exception: Exception

(BLE001)


249-249: Do not catch blind exception: Exception

(BLE001)


261-261: Do not catch blind exception: Exception

(BLE001)


275-275: Do not catch blind exception: Exception

(BLE001)


276-276: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


334-334: Do not catch blind exception: Exception

(BLE001)


335-335: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

src/backend/crew_ai/crew.py

99-99: Local variable optimized_context is assigned to but never used

Remove assignment to unused variable optimized_context

(F841)


136-136: Do not catch blind exception: Exception

(BLE001)


187-187: Do not catch blind exception: Exception

(BLE001)


188-188: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


198-198: String contains ambiguous (INFORMATION SOURCE). Did you mean i (LATIN SMALL LETTER I)?

(RUF001)


260-260: Consider moving this statement to an else block

(TRY300)

🔇 Additional comments (5)
.gitignore (3)

26-32: Gitignore updates align well with PR objectives.

The additions to ignore AI-generated instruction files (including .github/copilot-instructions.md) and the expanded comment clarity are appropriate and support the development environment.


37-39: New ignore rules are well-aligned with optimization system components.

The new directory ignores (chroma_db/, data/) appropriately exclude persistent storage and generated data from the PR's keyword optimization system (ChromaDB semantic search, pattern learning, and metrics), preventing unnecessary repository bloat.


43-43: Verified: The .gitignore path change is correct and safe.

The change renames a gitignored local development directory from tools/browser_service_local/* to tools/browser_service/*. This is a safe refinement because:

  1. Both tools/browser_service_local/ and tools/browser_service/ are local development directories that are gitignored by design, so they are never checked in—they are generated or populated locally during development.

  2. The external browser-service>=1.0.0 pip package in requirements.txt is the actual production dependency; the gitignored directories are only for local artifacts.

  3. Important checked-in source code (including imports from the browser_service module in tools/browser_use_service.py and documented in ARCHITECTURE.md) remains unaffected by this gitignore change.

No issues found.

src/backend/crew_ai/tasks.py (1)

31-46: Minimal keyword guidelines align with new planning context strategy

Switching _get_keyword_guidelines to use library_context.planning_context and falling back to high‑level action types is consistent with the new minimal planning model and keeps token usage low while preserving intent.

src/backend/crew_ai/library_context/base.py (1)

121-141: New core_rules contract is well‑defined and consistent

Adding the abstract core_rules property with a focused docstring gives a clear, centralized contract for “must‑include” library rules that optimization components can rely on. As long as all LibraryContext implementations (e.g., BrowserLibraryContext, SeleniumLibraryContext, any custom contexts) implement this property, the change is safe.

If you have any out‑of‑tree LibraryContext subclasses, double‑check they’ve been updated to avoid instantiation errors.

Comment on lines +219 to 233
# Get library-specific context if available (OPTIMIZED - minimal rules)
library_knowledge = ""
if self.library_context:
library_knowledge = f"\n\n{self.library_context.validation_context}"

return Agent(
role="Robot Framework Linter and Quality Assurance Engineer",
goal=f"Validate the generated Robot Framework code for correctness and adherence to {self.library_context.library_name if self.library_context else 'Robot Framework'} rules, and delegate fixes to Code Assembly Agent if errors are found.",
goal=f"Validate Robot Framework code for {self.library_context.library_name if self.library_context else 'Robot Framework'} correctness. Delegate fixes if errors found.",
backstory=(
"You are an expert Robot Framework linter. Your sole task is to validate the provided "
"Robot Framework code for syntax errors, correct keyword usage, and adherence to critical rules. "
"You must be thorough and provide a clear validation result.\n\n"
"**DELEGATION WORKFLOW:**\n"
"When you find errors in the code, you MUST follow this workflow:\n"
"1. Identify and document all syntax errors, incorrect keyword usage, and rule violations\n"
"2. Create a detailed fix request with:\n"
" - Specific line numbers where errors occur\n"
" - Clear description of each error\n"
" - Examples of correct syntax for each issue\n"
" - Relevant Robot Framework rules being violated\n"
"3. Delegate the fix request to the Code Assembly Agent with clear, actionable instructions\n"
"4. The Code Assembly Agent will regenerate the code incorporating your feedback\n"
"5. You will then validate the regenerated code and repeat if necessary\n\n"
"**CRITICAL DELEGATION INSTRUCTIONS:**\n"
"When you find errors, create a detailed fix request and delegate to Code Assembly Agent.\n"
"Your delegation message should include:\n"
"- A summary of all errors found\n"
"- Specific corrections needed for each error\n"
"- Code examples showing the correct implementation\n"
"- Priority ranking if multiple errors exist (fix critical syntax errors first)\n\n"
"**VALIDATION CRITERIA:**\n"
"- Syntax correctness (indentation, spacing, structure)\n"
"- Correct keyword usage for the target library\n"
"- Proper variable assignments for keywords that return values\n"
"- Valid locator formats\n"
"- Correct test case structure\n\n"
"If the code is valid, clearly state 'VALID' and provide a brief summary. "
"If errors are found, immediately delegate to Code Assembly Agent with detailed fix instructions."
"Expert Robot Framework validator. Check: syntax, keyword usage, variable assignments, locator formats, test structure. "
"If VALID: Return JSON {\"valid\": true, \"reason\": \"...\"}. "
"If INVALID: Document errors with line numbers, then delegate to Code Assembly Agent with fix instructions."
f"{library_knowledge}"
),
llm=self.llm,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validator context should override defaults

validator_context is calculated upstream but code_validator_agent still ignores it and always expands library_context.validation_context. That keeps the large baseline payload and prevents any token savings for the validator. Please prefer validator_context when present and fall back to the legacy context otherwise.

-        library_knowledge = ""
-        if self.library_context:
-            library_knowledge = f"\n\n{self.library_context.validation_context}"
+        library_knowledge = ""
+        if self.validator_context:
+            library_knowledge = f"\n\n{self.validator_context.strip()}"
+        elif self.library_context:
+            library_knowledge = f"\n\n{self.library_context.validation_context}"
🤖 Prompt for AI Agents
In src/backend/crew_ai/agents.py around lines 219 to 233, the agent builds
library_knowledge from library_context.validation_context and never uses
validator_context; change the construction to prefer validator_context when
present and fall back to library_context.validation_context (or empty string) so
the validator can use the smaller, precomputed validator_context. Update the
library_knowledge assignment to check self.validator_context first, then
self.library_context.validation_context, and ensure the backstory string
concatenates the chosen context.

Comment on lines +88 to +119
embedding = self.model.encode([description])[0]
self.category_embeddings[category] = embedding
logger.debug(f"Category '{category}' embedding shape: {embedding.shape}")

logger.info(f"Pre-computed embeddings for {len(self.category_embeddings)} categories")

def classify_query(self, user_query: str, confidence_threshold: float = 0.8) -> List[str]:
"""
Classify query into action categories using semantic similarity.
Computes similarity between the query and each category description.
Returns categories that meet the confidence threshold, or all categories
if no category meets the threshold (graceful degradation).
Args:
user_query: User's natural language query
confidence_threshold: Minimum similarity for category inclusion (0.0-1.0)
Returns:
List of relevant category names (e.g., ["input", "interaction"])
Returns all categories if confidence too low (fallback)
"""
logger.debug(f"Classifying query: {user_query[:50]}...")

# Encode query
query_embedding = self.model.encode([user_query])[0]

# Compute similarity with each category
similarities = {}
for category, category_embedding in self.category_embeddings.items():
# Cosine similarity using dot product (embeddings are normalized)
similarity = np.dot(query_embedding, category_embedding)
Copy link
Contributor

@coderabbitai coderabbitai bot Nov 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Normalize embeddings before applying cosine threshold

SentenceTransformer.encode does not normalize by default, so the dot product here is an unbounded inner product. With the fixed threshold of 0.8 we’ll routinely fail to meet the cutoff (or exceed 1.0), causing spurious fallbacks to “all categories” and defeating pruning. Please normalize both the category and query embeddings (pass normalize_embeddings=True) before taking the dot product.

-            embedding = self.model.encode([description])[0]
+            embedding = self.model.encode(
+                [description],
+                normalize_embeddings=True,
+            )[0]
...
-        query_embedding = self.model.encode([user_query])[0]
+        query_embedding = self.model.encode(
+            [user_query],
+            normalize_embeddings=True,
+        )[0]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
embedding = self.model.encode([description])[0]
self.category_embeddings[category] = embedding
logger.debug(f"Category '{category}' embedding shape: {embedding.shape}")
logger.info(f"Pre-computed embeddings for {len(self.category_embeddings)} categories")
def classify_query(self, user_query: str, confidence_threshold: float = 0.8) -> List[str]:
"""
Classify query into action categories using semantic similarity.
Computes similarity between the query and each category description.
Returns categories that meet the confidence threshold, or all categories
if no category meets the threshold (graceful degradation).
Args:
user_query: User's natural language query
confidence_threshold: Minimum similarity for category inclusion (0.0-1.0)
Returns:
List of relevant category names (e.g., ["input", "interaction"])
Returns all categories if confidence too low (fallback)
"""
logger.debug(f"Classifying query: {user_query[:50]}...")
# Encode query
query_embedding = self.model.encode([user_query])[0]
# Compute similarity with each category
similarities = {}
for category, category_embedding in self.category_embeddings.items():
# Cosine similarity using dot product (embeddings are normalized)
similarity = np.dot(query_embedding, category_embedding)
embedding = self.model.encode(
[description],
normalize_embeddings=True,
)[0]
self.category_embeddings[category] = embedding
logger.debug(f"Category '{category}' embedding shape: {embedding.shape}")
logger.info(f"Pre-computed embeddings for {len(self.category_embeddings)} categories")
def classify_query(self, user_query: str, confidence_threshold: float = 0.8) -> List[str]:
"""
Classify query into action categories using semantic similarity.
Computes similarity between the query and each category description.
Returns categories that meet the confidence threshold, or all categories
if no category meets the threshold (graceful degradation).
Args:
user_query: User's natural language query
confidence_threshold: Minimum similarity for category inclusion (0.0-1.0)
Returns:
List of relevant category names (e.g., ["input", "interaction"])
Returns all categories if confidence too low (fallback)
"""
logger.debug(f"Classifying query: {user_query[:50]}...")
# Encode query
query_embedding = self.model.encode(
[user_query],
normalize_embeddings=True,
)[0]
# Compute similarity with each category
similarities = {}
for category, category_embedding in self.category_embeddings.items():
# Cosine similarity using dot product (embeddings are normalized)
similarity = np.dot(query_embedding, category_embedding)
🤖 Prompt for AI Agents
In src/backend/crew_ai/optimization/context_pruner.py around lines 88 to 119,
the code computes cosine similarity by dot product but the
SentenceTransformer.encode calls do not normalize embeddings, so the dot product
is an unbounded inner product and the fixed 0.8 threshold is invalid; fix this
by ensuring both category and query embeddings are normalized before similarity:
either pass normalize_embeddings=True to the model.encode calls when computing
and storing category embeddings and when encoding the query, or manually
L2-normalize each embedding (divide by its norm) immediately after encoding;
then continue using np.dot for similarity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shall be using normalised Euclidean distance, and by default, this shall be done using the chromadb's implemented methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Devasy23, thank you for the clarification! I want to make sure I understand your architectural intent correctly.

Currently, the ContextPruner class performs local classification by:

  1. Pre-computing category embeddings using SentenceTransformer
  2. Computing similarity using np.dot (which CodeRabbit correctly flagged as potentially unnormalized)

You're suggesting to use ChromaDB's distance calculation methods instead. I see a few possible approaches:

Option 1: Store category descriptions in ChromaDB
Create a separate ChromaDB collection for category descriptions and use its built-in normalized distance calculations:

  • Store the 6 category descriptions with their metadata
  • Query against this collection during classify_query
  • Leverage ChromaDB's normalized cosine distance (or L2 if you prefer Euclidean)

Option 2: Use ChromaDB's embedding function directly
Use the same SentenceTransformerEmbeddingFunction from ChromaDB that's used in chroma_store.py, which handles normalization consistently.

Option 3: Use normalized Euclidean (L2) distance
If you specifically want Euclidean distance rather than cosine similarity, we should normalize embeddings and compute L2 distance: np.linalg.norm(query_embedding - category_embedding).

Questions:

  1. Should we refactor ContextPruner to store category descriptions in a ChromaDB collection and use its query methods?
  2. Do you prefer normalized Euclidean distance over cosine similarity for category classification? (Note: cosine is typically better for semantic similarity)
  3. Should we use the same embedding function setup as KeywordVectorStore for consistency?

Could you clarify which approach aligns with your vision? I'm happy to provide implementation guidance once I understand the direction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. yes, 2. normalized cosine might be fine as 0 to 1 range is crucial here, 3. Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the KeywordVectorStore implementation to understand the ChromaDB pattern
cat src/backend/crew_ai/optimization/chroma_store.py | head -100

Length of output: 3238


@Devasy23, perfect! Let me provide the refactored implementation that aligns ContextPruner with the KeywordVectorStore architecture.

Refactored Implementation

Here's the updated ContextPruner class that uses ChromaDB with normalized cosine similarity:

"""
Context Pruner for Smart Keyword Filtering

This module classifies user queries into action categories and prunes
keyword context to include only relevant keywords, reducing token usage
while maintaining code generation accuracy.
"""

import logging
from typing import List, Dict
import chromadb
from chromadb.config import Settings
from chromadb.utils import embedding_functions

logger = logging.getLogger(__name__)


class ContextPruner:
    """
    Classify queries and prune context to relevant keyword categories.
    
    Uses ChromaDB for semantic similarity to classify queries into action 
    categories (navigation, input, interaction, extraction, assertion, wait) 
    and filters keywords to only those in relevant categories.
    """
    
    # Keyword category mappings
    KEYWORD_CATEGORIES = {
        "navigation": [
            "New Browser", "New Page", "Go To", "Go Back", "Go Forward",
            "Close Browser", "Close Page", "Switch Page", "New Context"
        ],
        "input": [
            "Fill Text", "Input Text", "Type Text", "Press Keys", 
            "Upload File", "Type Secret", "Clear Text"
        ],
        "interaction": [
            "Click", "Click Element", "Hover", "Drag And Drop",
            "Select Options By", "Check Checkbox", "Uncheck Checkbox"
        ],
        "extraction": [
            "Get Text", "Get Attribute", "Get Element Count", 
            "Get Property", "Get Style", "Get Url", "Get Title"
        ],
        "assertion": [
            "Should Be Equal", "Should Contain", "Should Be Visible",
            "Should Not Be Visible", "Should Be Enabled", "Should Be Disabled"
        ],
        "wait": [
            "Wait For Elements State", "Wait Until Element Is Visible",
            "Wait For Condition", "Wait For Load State", "Sleep"
        ]
    }
    
    def __init__(
        self, 
        model_name: str = "all-MiniLM-L6-v2",
        persist_directory: str = "./chroma_db"
    ):
        """
        Initialize with ChromaDB for semantic classification.
        
        Args:
            model_name: Name of sentence-transformers model to use
            persist_directory: Path to ChromaDB storage directory
        """
        logger.info(f"Initializing ContextPruner with ChromaDB at {persist_directory}")
        
        try:
            # Initialize ChromaDB client (same pattern as KeywordVectorStore)
            self.client = chromadb.PersistentClient(
                path=persist_directory,
                settings=Settings(
                    anonymized_telemetry=False,
                    allow_reset=True
                )
            )
            
            # Initialize embedding function
            self.embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(
                model_name=model_name
            )
            
            # Create or get category collection
            self.collection = self.client.get_or_create_collection(
                name="category_descriptions",
                embedding_function=self.embedding_function,
                metadata={"type": "query_categories"}
            )
            
            # Initialize category descriptions in ChromaDB
            self._init_category_collection()
            
            logger.info("ContextPruner initialized successfully with ChromaDB")
            
        except Exception as e:
            logger.error(f"Failed to initialize ContextPruner: {e}")
            raise
    
    def _init_category_collection(self):
        """
        Initialize ChromaDB collection with category descriptions.
        
        Stores semantic representations of each category for fast
        similarity comparison during query classification.
        """
        logger.debug("Initializing category descriptions in ChromaDB")
        
        # Define category descriptions for semantic matching
        category_descriptions = {
            "navigation": "open browser navigate to website go to page url address",
            "input": "type text fill form input data enter information write",
            "interaction": "click button press element hover drag drop select",
            "extraction": "get text retrieve data extract information read content",
            "assertion": "verify check validate assert should be equal confirm",
            "wait": "wait for element visible ready loaded appear timeout"
        }
        
        # Check if collection is already populated
        existing_count = self.collection.count()
        if existing_count == len(category_descriptions):
            logger.debug(f"Category collection already populated with {existing_count} entries")
            return
        
        # Add category descriptions to ChromaDB
        try:
            ids = list(category_descriptions.keys())
            documents = list(category_descriptions.values())
            metadatas = [{"category": cat} for cat in ids]
            
            # Upsert to handle re-initialization
            self.collection.upsert(
                ids=ids,
                documents=documents,
                metadatas=metadatas
            )
            
            logger.info(f"Initialized {len(category_descriptions)} category descriptions in ChromaDB")
            
        except Exception as e:
            logger.error(f"Failed to initialize category collection: {e}")
            raise
    
    def classify_query(
        self, 
        user_query: str, 
        confidence_threshold: float = 0.8
    ) -> List[str]:
        """
        Classify query into action categories using ChromaDB semantic search.
        
        Queries the ChromaDB collection to find similar categories based on
        normalized cosine similarity. Returns categories that meet the 
        confidence threshold, or all categories if no category meets the 
        threshold (graceful degradation).
        
        Args:
            user_query: User's natural language query
            confidence_threshold: Minimum similarity for category inclusion (0.0-1.0)
            
        Returns:
            List of relevant category names (e.g., ["input", "interaction"])
            Returns all categories if confidence too low (fallback)
        """
        logger.debug(f"Classifying query: {user_query[:50]}...")
        
        try:
            # Query ChromaDB for similar categories
            # ChromaDB returns normalized cosine distance (0 = identical, 2 = opposite)
            # We need to convert to similarity: similarity = 1 - (distance / 2)
            results = self.collection.query(
                query_texts=[user_query],
                n_results=len(self.KEYWORD_CATEGORIES)
            )
            
            # Extract categories and convert distances to similarities
            similarities = {}
            if results['ids'] and len(results['ids'][0]) > 0:
                for idx, category_id in enumerate(results['ids'][0]):
                    distance = results['distances'][0][idx]
                    # Convert cosine distance to similarity (0-1 range)
                    # ChromaDB cosine distance range: [0, 2]
                    # Similarity = 1 - (distance / 2) gives us [0, 1] range
                    similarity = 1.0 - (distance / 2.0)
                    similarities[category_id] = similarity
                    logger.debug(f"Category '{category_id}': distance={distance:.4f}, similarity={similarity:.4f}")
            
            # Filter categories by confidence threshold
            relevant_categories = [
                cat for cat, sim in similarities.items() 
                if sim >= confidence_threshold
            ]
            
            if relevant_categories:
                logger.info(
                    f"Classified query into {len(relevant_categories)} categories: "
                    f"{relevant_categories} (threshold={confidence_threshold})"
                )
                return relevant_categories
            else:
                # Graceful degradation: return all categories if none meet threshold
                all_categories = list(self.KEYWORD_CATEGORIES.keys())
                logger.warning(
                    f"No categories met threshold {confidence_threshold}. "
                    f"Highest similarity: {max(similarities.values()):.4f}. "
                    f"Falling back to all categories."
                )
                return all_categories
                
        except Exception as e:
            logger.error(f"Classification failed: {e}. Falling back to all categories.")
            return list(self.KEYWORD_CATEGORIES.keys())
    
    def prune_keywords(
        self, 
        all_keywords: List[Dict], 
        categories: List[str]
    ) -> List[Dict]:
        """
        Filter keywords to only those in relevant categories.
        
        Args:
            all_keywords: List of keyword dicts with 'name' field
            categories: List of relevant category names
            
        Returns:
            Filtered list of keyword dicts
        """
        logger.debug(f"Pruning keywords for categories: {categories}")
        
        # Build set of relevant keyword names
        relevant_names = set()
        for category in categories:
            if category in self.KEYWORD_CATEGORIES:
                relevant_names.update(self.KEYWORD_CATEGORIES[category])
        
        # Filter keywords
        pruned = [
            kw for kw in all_keywords 
            if kw.get("name") in relevant_names
        ]
        
        logger.info(
            f"Pruned {len(all_keywords)} keywords to {len(pruned)} "
            f"({len(pruned)/len(all_keywords)*100:.1f}% retained)"
        )
        
        return pruned
    
    def get_pruning_stats(
        self, 
        original_count: int, 
        pruned_count: int
    ) -> Dict[str, float]:
        """
        Calculate pruning statistics.
        
        Args:
            original_count: Number of keywords before pruning
            pruned_count: Number of keywords after pruning
            
        Returns:
            Dict with retention_rate and reduction_rate
        """
        if original_count == 0:
            return {"retention_rate": 0.0, "reduction_rate": 0.0}
        
        retention = pruned_count / original_count
        reduction = 1.0 - retention
        
        return {
            "retention_rate": retention,
            "reduction_rate": reduction
        }

Key Changes

  1. ChromaDB Integration: Uses PersistentClient and SentenceTransformerEmbeddingFunction (same as KeywordVectorStore)

  2. Normalized Cosine Similarity: ChromaDB automatically normalizes embeddings and returns cosine distance in the range [0, 2]. The conversion formula is:

    similarity = 1.0 - (distance / 2.0)  # Maps [0, 2] → [1, 0]

    This gives us the 0-1 similarity range you need, where:

    • similarity = 1.0 means identical (distance = 0)
    • similarity = 0.0 means opposite (distance = 2)
  3. Persistent Storage: Category descriptions are stored in a category_descriptions collection, avoiding re-computation on each instantiation

  4. Consistent Architecture: Follows the same patterns as KeywordVectorStore for maintainability

  5. Updated Constructor: Added persist_directory parameter (defaults to same location as KeywordVectorStore)

Usage in smart_keyword_provider.py

Update the initialization to pass the ChromaDB path:

context_pruner = ContextPruner(
    persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH
)

This ensures all optimization components share the same ChromaDB instance and storage location.

Let me know if you'd like me to also update the instantiation points in smart_keyword_provider.py and crew.py!

Comment on lines +170 to +173
logger.info(
f"Pruned keywords: {len(all_keywords)} -> {len(pruned_keywords)} "
f"({len(pruned_keywords)/len(all_keywords)*100:.1f}% retained)"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid divide-by-zero when no keywords are provided

If all_keywords is empty we hit a ZeroDivisionError in the retention log formatting, crashing the caller instead of gracefully returning an empty list. Please guard the division.

-        logger.info(
-            f"Pruned keywords: {len(all_keywords)} -> {len(pruned_keywords)} "
-            f"({len(pruned_keywords)/len(all_keywords)*100:.1f}% retained)"
-        )
+        original_count = len(all_keywords)
+        pruned_count = len(pruned_keywords)
+
+        retention_pct = (pruned_count / original_count * 100) if original_count else 0.0
+
+        logger.info(
+            f"Pruned keywords: {original_count} -> {pruned_count} "
+            f"({retention_pct:.1f}% retained)"
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.info(
f"Pruned keywords: {len(all_keywords)} -> {len(pruned_keywords)} "
f"({len(pruned_keywords)/len(all_keywords)*100:.1f}% retained)"
)
original_count = len(all_keywords)
pruned_count = len(pruned_keywords)
retention_pct = (pruned_count / original_count * 100) if original_count else 0.0
logger.info(
f"Pruned keywords: {original_count} -> {pruned_count} "
f"({retention_pct:.1f}% retained)"
)
🤖 Prompt for AI Agents
In src/backend/crew_ai/optimization/context_pruner.py around lines 170 to 173,
the logging expression divides by len(all_keywords) and will raise
ZeroDivisionError when all_keywords is empty; change the calculation to guard
against zero by computing the retention percentage only when len(all_keywords) >
0 (e.g. set retained_pct = 0.0 if len(all_keywords) == 0 else
len(pruned_keywords)/len(all_keywords)*100) and use that retained_pct in the log
message so the function doesn't crash when no keywords are provided.

Comment on lines +489 to +520
# Pattern learning: ONLY learn from PASSED tests
# This ensures we only learn from validated, working code
if result.get('test_status') == 'passed':
if user_query:
try:
from src.backend.core.config import settings
if settings.OPTIMIZATION_ENABLED:
# Initialize optimization components to learn from this successful execution
from src.backend.crew_ai.optimization import SmartKeywordProvider, QueryPatternMatcher, KeywordVectorStore
from src.backend.crew_ai.library_context import get_library_context

logging.info("📚 Test PASSED - Learning from successful execution...")

# Initialize components
library_context = get_library_context(settings.ROBOT_LIBRARY)
chroma_store = KeywordVectorStore(persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH)
pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH)
smart_provider = SmartKeywordProvider(
library_context=library_context,
pattern_matcher=pattern_matcher,
vector_store=chroma_store
)

# Learn from the successful execution
smart_provider.learn_from_execution(user_query, robot_code)
logging.info("✅ Pattern learning completed - learned from PASSED test")
except Exception as e:
logging.warning(f"⚠️ Failed to learn from execution: {e}")
else:
logging.info("⏭️ Test PASSED but skipping pattern learning - no user query provided")
else:
logging.info(f"⏭️ Skipping pattern learning - test status: {result.get('test_status', 'unknown')}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Pattern learning sidecar has two functional bugs (missing Chroma store, undefined variable)

The post‑execution pattern learning blocks introduce two concrete issues:

  1. QueryPatternMatcher is created without a Chroma store

    In both stream_execute_only and stream_generate_and_run, you do:

    chroma_store = KeywordVectorStore(persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH)
    pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH)

    but QueryPatternMatcher relies on the provided chroma_store to create pattern_collection. Without it, learn_from_execution will only update SQLite statistics; no patterns are stored in ChromaDB, and get_relevant_keywords will always return []. This effectively disables semantic pattern-based predictions for executions learned through these paths.

  2. Legacy stream_generate_and_run uses an undefined natural_language_query

    In the legacy flow, the learning block calls:

    smart_provider.learn_from_execution(natural_language_query, robot_code)

    but this function has a user_query parameter, not natural_language_query. At runtime this will raise a NameError inside the try block, be caught by except Exception, and log a warning — meaning pattern learning always fails in this path.

To fix both issues and align with how run_crew initializes these components, consider:

@@ async def stream_execute_only(robot_code: str, user_query: str | None = None) -> Generator[str, None, None]:
-                        chroma_store = KeywordVectorStore(persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH)
-                        pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH)
+                        chroma_store = KeywordVectorStore(
+                            persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH
+                        )
+                        pattern_matcher = QueryPatternMatcher(
+                            db_path=settings.OPTIMIZATION_PATTERN_DB_PATH,
+                            chroma_store=chroma_store,
+                        )
@@ async def stream_generate_and_run(user_query: str, model_provider: str, model_name: str) -> Generator[str, None, None]:
-                    chroma_store = KeywordVectorStore(persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH)
-                    pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH)
+                    chroma_store = KeywordVectorStore(
+                        persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH
+                    )
+                    pattern_matcher = QueryPatternMatcher(
+                        db_path=settings.OPTIMIZATION_PATTERN_DB_PATH,
+                        chroma_store=chroma_store,
+                    )
@@
-                    # Learn from the successful execution
-                    smart_provider.learn_from_execution(natural_language_query, robot_code)
+                    # Learn from the successful execution
+                    smart_provider.learn_from_execution(user_query, robot_code)

With these changes, successful executions will correctly populate the query_patterns collection and use the actual user query string, so the optimization system can learn and later serve predictions as designed.

Also applies to: 593-621


I'll verify the claims in this review comment by inspecting the codebase.


🏁 Script executed:

#!/bin/bash
# Search for QueryPatternMatcher class definition
ast-grep --pattern $'class QueryPatternMatcher {
  $$$
  __init__($_) {
    $$$
  }
  $$$
}'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Find QueryPatternMatcher implementation with rg
rg -n "class QueryPatternMatcher" -A 30

Length of output: 5566


🏁 Script executed:

#!/bin/bash
# Search for stream_generate_and_run function
rg -n "def stream_generate_and_run" -A 50 --type py

Length of output: 4264


🏁 Script executed:

#!/bin/bash
# Search for learn_from_execution method signature
rg -n "def learn_from_execution" -A 15 --type py

Length of output: 2911


🏁 Script executed:

#!/bin/bash
# Search for natural_language_query in stream_generate_and_run context
rg -n "natural_language_query" --type py -B 3 -A 3

Length of output: 3284


🏁 Script executed:

#!/bin/bash
# Search for stream_execute_only function
rg -n "def stream_execute_only" -A 50 --type py

Length of output: 4685


🏁 Script executed:

#!/bin/bash
# Check what run_crew does with these components for comparison
rg -n "def run_crew" -A 100 --type py | head -150

Length of output: 7775


🏁 Script executed:

#!/bin/bash
# Search for QueryPatternMatcher initialization patterns in codebase
rg -n "QueryPatternMatcher(" --type py -B 2 -A 2

Length of output: 90


🏁 Script executed:

#!/bin/bash
# Search for KeywordVectorStore initialization patterns
rg -n "KeywordVectorStore(" --type py -B 2 -A 2

Length of output: 88


🏁 Script executed:

#!/bin/bash
# Look at imports in workflow_service to understand component structure
head -50 src/backend/services/workflow_service.py

Length of output: 1773


🏁 Script executed:

#!/bin/bash
# Find the optimization module structure
find . -name "optimization.py" -type f | head -5

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Inspect the optimization.py file for QueryPatternMatcher
cat src/backend/crew_ai/optimization.py | head -200

Length of output: 67


🏁 Script executed:

#!/bin/bash
# Look for QueryPatternMatcher __init__ specifically
ast-grep --pattern 'def __init__' | grep -A 10 QueryPatternMatcher

Length of output: 3596


🏁 Script executed:

#!/bin/bash
# Search more broadly for the parameter definitions
rg -n "QueryPatternMatcher" --type py -B 5 -A 15 | head -100

Length of output: 9277


🏁 Script executed:

#!/bin/bash
# Find all places where QueryPatternMatcher is instantiated
rg -n "QueryPatternMatcher\(" --type py -B 3 -A 5

Length of output: 2759


🏁 Script executed:

#!/bin/bash
# Check the actual workflow_service.py file around lines 489-520
sed -n '480,530p' src/backend/services/workflow_service.py

Length of output: 2921


🏁 Script executed:

#!/bin/bash
# Check the actual workflow_service.py file around lines 593-621
sed -n '590,625p' src/backend/services/workflow_service.py

Length of output: 2220


🏁 Script executed:

#!/bin/bash
# Get learn_from_execution method full signature and implementation
rg -n "def learn_from_execution" --type py -A 25

Length of output: 3936


🏁 Script executed:

#!/bin/bash
# Check SmartKeywordProvider initialization
rg -n "class SmartKeywordProvider" --type py -A 40 | head -80

Length of output: 4397


🏁 Script executed:

#!/bin/bash
# Look for chroma_store parameter usage in QueryPatternMatcher
rg -n "chroma_store" --type py -B 2 -A 2 | grep -A 5 -B 5 QueryPatternMatcher

Length of output: 3571


🏁 Script executed:

#!/bin/bash
# Verify the structure by examining the repository
find . -type f -name "*.py" -path "*crew_ai*" | head -20

Length of output: 861


🏁 Script executed:

#!/bin/bash
# Get file structure
ls -la src/backend/crew_ai/

Length of output: 627


🏁 Script executed:

#!/bin/bash
# Examine optimization.py structure
wc -l src/backend/crew_ai/optimization.py
head -300 src/backend/crew_ai/optimization.py

Length of output: 161


🏁 Script executed:

#!/bin/bash
# Look for the QueryPatternMatcher class definition completely
rg -n "class QueryPatternMatcher" -A 100 --type py | head -150

Length of output: 9409


🏁 Script executed:

#!/bin/bash
# Search for where chroma_store is expected as parameter
rg "chroma_store" src/backend/crew_ai/optimization.py -B 3 -A 3 | head -50

Length of output: 75


🏁 Script executed:

#!/bin/bash
# Look at lines 593-621 in context
sed -n '580,630p' src/backend/services/workflow_service.py

Length of output: 2799


🏁 Script executed:

#!/bin/bash
# Find learn_from_execution calls to see expected parameters
rg "learn_from_execution" src/backend/services/workflow_service.py -B 2 -A 2

Length of output: 606


🏁 Script executed:

#!/bin/bash
# Search for user_query vs natural_language_query in the file
grep -n "user_query\|natural_language_query" src/backend/services/workflow_service.py | head -30

Length of output: 1332


🏁 Script executed:

#!/bin/bash
# Get comprehensive view of stream_generate_and_run including pattern learning section
sed -n '500,640p' src/backend/services/workflow_service.py

Length of output: 6472


🏁 Script executed:

#!/bin/bash
# Check run_crew for comparison
sed -n '150,300p' src/backend/services/workflow_service.py

Length of output: 7456


🏁 Script executed:

#!/bin/bash
# List all Python files in crew_ai module
find src/backend/crew_ai -name "*.py" -type f

Length of output: 825


🏁 Script executed:

#!/bin/bash
# Get QueryPatternMatcher complete class
rg "class QueryPatternMatcher" -A 200 src/backend/crew_ai/optimization.py

Length of output: 75


🏁 Script executed:

#!/bin/bash
# Get SmartKeywordProvider.learn_from_execution
rg "def learn_from_execution" -A 30 src/backend/crew_ai/optimization.py

Length of output: 75


🏁 Script executed:

#!/bin/bash
# Verify chroma_store parameter requirement
grep -n "self.chroma_store\|self.vector_store" src/backend/crew_ai/optimization.py | head -20

Length of output: 68


🏁 Script executed:

#!/bin/bash
# Check how run_crew initializes QueryPatternMatcher
rg "def run_crew" -A 150 src/backend/services/workflow_service.py | grep -A 30 "QueryPatternMatcher"

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Get the exact parameter definition for QueryPatternMatcher.__init__
rg "def __init__" src/backend/crew_ai/optimization.py -A 20 | grep -B 5 -A 20 "QueryPatternMatcher" | head -50

Length of output: 75


🏁 Script executed:

#!/bin/bash
# Final check: verify the two issues with exact line context
rg -n "learn_from_execution\|natural_language_query\|QueryPatternMatcher\(" src/backend/services/workflow_service.py -B 2 -A 2

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Examine the optimization module for all class definitions
head -500 src/backend/crew_ai/optimization.py | grep -n "class\|def __init__\|self.chroma_store\|self.pattern_matcher"

Length of output: 94


🏁 Script executed:

#!/bin/bash
# Get the complete QueryPatternMatcher initialization
python3 << 'EOF'
import ast
import sys

# Parse the optimization.py file
with open('src/backend/crew_ai/optimization.py', 'r') as f:
    tree = ast.parse(f.read())

# Find QueryPatternMatcher class
for node in ast.walk(tree):
    if isinstance(node, ast.ClassDef) and node.name == 'QueryPatternMatcher':
        print(f"Found class QueryPatternMatcher at line {node.lineno}")
        for item in node.body:
            if isinstance(item, ast.FunctionDef) and item.name == '__init__':
                print(f"  __init__ at line {item.lineno}")
                print(f"  Args: {[arg.arg for arg in item.args.args]}")
                print(f"  Kwonly args: {[arg.arg for arg in item.args.kwonlyargs]}")
EOF

Length of output: 166


🏁 Script executed:

#!/bin/bash
# Get SmartKeywordProvider.learn_from_execution parameters
python3 << 'EOF'
import ast
import sys

# Parse the optimization.py file
with open('src/backend/crew_ai/optimization.py', 'r') as f:
    tree = ast.parse(f.read())

# Find SmartKeywordProvider class
for node in ast.walk(tree):
    if isinstance(node, ast.ClassDef) and node.name == 'SmartKeywordProvider':
        print(f"Found class SmartKeywordProvider at line {node.lineno}")
        for item in node.body:
            if isinstance(item, ast.FunctionDef) and item.name == 'learn_from_execution':
                print(f"  learn_from_execution at line {item.lineno}")
                print(f"  Args: {[arg.arg for arg in item.args.args]}")
                print(f"  Kwonly args: {[arg.arg for arg in item.args.kwonlyargs]}")
EOF

Length of output: 166


🏁 Script executed:

#!/bin/bash
# Final verification of the exact issues
echo "=== Issue 1: Check if QueryPatternMatcher requires chroma_store ==="
grep -n "self.chroma_store\|self._chroma\|_chroma" src/backend/crew_ai/optimization.py | head -10

echo ""
echo "=== Issue 2: Check learn_from_execution parameter name ==="
grep -n "def learn_from_execution" src/backend/services/workflow_service.py -A 2

echo ""
echo "=== Check if natural_language_query is used ==="
sed -n '590,630p' src/backend/services/workflow_service.py | grep -n "natural_language_query\|user_query"

Length of output: 340


🏁 Script executed:

#!/bin/bash
# Verify the actual test calls to learn_from_execution
grep -rn "learn_from_execution" src/backend/services/workflow_service.py | cat

Length of output: 183


🏁 Script executed:

#!/bin/bash
# Dump relevant portions to analyze
echo "=== Lines 489-520 (stream_execute_only) ==="
sed -n '489,520p' src/backend/services/workflow_service.py

echo ""
echo "=== Lines 593-621 (stream_generate_and_run) ==="
sed -n '593,621p' src/backend/services/workflow_service.py

Length of output: 3988


🏁 Script executed:

#!/bin/bash
# Check QueryPatternMatcher definition and usage of chroma_store
python3 << 'EOF'
import re

# Read the optimization.py file
with open('src/backend/crew_ai/optimization.py', 'r') as f:
    content = f.read()

# Find QueryPatternMatcher class and check for chroma_store references
match = re.search(r'class QueryPatternMatcher.*?(?=class |\Z)', content, re.DOTALL)
if match:
    cls_content = match.group(0)
    # Find __init__ method
    init_match = re.search(r'def __init__\(self[^)]*\):', cls_content)
    if init_match:
        print("QueryPatternMatcher.__init__ signature:")
        print(init_match.group(0))
    
    # Check if chroma_store is referenced
    if 'chroma_store' in cls_content:
        print("\nchroma_store references found in QueryPatternMatcher")
        for i, line in enumerate(cls_content.split('\n')[:100]):
            if 'chroma_store' in line:
                print(f"  Line: {line.strip()}")
    else:
        print("\nNo chroma_store references in QueryPatternMatcher")

# Find SmartKeywordProvider class
match = re.search(r'class SmartKeywordProvider.*?(?=class |\Z)', content, re.DOTALL)
if match:
    cls_content = match.group(0)
    # Find learn_from_execution method
    method_match = re.search(r'def learn_from_execution\(self[^)]*\):', cls_content)
    if method_match:
        print("\nSmartKeywordProvider.learn_from_execution signature:")
        print(method_match.group(0))
EOF

Length of output: 166


Two critical bugs confirmed: missing chroma_store parameter and undefined variable

Both issues identified in the review are real and will break functionality:

  1. Lines 505 and 608: QueryPatternMatcher instantiated without chroma_store parameter

    • QueryPatternMatcher.init accepts optional chroma_store (src/backend/crew_ai/optimization/pattern_learning.py:28)
    • When None, pattern_collection is set to None and learning is disabled (lines 46–50)
    • Correct pattern in crew.py (line 126) passes chroma_store=vector_store
  2. Line 616: stream_generate_and_run calls learn_from_execution(natural_language_query, ...) but the parameter is user_query

    • Function signature (line 527): async def stream_generate_and_run(user_query: str, ...)
    • natural_language_query is undefined in this scope; causes NameError at runtime
    • Line 513 in stream_execute_only correctly uses user_query
@@ src/backend/services/workflow_service.py:505
-                        pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH)
+                        pattern_matcher = QueryPatternMatcher(
+                            db_path=settings.OPTIMIZATION_PATTERN_DB_PATH,
+                            chroma_store=chroma_store
+                        )

@@ src/backend/services/workflow_service.py:608
-                    pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH)
+                    pattern_matcher = QueryPatternMatcher(
+                        db_path=settings.OPTIMIZATION_PATTERN_DB_PATH,
+                        chroma_store=chroma_store
+                    )

@@ src/backend/services/workflow_service.py:616
-                    smart_provider.learn_from_execution(natural_language_query, robot_code)
+                    smart_provider.learn_from_execution(user_query, robot_code)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.4)

515-515: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In src/backend/services/workflow_service.py around lines 489-520 (and related
usages at ~505 and ~616), fix two issues: instantiate QueryPatternMatcher with
the chroma_store passed (e.g., QueryPatternMatcher(...,
chroma_store=chroma_store)) so pattern_collection is initialized and learning
enabled, and replace the undefined natural_language_query argument with the
correct user_query when calling learn_from_execution (use user_query
consistently). Ensure both changes follow existing variable names (chroma_store
or chroma_store variable created from KeywordVectorStore) and update any other
calls to QueryPatternMatcher or learn_from_execution in this file to the same
signature and variable name for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants