-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Implement keyword search tool and pattern learning system #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Added `KeywordSearchTool` for semantic search over Robot Framework keywords. - Introduced `QueryPatternMatcher` for learning and predicting keyword usage patterns. - Developed `SmartKeywordProvider` to orchestrate keyword retrieval with a hybrid architecture. - Configured centralized logging for optimization components in `logging_config.py`. - Enhanced `RobotTasks` to provide minimal keyword guidelines for planning phase. - Updated `requirements.txt` to include dependencies for ChromaDB and sentence-transformers. - Modified workflow service to learn from successful test executions and store patterns. - Updated frontend to store and pass the original user query for pattern learning during execution.
WalkthroughIntroduces a comprehensive optimization system for CrewAI that reduces token usage via a hybrid knowledge architecture. Implements core rules, semantic keyword search (ChromaDB), pattern learning from executed code, context pruning, and workflow metrics tracking across backend configuration, agent initialization, optimization components, and frontend integration. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Frontend as Frontend<br/>script.js
participant API as API<br/>endpoints.py
participant Crew as Crew<br/>crew.py
participant Optimization as Optimization<br/>Components
participant Agents as Agent Planner<br/>Identifier, etc.
participant Workflow as Workflow<br/>Service
User->>Frontend: Submit query
Frontend->>Frontend: Store currentUserQuery
Frontend->>API: POST /generate + /execute-test
Note over Crew,Optimization: Optimization Phase
API->>Crew: run_crew (with settings)
Crew->>Optimization: Initialize if OPTIMIZATION_ENABLED
Optimization->>Optimization: Build core_rules for all agents
Optimization->>Optimization: Load pattern predictions (Tier 2)
Optimization->>Optimization: Fallback to zero-context+tool (Tier 2)
Optimization->>Optimization: Fallback to full context (Tier 3)
Crew->>Agents: Pass optimized_context + keyword_search_tool
Note over Agents: Agent Execution
Agents->>Agents: Generate code
Workflow->>Workflow: Execute test
Note over Workflow: Pattern Learning Phase
Workflow->>Workflow: Check if test passed + user_query exists
Workflow->>Optimization: Initialize optimization components
Optimization->>Optimization: Extract keywords from code
Optimization->>Optimization: Learn pattern: query→keywords
Optimization->>Optimization: Update stats in SQLite
Frontend->>User: Display result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/backend/crew_ai/agents.py (2)
80-112: Planner context is never usedSmartKeywordProvider now supplies a trimmed planner_context, but step_planner_agent still builds guidance solely from library_context. As a result the optimization work never reaches the agent and we keep sending the heavy fallback context. Please honor the optimized planner_context first and only fall back to the legacy guidance when it is missing.
- library_name = self.library_context.library_name if self.library_context else 'Robot Framework' - - # Step Planner needs MINIMAL context - just library name and core principles - # It doesn't need keyword details - that's for the Code Assembler - library_guidance = "" - if self.library_context: - library_guidance = f""" + library_name = self.library_context.library_name if self.library_context else 'Robot Framework' + + # Prefer optimized planner context when provided, otherwise fall back to minimal library guidance + planner_guidance = "" + if self.planner_context: + planner_guidance = f"\n\n{self.planner_context.strip()}" + elif self.library_context: + planner_guidance = f""" @@ - "8. Create HIGH-LEVEL steps - the Code Assembler will handle keyword details." - f"{library_guidance}" + "8. Create HIGH-LEVEL steps - the Code Assembler will handle keyword details." + f"{planner_guidance}"
118-134: Identifier context is ignoredThe optimized identifier_context passed into RobotAgents is never used, so the element identifier agent still operates on the legacy backstory. That defeats the optimization pipeline’s attempt to shrink context for this role. Please insert the identifier_context when provided before falling back to the static text.
- def element_identifier_agent(self) -> Agent: - return Agent( + def element_identifier_agent(self) -> Agent: + identifier_guidance = "" + if self.identifier_context: + identifier_guidance = f"\n\n{self.identifier_context.strip()}" + + return Agent( @@ - "Benefits: Browser opens once (3-5x faster), full context awareness, intelligent popup handling, validated locators." - ), + "Benefits: Browser opens once (3-5x faster), full context awareness, intelligent popup handling, validated locators." + f"{identifier_guidance}" + ),
🧹 Nitpick comments (12)
.gitignore (1)
40-40: Use trailing slash for directory pattern consistency.Line 40 ignores
logs/temp_metricswithout a trailing slash. For consistency with standard gitignore directory patterns (e.g.,chroma_db/on line 38), use a trailing slash to explicitly denote this as a directory:-logs/temp_metrics +logs/temp_metrics/This prevents accidental matching of files named
temp_metricsat different levels.src/backend/requirements.txt (1)
26-34: Validate new ML dependency versions against your Python/runtime targetsThe new
chromadb==0.4.22,sentence-transformers==2.2.2, andnumpy==1.24.3pins look reasonable, but they are heavy and somewhat opinionated:
sentence-transformerswill pull in substantial ML dependencies (e.g., torch), which will noticeably increase image size and cold-start time.numpy==1.24.3may not be compatible with newer Python runtimes (e.g., Python 3.12 prefers a newer NumPy).I’d suggest double-checking that:
- These versions are supported on the Python version you ship in Docker/production.
- The footprint/performance impact is acceptable (or consider putting these behind an extra or separate image if not always needed).
src/backend/.env.example (1)
69-107: Consider aligning exampleOPTIMIZATION_ENABLEDvalue with the documented defaultThe comment says default is
false(disabled until fully tested), but.env.exampleships with:OPTIMIZATION_ENABLED=trueGiven many users will
cp .env.example .env, this effectively enables the new optimization system by default, which may surprise them if they haven’t read the docs yet.I’d consider either:
- Changing the example to
OPTIMIZATION_ENABLED=falseand letting the docs show how to turn it on, or- Adjusting the comment to clarify that the example enables optimization even though the code default is
false.src/backend/core/config.py (1)
39-47: Optimization config looks solid; consider adding TOP_K validation and tightening error handlingThe new optimization settings and confidence-threshold validator are well-structured and match the env/example usage.
Two minor improvement ideas:
Enforce the documented range for
OPTIMIZATION_KEYWORD_SEARCH_TOP_K
Docs and.env.exampledescribe a valid range of 1–10, but the config doesn’t enforce it. Adding a small validator would prevent misconfiguration, e.g.:@validator("OPTIMIZATION_KEYWORD_SEARCH_TOP_K") def validate_keyword_search_top_k(cls, v: int) -> int: if not 1 <= v <= 10: raise ValueError(f"OPTIMIZATION_KEYWORD_SEARCH_TOP_K must be between 1 and 10, got {v}") return vError message length (Ruff TRY003)
Ruff flags the relatively long error message invalidate_confidence_threshold. This is purely stylistic; if you want to appease it, you could shorten the message or factor it into a constant, but functionally it’s fine as-is.Also note:
OPTIMIZATION_CONTEXT_PRUNING_THRESHOLDhere defaults to0.6, whiledocs/OPTIMIZATION.mdcurrently states a default of0.8; worth reconciling so operators don’t get conflicting information.Also applies to: 77-82
src/backend/crew_ai/optimization/logging_config.py (1)
23-47: Simplify logger naming to guarantee hierarchy and narrow the exception catch around file handlerNice centralized logging surface; a couple of small robustness points:
get_optimization_loggerhierarchy can be brittle
Current behavior depends on whethernamestarts with or contains"optimization". If callers follow the doc and pass__name__, modules with names likesrc.backend.crew_ai.optimization.keyword_search_toolwill get loggers outside thecrew_ai.optimizationtree, so they won’t automatically inherit the handlers configured onOPTIMIZATION_LOGGER_NAME.A simpler, more predictable pattern is:
def get_optimization_logger(name: str) -> logging.Logger: if name.startswith(OPTIMIZATION_LOGGER_NAME): logger_name = name else: logger_name = f"{OPTIMIZATION_LOGGER_NAME}.{name}" return logging.getLogger(logger_name)and then pass a short component name (e.g.
"keyword_search") or__name__if you really want the fully-qualified under that prefix.Catching bare
Exceptionwhen configuring file logging (Ruff BLE001)
Functionally it’s acceptable to treat any failure as “log a warning and continue”, but to keep linters quiet and make intent clearer, you may want to narrow this toOSError/IOError/PermissionError, which covers the usual file-handler failures.Also applies to: 90-99
src/backend/core/workflow_metrics.py (1)
52-91: Optimization metrics wiring looks good; minor serialization and docstring nitsOverall, the new optimization metrics surface is well thought out: defaults via
__post_init__, dedicatedtrack_*helpers, and backward-compatible parsing infrom_dictall look solid.A couple of small things you might want to tweak:
Duplicate exposure of optimization fields in
to_dict
to_dict()usesasdict(self)(which already includestoken_usage,keyword_search_stats, etc. at the top level) and then adds anoptimizationsub-dict that nests the same values. The JSON shape shown indocs/OPTIMIZATION.mdonly uses the nestedoptimizationsection.If you want the external JSON to match the docs and avoid redundancy, you could
popthe top-level keys before addingoptimization, for example:def to_dict(self) -> Dict[str, Any]: data = asdict(self) data["timestamp"] = self.timestamp.isoformat() token_usage = data.pop("token_usage", None) keyword_search = data.pop("keyword_search_stats", None) pattern_learning = data.pop("pattern_learning_stats", None) context_reduction = data.pop("context_reduction", None) data["optimization"] = { "token_usage": token_usage, "keyword_search": keyword_search, "pattern_learning": pattern_learning, "context_reduction": context_reduction, } return data
from_dictis already set up to consume anoptimizationsection, so this would align the serialized shape with what you document.
count_tokensdocstring example doesn’t match implementation
Givenwords = text.split()andestimated_tokens = int(len(words) * 1.33), the example:>>> count_tokens("Hello world, this is a test") 8actually returns
7with the current heuristic (6 words * 1.33 → 7 afterint). Either adjust the example value or tweak the multiplier if you want the example to be exact.Also applies to: 92-147, 150-161, 221-235, 386-414
src/frontend/script.js (1)
26-28: User query tracking across generate/execute looks consistentStoring
currentUserQueryon generation, resetting it onclearAll, and forwarding it in/execute-testpayload is coherent and matches the backend contract for optionaluser_query. This keeps the original query bound to the generated code and avoids leakage across sessions.One behavioral nuance to be aware of: if a user generates once, then manually replaces the code without pressing “New Test”, executions will still send the original
currentUserQuery. If you’d rather only learn from queries that directly produced the executed code, consider clearingcurrentUserQuerywhen users paste or heavily edit code after generation.Also applies to: 387-389, 536-538, 648-650
src/backend/crew_ai/library_context/dynamic_context.py (1)
179-213: Minimal planning context implementation is sound; consider tightening exception scope
get_minimal_planning_contextcorrectly reusesget_library_documentation, includes version in the banner when available, and degrades gracefully when documentation loading fails.The broad
except Exceptionis acceptable here for resiliency, but if you want to satisfy BLE001 and avoid hiding programmer errors, you could narrow it to the expected failure modes (e.g.,ImportError,OSError,json.JSONDecodeError) while letting unexpected exceptions surface.docs/OPTIMIZATION_DEVELOPER_GUIDE.md (1)
1-2233: Address markdownlint issues for better tooling compatibilityThe guide is thorough and well‑structured. markdownlint is flagging a few mechanical issues:
- Some fenced code blocks lack a language spec (e.g., around lines 26, 54, 861, 1933, 1946, etc.). Consider adding
bash,python,json, etc. to those fences.- Several lines use emphasis (
**...**) in places where a heading level (e.g.,### ...) would be more appropriate (MD036).These don’t affect rendering much but fixing them will reduce noise from docs linters and improve IDE support.
src/backend/services/workflow_service.py (3)
81-83: Unusedvalidation_outputandoptimization_metricsfromrun_crew
run_crewnow returns three values, butvalidation_outputandoptimization_metricsare never used inrun_agentic_workflow. This is harmless but flagged by Ruff and slightly misleading.If you don’t plan to use them here, consider marking them as intentionally unused:
- validation_output, crew_with_results, optimization_metrics = run_crew( + _result, crew_with_results, _optimization_metrics = run_crew( natural_language_query, model_provider, model_name, library_type=None, workflow_id=workflow_id)If you do intend to surface optimization metrics later, wiring them into the unified
WorkflowMetricswould be a good follow‑up.
449-457: Type hint foruser_queryis slightly non‑idiomatic
stream_execute_onlyusesuser_query: str = None, which is valid at runtime but violates PEP‑484 style and triggers RUF013.Consider switching to an explicit optional type for clarity:
-async def stream_execute_only(robot_code: str, user_query: str = None) -> Generator[str, None, None]: +async def stream_execute_only(robot_code: str, user_query: str | None = None) -> Generator[str, None, None]:(Similarly, you can use
Optional[str]if you prefer older syntax.)
489-520: Broadexcept Exceptionaround learning is acceptable but could be narrowedBoth learning blocks wrap all errors in a generic
except Exceptionand log a warning. This is reasonable for a non‑critical sidecar that must not break execution, but it also swallows programming errors (e.g., misconfigurations) the same way as transient environment failures.If you want stricter behavior, consider:
- Narrowing to expected runtime failures (e.g.,
ImportError,OSError,chromadb.errors.*), or- Re‑raising on clearly programmer‑error types while continuing to log and swallow transient ones.
Not urgent, but worth considering once the main wiring is stable.
Also applies to: 593-621
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
.gitignore(1 hunks)docs/OPTIMIZATION.md(1 hunks)docs/OPTIMIZATION_DEVELOPER_GUIDE.md(1 hunks)src/backend/.env.example(1 hunks)src/backend/api/endpoints.py(2 hunks)src/backend/core/config.py(2 hunks)src/backend/core/workflow_metrics.py(3 hunks)src/backend/crew_ai/agents.py(6 hunks)src/backend/crew_ai/crew.py(5 hunks)src/backend/crew_ai/library_context/base.py(1 hunks)src/backend/crew_ai/library_context/browser_context.py(3 hunks)src/backend/crew_ai/library_context/dynamic_context.py(1 hunks)src/backend/crew_ai/library_context/selenium_context.py(4 hunks)src/backend/crew_ai/optimization/__init__.py(1 hunks)src/backend/crew_ai/optimization/chroma_store.py(1 hunks)src/backend/crew_ai/optimization/context_pruner.py(1 hunks)src/backend/crew_ai/optimization/keyword_search_tool.py(1 hunks)src/backend/crew_ai/optimization/logging_config.py(1 hunks)src/backend/crew_ai/optimization/pattern_learning.py(1 hunks)src/backend/crew_ai/optimization/smart_keyword_provider.py(1 hunks)src/backend/crew_ai/tasks.py(1 hunks)src/backend/requirements.txt(1 hunks)src/backend/services/workflow_service.py(4 hunks)src/frontend/script.js(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (15)
src/backend/crew_ai/optimization/logging_config.py (1)
src/backend/crew_ai/llm_output_cleaner.py (2)
LLMFormattingMonitor(366-413)LLMOutputCleaner(31-363)
src/backend/services/workflow_service.py (5)
src/backend/crew_ai/crew.py (1)
run_crew(53-277)src/backend/crew_ai/optimization/smart_keyword_provider.py (2)
SmartKeywordProvider(20-335)learn_from_execution(323-335)src/backend/crew_ai/optimization/pattern_learning.py (2)
QueryPatternMatcher(22-308)learn_from_execution(145-195)src/backend/crew_ai/optimization/chroma_store.py (1)
KeywordVectorStore(18-378)src/backend/crew_ai/library_context/__init__.py (1)
get_library_context(21-43)
src/backend/crew_ai/library_context/base.py (2)
src/backend/crew_ai/library_context/browser_context.py (1)
core_rules(52-96)src/backend/crew_ai/library_context/selenium_context.py (1)
core_rules(189-234)
src/backend/crew_ai/optimization/pattern_learning.py (2)
src/backend/crew_ai/optimization/chroma_store.py (1)
get_or_create_pattern_collection(84-104)src/backend/crew_ai/optimization/smart_keyword_provider.py (1)
learn_from_execution(323-335)
src/backend/crew_ai/library_context/browser_context.py (3)
src/backend/crew_ai/library_context/base.py (4)
core_rules(123-140)planning_context(33-42)code_assembly_context(46-56)validation_context(60-69)src/backend/crew_ai/library_context/selenium_context.py (4)
core_rules(189-234)planning_context(35-43)code_assembly_context(46-154)validation_context(237-246)src/backend/crew_ai/library_context/dynamic_context.py (1)
get_minimal_planning_context(179-212)
src/backend/crew_ai/optimization/chroma_store.py (4)
src/backend/core/config.py (1)
Settings(11-87)src/backend/crew_ai/library_context/browser_context.py (1)
library_name(27-28)src/backend/crew_ai/library_context/selenium_context.py (1)
library_name(27-28)src/backend/crew_ai/library_context/dynamic_context.py (2)
DynamicLibraryDocumentation(23-233)get_library_documentation(41-89)
src/backend/crew_ai/optimization/__init__.py (6)
src/backend/crew_ai/optimization/chroma_store.py (1)
KeywordVectorStore(18-378)src/backend/crew_ai/optimization/keyword_search_tool.py (1)
KeywordSearchTool(18-169)src/backend/crew_ai/optimization/pattern_learning.py (1)
QueryPatternMatcher(22-308)src/backend/crew_ai/optimization/smart_keyword_provider.py (1)
SmartKeywordProvider(20-335)src/backend/crew_ai/optimization/context_pruner.py (1)
ContextPruner(17-204)src/backend/crew_ai/optimization/logging_config.py (6)
get_optimization_logger(23-47)configure_optimization_logging(50-104)LogMessages(108-146)log_fallback(150-166)log_critical_failure(169-184)log_performance_metric(187-207)
src/backend/api/endpoints.py (1)
src/backend/services/workflow_service.py (1)
stream_execute_only(449-524)
src/backend/crew_ai/agents.py (3)
src/backend/crew_ai/library_context/base.py (3)
library_name(21-23)code_assembly_context(46-56)validation_context(60-69)src/backend/crew_ai/library_context/browser_context.py (3)
library_name(27-28)code_assembly_context(110-205)validation_context(208-216)src/backend/crew_ai/library_context/selenium_context.py (3)
library_name(27-28)code_assembly_context(46-154)validation_context(237-246)
src/backend/crew_ai/tasks.py (3)
src/backend/crew_ai/library_context/base.py (1)
planning_context(33-42)src/backend/crew_ai/library_context/browser_context.py (1)
planning_context(99-107)src/backend/crew_ai/library_context/selenium_context.py (1)
planning_context(35-43)
src/backend/crew_ai/library_context/selenium_context.py (3)
src/backend/crew_ai/library_context/dynamic_context.py (1)
get_minimal_planning_context(179-212)src/backend/crew_ai/library_context/base.py (3)
code_assembly_context(46-56)core_rules(123-140)validation_context(60-69)src/backend/crew_ai/library_context/browser_context.py (3)
code_assembly_context(110-205)core_rules(52-96)validation_context(208-216)
src/backend/crew_ai/optimization/keyword_search_tool.py (2)
src/backend/crew_ai/optimization/chroma_store.py (1)
search(197-243)src/backend/core/workflow_metrics.py (1)
track_keyword_search(106-121)
src/backend/crew_ai/library_context/dynamic_context.py (3)
src/backend/crew_ai/library_context/base.py (1)
library_name(21-23)src/backend/crew_ai/library_context/browser_context.py (1)
library_name(27-28)src/backend/crew_ai/library_context/selenium_context.py (1)
library_name(27-28)
src/backend/crew_ai/optimization/smart_keyword_provider.py (8)
src/backend/crew_ai/optimization/pattern_learning.py (2)
get_relevant_keywords(197-257)learn_from_execution(145-195)src/backend/crew_ai/optimization/chroma_store.py (2)
KeywordVectorStore(18-378)search(197-243)src/backend/crew_ai/optimization/keyword_search_tool.py (1)
KeywordSearchTool(18-169)src/backend/crew_ai/optimization/context_pruner.py (4)
ContextPruner(17-204)classify_query(94-142)prune_keywords(144-175)get_pruning_stats(177-204)src/backend/crew_ai/library_context/base.py (5)
library_name(21-23)core_rules(123-140)planning_context(33-42)code_assembly_context(46-56)validation_context(60-69)src/backend/crew_ai/library_context/browser_context.py (5)
library_name(27-28)core_rules(52-96)planning_context(99-107)code_assembly_context(110-205)validation_context(208-216)src/backend/crew_ai/library_context/selenium_context.py (5)
library_name(27-28)core_rules(189-234)planning_context(35-43)code_assembly_context(46-154)validation_context(237-246)src/backend/core/workflow_metrics.py (1)
track_pattern_learning(123-134)
src/backend/crew_ai/crew.py (5)
src/backend/core/workflow_metrics.py (3)
WorkflowMetrics(18-236)count_tokens(386-414)track_context_reduction(136-148)src/backend/crew_ai/optimization/chroma_store.py (2)
KeywordVectorStore(18-378)ensure_collection_ready(361-378)src/backend/crew_ai/optimization/pattern_learning.py (1)
QueryPatternMatcher(22-308)src/backend/crew_ai/optimization/smart_keyword_provider.py (2)
get_agent_context(210-280)get_keyword_search_tool(310-321)src/backend/crew_ai/agents.py (1)
RobotAgents(52-237)
🪛 dotenv-linter (4.0.0)
src/backend/.env.example
[warning] 68-68: [ExtraBlankLine] Extra blank line detected
(ExtraBlankLine)
🪛 markdownlint-cli2 (0.18.1)
docs/OPTIMIZATION_DEVELOPER_GUIDE.md
26-26: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
54-54: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
534-534: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
557-557: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
572-572: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
588-588: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
617-617: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
633-633: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
660-660: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
679-679: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
721-721: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
748-748: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
761-761: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
782-782: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
839-839: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
861-861: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
870-870: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
950-950: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1064-1064: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1933-1933: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1946-1946: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1974-1974: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
2001-2001: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
2027-2027: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
2056-2056: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/OPTIMIZATION.md
289-289: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
371-371: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.4)
src/backend/crew_ai/optimization/context_pruner.py
27-52: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
src/backend/crew_ai/optimization/logging_config.py
98-98: Do not catch blind exception: Exception
(BLE001)
src/backend/services/workflow_service.py
81-81: Unpacked variable validation_output is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
81-81: Unpacked variable optimization_metrics is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
449-449: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
515-515: Do not catch blind exception: Exception
(BLE001)
616-616: Undefined name natural_language_query
(F821)
618-618: Do not catch blind exception: Exception
(BLE001)
src/backend/crew_ai/optimization/pattern_learning.py
253-253: Consider moving this statement to an else block
(TRY300)
285-285: Consider moving this statement to an else block
(TRY300)
304-304: Consider moving this statement to an else block
(TRY300)
src/backend/crew_ai/optimization/chroma_store.py
56-56: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
78-78: Consider moving this statement to an else block
(TRY300)
81-81: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
100-100: Consider moving this statement to an else block
(TRY300)
103-103: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
161-161: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
193-193: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
239-239: Consider moving this statement to an else block
(TRY300)
241-241: Do not catch blind exception: Exception
(BLE001)
242-242: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
264-264: Consider moving this statement to an else block
(TRY300)
266-266: Do not catch blind exception: Exception
(BLE001)
285-285: Do not catch blind exception: Exception
(BLE001)
314-314: Consider moving this statement to an else block
(TRY300)
316-316: Do not catch blind exception: Exception
(BLE001)
334-334: Do not catch blind exception: Exception
(BLE001)
341-341: Local variable collection is assigned to but never used
Remove assignment to unused variable collection
(F841)
358-358: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
377-377: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
src/backend/crew_ai/optimization/__init__.py
26-38: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
src/backend/core/config.py
81-81: Avoid specifying long messages outside the exception class
(TRY003)
src/backend/api/endpoints.py
57-57: f-string without any placeholders
Remove extraneous f prefix
(F541)
src/backend/crew_ai/optimization/keyword_search_tool.py
128-128: Consider moving this statement to an else block
(TRY300)
130-130: Do not catch blind exception: Exception
(BLE001)
131-131: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
src/backend/crew_ai/library_context/dynamic_context.py
196-196: Do not catch blind exception: Exception
(BLE001)
src/backend/crew_ai/optimization/smart_keyword_provider.py
69-69: Unused method argument: agent_role
(ARG002)
114-114: Unused method argument: agent_role
(ARG002)
165-165: Do not catch blind exception: Exception
(BLE001)
249-249: Do not catch blind exception: Exception
(BLE001)
261-261: Do not catch blind exception: Exception
(BLE001)
275-275: Do not catch blind exception: Exception
(BLE001)
276-276: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
334-334: Do not catch blind exception: Exception
(BLE001)
335-335: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
src/backend/crew_ai/crew.py
99-99: Local variable optimized_context is assigned to but never used
Remove assignment to unused variable optimized_context
(F841)
136-136: Do not catch blind exception: Exception
(BLE001)
187-187: Do not catch blind exception: Exception
(BLE001)
188-188: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
198-198: String contains ambiguous ℹ (INFORMATION SOURCE). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
260-260: Consider moving this statement to an else block
(TRY300)
🔇 Additional comments (5)
.gitignore (3)
26-32: Gitignore updates align well with PR objectives.The additions to ignore AI-generated instruction files (including
.github/copilot-instructions.md) and the expanded comment clarity are appropriate and support the development environment.
37-39: New ignore rules are well-aligned with optimization system components.The new directory ignores (
chroma_db/,data/) appropriately exclude persistent storage and generated data from the PR's keyword optimization system (ChromaDB semantic search, pattern learning, and metrics), preventing unnecessary repository bloat.
43-43: Verified: The .gitignore path change is correct and safe.The change renames a gitignored local development directory from
tools/browser_service_local/*totools/browser_service/*. This is a safe refinement because:
Both
tools/browser_service_local/andtools/browser_service/are local development directories that are gitignored by design, so they are never checked in—they are generated or populated locally during development.The external
browser-service>=1.0.0pip package in requirements.txt is the actual production dependency; the gitignored directories are only for local artifacts.Important checked-in source code (including imports from the
browser_servicemodule intools/browser_use_service.pyand documented inARCHITECTURE.md) remains unaffected by this gitignore change.No issues found.
src/backend/crew_ai/tasks.py (1)
31-46: Minimal keyword guidelines align with new planning context strategySwitching
_get_keyword_guidelinesto uselibrary_context.planning_contextand falling back to high‑level action types is consistent with the new minimal planning model and keeps token usage low while preserving intent.src/backend/crew_ai/library_context/base.py (1)
121-141: Newcore_rulescontract is well‑defined and consistentAdding the abstract
core_rulesproperty with a focused docstring gives a clear, centralized contract for “must‑include” library rules that optimization components can rely on. As long as allLibraryContextimplementations (e.g.,BrowserLibraryContext,SeleniumLibraryContext, any custom contexts) implement this property, the change is safe.If you have any out‑of‑tree
LibraryContextsubclasses, double‑check they’ve been updated to avoid instantiation errors.
| # Get library-specific context if available (OPTIMIZED - minimal rules) | ||
| library_knowledge = "" | ||
| if self.library_context: | ||
| library_knowledge = f"\n\n{self.library_context.validation_context}" | ||
|
|
||
| return Agent( | ||
| role="Robot Framework Linter and Quality Assurance Engineer", | ||
| goal=f"Validate the generated Robot Framework code for correctness and adherence to {self.library_context.library_name if self.library_context else 'Robot Framework'} rules, and delegate fixes to Code Assembly Agent if errors are found.", | ||
| goal=f"Validate Robot Framework code for {self.library_context.library_name if self.library_context else 'Robot Framework'} correctness. Delegate fixes if errors found.", | ||
| backstory=( | ||
| "You are an expert Robot Framework linter. Your sole task is to validate the provided " | ||
| "Robot Framework code for syntax errors, correct keyword usage, and adherence to critical rules. " | ||
| "You must be thorough and provide a clear validation result.\n\n" | ||
| "**DELEGATION WORKFLOW:**\n" | ||
| "When you find errors in the code, you MUST follow this workflow:\n" | ||
| "1. Identify and document all syntax errors, incorrect keyword usage, and rule violations\n" | ||
| "2. Create a detailed fix request with:\n" | ||
| " - Specific line numbers where errors occur\n" | ||
| " - Clear description of each error\n" | ||
| " - Examples of correct syntax for each issue\n" | ||
| " - Relevant Robot Framework rules being violated\n" | ||
| "3. Delegate the fix request to the Code Assembly Agent with clear, actionable instructions\n" | ||
| "4. The Code Assembly Agent will regenerate the code incorporating your feedback\n" | ||
| "5. You will then validate the regenerated code and repeat if necessary\n\n" | ||
| "**CRITICAL DELEGATION INSTRUCTIONS:**\n" | ||
| "When you find errors, create a detailed fix request and delegate to Code Assembly Agent.\n" | ||
| "Your delegation message should include:\n" | ||
| "- A summary of all errors found\n" | ||
| "- Specific corrections needed for each error\n" | ||
| "- Code examples showing the correct implementation\n" | ||
| "- Priority ranking if multiple errors exist (fix critical syntax errors first)\n\n" | ||
| "**VALIDATION CRITERIA:**\n" | ||
| "- Syntax correctness (indentation, spacing, structure)\n" | ||
| "- Correct keyword usage for the target library\n" | ||
| "- Proper variable assignments for keywords that return values\n" | ||
| "- Valid locator formats\n" | ||
| "- Correct test case structure\n\n" | ||
| "If the code is valid, clearly state 'VALID' and provide a brief summary. " | ||
| "If errors are found, immediately delegate to Code Assembly Agent with detailed fix instructions." | ||
| "Expert Robot Framework validator. Check: syntax, keyword usage, variable assignments, locator formats, test structure. " | ||
| "If VALID: Return JSON {\"valid\": true, \"reason\": \"...\"}. " | ||
| "If INVALID: Document errors with line numbers, then delegate to Code Assembly Agent with fix instructions." | ||
| f"{library_knowledge}" | ||
| ), | ||
| llm=self.llm, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validator context should override defaults
validator_context is calculated upstream but code_validator_agent still ignores it and always expands library_context.validation_context. That keeps the large baseline payload and prevents any token savings for the validator. Please prefer validator_context when present and fall back to the legacy context otherwise.
- library_knowledge = ""
- if self.library_context:
- library_knowledge = f"\n\n{self.library_context.validation_context}"
+ library_knowledge = ""
+ if self.validator_context:
+ library_knowledge = f"\n\n{self.validator_context.strip()}"
+ elif self.library_context:
+ library_knowledge = f"\n\n{self.library_context.validation_context}"🤖 Prompt for AI Agents
In src/backend/crew_ai/agents.py around lines 219 to 233, the agent builds
library_knowledge from library_context.validation_context and never uses
validator_context; change the construction to prefer validator_context when
present and fall back to library_context.validation_context (or empty string) so
the validator can use the smaller, precomputed validator_context. Update the
library_knowledge assignment to check self.validator_context first, then
self.library_context.validation_context, and ensure the backstory string
concatenates the chosen context.
| embedding = self.model.encode([description])[0] | ||
| self.category_embeddings[category] = embedding | ||
| logger.debug(f"Category '{category}' embedding shape: {embedding.shape}") | ||
|
|
||
| logger.info(f"Pre-computed embeddings for {len(self.category_embeddings)} categories") | ||
|
|
||
| def classify_query(self, user_query: str, confidence_threshold: float = 0.8) -> List[str]: | ||
| """ | ||
| Classify query into action categories using semantic similarity. | ||
| Computes similarity between the query and each category description. | ||
| Returns categories that meet the confidence threshold, or all categories | ||
| if no category meets the threshold (graceful degradation). | ||
| Args: | ||
| user_query: User's natural language query | ||
| confidence_threshold: Minimum similarity for category inclusion (0.0-1.0) | ||
| Returns: | ||
| List of relevant category names (e.g., ["input", "interaction"]) | ||
| Returns all categories if confidence too low (fallback) | ||
| """ | ||
| logger.debug(f"Classifying query: {user_query[:50]}...") | ||
|
|
||
| # Encode query | ||
| query_embedding = self.model.encode([user_query])[0] | ||
|
|
||
| # Compute similarity with each category | ||
| similarities = {} | ||
| for category, category_embedding in self.category_embeddings.items(): | ||
| # Cosine similarity using dot product (embeddings are normalized) | ||
| similarity = np.dot(query_embedding, category_embedding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Normalize embeddings before applying cosine threshold
SentenceTransformer.encode does not normalize by default, so the dot product here is an unbounded inner product. With the fixed threshold of 0.8 we’ll routinely fail to meet the cutoff (or exceed 1.0), causing spurious fallbacks to “all categories” and defeating pruning. Please normalize both the category and query embeddings (pass normalize_embeddings=True) before taking the dot product.
- embedding = self.model.encode([description])[0]
+ embedding = self.model.encode(
+ [description],
+ normalize_embeddings=True,
+ )[0]
...
- query_embedding = self.model.encode([user_query])[0]
+ query_embedding = self.model.encode(
+ [user_query],
+ normalize_embeddings=True,
+ )[0]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| embedding = self.model.encode([description])[0] | |
| self.category_embeddings[category] = embedding | |
| logger.debug(f"Category '{category}' embedding shape: {embedding.shape}") | |
| logger.info(f"Pre-computed embeddings for {len(self.category_embeddings)} categories") | |
| def classify_query(self, user_query: str, confidence_threshold: float = 0.8) -> List[str]: | |
| """ | |
| Classify query into action categories using semantic similarity. | |
| Computes similarity between the query and each category description. | |
| Returns categories that meet the confidence threshold, or all categories | |
| if no category meets the threshold (graceful degradation). | |
| Args: | |
| user_query: User's natural language query | |
| confidence_threshold: Minimum similarity for category inclusion (0.0-1.0) | |
| Returns: | |
| List of relevant category names (e.g., ["input", "interaction"]) | |
| Returns all categories if confidence too low (fallback) | |
| """ | |
| logger.debug(f"Classifying query: {user_query[:50]}...") | |
| # Encode query | |
| query_embedding = self.model.encode([user_query])[0] | |
| # Compute similarity with each category | |
| similarities = {} | |
| for category, category_embedding in self.category_embeddings.items(): | |
| # Cosine similarity using dot product (embeddings are normalized) | |
| similarity = np.dot(query_embedding, category_embedding) | |
| embedding = self.model.encode( | |
| [description], | |
| normalize_embeddings=True, | |
| )[0] | |
| self.category_embeddings[category] = embedding | |
| logger.debug(f"Category '{category}' embedding shape: {embedding.shape}") | |
| logger.info(f"Pre-computed embeddings for {len(self.category_embeddings)} categories") | |
| def classify_query(self, user_query: str, confidence_threshold: float = 0.8) -> List[str]: | |
| """ | |
| Classify query into action categories using semantic similarity. | |
| Computes similarity between the query and each category description. | |
| Returns categories that meet the confidence threshold, or all categories | |
| if no category meets the threshold (graceful degradation). | |
| Args: | |
| user_query: User's natural language query | |
| confidence_threshold: Minimum similarity for category inclusion (0.0-1.0) | |
| Returns: | |
| List of relevant category names (e.g., ["input", "interaction"]) | |
| Returns all categories if confidence too low (fallback) | |
| """ | |
| logger.debug(f"Classifying query: {user_query[:50]}...") | |
| # Encode query | |
| query_embedding = self.model.encode( | |
| [user_query], | |
| normalize_embeddings=True, | |
| )[0] | |
| # Compute similarity with each category | |
| similarities = {} | |
| for category, category_embedding in self.category_embeddings.items(): | |
| # Cosine similarity using dot product (embeddings are normalized) | |
| similarity = np.dot(query_embedding, category_embedding) |
🤖 Prompt for AI Agents
In src/backend/crew_ai/optimization/context_pruner.py around lines 88 to 119,
the code computes cosine similarity by dot product but the
SentenceTransformer.encode calls do not normalize embeddings, so the dot product
is an unbounded inner product and the fixed 0.8 threshold is invalid; fix this
by ensuring both category and query embeddings are normalized before similarity:
either pass normalize_embeddings=True to the model.encode calls when computing
and storing category embeddings and when encoding the query, or manually
L2-normalize each embedding (divide by its norm) immediately after encoding;
then continue using np.dot for similarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shall be using normalised Euclidean distance, and by default, this shall be done using the chromadb's implemented methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Devasy23, thank you for the clarification! I want to make sure I understand your architectural intent correctly.
Currently, the ContextPruner class performs local classification by:
- Pre-computing category embeddings using SentenceTransformer
- Computing similarity using
np.dot(which CodeRabbit correctly flagged as potentially unnormalized)
You're suggesting to use ChromaDB's distance calculation methods instead. I see a few possible approaches:
Option 1: Store category descriptions in ChromaDB
Create a separate ChromaDB collection for category descriptions and use its built-in normalized distance calculations:
- Store the 6 category descriptions with their metadata
- Query against this collection during
classify_query - Leverage ChromaDB's normalized cosine distance (or L2 if you prefer Euclidean)
Option 2: Use ChromaDB's embedding function directly
Use the same SentenceTransformerEmbeddingFunction from ChromaDB that's used in chroma_store.py, which handles normalization consistently.
Option 3: Use normalized Euclidean (L2) distance
If you specifically want Euclidean distance rather than cosine similarity, we should normalize embeddings and compute L2 distance: np.linalg.norm(query_embedding - category_embedding).
Questions:
- Should we refactor
ContextPrunerto store category descriptions in a ChromaDB collection and use its query methods? - Do you prefer normalized Euclidean distance over cosine similarity for category classification? (Note: cosine is typically better for semantic similarity)
- Should we use the same embedding function setup as
KeywordVectorStorefor consistency?
Could you clarify which approach aligns with your vision? I'm happy to provide implementation guidance once I understand the direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- yes, 2. normalized cosine might be fine as 0 to 1 range is crucial here, 3. Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the KeywordVectorStore implementation to understand the ChromaDB pattern
cat src/backend/crew_ai/optimization/chroma_store.py | head -100
Length of output: 3238
@Devasy23, perfect! Let me provide the refactored implementation that aligns ContextPruner with the KeywordVectorStore architecture.
Refactored Implementation
Here's the updated ContextPruner class that uses ChromaDB with normalized cosine similarity:
"""
Context Pruner for Smart Keyword Filtering
This module classifies user queries into action categories and prunes
keyword context to include only relevant keywords, reducing token usage
while maintaining code generation accuracy.
"""
import logging
from typing import List, Dict
import chromadb
from chromadb.config import Settings
from chromadb.utils import embedding_functions
logger = logging.getLogger(__name__)
class ContextPruner:
"""
Classify queries and prune context to relevant keyword categories.
Uses ChromaDB for semantic similarity to classify queries into action
categories (navigation, input, interaction, extraction, assertion, wait)
and filters keywords to only those in relevant categories.
"""
# Keyword category mappings
KEYWORD_CATEGORIES = {
"navigation": [
"New Browser", "New Page", "Go To", "Go Back", "Go Forward",
"Close Browser", "Close Page", "Switch Page", "New Context"
],
"input": [
"Fill Text", "Input Text", "Type Text", "Press Keys",
"Upload File", "Type Secret", "Clear Text"
],
"interaction": [
"Click", "Click Element", "Hover", "Drag And Drop",
"Select Options By", "Check Checkbox", "Uncheck Checkbox"
],
"extraction": [
"Get Text", "Get Attribute", "Get Element Count",
"Get Property", "Get Style", "Get Url", "Get Title"
],
"assertion": [
"Should Be Equal", "Should Contain", "Should Be Visible",
"Should Not Be Visible", "Should Be Enabled", "Should Be Disabled"
],
"wait": [
"Wait For Elements State", "Wait Until Element Is Visible",
"Wait For Condition", "Wait For Load State", "Sleep"
]
}
def __init__(
self,
model_name: str = "all-MiniLM-L6-v2",
persist_directory: str = "./chroma_db"
):
"""
Initialize with ChromaDB for semantic classification.
Args:
model_name: Name of sentence-transformers model to use
persist_directory: Path to ChromaDB storage directory
"""
logger.info(f"Initializing ContextPruner with ChromaDB at {persist_directory}")
try:
# Initialize ChromaDB client (same pattern as KeywordVectorStore)
self.client = chromadb.PersistentClient(
path=persist_directory,
settings=Settings(
anonymized_telemetry=False,
allow_reset=True
)
)
# Initialize embedding function
self.embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(
model_name=model_name
)
# Create or get category collection
self.collection = self.client.get_or_create_collection(
name="category_descriptions",
embedding_function=self.embedding_function,
metadata={"type": "query_categories"}
)
# Initialize category descriptions in ChromaDB
self._init_category_collection()
logger.info("ContextPruner initialized successfully with ChromaDB")
except Exception as e:
logger.error(f"Failed to initialize ContextPruner: {e}")
raise
def _init_category_collection(self):
"""
Initialize ChromaDB collection with category descriptions.
Stores semantic representations of each category for fast
similarity comparison during query classification.
"""
logger.debug("Initializing category descriptions in ChromaDB")
# Define category descriptions for semantic matching
category_descriptions = {
"navigation": "open browser navigate to website go to page url address",
"input": "type text fill form input data enter information write",
"interaction": "click button press element hover drag drop select",
"extraction": "get text retrieve data extract information read content",
"assertion": "verify check validate assert should be equal confirm",
"wait": "wait for element visible ready loaded appear timeout"
}
# Check if collection is already populated
existing_count = self.collection.count()
if existing_count == len(category_descriptions):
logger.debug(f"Category collection already populated with {existing_count} entries")
return
# Add category descriptions to ChromaDB
try:
ids = list(category_descriptions.keys())
documents = list(category_descriptions.values())
metadatas = [{"category": cat} for cat in ids]
# Upsert to handle re-initialization
self.collection.upsert(
ids=ids,
documents=documents,
metadatas=metadatas
)
logger.info(f"Initialized {len(category_descriptions)} category descriptions in ChromaDB")
except Exception as e:
logger.error(f"Failed to initialize category collection: {e}")
raise
def classify_query(
self,
user_query: str,
confidence_threshold: float = 0.8
) -> List[str]:
"""
Classify query into action categories using ChromaDB semantic search.
Queries the ChromaDB collection to find similar categories based on
normalized cosine similarity. Returns categories that meet the
confidence threshold, or all categories if no category meets the
threshold (graceful degradation).
Args:
user_query: User's natural language query
confidence_threshold: Minimum similarity for category inclusion (0.0-1.0)
Returns:
List of relevant category names (e.g., ["input", "interaction"])
Returns all categories if confidence too low (fallback)
"""
logger.debug(f"Classifying query: {user_query[:50]}...")
try:
# Query ChromaDB for similar categories
# ChromaDB returns normalized cosine distance (0 = identical, 2 = opposite)
# We need to convert to similarity: similarity = 1 - (distance / 2)
results = self.collection.query(
query_texts=[user_query],
n_results=len(self.KEYWORD_CATEGORIES)
)
# Extract categories and convert distances to similarities
similarities = {}
if results['ids'] and len(results['ids'][0]) > 0:
for idx, category_id in enumerate(results['ids'][0]):
distance = results['distances'][0][idx]
# Convert cosine distance to similarity (0-1 range)
# ChromaDB cosine distance range: [0, 2]
# Similarity = 1 - (distance / 2) gives us [0, 1] range
similarity = 1.0 - (distance / 2.0)
similarities[category_id] = similarity
logger.debug(f"Category '{category_id}': distance={distance:.4f}, similarity={similarity:.4f}")
# Filter categories by confidence threshold
relevant_categories = [
cat for cat, sim in similarities.items()
if sim >= confidence_threshold
]
if relevant_categories:
logger.info(
f"Classified query into {len(relevant_categories)} categories: "
f"{relevant_categories} (threshold={confidence_threshold})"
)
return relevant_categories
else:
# Graceful degradation: return all categories if none meet threshold
all_categories = list(self.KEYWORD_CATEGORIES.keys())
logger.warning(
f"No categories met threshold {confidence_threshold}. "
f"Highest similarity: {max(similarities.values()):.4f}. "
f"Falling back to all categories."
)
return all_categories
except Exception as e:
logger.error(f"Classification failed: {e}. Falling back to all categories.")
return list(self.KEYWORD_CATEGORIES.keys())
def prune_keywords(
self,
all_keywords: List[Dict],
categories: List[str]
) -> List[Dict]:
"""
Filter keywords to only those in relevant categories.
Args:
all_keywords: List of keyword dicts with 'name' field
categories: List of relevant category names
Returns:
Filtered list of keyword dicts
"""
logger.debug(f"Pruning keywords for categories: {categories}")
# Build set of relevant keyword names
relevant_names = set()
for category in categories:
if category in self.KEYWORD_CATEGORIES:
relevant_names.update(self.KEYWORD_CATEGORIES[category])
# Filter keywords
pruned = [
kw for kw in all_keywords
if kw.get("name") in relevant_names
]
logger.info(
f"Pruned {len(all_keywords)} keywords to {len(pruned)} "
f"({len(pruned)/len(all_keywords)*100:.1f}% retained)"
)
return pruned
def get_pruning_stats(
self,
original_count: int,
pruned_count: int
) -> Dict[str, float]:
"""
Calculate pruning statistics.
Args:
original_count: Number of keywords before pruning
pruned_count: Number of keywords after pruning
Returns:
Dict with retention_rate and reduction_rate
"""
if original_count == 0:
return {"retention_rate": 0.0, "reduction_rate": 0.0}
retention = pruned_count / original_count
reduction = 1.0 - retention
return {
"retention_rate": retention,
"reduction_rate": reduction
}Key Changes
-
ChromaDB Integration: Uses
PersistentClientandSentenceTransformerEmbeddingFunction(same asKeywordVectorStore) -
Normalized Cosine Similarity: ChromaDB automatically normalizes embeddings and returns cosine distance in the range [0, 2]. The conversion formula is:
similarity = 1.0 - (distance / 2.0) # Maps [0, 2] → [1, 0]
This gives us the 0-1 similarity range you need, where:
similarity = 1.0means identical (distance = 0)similarity = 0.0means opposite (distance = 2)
-
Persistent Storage: Category descriptions are stored in a
category_descriptionscollection, avoiding re-computation on each instantiation -
Consistent Architecture: Follows the same patterns as
KeywordVectorStorefor maintainability -
Updated Constructor: Added
persist_directoryparameter (defaults to same location asKeywordVectorStore)
Usage in smart_keyword_provider.py
Update the initialization to pass the ChromaDB path:
context_pruner = ContextPruner(
persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH
)This ensures all optimization components share the same ChromaDB instance and storage location.
Let me know if you'd like me to also update the instantiation points in smart_keyword_provider.py and crew.py!
| logger.info( | ||
| f"Pruned keywords: {len(all_keywords)} -> {len(pruned_keywords)} " | ||
| f"({len(pruned_keywords)/len(all_keywords)*100:.1f}% retained)" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid divide-by-zero when no keywords are provided
If all_keywords is empty we hit a ZeroDivisionError in the retention log formatting, crashing the caller instead of gracefully returning an empty list. Please guard the division.
- logger.info(
- f"Pruned keywords: {len(all_keywords)} -> {len(pruned_keywords)} "
- f"({len(pruned_keywords)/len(all_keywords)*100:.1f}% retained)"
- )
+ original_count = len(all_keywords)
+ pruned_count = len(pruned_keywords)
+
+ retention_pct = (pruned_count / original_count * 100) if original_count else 0.0
+
+ logger.info(
+ f"Pruned keywords: {original_count} -> {pruned_count} "
+ f"({retention_pct:.1f}% retained)"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.info( | |
| f"Pruned keywords: {len(all_keywords)} -> {len(pruned_keywords)} " | |
| f"({len(pruned_keywords)/len(all_keywords)*100:.1f}% retained)" | |
| ) | |
| original_count = len(all_keywords) | |
| pruned_count = len(pruned_keywords) | |
| retention_pct = (pruned_count / original_count * 100) if original_count else 0.0 | |
| logger.info( | |
| f"Pruned keywords: {original_count} -> {pruned_count} " | |
| f"({retention_pct:.1f}% retained)" | |
| ) |
🤖 Prompt for AI Agents
In src/backend/crew_ai/optimization/context_pruner.py around lines 170 to 173,
the logging expression divides by len(all_keywords) and will raise
ZeroDivisionError when all_keywords is empty; change the calculation to guard
against zero by computing the retention percentage only when len(all_keywords) >
0 (e.g. set retained_pct = 0.0 if len(all_keywords) == 0 else
len(pruned_keywords)/len(all_keywords)*100) and use that retained_pct in the log
message so the function doesn't crash when no keywords are provided.
| # Pattern learning: ONLY learn from PASSED tests | ||
| # This ensures we only learn from validated, working code | ||
| if result.get('test_status') == 'passed': | ||
| if user_query: | ||
| try: | ||
| from src.backend.core.config import settings | ||
| if settings.OPTIMIZATION_ENABLED: | ||
| # Initialize optimization components to learn from this successful execution | ||
| from src.backend.crew_ai.optimization import SmartKeywordProvider, QueryPatternMatcher, KeywordVectorStore | ||
| from src.backend.crew_ai.library_context import get_library_context | ||
|
|
||
| logging.info("📚 Test PASSED - Learning from successful execution...") | ||
|
|
||
| # Initialize components | ||
| library_context = get_library_context(settings.ROBOT_LIBRARY) | ||
| chroma_store = KeywordVectorStore(persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH) | ||
| pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH) | ||
| smart_provider = SmartKeywordProvider( | ||
| library_context=library_context, | ||
| pattern_matcher=pattern_matcher, | ||
| vector_store=chroma_store | ||
| ) | ||
|
|
||
| # Learn from the successful execution | ||
| smart_provider.learn_from_execution(user_query, robot_code) | ||
| logging.info("✅ Pattern learning completed - learned from PASSED test") | ||
| except Exception as e: | ||
| logging.warning(f"⚠️ Failed to learn from execution: {e}") | ||
| else: | ||
| logging.info("⏭️ Test PASSED but skipping pattern learning - no user query provided") | ||
| else: | ||
| logging.info(f"⏭️ Skipping pattern learning - test status: {result.get('test_status', 'unknown')}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Pattern learning sidecar has two functional bugs (missing Chroma store, undefined variable)
The post‑execution pattern learning blocks introduce two concrete issues:
-
QueryPatternMatcheris created without a Chroma storeIn both
stream_execute_onlyandstream_generate_and_run, you do:chroma_store = KeywordVectorStore(persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH) pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH)
but
QueryPatternMatcherrelies on the providedchroma_storeto createpattern_collection. Without it,learn_from_executionwill only update SQLite statistics; no patterns are stored in ChromaDB, andget_relevant_keywordswill always return[]. This effectively disables semantic pattern-based predictions for executions learned through these paths. -
Legacy
stream_generate_and_runuses an undefinednatural_language_queryIn the legacy flow, the learning block calls:
smart_provider.learn_from_execution(natural_language_query, robot_code)
but this function has a
user_queryparameter, notnatural_language_query. At runtime this will raise aNameErrorinside thetryblock, be caught byexcept Exception, and log a warning — meaning pattern learning always fails in this path.
To fix both issues and align with how run_crew initializes these components, consider:
@@ async def stream_execute_only(robot_code: str, user_query: str | None = None) -> Generator[str, None, None]:
- chroma_store = KeywordVectorStore(persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH)
- pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH)
+ chroma_store = KeywordVectorStore(
+ persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH
+ )
+ pattern_matcher = QueryPatternMatcher(
+ db_path=settings.OPTIMIZATION_PATTERN_DB_PATH,
+ chroma_store=chroma_store,
+ )
@@ async def stream_generate_and_run(user_query: str, model_provider: str, model_name: str) -> Generator[str, None, None]:
- chroma_store = KeywordVectorStore(persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH)
- pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH)
+ chroma_store = KeywordVectorStore(
+ persist_directory=settings.OPTIMIZATION_CHROMA_DB_PATH
+ )
+ pattern_matcher = QueryPatternMatcher(
+ db_path=settings.OPTIMIZATION_PATTERN_DB_PATH,
+ chroma_store=chroma_store,
+ )
@@
- # Learn from the successful execution
- smart_provider.learn_from_execution(natural_language_query, robot_code)
+ # Learn from the successful execution
+ smart_provider.learn_from_execution(user_query, robot_code)With these changes, successful executions will correctly populate the query_patterns collection and use the actual user query string, so the optimization system can learn and later serve predictions as designed.
Also applies to: 593-621
I'll verify the claims in this review comment by inspecting the codebase.
🏁 Script executed:
#!/bin/bash
# Search for QueryPatternMatcher class definition
ast-grep --pattern $'class QueryPatternMatcher {
$$$
__init__($_) {
$$$
}
$$$
}'Length of output: 0
🏁 Script executed:
#!/bin/bash
# Find QueryPatternMatcher implementation with rg
rg -n "class QueryPatternMatcher" -A 30Length of output: 5566
🏁 Script executed:
#!/bin/bash
# Search for stream_generate_and_run function
rg -n "def stream_generate_and_run" -A 50 --type pyLength of output: 4264
🏁 Script executed:
#!/bin/bash
# Search for learn_from_execution method signature
rg -n "def learn_from_execution" -A 15 --type pyLength of output: 2911
🏁 Script executed:
#!/bin/bash
# Search for natural_language_query in stream_generate_and_run context
rg -n "natural_language_query" --type py -B 3 -A 3Length of output: 3284
🏁 Script executed:
#!/bin/bash
# Search for stream_execute_only function
rg -n "def stream_execute_only" -A 50 --type pyLength of output: 4685
🏁 Script executed:
#!/bin/bash
# Check what run_crew does with these components for comparison
rg -n "def run_crew" -A 100 --type py | head -150Length of output: 7775
🏁 Script executed:
#!/bin/bash
# Search for QueryPatternMatcher initialization patterns in codebase
rg -n "QueryPatternMatcher(" --type py -B 2 -A 2Length of output: 90
🏁 Script executed:
#!/bin/bash
# Search for KeywordVectorStore initialization patterns
rg -n "KeywordVectorStore(" --type py -B 2 -A 2Length of output: 88
🏁 Script executed:
#!/bin/bash
# Look at imports in workflow_service to understand component structure
head -50 src/backend/services/workflow_service.pyLength of output: 1773
🏁 Script executed:
#!/bin/bash
# Find the optimization module structure
find . -name "optimization.py" -type f | head -5Length of output: 0
🏁 Script executed:
#!/bin/bash
# Inspect the optimization.py file for QueryPatternMatcher
cat src/backend/crew_ai/optimization.py | head -200Length of output: 67
🏁 Script executed:
#!/bin/bash
# Look for QueryPatternMatcher __init__ specifically
ast-grep --pattern 'def __init__' | grep -A 10 QueryPatternMatcherLength of output: 3596
🏁 Script executed:
#!/bin/bash
# Search more broadly for the parameter definitions
rg -n "QueryPatternMatcher" --type py -B 5 -A 15 | head -100Length of output: 9277
🏁 Script executed:
#!/bin/bash
# Find all places where QueryPatternMatcher is instantiated
rg -n "QueryPatternMatcher\(" --type py -B 3 -A 5Length of output: 2759
🏁 Script executed:
#!/bin/bash
# Check the actual workflow_service.py file around lines 489-520
sed -n '480,530p' src/backend/services/workflow_service.pyLength of output: 2921
🏁 Script executed:
#!/bin/bash
# Check the actual workflow_service.py file around lines 593-621
sed -n '590,625p' src/backend/services/workflow_service.pyLength of output: 2220
🏁 Script executed:
#!/bin/bash
# Get learn_from_execution method full signature and implementation
rg -n "def learn_from_execution" --type py -A 25Length of output: 3936
🏁 Script executed:
#!/bin/bash
# Check SmartKeywordProvider initialization
rg -n "class SmartKeywordProvider" --type py -A 40 | head -80Length of output: 4397
🏁 Script executed:
#!/bin/bash
# Look for chroma_store parameter usage in QueryPatternMatcher
rg -n "chroma_store" --type py -B 2 -A 2 | grep -A 5 -B 5 QueryPatternMatcherLength of output: 3571
🏁 Script executed:
#!/bin/bash
# Verify the structure by examining the repository
find . -type f -name "*.py" -path "*crew_ai*" | head -20Length of output: 861
🏁 Script executed:
#!/bin/bash
# Get file structure
ls -la src/backend/crew_ai/Length of output: 627
🏁 Script executed:
#!/bin/bash
# Examine optimization.py structure
wc -l src/backend/crew_ai/optimization.py
head -300 src/backend/crew_ai/optimization.pyLength of output: 161
🏁 Script executed:
#!/bin/bash
# Look for the QueryPatternMatcher class definition completely
rg -n "class QueryPatternMatcher" -A 100 --type py | head -150Length of output: 9409
🏁 Script executed:
#!/bin/bash
# Search for where chroma_store is expected as parameter
rg "chroma_store" src/backend/crew_ai/optimization.py -B 3 -A 3 | head -50Length of output: 75
🏁 Script executed:
#!/bin/bash
# Look at lines 593-621 in context
sed -n '580,630p' src/backend/services/workflow_service.pyLength of output: 2799
🏁 Script executed:
#!/bin/bash
# Find learn_from_execution calls to see expected parameters
rg "learn_from_execution" src/backend/services/workflow_service.py -B 2 -A 2Length of output: 606
🏁 Script executed:
#!/bin/bash
# Search for user_query vs natural_language_query in the file
grep -n "user_query\|natural_language_query" src/backend/services/workflow_service.py | head -30Length of output: 1332
🏁 Script executed:
#!/bin/bash
# Get comprehensive view of stream_generate_and_run including pattern learning section
sed -n '500,640p' src/backend/services/workflow_service.pyLength of output: 6472
🏁 Script executed:
#!/bin/bash
# Check run_crew for comparison
sed -n '150,300p' src/backend/services/workflow_service.pyLength of output: 7456
🏁 Script executed:
#!/bin/bash
# List all Python files in crew_ai module
find src/backend/crew_ai -name "*.py" -type fLength of output: 825
🏁 Script executed:
#!/bin/bash
# Get QueryPatternMatcher complete class
rg "class QueryPatternMatcher" -A 200 src/backend/crew_ai/optimization.pyLength of output: 75
🏁 Script executed:
#!/bin/bash
# Get SmartKeywordProvider.learn_from_execution
rg "def learn_from_execution" -A 30 src/backend/crew_ai/optimization.pyLength of output: 75
🏁 Script executed:
#!/bin/bash
# Verify chroma_store parameter requirement
grep -n "self.chroma_store\|self.vector_store" src/backend/crew_ai/optimization.py | head -20Length of output: 68
🏁 Script executed:
#!/bin/bash
# Check how run_crew initializes QueryPatternMatcher
rg "def run_crew" -A 150 src/backend/services/workflow_service.py | grep -A 30 "QueryPatternMatcher"Length of output: 0
🏁 Script executed:
#!/bin/bash
# Get the exact parameter definition for QueryPatternMatcher.__init__
rg "def __init__" src/backend/crew_ai/optimization.py -A 20 | grep -B 5 -A 20 "QueryPatternMatcher" | head -50Length of output: 75
🏁 Script executed:
#!/bin/bash
# Final check: verify the two issues with exact line context
rg -n "learn_from_execution\|natural_language_query\|QueryPatternMatcher\(" src/backend/services/workflow_service.py -B 2 -A 2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Examine the optimization module for all class definitions
head -500 src/backend/crew_ai/optimization.py | grep -n "class\|def __init__\|self.chroma_store\|self.pattern_matcher"Length of output: 94
🏁 Script executed:
#!/bin/bash
# Get the complete QueryPatternMatcher initialization
python3 << 'EOF'
import ast
import sys
# Parse the optimization.py file
with open('src/backend/crew_ai/optimization.py', 'r') as f:
tree = ast.parse(f.read())
# Find QueryPatternMatcher class
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef) and node.name == 'QueryPatternMatcher':
print(f"Found class QueryPatternMatcher at line {node.lineno}")
for item in node.body:
if isinstance(item, ast.FunctionDef) and item.name == '__init__':
print(f" __init__ at line {item.lineno}")
print(f" Args: {[arg.arg for arg in item.args.args]}")
print(f" Kwonly args: {[arg.arg for arg in item.args.kwonlyargs]}")
EOFLength of output: 166
🏁 Script executed:
#!/bin/bash
# Get SmartKeywordProvider.learn_from_execution parameters
python3 << 'EOF'
import ast
import sys
# Parse the optimization.py file
with open('src/backend/crew_ai/optimization.py', 'r') as f:
tree = ast.parse(f.read())
# Find SmartKeywordProvider class
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef) and node.name == 'SmartKeywordProvider':
print(f"Found class SmartKeywordProvider at line {node.lineno}")
for item in node.body:
if isinstance(item, ast.FunctionDef) and item.name == 'learn_from_execution':
print(f" learn_from_execution at line {item.lineno}")
print(f" Args: {[arg.arg for arg in item.args.args]}")
print(f" Kwonly args: {[arg.arg for arg in item.args.kwonlyargs]}")
EOFLength of output: 166
🏁 Script executed:
#!/bin/bash
# Final verification of the exact issues
echo "=== Issue 1: Check if QueryPatternMatcher requires chroma_store ==="
grep -n "self.chroma_store\|self._chroma\|_chroma" src/backend/crew_ai/optimization.py | head -10
echo ""
echo "=== Issue 2: Check learn_from_execution parameter name ==="
grep -n "def learn_from_execution" src/backend/services/workflow_service.py -A 2
echo ""
echo "=== Check if natural_language_query is used ==="
sed -n '590,630p' src/backend/services/workflow_service.py | grep -n "natural_language_query\|user_query"Length of output: 340
🏁 Script executed:
#!/bin/bash
# Verify the actual test calls to learn_from_execution
grep -rn "learn_from_execution" src/backend/services/workflow_service.py | catLength of output: 183
🏁 Script executed:
#!/bin/bash
# Dump relevant portions to analyze
echo "=== Lines 489-520 (stream_execute_only) ==="
sed -n '489,520p' src/backend/services/workflow_service.py
echo ""
echo "=== Lines 593-621 (stream_generate_and_run) ==="
sed -n '593,621p' src/backend/services/workflow_service.pyLength of output: 3988
🏁 Script executed:
#!/bin/bash
# Check QueryPatternMatcher definition and usage of chroma_store
python3 << 'EOF'
import re
# Read the optimization.py file
with open('src/backend/crew_ai/optimization.py', 'r') as f:
content = f.read()
# Find QueryPatternMatcher class and check for chroma_store references
match = re.search(r'class QueryPatternMatcher.*?(?=class |\Z)', content, re.DOTALL)
if match:
cls_content = match.group(0)
# Find __init__ method
init_match = re.search(r'def __init__\(self[^)]*\):', cls_content)
if init_match:
print("QueryPatternMatcher.__init__ signature:")
print(init_match.group(0))
# Check if chroma_store is referenced
if 'chroma_store' in cls_content:
print("\nchroma_store references found in QueryPatternMatcher")
for i, line in enumerate(cls_content.split('\n')[:100]):
if 'chroma_store' in line:
print(f" Line: {line.strip()}")
else:
print("\nNo chroma_store references in QueryPatternMatcher")
# Find SmartKeywordProvider class
match = re.search(r'class SmartKeywordProvider.*?(?=class |\Z)', content, re.DOTALL)
if match:
cls_content = match.group(0)
# Find learn_from_execution method
method_match = re.search(r'def learn_from_execution\(self[^)]*\):', cls_content)
if method_match:
print("\nSmartKeywordProvider.learn_from_execution signature:")
print(method_match.group(0))
EOFLength of output: 166
Two critical bugs confirmed: missing chroma_store parameter and undefined variable
Both issues identified in the review are real and will break functionality:
-
Lines 505 and 608:
QueryPatternMatcherinstantiated withoutchroma_storeparameter- QueryPatternMatcher.init accepts optional
chroma_store(src/backend/crew_ai/optimization/pattern_learning.py:28) - When None,
pattern_collectionis set to None and learning is disabled (lines 46–50) - Correct pattern in crew.py (line 126) passes
chroma_store=vector_store
- QueryPatternMatcher.init accepts optional
-
Line 616:
stream_generate_and_runcallslearn_from_execution(natural_language_query, ...)but the parameter isuser_query- Function signature (line 527):
async def stream_generate_and_run(user_query: str, ...) natural_language_queryis undefined in this scope; causes NameError at runtime- Line 513 in
stream_execute_onlycorrectly usesuser_query
- Function signature (line 527):
@@ src/backend/services/workflow_service.py:505
- pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH)
+ pattern_matcher = QueryPatternMatcher(
+ db_path=settings.OPTIMIZATION_PATTERN_DB_PATH,
+ chroma_store=chroma_store
+ )
@@ src/backend/services/workflow_service.py:608
- pattern_matcher = QueryPatternMatcher(db_path=settings.OPTIMIZATION_PATTERN_DB_PATH)
+ pattern_matcher = QueryPatternMatcher(
+ db_path=settings.OPTIMIZATION_PATTERN_DB_PATH,
+ chroma_store=chroma_store
+ )
@@ src/backend/services/workflow_service.py:616
- smart_provider.learn_from_execution(natural_language_query, robot_code)
+ smart_provider.learn_from_execution(user_query, robot_code)Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.4)
515-515: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In src/backend/services/workflow_service.py around lines 489-520 (and related
usages at ~505 and ~616), fix two issues: instantiate QueryPatternMatcher with
the chroma_store passed (e.g., QueryPatternMatcher(...,
chroma_store=chroma_store)) so pattern_collection is initialized and learning
enabled, and replace the undefined natural_language_query argument with the
correct user_query when calling learn_from_execution (use user_query
consistently). Ensure both changes follow existing variable names (chroma_store
or chroma_store variable created from KeywordVectorStore) and update any other
calls to QueryPatternMatcher or learn_from_execution in this file to the same
signature and variable name for consistency.



KeywordSearchToolfor semantic search over Robot Framework keywords.QueryPatternMatcherfor learning and predicting keyword usage patterns.SmartKeywordProviderto orchestrate keyword retrieval with a hybrid architecture.logging_config.py.RobotTasksto provide minimal keyword guidelines for planning phase.requirements.txtto include dependencies for ChromaDB and sentence-transformers.Summary by CodeRabbit
Release Notes
New Features
Configuration
Documentation
Metrics & Monitoring