Improve advisory engine robustness and standardize UTC handling#8
Improve advisory engine robustness and standardize UTC handling#8Kunal241207 wants to merge 3 commits intoOWASP-BLT:mainfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⚙️ Run configurationConfiguration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughAdded Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/advisory_engine/dashboard.py (1)
70-74:⚠️ Potential issue | 🔴 CriticalNormalize parsed feedback timestamps before comparing to the UTC cutoff.
recent_cutoffis timezone-aware, butdatetime.fromisoformat()returns naive datetimes for timestamps without timezone info and for the"1970-01-01"fallback, causing aTypeError: can't compare offset-naive and offset-aware datetimeswhenever older or missing timestamps are encountered.🛠️ Proposed fix
- recent_feedback = [ - f for f in feedback - if datetime.fromisoformat(f.get("timestamp", "1970-01-01")) > recent_cutoff - ] + recent_feedback = [] + for f in feedback: + raw_timestamp = f.get("timestamp") + if not raw_timestamp: + continue + try: + parsed = datetime.fromisoformat(raw_timestamp) + except ValueError: + continue + if parsed.tzinfo is None: + parsed = parsed.replace(tzinfo=timezone.utc) + if parsed > recent_cutoff: + recent_feedback.append(f)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/advisory_engine/dashboard.py` around lines 70 - 74, The comparison fails because recent_cutoff is timezone-aware but datetime.fromisoformat(...) can return naive datetimes; in the recent_feedback comprehension (variables recent_cutoff and recent_feedback, and the use of f.get("timestamp")), parse each timestamp and normalize it to a timezone-aware UTC datetime before comparing: after calling datetime.fromisoformat(timestamp_or_fallback) check dt.tzinfo and if None set/replace tzinfo=timezone.utc (or parse the fallback as a UTC-aware datetime) so all datetimes compared against recent_cutoff are offset-aware.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/advisory_engine/core.py`:
- Around line 273-276: Normalize legacy feedback identifiers before matching and
persisting by mapping legacy titles to the new pattern key and ensuring a
non-None pattern value is used; update the matching logic around the list
comprehension that builds relevant_feedback (and the similar block at lines
300-305) to call a normalization helper (e.g., normalize_feedback_pattern) that
checks f.get("pattern"), falls back to historical fields like f.get("title") or
legacy strings such as "Security Advisory: Authentication", translates them to
advice.pattern_key, and returns a canonical pattern string (never None) before
comparing to advice.pattern_key; add the same normalization when persisting
feedback so stored rows use the canonical pattern.
In `@src/blt_preflight.py`:
- Around line 103-109: Reject and validate the provided pattern_key before
calling engine.record_feedback: check that args.pattern exactly matches a known
pattern key (use engine.record_feedback's supported keys or the configured
SecurityAdvice.pattern_key list / method that exposes valid keys), and if not,
print/raise a clear error listing available pattern keys and abort recording;
after accepting a valid key, continue to call
engine.record_feedback(pattern_key=args.pattern, ...). Also update advise
output/JSON paths (the advise command and any JSON serializer) to include the
canonical pattern_key alongside the advisory title so users can copy the exact
key for feedback and _refine_with_learning() will match.
---
Outside diff comments:
In `@src/advisory_engine/dashboard.py`:
- Around line 70-74: The comparison fails because recent_cutoff is
timezone-aware but datetime.fromisoformat(...) can return naive datetimes; in
the recent_feedback comprehension (variables recent_cutoff and recent_feedback,
and the use of f.get("timestamp")), parse each timestamp and normalize it to a
timezone-aware UTC datetime before comparing: after calling
datetime.fromisoformat(timestamp_or_fallback) check dt.tzinfo and if None
set/replace tzinfo=timezone.utc (or parse the fallback as a UTC-aware datetime)
so all datetimes compared against recent_cutoff are offset-aware.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 1bff5869-f06a-4fd4-ae9c-ff0bfd051cde
📒 Files selected for processing (3)
src/advisory_engine/core.pysrc/advisory_engine/dashboard.pysrc/blt_preflight.py
|
@coderabbitai Thanks for the suggestions! Both normalization of legacy feedback identifiers and CLI validation for unknown pattern_key values are good improvements. To keep this PR focused on the current changes (pattern_key usage, deduplication, and timestamp updates), I’d prefer to address these in a follow-up PR. |
|
Tip For best results, initiate chat on the files or code changes.
Just say the word and I'll create the issue! |
|
@coderabbitai for now, review the changes again with the current scope in mind. |
|
✅ Actions performedFull review triggered. |
|
Scope reminder: Focus on ✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/advisory_engine/dashboard.py (1)
70-74:⚠️ Potential issue | 🟠 MajorNaive datetime fallback breaks comparison with timezone-aware cutoff.
When a feedback entry lacks a timestamp, the fallback
"1970-01-01"produces a naive datetime, whilerecent_cutoffis timezone-aware. Comparing naive and aware datetimes raises aTypeErrorin Python.🐛 Proposed fix to use a timezone-aware fallback
recent_cutoff = datetime.now(timezone.utc) - timedelta(days=7) recent_feedback = [ f for f in feedback - if datetime.fromisoformat(f.get("timestamp", "1970-01-01")) > recent_cutoff + if datetime.fromisoformat(f.get("timestamp", "1970-01-01T00:00:00+00:00")) > recent_cutoff ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/advisory_engine/dashboard.py` around lines 70 - 74, The code filters feedback using recent_cutoff (timezone-aware) but uses datetime.fromisoformat with a naive fallback "1970-01-01", causing TypeError when comparing naive and aware datetimes; update the fallback to a timezone-aware ISO string or ensure timestamps parsed are made aware (e.g., use a UTC-aware fallback like "1970-01-01T00:00:00+00:00" or attach timezone info after parsing) in the recent_feedback comprehension that calls datetime.fromisoformat(f.get("timestamp", "1970-01-01")) so all parsed datetimes are timezone-aware before comparing to recent_cutoff.
♻️ Duplicate comments (1)
src/advisory_engine/core.py (1)
300-308:⚠️ Potential issue | 🟡 MinorGuard against storing
Noneas pattern key.If both
pattern_keyandadvice_titleareNone,keywill beNoneand stored in feedback data. This creates orphaned records that won't match any advice in_refine_with_learning.🛡️ Proposed fix to validate key
def record_feedback(self, pattern_key: str = None, helpful: bool = False, comments: str = "", advice_title: str = None) -> None: """Record feedback on advice for learning loop. Accepts pattern_key (preferred) or advice_title (legacy).""" # Backward compatibility: accept advice_title as alias for pattern_key key = pattern_key if pattern_key is not None else advice_title + if key is None: + logger.warning("record_feedback called without pattern_key or advice_title; skipping") + return feedback_data = { "pattern": key,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/advisory_engine/core.py` around lines 300 - 308, The record_feedback function currently allows a None pattern key (key derived from pattern_key or advice_title) which would store "pattern": None; update record_feedback to validate the resolved key (the local variable key) before building feedback_data: if key is None or empty, raise a ValueError (or return early) with a clear message; ensure any callers expect this change. Reference the record_feedback method and ensure consistency with _refine_with_learning which expects non-None pattern keys.
🧹 Nitpick comments (4)
src/advisory_engine/core.py (2)
47-52: Consider narrowing exception types.Catching broad
Exceptionworks for resilience but can mask unexpected errors. Narrowing to(json.JSONDecodeError, OSError)would catch file/parsing issues while letting programming errors propagate.♻️ Proposed narrower exception handling
try: with open(self.config_path, 'r') as f: return json.load(f) - except Exception as e: + except (json.JSONDecodeError, OSError) as e: logger.warning(f"Failed to load security patterns from {self.config_path}: {e}")The same applies to
_load_learning_dataat lines 103-107.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/advisory_engine/core.py` around lines 47 - 52, Narrow the broad except Exception in the config loading and learning-data loading paths to only handle file/parse errors: catch json.JSONDecodeError and OSError (or IOError) instead of Exception around the block that opens and json.load(self.config_path) (the code referencing self.config_path and calling _get_default_patterns()), and do the same in the _load_learning_data method so programming errors still propagate while file/JSON issues are handled gracefully.
300-301: Use explicitOptionalor union type for nullable parameters.Per PEP 484,
pattern_key: str = Noneimplicitly allowsNonebut the type hint saysstr. Usestr | None(Python 3.10+) orOptional[str]for clarity.♻️ Proposed fix
- def record_feedback(self, pattern_key: str = None, helpful: bool = False, comments: str = "", advice_title: str = None) -> None: + def record_feedback(self, pattern_key: str | None = None, helpful: bool = False, comments: str = "", advice_title: str | None = None) -> None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/advisory_engine/core.py` around lines 300 - 301, The function signature for record_feedback uses parameters like pattern_key: str = None and advice_title: str = None which declare nullable defaults but keep the type as plain str; update these annotations to explicit nullable types (e.g., pattern_key: Optional[str] or pattern_key: str | None and advice_title: Optional[str] or advice_title: str | None depending on project Python/version) and add the necessary typing import (from typing import Optional) if using Optional, ensuring the signature and any docstrings remain consistent; adjust any callers or type checks if needed to satisfy static type checkers.src/advisory_engine/dashboard.py (1)
19-24: Add safe JSON loading for consistency with core.py.The PR objective mentions wrapping config and learning data loading in try/except for resilience, but this method lacks error handling. If
learning_data.jsoncontains malformed JSON, this will raise an unhandled exception.♻️ Proposed fix to add error handling
def _load_learning_data(self) -> Dict: """Load learning data.""" if os.path.exists(self.learning_data_path): - with open(self.learning_data_path, 'r') as f: - return json.load(f) + try: + with open(self.learning_data_path, 'r') as f: + return json.load(f) + except (json.JSONDecodeError, OSError) as e: + import logging + logging.getLogger(__name__).warning( + f"Failed to load learning data from {self.learning_data_path}: {e}" + ) return {"patterns": [], "feedback": [], "intents": []}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/advisory_engine/dashboard.py` around lines 19 - 24, The _load_learning_data method should wrap the file read + json.load in a try/except like core.py to handle malformed JSON and I/O errors: modify _load_learning_data to catch exceptions around opening/reading/parsing self.learning_data_path, log the error (including exception details) using the component logger, and return the default {"patterns": [], "feedback": [], "intents": []} on failure; keep the existing os.path.exists check but ensure any json.JSONDecodeError or OSError doesn't propagate.src/blt_preflight.py (1)
54-67: Includepattern_keyin JSON output for feedback usability.The JSON output omits
pattern_key, making it harder for users to know what value to pass to thefeedbackcommand. Including it would improve the feedback workflow.♻️ Proposed fix to include pattern_key
json_output = { 'advice': [ { 'severity': a.severity, 'title': a.title, 'message': a.message, 'recommendations': a.recommendations, 'documentation_links': a.documentation_links, - 'timestamp': a.timestamp + 'timestamp': a.timestamp, + 'pattern_key': a.pattern_key } for a in advice_list ] }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/blt_preflight.py` around lines 54 - 67, The JSON builder for args.json currently serializes fields from each advice object in the list but omits pattern_key; update the construction of json_output (the comprehension that iterates over advice_list) to include 'pattern_key': a.pattern_key for each item so the exported JSON contains the pattern_key users need for the feedback command; ensure you reference the existing keys (severity, title, message, recommendations, documentation_links, timestamp) and add pattern_key alongside them in the same dict produced in that comprehension.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/advisory_engine/dashboard.py`:
- Around line 70-74: The code filters feedback using recent_cutoff
(timezone-aware) but uses datetime.fromisoformat with a naive fallback
"1970-01-01", causing TypeError when comparing naive and aware datetimes; update
the fallback to a timezone-aware ISO string or ensure timestamps parsed are made
aware (e.g., use a UTC-aware fallback like "1970-01-01T00:00:00+00:00" or attach
timezone info after parsing) in the recent_feedback comprehension that calls
datetime.fromisoformat(f.get("timestamp", "1970-01-01")) so all parsed datetimes
are timezone-aware before comparing to recent_cutoff.
---
Duplicate comments:
In `@src/advisory_engine/core.py`:
- Around line 300-308: The record_feedback function currently allows a None
pattern key (key derived from pattern_key or advice_title) which would store
"pattern": None; update record_feedback to validate the resolved key (the local
variable key) before building feedback_data: if key is None or empty, raise a
ValueError (or return early) with a clear message; ensure any callers expect
this change. Reference the record_feedback method and ensure consistency with
_refine_with_learning which expects non-None pattern keys.
---
Nitpick comments:
In `@src/advisory_engine/core.py`:
- Around line 47-52: Narrow the broad except Exception in the config loading and
learning-data loading paths to only handle file/parse errors: catch
json.JSONDecodeError and OSError (or IOError) instead of Exception around the
block that opens and json.load(self.config_path) (the code referencing
self.config_path and calling _get_default_patterns()), and do the same in the
_load_learning_data method so programming errors still propagate while file/JSON
issues are handled gracefully.
- Around line 300-301: The function signature for record_feedback uses
parameters like pattern_key: str = None and advice_title: str = None which
declare nullable defaults but keep the type as plain str; update these
annotations to explicit nullable types (e.g., pattern_key: Optional[str] or
pattern_key: str | None and advice_title: Optional[str] or advice_title: str |
None depending on project Python/version) and add the necessary typing import
(from typing import Optional) if using Optional, ensuring the signature and any
docstrings remain consistent; adjust any callers or type checks if needed to
satisfy static type checkers.
In `@src/advisory_engine/dashboard.py`:
- Around line 19-24: The _load_learning_data method should wrap the file read +
json.load in a try/except like core.py to handle malformed JSON and I/O errors:
modify _load_learning_data to catch exceptions around opening/reading/parsing
self.learning_data_path, log the error (including exception details) using the
component logger, and return the default {"patterns": [], "feedback": [],
"intents": []} on failure; keep the existing os.path.exists check but ensure any
json.JSONDecodeError or OSError doesn't propagate.
In `@src/blt_preflight.py`:
- Around line 54-67: The JSON builder for args.json currently serializes fields
from each advice object in the list but omits pattern_key; update the
construction of json_output (the comprehension that iterates over advice_list)
to include 'pattern_key': a.pattern_key for each item so the exported JSON
contains the pattern_key users need for the feedback command; ensure you
reference the existing keys (severity, title, message, recommendations,
documentation_links, timestamp) and add pattern_key alongside them in the same
dict produced in that comprehension.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 4ed11015-a261-43ec-a922-0d1f97062e5d
📒 Files selected for processing (3)
src/advisory_engine/core.pysrc/advisory_engine/dashboard.pysrc/blt_preflight.py
|
@DonnieBLT could you review this please? |
This PR improves the robustness and consistency of the advisory engine:
Summary by CodeRabbit
New Features
Improvements