Skip to content

Improve advisory engine robustness and standardize UTC handling#8

Open
Kunal241207 wants to merge 3 commits intoOWASP-BLT:mainfrom
Kunal241207:improve-advisory-robustness
Open

Improve advisory engine robustness and standardize UTC handling#8
Kunal241207 wants to merge 3 commits intoOWASP-BLT:mainfrom
Kunal241207:improve-advisory-robustness

Conversation

@Kunal241207
Copy link
Copy Markdown

@Kunal241207 Kunal241207 commented Feb 27, 2026

This PR improves the robustness and consistency of the advisory engine:

  • Deduplication: Prevents repeated advisories when multiple files match the same pattern.
  • Stable Feedback Matching: Uses a persistent pattern_key for feedback and learning instead of advisory titles.
  • Safe JSON Loading: Wraps config and learning data loading in try/except for resilience.
  • UTC Standardization: Ensures all timestamps are timezone-aware and consistent across modules.
  • Backward Compatibility: record_feedback supports both pattern_key (preferred) and advice_title (legacy).
  • Cleanup: Removes unused imports and minor inconsistencies.

Summary by CodeRabbit

  • New Features

    • Automatic fallback to default security patterns when pattern loading fails.
  • Improvements

    • All timestamps now use UTC for consistent activity and feedback times.
    • Advice results deduplicated to reduce repeated recommendations.
    • Feedback CLI updated to use a pattern-based identifier (e.g., --pattern) for clearer, more consistent feedback association.

Copy link
Copy Markdown

@arnavkirti arnavkirti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good changes.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

Warning

Rate limit exceeded

@Kunal241207 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 50 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 104e5ef9-3a33-4bb3-a158-96cf81a73133

📥 Commits

Reviewing files that changed from the base of the PR and between 94aa0a1 and c2c3540.

📒 Files selected for processing (3)
  • src/advisory_engine/core.py
  • src/advisory_engine/dashboard.py
  • src/blt_preflight.py

Walkthrough

Added pattern_key field to SecurityAdvice dataclass, updated timezone handling to use timezone-aware datetimes throughout, made security pattern loading more robust with error handling, and updated feedback recording to use pattern_key as primary identifier while maintaining backward compatibility with advice_title.

Changes

Cohort / File(s) Summary
Core Advisory Engine Logic
src/advisory_engine/core.py
Added pattern_key field to SecurityAdvice; made pattern loading fault-tolerant with try/except; updated evaluate_context to deduplicate advice by (pattern_key, severity); refined advice matching and feedback recording to use pattern_key consistently; added timezone-aware UTC timestamps; updated record_feedback signature to accept pattern_key as primary parameter with advice_title retained for backward compatibility.
Dashboard Time Handling
src/advisory_engine/dashboard.py
Replaced deprecated datetime.utcnow() calls with timezone-aware datetime.now(timezone.utc) for generated timestamps and activity cutoff calculations.
CLI Feedback Recording
src/blt_preflight.py
Updated cmd_feedback to use pattern_key instead of advice_title when recording feedback; updated help/example text to reflect new pattern naming convention.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

quality: high

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the two main objectives: improving advisory engine robustness (deduplication, safe loading, stable feedback matching) and standardizing UTC timestamp handling across modules.
Docstring Coverage ✅ Passed Docstring coverage is 93.75% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/advisory_engine/dashboard.py (1)

70-74: ⚠️ Potential issue | 🔴 Critical

Normalize parsed feedback timestamps before comparing to the UTC cutoff.

recent_cutoff is timezone-aware, but datetime.fromisoformat() returns naive datetimes for timestamps without timezone info and for the "1970-01-01" fallback, causing a TypeError: can't compare offset-naive and offset-aware datetimes whenever older or missing timestamps are encountered.

🛠️ Proposed fix
-        recent_feedback = [
-            f for f in feedback 
-            if datetime.fromisoformat(f.get("timestamp", "1970-01-01")) > recent_cutoff
-        ]
+        recent_feedback = []
+        for f in feedback:
+            raw_timestamp = f.get("timestamp")
+            if not raw_timestamp:
+                continue
+            try:
+                parsed = datetime.fromisoformat(raw_timestamp)
+            except ValueError:
+                continue
+            if parsed.tzinfo is None:
+                parsed = parsed.replace(tzinfo=timezone.utc)
+            if parsed > recent_cutoff:
+                recent_feedback.append(f)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/advisory_engine/dashboard.py` around lines 70 - 74, The comparison fails
because recent_cutoff is timezone-aware but datetime.fromisoformat(...) can
return naive datetimes; in the recent_feedback comprehension (variables
recent_cutoff and recent_feedback, and the use of f.get("timestamp")), parse
each timestamp and normalize it to a timezone-aware UTC datetime before
comparing: after calling datetime.fromisoformat(timestamp_or_fallback) check
dt.tzinfo and if None set/replace tzinfo=timezone.utc (or parse the fallback as
a UTC-aware datetime) so all datetimes compared against recent_cutoff are
offset-aware.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/advisory_engine/core.py`:
- Around line 273-276: Normalize legacy feedback identifiers before matching and
persisting by mapping legacy titles to the new pattern key and ensuring a
non-None pattern value is used; update the matching logic around the list
comprehension that builds relevant_feedback (and the similar block at lines
300-305) to call a normalization helper (e.g., normalize_feedback_pattern) that
checks f.get("pattern"), falls back to historical fields like f.get("title") or
legacy strings such as "Security Advisory: Authentication", translates them to
advice.pattern_key, and returns a canonical pattern string (never None) before
comparing to advice.pattern_key; add the same normalization when persisting
feedback so stored rows use the canonical pattern.

In `@src/blt_preflight.py`:
- Around line 103-109: Reject and validate the provided pattern_key before
calling engine.record_feedback: check that args.pattern exactly matches a known
pattern key (use engine.record_feedback's supported keys or the configured
SecurityAdvice.pattern_key list / method that exposes valid keys), and if not,
print/raise a clear error listing available pattern keys and abort recording;
after accepting a valid key, continue to call
engine.record_feedback(pattern_key=args.pattern, ...). Also update advise
output/JSON paths (the advise command and any JSON serializer) to include the
canonical pattern_key alongside the advisory title so users can copy the exact
key for feedback and _refine_with_learning() will match.

---

Outside diff comments:
In `@src/advisory_engine/dashboard.py`:
- Around line 70-74: The comparison fails because recent_cutoff is
timezone-aware but datetime.fromisoformat(...) can return naive datetimes; in
the recent_feedback comprehension (variables recent_cutoff and recent_feedback,
and the use of f.get("timestamp")), parse each timestamp and normalize it to a
timezone-aware UTC datetime before comparing: after calling
datetime.fromisoformat(timestamp_or_fallback) check dt.tzinfo and if None
set/replace tzinfo=timezone.utc (or parse the fallback as a UTC-aware datetime)
so all datetimes compared against recent_cutoff are offset-aware.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 1bff5869-f06a-4fd4-ae9c-ff0bfd051cde

📥 Commits

Reviewing files that changed from the base of the PR and between 94aa0a1 and c2c3540.

📒 Files selected for processing (3)
  • src/advisory_engine/core.py
  • src/advisory_engine/dashboard.py
  • src/blt_preflight.py

@Kunal241207
Copy link
Copy Markdown
Author

@coderabbitai Thanks for the suggestions! Both normalization of legacy feedback identifiers and CLI validation for unknown pattern_key values are good improvements. To keep this PR focused on the current changes (pattern_key usage, deduplication, and timestamp updates), I’d prefer to address these in a follow-up PR.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

Tip

For best results, initiate chat on the files or code changes.

@Kunal241207 Sounds good! Keeping the PR focused is a solid approach. Would you like me to open a follow-up GitHub issue to track these two improvements so they don't get lost?

  1. Normalization of legacy feedback identifiers – ensuring advice_title values passed via the legacy path are normalized/validated before being stored.
  2. CLI validation for unknown pattern_key values – adding a check in cmd_feedback to warn or error when an unrecognized pattern_key is supplied.

Just say the word and I'll create the issue!

@Kunal241207
Copy link
Copy Markdown
Author

@coderabbitai for now, review the changes again with the current scope in mind.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

@Kunal241207 Sure! I'll re-review the PR with the current scope in mind — focusing on the pattern_key usage, deduplication logic, UTC timestamp standardization, safe JSON loading, backward compatibility, and cleanup changes.

✅ Actions performed

Full review triggered.

@Kunal241207 Kunal241207 requested a review from Nachiket-Roy March 6, 2026 14:48
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

@Kunal241207 Sure! I'll re-review the PR with the current scope in mind — focusing on the pattern_key usage, deduplication logic, UTC timestamp standardization, safe JSON loading, backward compatibility, and cleanup changes.

Scope reminder: Focus on pattern_key usage, deduplication logic, UTC timestamp standardization, safe JSON loading, backward compatibility, and cleanup. Defer normalization of legacy feedback identifiers and CLI validation for unknown pattern_key values to a follow-up PR.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/advisory_engine/dashboard.py (1)

70-74: ⚠️ Potential issue | 🟠 Major

Naive datetime fallback breaks comparison with timezone-aware cutoff.

When a feedback entry lacks a timestamp, the fallback "1970-01-01" produces a naive datetime, while recent_cutoff is timezone-aware. Comparing naive and aware datetimes raises a TypeError in Python.

🐛 Proposed fix to use a timezone-aware fallback
         recent_cutoff = datetime.now(timezone.utc) - timedelta(days=7)
         recent_feedback = [
             f for f in feedback 
-            if datetime.fromisoformat(f.get("timestamp", "1970-01-01")) > recent_cutoff
+            if datetime.fromisoformat(f.get("timestamp", "1970-01-01T00:00:00+00:00")) > recent_cutoff
         ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/advisory_engine/dashboard.py` around lines 70 - 74, The code filters
feedback using recent_cutoff (timezone-aware) but uses datetime.fromisoformat
with a naive fallback "1970-01-01", causing TypeError when comparing naive and
aware datetimes; update the fallback to a timezone-aware ISO string or ensure
timestamps parsed are made aware (e.g., use a UTC-aware fallback like
"1970-01-01T00:00:00+00:00" or attach timezone info after parsing) in the
recent_feedback comprehension that calls
datetime.fromisoformat(f.get("timestamp", "1970-01-01")) so all parsed datetimes
are timezone-aware before comparing to recent_cutoff.
♻️ Duplicate comments (1)
src/advisory_engine/core.py (1)

300-308: ⚠️ Potential issue | 🟡 Minor

Guard against storing None as pattern key.

If both pattern_key and advice_title are None, key will be None and stored in feedback data. This creates orphaned records that won't match any advice in _refine_with_learning.

🛡️ Proposed fix to validate key
     def record_feedback(self, pattern_key: str = None, helpful: bool = False, comments: str = "", advice_title: str = None) -> None:
         """Record feedback on advice for learning loop. Accepts pattern_key (preferred) or advice_title (legacy)."""
         # Backward compatibility: accept advice_title as alias for pattern_key
         key = pattern_key if pattern_key is not None else advice_title
+        if key is None:
+            logger.warning("record_feedback called without pattern_key or advice_title; skipping")
+            return
         feedback_data = {
             "pattern": key,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/advisory_engine/core.py` around lines 300 - 308, The record_feedback
function currently allows a None pattern key (key derived from pattern_key or
advice_title) which would store "pattern": None; update record_feedback to
validate the resolved key (the local variable key) before building
feedback_data: if key is None or empty, raise a ValueError (or return early)
with a clear message; ensure any callers expect this change. Reference the
record_feedback method and ensure consistency with _refine_with_learning which
expects non-None pattern keys.
🧹 Nitpick comments (4)
src/advisory_engine/core.py (2)

47-52: Consider narrowing exception types.

Catching broad Exception works for resilience but can mask unexpected errors. Narrowing to (json.JSONDecodeError, OSError) would catch file/parsing issues while letting programming errors propagate.

♻️ Proposed narrower exception handling
             try:
                 with open(self.config_path, 'r') as f:
                     return json.load(f)
-            except Exception as e:
+            except (json.JSONDecodeError, OSError) as e:
                 logger.warning(f"Failed to load security patterns from {self.config_path}: {e}")

The same applies to _load_learning_data at lines 103-107.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/advisory_engine/core.py` around lines 47 - 52, Narrow the broad except
Exception in the config loading and learning-data loading paths to only handle
file/parse errors: catch json.JSONDecodeError and OSError (or IOError) instead
of Exception around the block that opens and json.load(self.config_path) (the
code referencing self.config_path and calling _get_default_patterns()), and do
the same in the _load_learning_data method so programming errors still propagate
while file/JSON issues are handled gracefully.

300-301: Use explicit Optional or union type for nullable parameters.

Per PEP 484, pattern_key: str = None implicitly allows None but the type hint says str. Use str | None (Python 3.10+) or Optional[str] for clarity.

♻️ Proposed fix
-    def record_feedback(self, pattern_key: str = None, helpful: bool = False, comments: str = "", advice_title: str = None) -> None:
+    def record_feedback(self, pattern_key: str | None = None, helpful: bool = False, comments: str = "", advice_title: str | None = None) -> None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/advisory_engine/core.py` around lines 300 - 301, The function signature
for record_feedback uses parameters like pattern_key: str = None and
advice_title: str = None which declare nullable defaults but keep the type as
plain str; update these annotations to explicit nullable types (e.g.,
pattern_key: Optional[str] or pattern_key: str | None and advice_title:
Optional[str] or advice_title: str | None depending on project Python/version)
and add the necessary typing import (from typing import Optional) if using
Optional, ensuring the signature and any docstrings remain consistent; adjust
any callers or type checks if needed to satisfy static type checkers.
src/advisory_engine/dashboard.py (1)

19-24: Add safe JSON loading for consistency with core.py.

The PR objective mentions wrapping config and learning data loading in try/except for resilience, but this method lacks error handling. If learning_data.json contains malformed JSON, this will raise an unhandled exception.

♻️ Proposed fix to add error handling
     def _load_learning_data(self) -> Dict:
         """Load learning data."""
         if os.path.exists(self.learning_data_path):
-            with open(self.learning_data_path, 'r') as f:
-                return json.load(f)
+            try:
+                with open(self.learning_data_path, 'r') as f:
+                    return json.load(f)
+            except (json.JSONDecodeError, OSError) as e:
+                import logging
+                logging.getLogger(__name__).warning(
+                    f"Failed to load learning data from {self.learning_data_path}: {e}"
+                )
         return {"patterns": [], "feedback": [], "intents": []}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/advisory_engine/dashboard.py` around lines 19 - 24, The
_load_learning_data method should wrap the file read + json.load in a try/except
like core.py to handle malformed JSON and I/O errors: modify _load_learning_data
to catch exceptions around opening/reading/parsing self.learning_data_path, log
the error (including exception details) using the component logger, and return
the default {"patterns": [], "feedback": [], "intents": []} on failure; keep the
existing os.path.exists check but ensure any json.JSONDecodeError or OSError
doesn't propagate.
src/blt_preflight.py (1)

54-67: Include pattern_key in JSON output for feedback usability.

The JSON output omits pattern_key, making it harder for users to know what value to pass to the feedback command. Including it would improve the feedback workflow.

♻️ Proposed fix to include pattern_key
         json_output = {
             'advice': [
                 {
                     'severity': a.severity,
                     'title': a.title,
                     'message': a.message,
                     'recommendations': a.recommendations,
                     'documentation_links': a.documentation_links,
-                    'timestamp': a.timestamp
+                    'timestamp': a.timestamp,
+                    'pattern_key': a.pattern_key
                 }
                 for a in advice_list
             ]
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/blt_preflight.py` around lines 54 - 67, The JSON builder for args.json
currently serializes fields from each advice object in the list but omits
pattern_key; update the construction of json_output (the comprehension that
iterates over advice_list) to include 'pattern_key': a.pattern_key for each item
so the exported JSON contains the pattern_key users need for the feedback
command; ensure you reference the existing keys (severity, title, message,
recommendations, documentation_links, timestamp) and add pattern_key alongside
them in the same dict produced in that comprehension.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/advisory_engine/dashboard.py`:
- Around line 70-74: The code filters feedback using recent_cutoff
(timezone-aware) but uses datetime.fromisoformat with a naive fallback
"1970-01-01", causing TypeError when comparing naive and aware datetimes; update
the fallback to a timezone-aware ISO string or ensure timestamps parsed are made
aware (e.g., use a UTC-aware fallback like "1970-01-01T00:00:00+00:00" or attach
timezone info after parsing) in the recent_feedback comprehension that calls
datetime.fromisoformat(f.get("timestamp", "1970-01-01")) so all parsed datetimes
are timezone-aware before comparing to recent_cutoff.

---

Duplicate comments:
In `@src/advisory_engine/core.py`:
- Around line 300-308: The record_feedback function currently allows a None
pattern key (key derived from pattern_key or advice_title) which would store
"pattern": None; update record_feedback to validate the resolved key (the local
variable key) before building feedback_data: if key is None or empty, raise a
ValueError (or return early) with a clear message; ensure any callers expect
this change. Reference the record_feedback method and ensure consistency with
_refine_with_learning which expects non-None pattern keys.

---

Nitpick comments:
In `@src/advisory_engine/core.py`:
- Around line 47-52: Narrow the broad except Exception in the config loading and
learning-data loading paths to only handle file/parse errors: catch
json.JSONDecodeError and OSError (or IOError) instead of Exception around the
block that opens and json.load(self.config_path) (the code referencing
self.config_path and calling _get_default_patterns()), and do the same in the
_load_learning_data method so programming errors still propagate while file/JSON
issues are handled gracefully.
- Around line 300-301: The function signature for record_feedback uses
parameters like pattern_key: str = None and advice_title: str = None which
declare nullable defaults but keep the type as plain str; update these
annotations to explicit nullable types (e.g., pattern_key: Optional[str] or
pattern_key: str | None and advice_title: Optional[str] or advice_title: str |
None depending on project Python/version) and add the necessary typing import
(from typing import Optional) if using Optional, ensuring the signature and any
docstrings remain consistent; adjust any callers or type checks if needed to
satisfy static type checkers.

In `@src/advisory_engine/dashboard.py`:
- Around line 19-24: The _load_learning_data method should wrap the file read +
json.load in a try/except like core.py to handle malformed JSON and I/O errors:
modify _load_learning_data to catch exceptions around opening/reading/parsing
self.learning_data_path, log the error (including exception details) using the
component logger, and return the default {"patterns": [], "feedback": [],
"intents": []} on failure; keep the existing os.path.exists check but ensure any
json.JSONDecodeError or OSError doesn't propagate.

In `@src/blt_preflight.py`:
- Around line 54-67: The JSON builder for args.json currently serializes fields
from each advice object in the list but omits pattern_key; update the
construction of json_output (the comprehension that iterates over advice_list)
to include 'pattern_key': a.pattern_key for each item so the exported JSON
contains the pattern_key users need for the feedback command; ensure you
reference the existing keys (severity, title, message, recommendations,
documentation_links, timestamp) and add pattern_key alongside them in the same
dict produced in that comprehension.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 4ed11015-a261-43ec-a922-0d1f97062e5d

📥 Commits

Reviewing files that changed from the base of the PR and between 94aa0a1 and c2c3540.

📒 Files selected for processing (3)
  • src/advisory_engine/core.py
  • src/advisory_engine/dashboard.py
  • src/blt_preflight.py

@Kunal241207
Copy link
Copy Markdown
Author

@DonnieBLT could you review this please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants