Skip to content

port leaderboard tracking from BLT-Pool#50

Open
BharathC0 wants to merge 3 commits intoOWASP-BLT:mainfrom
BharathC0:feat/port-leaderboard-tracking-from-blt-pool
Open

port leaderboard tracking from BLT-Pool#50
BharathC0 wants to merge 3 commits intoOWASP-BLT:mainfrom
BharathC0:feat/port-leaderboard-tracking-from-blt-pool

Conversation

@BharathC0
Copy link
Copy Markdown

@BharathC0 BharathC0 commented Apr 4, 2026

Ports the D1-backed leaderboard scoring, tracking, backfill, and comment-formatting logic out of BLT-Pool into BLT-Rewards as part of the architectural split described in OWASP-BLT/BLT-Pool#83.

src/leaderboard.py contains:

  • D1 schema management (ensure_leaderboard_schema)
  • Per-event tracking: track_pr_opened, track_pr_closed, track_pr_reopened, track_comment, track_review
  • Monthly stat helpers: inc_open_pr, inc_monthly
  • Full backfill: get_backfill_state, set_backfill_state, run_incremental_backfill, backfill_repo_month_if_needed
  • Leaderboard calculation: calculate_stats_from_d1, calculate_leaderboard_stats, reset_leaderboard_month
  • Comment formatting: format_leaderboard_comment, format_reviewer_leaderboard_comment, post_or_update_leaderboard

Summary by CodeRabbit

  • New Features
    • Added monthly contribution leaderboards tracking merged PRs, closed PRs, reviews, and comments per contributor.
    • Introduced reviewer leaderboards with ranking, medals, and current PR reviewer highlights.
    • Leaderboards now post as GitHub comments with personalized author highlighting, centered views, and scoring breakdowns.

Ports the D1-backed leaderboard scoring, tracking, backfill, and
comment-formatting logic out of BLT-Pool into BLT-Rewards as part
of the architectural split described in OWASP-BLT/BLT-Pool#83.

src/leaderboard.py contains:
- D1 schema management (ensure_leaderboard_schema)
- Per-event tracking: track_pr_opened, track_pr_closed,
  track_pr_reopened, track_comment, track_review
- Monthly stat helpers: inc_open_pr, inc_monthly
- Full backfill: get_backfill_state, set_backfill_state,
  run_incremental_backfill, backfill_repo_month_if_needed
- Leaderboard calculation: calculate_stats_from_d1,
  calculate_leaderboard_stats, reset_leaderboard_month
- Comment formatting: format_leaderboard_comment,
  format_reviewer_leaderboard_comment, post_or_update_leaderboard
@owasp-blt
Copy link
Copy Markdown

owasp-blt bot commented Apr 4, 2026

📊 Monthly Leaderboard

Hi @BharathC0! Here's how you rank for April 2026:

Rank User Open PRs PRs (merged) PRs (closed) Reviews Comments Total
8 e-esakman @e-esakman 1 0 0 10 0 51
9 BharathC0 @BharathC0 6 1 2 0 15 42
10 tejeshvenkat @tejeshvenkat 15 0 0 3 0 30

Scoring this month (across OWASP-BLT org): Open PRs (+1 each), Merged PRs (+10), Closed (not merged) (−2), Reviews (+5; first two per PR in-month), Comments (+2, excludes CodeRabbit). Run /leaderboard on any issue or PR to see your rank!

@owasp-blt
Copy link
Copy Markdown

owasp-blt bot commented Apr 4, 2026

👋 Hi @BharathC0!

This pull request needs a peer review before it can be merged. Please request a review from a team member who is not:

  • The PR author
  • coderabbitai
  • copilot

Once a valid peer review is submitted, this check will pass automatically. Thank you!

⚠️ Peer review enforcement is active.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

Walkthrough

A new src/leaderboard.py module implements a monthly GitHub-style leaderboard backed by Cloudflare D1. It adds UTC month utilities, GitHub timestamp parsing, D1 helpers, schema bootstrap, PR/comment/review event trackers, backfill state, aggregation, and markdown formatting for leaderboard comments.

Changes

Cohort / File(s) Summary
New Leaderboard Module
src/leaderboard.py
Adds a comprehensive monthly leaderboard implementation: UTC month utilities (month_key, month_window), GitHub timestamp/avatar helpers, D1 binding and query wrappers, schema initialization (ensure_leaderboard_schema), event handlers (track_pr_opened, track_pr_closed, track_pr_reopened, track_comment, track_review), atomic review-crediting, open-PR counters, backfill cursor persistence, aggregation (calculate_stats_from_d1), reset helpers, and markdown formatters for author and reviewer leaderboard comments. Includes helpers for D1 column checks and admin operations. Highlights: bot/user filtering, per-PR reviewer credit limits, and incremental backfill state.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant GitHub as GitHub (webhook)
participant Worker as Cloudflare Worker (leaderboard.py)
participant D1 as Cloudflare D1
participant GHAPI as GitHub API (comments)
GitHub->>Worker: PR/comment/review event
Worker->>Worker: parse payload, is_bot checks, determine month_key
Worker->>D1: ensure schema / bind DB
Worker->>D1: upsert PR state / inc open PRs / inc monthly counters / insert review credit
D1-->>Worker: success/rows affected
Worker->>GHAPI: post formatted leaderboard comment (if invoked)
GHAPI-->>Worker: comment posted

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

quality: high

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'port leaderboard tracking from BLT-Pool' accurately describes the main change: introducing new leaderboard tracking functionality ported from another project.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
src/leaderboard.py (5)

334-387: Unused import: console is imported but never used.

Line 336 imports from js import console but there are no console.log or console.error calls in track_pr_closed. Same issue in track_pr_reopened at line 391.

♻️ Remove unused imports
 async def track_pr_closed(payload: dict, env, is_bot_fn, d1_binding_fn) -> None:
     """Record a PR-closed event in D1 and update merged/closed monthly counters."""
-    from js import console
     db = d1_binding_fn(env)
 async def track_pr_reopened(payload: dict, env, is_bot_fn, d1_binding_fn) -> None:
     """Reverse any closed-state counters when a PR is reopened."""
-    from js import console
     db = d1_binding_fn(env)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/leaderboard.py` around lines 334 - 387, Remove the unused JavaScript
console import inside the affected functions: delete the "from js import
console" line found in track_pr_closed and the similar import in
track_pr_reopened in src/leaderboard.py; leaving no other changes to those
functions (or, if you actually intend to log, replace the import with a used
logging call) so there are no unused imports referenced by the functions
track_pr_closed and track_pr_reopened.

616-659: Static analysis false positive: table names are hardcoded literals.

The SQL injection warning (S608) on line 627 is a false positive since table iterates over hardcoded table names. For additional safety and to silence the linter, you could use the same _VALID_TABLES whitelist suggested earlier:

♻️ Optional: Add assertion for defense-in-depth
     for table, params in [
         ("leaderboard_monthly_stats", (org, mk)),
         ...
     ]:
+        assert table in _VALID_TABLES, f"Invalid table: {table}"
         try:
             await d1_run(db, f"DELETE FROM {table} WHERE org = ? AND month_key = ?", params)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/leaderboard.py` around lines 616 - 659, The loop in
reset_leaderboard_month builds SQL with an interpolated table name which tripped
an S608 linter; ensure table is validated against a whitelist (e.g.
_VALID_TABLES) before using it in the f-string: check that table in
_VALID_TABLES (or raise/skip and record an error in deleted) and only then call
d1_run with f"DELETE FROM {table} ..."; if _VALID_TABLES is not defined, add it
as a module-level tuple/list of the expected literal table names used here to
make the intent explicit.

481-510: The existence check before INSERT creates a TOCTOU window, but the atomic INSERT guards correctness.

The check at lines 481-490 followed by the INSERT at 491-503 has a race window—however, since the INSERT itself contains WHERE ... COUNT(*) < 2, the 2-reviewer limit is still enforced atomically. The pre-check is just an optimization to skip the INSERT for known duplicates.

Consider logging the exception when reading rowsWritten (lines 507-508) to aid debugging if the Workers runtime changes its API.

♻️ Log rather than silently pass
     try:
         rows_written = int(getattr(result, "rowsWritten", None) or 0)
-    except Exception:
-        pass
+    except Exception as e:
+        console.warn(f"[D1] Could not read rowsWritten: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/leaderboard.py` around lines 481 - 510, The silent except around reading
rowsWritten should log the error for visibility: in leaderboard.py, inside the
block that sets rows_written from getattr(result, "rowsWritten", None) (the
try/except around rows_written = int(...)), replace the bare except/pass with
logging of the exception (e.g., logger.exception or logging.exception) and
include contextual info (the result object,
org/repo/pr_number/month_key/reviewer_login if available) so you can debug if
the Workers runtime changes its API; keep the outer behavior the same (don’t
re-raise).

147-231: Schema creation is called on every event—consider caching.

ensure_leaderboard_schema is invoked from every tracking function (track_pr_opened, track_pr_closed, etc.). While CREATE TABLE IF NOT EXISTS is idempotent, repeatedly executing 6 DDL statements per webhook event adds unnecessary latency and database load.

♻️ Suggested approach: module-level flag
+_schema_ensured = False
+
 async def ensure_leaderboard_schema(db) -> None:
     """Create leaderboard tables if they do not exist."""
+    global _schema_ensured
+    if _schema_ensured:
+        return
     await d1_run(
         db,
         """
         CREATE TABLE IF NOT EXISTS leaderboard_monthly_stats (
         ...
     )
+    _schema_ensured = True

Note: In a Workers environment, the flag resets per isolate, but it still reduces redundant calls within a single request/invocation batch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/leaderboard.py` around lines 147 - 231, The ensure_leaderboard_schema
function is currently running 6 CREATE TABLE IF NOT EXISTS statements on every
event (called from track_pr_opened, track_pr_closed, etc.); add a module-level
boolean flag (e.g., _leaderboard_schema_ensured) that is checked at the top of
ensure_leaderboard_schema and set to True after the first successful run so
subsequent calls return immediately; keep the flag simple (Workers will reset
per isolate) and ensure it gates the existing d1_run calls inside
ensure_leaderboard_schema to avoid redundant DDL on every webhook.

91-123: Silent exception handling in d1_all makes debugging difficult.

The try-except-pass blocks (lines 104-105, 122-123) silently swallow errors. Consider logging unexpected failures to aid troubleshooting in production.

♻️ Suggested improvement
     try:
         from js import JSON as JS_JSON
         js_json = JS_JSON.stringify(raw_result)
         parsed = json.loads(str(js_json))
         rows = parsed.get("results") if isinstance(parsed, dict) else None
         if isinstance(rows, list):
             return rows
-    except Exception:
-        pass
+    except Exception as e:
+        from js import console
+        console.warn(f"[D1.all] JSON conversion fallback triggered: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/leaderboard.py` around lines 91 - 123, The function d1_all silently
swallows exceptions in two places (the JSON parsing try/except around raw_result
and the getattr/result handling try/except), which hinders debugging; modify
d1_all to import and use a logger (e.g., import logging; logger =
logging.getLogger(__name__)) and replace the bare "except Exception: pass"
blocks with logging.exception or logger.error calls that include context
(mentioning raw_result/result and the SQL or params if available) while
preserving the existing control flow and return behavior; update the handlers
around the JSON block (working with raw_result and js_json) and around the
getattr(result, "results", None) block so unexpected errors are logged with
stack traces instead of being silently ignored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/leaderboard.py`:
- Around line 132-142: The PRAGMA call in d1_has_column interpolates table_name
directly and risks SQL injection; replace that by validating table_name against
a whitelist of allowed table names (or at minimum a strict regex like
^[A-Za-z0-9_]+$) before calling d1_all, and fail/raise if validation does not
pass. Keep the query call to d1_all(d1_all(db, f"PRAGMA
table_info({table_name})")) but only after verifying table_name, and document
the validation step; reference the d1_has_column function and d1_all helper when
adding the validation.
- Around line 38-49: parse_github_timestamp currently returns 0 for
invalid/empty inputs which causes events to be attributed to "1970-01"; change
parse_github_timestamp to return Optional[int] and return None (not 0) on
invalid/empty input and update its docstring accordingly, and then update
callers (e.g., the code that calls month_key(event_ts)) to explicitly fall back
to the current time (time.time()) when parse_github_timestamp returns None;
reference the parse_github_timestamp function and the month_key usage so you can
locate and update both the parser and its callers to handle None fallback.
- Around line 52-58: The alt attribute in avatar_img_tag uses raw login which
can allow HTML injection; update avatar_img_tag to HTML-escape the login for the
alt (and preferably any other HTML contexts) before embedding it in the returned
string (e.g., use html.escape on the login value), while keeping the URL-encoded
safe_login for the src; ensure you reference the avatar_img_tag function and
replace alt="{login}" with the escaped variant.

---

Nitpick comments:
In `@src/leaderboard.py`:
- Around line 334-387: Remove the unused JavaScript console import inside the
affected functions: delete the "from js import console" line found in
track_pr_closed and the similar import in track_pr_reopened in
src/leaderboard.py; leaving no other changes to those functions (or, if you
actually intend to log, replace the import with a used logging call) so there
are no unused imports referenced by the functions track_pr_closed and
track_pr_reopened.
- Around line 616-659: The loop in reset_leaderboard_month builds SQL with an
interpolated table name which tripped an S608 linter; ensure table is validated
against a whitelist (e.g. _VALID_TABLES) before using it in the f-string: check
that table in _VALID_TABLES (or raise/skip and record an error in deleted) and
only then call d1_run with f"DELETE FROM {table} ..."; if _VALID_TABLES is not
defined, add it as a module-level tuple/list of the expected literal table names
used here to make the intent explicit.
- Around line 481-510: The silent except around reading rowsWritten should log
the error for visibility: in leaderboard.py, inside the block that sets
rows_written from getattr(result, "rowsWritten", None) (the try/except around
rows_written = int(...)), replace the bare except/pass with logging of the
exception (e.g., logger.exception or logging.exception) and include contextual
info (the result object, org/repo/pr_number/month_key/reviewer_login if
available) so you can debug if the Workers runtime changes its API; keep the
outer behavior the same (don’t re-raise).
- Around line 147-231: The ensure_leaderboard_schema function is currently
running 6 CREATE TABLE IF NOT EXISTS statements on every event (called from
track_pr_opened, track_pr_closed, etc.); add a module-level boolean flag (e.g.,
_leaderboard_schema_ensured) that is checked at the top of
ensure_leaderboard_schema and set to True after the first successful run so
subsequent calls return immediately; keep the flag simple (Workers will reset
per isolate) and ensure it gates the existing d1_run calls inside
ensure_leaderboard_schema to avoid redundant DDL on every webhook.
- Around line 91-123: The function d1_all silently swallows exceptions in two
places (the JSON parsing try/except around raw_result and the getattr/result
handling try/except), which hinders debugging; modify d1_all to import and use a
logger (e.g., import logging; logger = logging.getLogger(__name__)) and replace
the bare "except Exception: pass" blocks with logging.exception or logger.error
calls that include context (mentioning raw_result/result and the SQL or params
if available) while preserving the existing control flow and return behavior;
update the handlers around the JSON block (working with raw_result and js_json)
and around the getattr(result, "results", None) block so unexpected errors are
logged with stack traces instead of being silently ignored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9c5d4275-45ac-4e9e-8ba2-56e4bffa4881

📥 Commits

Reviewing files that changed from the base of the PR and between 99e6b27 and b579570.

📒 Files selected for processing (1)
  • src/leaderboard.py

BharathC0 and others added 2 commits April 4, 2026 07:49
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@Jayant2908
Copy link
Copy Markdown

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Nachiket-Roy
Copy link
Copy Markdown

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

✅ Actions performed

Reviews resumed.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
src/leaderboard.py (5)

581-587: Consider extracting scoring weights as named constants.

The scoring formula is embedded inline. Extracting weights as module-level constants would improve readability and make the scoring system easier to adjust:

✨ Suggested refactor
+# Scoring weights for monthly leaderboard
+SCORE_OPEN_PR = 1
+SCORE_MERGED_PR = 10
+SCORE_CLOSED_PR = -2  # Penalty for closed-not-merged
+SCORE_REVIEW = 5
+SCORE_COMMENT = 2

 # In calculate_stats_from_d1:
     for login in user_stats:
         s = user_stats[login]
-        s["total"] = (s["openPrs"] * 1) + (s["mergedPrs"] * 10) + (s["closedPrs"] * -2) + (s["reviews"] * 5) + (s["comments"] * 2)
+        s["total"] = (
+            s["openPrs"] * SCORE_OPEN_PR +
+            s["mergedPrs"] * SCORE_MERGED_PR +
+            s["closedPrs"] * SCORE_CLOSED_PR +
+            s["reviews"] * SCORE_REVIEW +
+            s["comments"] * SCORE_COMMENT
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/leaderboard.py` around lines 581 - 587, The scoring weights are
hard-coded inside the loop that computes s["total"]; define descriptive
module-level constants (e.g., WEIGHT_OPEN_PRS, WEIGHT_MERGED_PRS,
WEIGHT_CLOSED_PRS, WEIGHT_REVIEWS, WEIGHT_COMMENTS) and replace the inline
numeric literals in the user_stats total calculation (the loop that sets
s["total"] and the sorted_users ordering logic still referencing "total",
"mergedPrs", "reviews") to use these constants so the scoring formula is clearer
and adjustable without changing the loop.

161-245: Consider caching schema initialization and adding secondary indexes.

  1. Repeated calls: ensure_leaderboard_schema is invoked on every event (PR open/close/reopen, comment, review). While CREATE TABLE IF NOT EXISTS is safe, it adds latency to every request. Consider caching whether schema has been verified for this request/worker instance.

  2. Missing indexes: Queries filter on (org, month_key) combinations frequently. Consider adding indexes for better query performance on larger datasets:

📊 Suggested index additions
CREATE INDEX IF NOT EXISTS idx_monthly_stats_org_month 
  ON leaderboard_monthly_stats(org, month_key);

CREATE INDEX IF NOT EXISTS idx_review_credits_org_pr_month 
  ON leaderboard_review_credits(org, repo, pr_number, month_key);

CREATE INDEX IF NOT EXISTS idx_pr_state_org_closed 
  ON leaderboard_pr_state(org, closed_at);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/leaderboard.py` around lines 161 - 245, ensure_leaderboard_schema
currently runs CREATE TABLE IF NOT EXISTS on every event which adds latency and
also lacks helpful secondary indexes; modify ensure_leaderboard_schema to
short-circuit using a per-worker/per-process cached flag (e.g., module-level
boolean like schema_initialized) so the CREATE statements run only once per
worker, and add CREATE INDEX IF NOT EXISTS statements for the common filter
patterns (e.g., on leaderboard_monthly_stats(org, month_key),
leaderboard_review_credits(org, repo, pr_number, month_key), and
leaderboard_pr_state(closed_at) or org/closed_at combinations) so queries
against leaderboard_monthly_stats, leaderboard_review_credits, and
leaderboard_pr_state are optimized.

23-35: Missing input validation in month_window could raise on malformed keys.

If mk doesn't match the expected YYYY-MM format, split("-") and int() conversions will raise uncaught exceptions. Since this function is called with values from the database (e.g., in reset_leaderboard_month), defensive validation prevents propagating bad data errors.

🛡️ Suggested defensive validation
 def month_window(mk: str) -> Tuple[int, int]:
     """Return start/end timestamps (UTC) for a YYYY-MM key."""
+    if not mk or len(mk) != 7 or mk[4] != '-':
+        raise ValueError(f"Invalid month key format: {mk}")
     year, month = mk.split("-")
     y = int(year)
     m = int(month)
+    if not (1 <= m <= 12):
+        raise ValueError(f"Invalid month: {m}")
     start_struct = time.struct_time((y, m, 1, 0, 0, 0, 0, 0, 0))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/leaderboard.py` around lines 23 - 35, The month_window function currently
assumes mk is "YYYY-MM" and will crash on malformed input; add defensive
validation at the start of month_window: verify mk splits into exactly two parts
(year, month), that both parts are numeric and year is 4 digits and month is 2
digits (or use a regex like ^\d{4}-\d{2}$), convert to ints inside a try/except
and check 1 <= month <= 12, and if validation/parsing fails raise a clear
ValueError (e.g., "Invalid month key: <mk>") so callers like
reset_leaderboard_month get a predictable exception rather than uncaught
TypeError/ValueError from split/int conversion.

634-646: Static analysis false positive, but consider validating against _VALID_TABLES for consistency.

The S608 warning on line 641 is a false positive since table values come from a hardcoded list. However, for consistency with the _VALID_TABLES pattern used in d1_has_column, consider validating:

🛡️ Suggested consistency improvement
     for table, params in [
         ("leaderboard_monthly_stats", (org, mk)),
         ("leaderboard_backfill_repo_done", (org, mk)),
         ("leaderboard_review_credits", (org, mk)),
         ("leaderboard_backfill_state", (org, mk)),
     ]:
+        if table not in _VALID_TABLES:
+            continue  # Safety check - should never trigger
         try:
             await d1_run(db, f"DELETE FROM {table} WHERE org = ? AND month_key = ?", params)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/leaderboard.py` around lines 634 - 646, The loop deletes rows from
several hardcoded tables but triggers a static-analysis S608 warning; to be
consistent with the project's whitelist pattern, validate the table name against
the existing _VALID_TABLES before running d1_run. Update the deletion block (the
loop using variable table and calling d1_run) to check that table is in
_VALID_TABLES (or map those names to allowed names) and only then call d1_run;
if not allowed, log or set deleted[table] to an error. This mirrors the pattern
used by d1_has_column and keeps the hardcoded list but satisfies the validator.

100-126: Silent exception handling obscures potential D1 parsing failures.

The try-except-pass blocks (lines 107-108, 125-126) silently swallow all exceptions. While fallback logic is appropriate for handling varied D1 result formats, completely ignoring exceptions makes debugging difficult when parsing genuinely fails. Consider logging at debug level.

🔧 Suggested improvement
     try:
         from js import JSON as JS_JSON
         js_json = JS_JSON.stringify(raw_result)
         parsed = json.loads(str(js_json))
         rows = parsed.get("results") if isinstance(parsed, dict) else None
         if isinstance(rows, list):
             return rows
     except Exception:
-        pass
+        from js import console
+        console.log("[D1] JSON stringify/parse fallback triggered")
     result = _to_py(raw_result)
     # ... rest of function
     try:
         return list(rows)
-    except Exception:
+    except Exception as e:
+        from js import console
+        console.log(f"[D1] Could not convert rows to list: {e}")
         return []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/leaderboard.py` around lines 100 - 126, Replace the silent except-pass
blocks with debug-level logging so parsing failures are visible: in the block
that tries to use JS_JSON.stringify on raw_result (around JS_JSON, js_json,
parsed, rows) catch the Exception and call the module/logger debug with the
exception (e.g., logger.debug("D1 JS JSON parse failed for raw_result=%r",
raw_result, exc_info=True)); do the same in the getattr fallback (around result,
getattr(result, "results", None)) and in the final list(rows) conversion catch —
log the exception and context (rows/result) at debug level instead of swallowing
it. Ensure you use the existing module logger (or logging.getLogger(__name__))
and keep the current fallback behavior after logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/leaderboard.py`:
- Line 12: Remove the unused constant MAX_OPEN_PRS_PER_AUTHOR or implement the
intended enforcement; specifically either delete the MAX_OPEN_PRS_PER_AUTHOR
definition from leaderboard.py if no PR-per-author limit is required, or add
logic where open PRs are processed (e.g., in the function that aggregates or
validates author PR counts—search for functions like compute_leaderboard,
tally_open_prs, or get_open_prs_by_author) to enforce the cap by counting open
PRs per author and rejecting/flagging or truncating further PRs when the count
exceeds MAX_OPEN_PRS_PER_AUTHOR; update any tests or callers accordingly.
- Around line 365-381: The timestamp parsing for closed_at and merged_at uses
parse_github_timestamp which can return 0 for malformed inputs, causing month
attribution to fall to 1970-01; update the logic around closed_ts and event_ts
to validate parse_github_timestamp results (for example treat 0 as invalid) and
fall back to int(time.time()) when parse_github_timestamp returns 0 or other
clearly invalid values; specifically adjust the closed_ts assignment (where
parse_github_timestamp(closed_at) is used) and the event_ts computation (where
parse_github_timestamp(merged_at) is used before calling month_key) so both use
the validated timestamp value instead of blindly trusting
parse_github_timestamp.
- Around line 715-716: The code that appends the row below the author omits the
medal parameter so rank 3 won't show the 🥉; update the call inside the block
that checks if author_index < len(sorted_users) - 1 to pass the correct medal
for rank (author_index + 2) to row_for (or compute the medal via an existing
helper like medal_for and pass it), ensuring the third-place row includes the 🥉
emoji when author is at rank 2.

---

Nitpick comments:
In `@src/leaderboard.py`:
- Around line 581-587: The scoring weights are hard-coded inside the loop that
computes s["total"]; define descriptive module-level constants (e.g.,
WEIGHT_OPEN_PRS, WEIGHT_MERGED_PRS, WEIGHT_CLOSED_PRS, WEIGHT_REVIEWS,
WEIGHT_COMMENTS) and replace the inline numeric literals in the user_stats total
calculation (the loop that sets s["total"] and the sorted_users ordering logic
still referencing "total", "mergedPrs", "reviews") to use these constants so the
scoring formula is clearer and adjustable without changing the loop.
- Around line 161-245: ensure_leaderboard_schema currently runs CREATE TABLE IF
NOT EXISTS on every event which adds latency and also lacks helpful secondary
indexes; modify ensure_leaderboard_schema to short-circuit using a
per-worker/per-process cached flag (e.g., module-level boolean like
schema_initialized) so the CREATE statements run only once per worker, and add
CREATE INDEX IF NOT EXISTS statements for the common filter patterns (e.g., on
leaderboard_monthly_stats(org, month_key), leaderboard_review_credits(org, repo,
pr_number, month_key), and leaderboard_pr_state(closed_at) or org/closed_at
combinations) so queries against leaderboard_monthly_stats,
leaderboard_review_credits, and leaderboard_pr_state are optimized.
- Around line 23-35: The month_window function currently assumes mk is "YYYY-MM"
and will crash on malformed input; add defensive validation at the start of
month_window: verify mk splits into exactly two parts (year, month), that both
parts are numeric and year is 4 digits and month is 2 digits (or use a regex
like ^\d{4}-\d{2}$), convert to ints inside a try/except and check 1 <= month <=
12, and if validation/parsing fails raise a clear ValueError (e.g., "Invalid
month key: <mk>") so callers like reset_leaderboard_month get a predictable
exception rather than uncaught TypeError/ValueError from split/int conversion.
- Around line 634-646: The loop deletes rows from several hardcoded tables but
triggers a static-analysis S608 warning; to be consistent with the project's
whitelist pattern, validate the table name against the existing _VALID_TABLES
before running d1_run. Update the deletion block (the loop using variable table
and calling d1_run) to check that table is in _VALID_TABLES (or map those names
to allowed names) and only then call d1_run; if not allowed, log or set
deleted[table] to an error. This mirrors the pattern used by d1_has_column and
keeps the hardcoded list but satisfies the validator.
- Around line 100-126: Replace the silent except-pass blocks with debug-level
logging so parsing failures are visible: in the block that tries to use
JS_JSON.stringify on raw_result (around JS_JSON, js_json, parsed, rows) catch
the Exception and call the module/logger debug with the exception (e.g.,
logger.debug("D1 JS JSON parse failed for raw_result=%r", raw_result,
exc_info=True)); do the same in the getattr fallback (around result,
getattr(result, "results", None)) and in the final list(rows) conversion catch —
log the exception and context (rows/result) at debug level instead of swallowing
it. Ensure you use the existing module logger (or logging.getLogger(__name__))
and keep the current fallback behavior after logging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0ac07d77-30e9-4f7b-9ce7-1e1fc276899c

📥 Commits

Reviewing files that changed from the base of the PR and between b579570 and 26cdb3c.

📒 Files selected for processing (1)
  • src/leaderboard.py

@BharathC0
Copy link
Copy Markdown
Author

@Jayant2908 @Jayant2908 please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Ready

Development

Successfully merging this pull request may close these issues.

3 participants