Skip to content

feat: Implement the documented "add_repo_metadata" functionality#2304

Open
asbach wants to merge 14 commits intoThe-PR-Agent:mainfrom
asbach:feature/add-repo-metadata
Open

feat: Implement the documented "add_repo_metadata" functionality#2304
asbach wants to merge 14 commits intoThe-PR-Agent:mainfrom
asbach:feature/add-repo-metadata

Conversation

@asbach
Copy link
Copy Markdown

@asbach asbach commented Apr 3, 2026

User description

Adds the already documented "add_repo_metadata" and "add_repo_metadata_file_list" features.

Support for reading additional instructions from agents.md, claude.md etc. or any other custom instructions file.

fixes #2174

If the instructions become too long, the current pr-agent implementation would start clipping the diff and not the metadata. I can add some kind of sensible max length, if required.


PR Type

Enhancement


Description

  • Implement documented add_repo_metadata feature to load custom instructions

  • Add get_repo_file() method across all git providers for file retrieval

  • Validate file paths to prevent directory traversal security attacks

  • Support content-based deduplication to prevent metadata duplication

  • Add comprehensive unit tests for metadata loading functionality


Diagram Walkthrough

flowchart LR
  A["PR Metadata Config"] -->|add_repo_metadata enabled| B["apply_repo_settings()"]
  B -->|validate paths| C["_is_safe_repo_file_path()"]
  C -->|safe paths| D["git_provider.get_repo_file()"]
  D -->|fetch from head branch| E["Metadata Files<br/>AGENTS.md, QODO.md, etc"]
  E -->|combine content| F["extra_instructions"]
  F -->|append to all tools| G["pr_reviewer, pr_description,<br/>pr_code_suggestions, etc"]
  C -->|unsafe paths| H["Skip & Log Warning"]
Loading

File Walkthrough

Relevant files
Enhancement
10 files
utils.py
Add metadata loading and path validation logic                     
+87/-0   
git_provider.py
Add abstract get_repo_file method to base class                   
+13/-0   
github_provider.py
Implement get_repo_file for GitHub provider                           
+13/-0   
gitlab_provider.py
Implement get_repo_file for GitLab provider                           
+13/-0   
bitbucket_provider.py
Implement get_repo_file for Bitbucket provider                     
+17/-0   
bitbucket_server_provider.py
Implement get_repo_file for Bitbucket Server provider       
+13/-0   
azuredevops_provider.py
Implement get_repo_file for Azure DevOps provider               
+23/-0   
gitea_provider.py
Implement get_repo_file for Gitea provider                             
+18/-0   
codecommit_provider.py
Implement get_repo_file for CodeCommit provider                   
+8/-0     
gerrit_provider.py
Implement get_repo_file for Gerrit provider                           
+7/-0     
Configuration changes
1 files
configuration.toml
Add configuration options for metadata feature                     
+2/-0     
Tests
1 files
test_repo_metadata.py
Add comprehensive unit tests for metadata feature               
+215/-0 

@asbach asbach marked this pull request as ready for review April 3, 2026 07:40
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Implement add_repo_metadata feature to inject repository metadata files into tool instructions

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Adds get_repo_file() method to all git providers for reading arbitrary repository files
• Implements add_repo_metadata feature to load metadata files (AGENTS.md, QODO.md, CLAUDE.md) from
  PR's head branch
• Injects loaded metadata content into extra_instructions for all supported tools
• Includes comprehensive unit tests for metadata loading functionality
Diagram
flowchart LR
  A["PR's Head Branch"] -->|get_repo_file| B["Git Providers"]
  B -->|Read Metadata Files| C["AGENTS.md, QODO.md, CLAUDE.md"]
  C -->|Combine Content| D["Metadata Content"]
  D -->|Append to| E["Tool extra_instructions"]
  E -->|Applied to| F["All PR Tools"]
Loading

Grey Divider

File Changes

1. pr_agent/git_providers/git_provider.py ✨ Enhancement +13/-0

Add abstract get_repo_file method to base class

pr_agent/git_providers/git_provider.py


2. pr_agent/git_providers/github_provider.py ✨ Enhancement +8/-0

Implement get_repo_file for GitHub provider

pr_agent/git_providers/github_provider.py


3. pr_agent/git_providers/gitlab_provider.py ✨ Enhancement +9/-0

Implement get_repo_file for GitLab provider

pr_agent/git_providers/gitlab_provider.py


View more (9)
4. pr_agent/git_providers/bitbucket_provider.py ✨ Enhancement +12/-0

Implement get_repo_file for Bitbucket provider

pr_agent/git_providers/bitbucket_provider.py


5. pr_agent/git_providers/bitbucket_server_provider.py ✨ Enhancement +10/-0

Implement get_repo_file for Bitbucket Server provider

pr_agent/git_providers/bitbucket_server_provider.py


6. pr_agent/git_providers/azuredevops_provider.py ✨ Enhancement +15/-0

Implement get_repo_file for Azure DevOps provider

pr_agent/git_providers/azuredevops_provider.py


7. pr_agent/git_providers/gitea_provider.py ✨ Enhancement +13/-0

Implement get_repo_file for Gitea provider

pr_agent/git_providers/gitea_provider.py


8. pr_agent/git_providers/codecommit_provider.py ✨ Enhancement +7/-0

Implement get_repo_file for CodeCommit provider

pr_agent/git_providers/codecommit_provider.py


9. pr_agent/git_providers/gerrit_provider.py ✨ Enhancement +7/-0

Implement get_repo_file for Gerrit provider

pr_agent/git_providers/gerrit_provider.py


10. pr_agent/git_providers/utils.py ✨ Enhancement +43/-0

Add metadata file loading and injection logic

pr_agent/git_providers/utils.py


11. tests/unittest/test_repo_metadata.py 🧪 Tests +146/-0

Add comprehensive unit tests for metadata feature

tests/unittest/test_repo_metadata.py


12. pr_agent/settings/configuration.toml ⚙️ Configuration changes +2/-0

Add configuration options for metadata feature

pr_agent/settings/configuration.toml


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 3, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (4) 📎 Requirement gaps (1) 🎨 UX Issues (0)

Grey Divider


Action required

1. add_repo_metadata_file_list not validated 📘 Rule violation ≡ Correctness ⭐ New
Description
add_repo_metadata_file_list is used directly without validating/normalizing its type/contents,
which can lead to incorrect behavior (e.g., iterating over characters if a string is provided) or
runtime errors. The checklist requires normalizing and validating user-provided settings before
using them in logic.
Code

pr_agent/git_providers/utils.py[R142-179]

+    if not repo_metadata_applied and get_settings().config.get("add_repo_metadata", False):
+        try:
+            tool_sections = [
+                "pr_reviewer",
+                "pr_description",
+                "pr_code_suggestions",
+                "pr_add_docs",
+                "pr_update_changelog",
+                "pr_test",
+                "pr_improve_component",
+            ]
+
+            # In non-context runtimes (CLI, polling), restore extra_instructions to their
+            # pre-metadata baseline so metadata from a previous PR doesn't persist.
+            global _extra_instructions_baseline
+            is_context_mode = False
+            try:
+                is_context_mode = context.exists()
+            except Exception:
+                pass
+            if not is_context_mode:
+                if _extra_instructions_baseline:
+                    # Restore baseline before applying this PR's metadata
+                    for section, baseline_value in _extra_instructions_baseline.items():
+                        get_settings().set(f"{section}.extra_instructions", baseline_value)
+                else:
+                    # First run: capture the current values as the baseline
+                    for section in tool_sections:
+                        section_obj = get_settings().get(section, None)
+                        if section_obj is not None and hasattr(section_obj, "extra_instructions"):
+                            _extra_instructions_baseline[section] = section_obj.extra_instructions or ""
+
+            metadata_files = get_settings().config.get("add_repo_metadata_file_list",
+                                                        ["AGENTS.md", "QODO.md", "CLAUDE.md"])
+
+            # Collect contents of all metadata files that exist in the repo
+            metadata_content_parts = []
+            for file_name in metadata_files:
Evidence
The code reads add_repo_metadata_file_list and immediately iterates over it without checking it is
a list of strings, violating the requirement to validate/normalize settings values before use.

pr_agent/git_providers/utils.py[142-179]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`config.add_repo_metadata_file_list` is consumed without validation. If the setting is mis-typed (e.g., a string, `None`, or a list containing non-strings), metadata loading can behave incorrectly or crash.

## Issue Context
This setting is user-provided via TOML/env/config overrides and must be normalized to a predictable shape before iterating.

## Fix Focus Areas
- pr_agent/git_providers/utils.py[142-186]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. _is_safe_repo_file_path() lacks type guard 📘 Rule violation ☼ Reliability ⭐ New
Description
_is_safe_repo_file_path() calls .strip() on file_path without confirming it is a string, which
can raise AttributeError if non-string values reach it (e.g., from configuration). The checklist
requires defensive type checks before calling methods on external inputs.
Code

pr_agent/git_providers/utils.py[R20-27]

+def _is_safe_repo_file_path(file_path: str) -> bool:
+    """
+    Validate that a file path is safe to read from a repository root.
+    Rejects absolute paths, paths with '..' traversal components, backslashes,
+    and percent-encoded bypass attempts.
+    """
+    if not file_path or not file_path.strip():
+        return False
Evidence
The function assumes file_path has .strip(); without an isinstance(file_path, str) guard,
non-string inputs can crash the metadata-loading path.

pr_agent/git_providers/utils.py[20-27]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`_is_safe_repo_file_path()` calls `file_path.strip()` without first ensuring `file_path` is a string. This can raise at runtime if configuration contains non-string entries.

## Issue Context
Even though the type annotation is `str`, configuration values are external inputs and can be malformed.

## Fix Focus Areas
- pr_agent/git_providers/utils.py[20-49]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. fromRef['latestCommit'] unguarded access 📘 Rule violation ☼ Reliability ⭐ New
Description
bitbucket_server_provider.get_repo_file() directly indexes self.pr.fromRef['latestCommit']
without checking the key exists or the shape is as expected. This can raise KeyError/TypeError
on unexpected webhook/provider payload shapes.
Code

pr_agent/git_providers/bitbucket_server_provider.py[R121-122]

+            head_sha = self.pr.fromRef['latestCommit']
+            content = self.get_file(file_path, head_sha)
Evidence
The rule requires defensive access patterns for external/provider inputs; the new code indexes a
nested provider object without .get(...) or type checks.

pr_agent/git_providers/bitbucket_server_provider.py[119-123]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`self.pr.fromRef['latestCommit']` is accessed via direct indexing, which can crash if `fromRef` is missing, not a dict, or the key is absent.

## Issue Context
Provider payloads are external inputs and may vary across Bitbucket Server versions / API responses.

## Fix Focus Areas
- pr_agent/git_providers/bitbucket_server_provider.py[119-123]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (14)
4. Bitbucket file fetch can hang 🐞 Bug ☼ Reliability ⭐ New
Description
BitbucketProvider.get_repo_file performs an HTTP GET without a timeout, so enabling
add_repo_metadata can block apply_repo_settings indefinitely on stalled connections. Because
apply_repo_settings is called before request handling logic, this can stall the whole PR-agent flow.
Code

pr_agent/git_providers/bitbucket_provider.py[R92-101]

+    def get_repo_file(self, file_path: str) -> str:
+        try:
+            # Read from the PR's source branch so metadata files reflect the branch under review
+            url = (f"https://api.bitbucket.org/2.0/repositories/{self.workspace_slug}/{self.repo_slug}/src/"
+                   f"{self.pr.source_branch}/{file_path}")
+            response = requests.request("GET", url, headers=self.headers)
+            if response.status_code == 404:
+                return ""
+            response.raise_for_status()
+            return response.text
Evidence
The new get_repo_file() uses requests.request(...) without a timeout. apply_repo_settings() calls
git_provider.get_repo_file() for each configured metadata filename, so a stalled Bitbucket API call
will block startup/handling.

pr_agent/git_providers/bitbucket_provider.py[92-108]
pr_agent/git_providers/utils.py[174-186]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`BitbucketProvider.get_repo_file()` makes an outbound HTTP request without a timeout. In network stalls, this can hang indefinitely and block `apply_repo_settings()`.

### Issue Context
`apply_repo_settings()` runs before PR processing and calls `get_repo_file()` for each metadata filename, so this path is on the critical startup path when `add_repo_metadata=true`.

### Fix Focus Areas
- Add an explicit `timeout=` to the Bitbucket `requests.request("GET", ...)` call.
- Prefer a configurable timeout (or a sensible constant) and handle timeout exceptions similarly to other request errors.
- pr_agent/git_providers/bitbucket_provider.py[92-108]
- pr_agent/git_providers/utils.py[174-186]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Cross-PR metadata leakage🐞 Bug ≡ Correctness
Description
In runtimes without a Starlette request context (e.g., GitHub polling), apply_repo_settings()
appends repo metadata into global Dynaconf settings and never removes it, so later PRs can inherit
extra_instructions from earlier repos/PRs. The current content-based dedup only prevents exact
duplicates and does not prevent accumulation when metadata differs across PRs.
Code

pr_agent/git_providers/utils.py[R122-167]

+    repo_metadata_applied = False
+    try:
+        repo_metadata_applied = context.get("repo_metadata_applied", False)
+    except Exception:
+        pass
+    if not repo_metadata_applied and get_settings().config.get("add_repo_metadata", False):
+        try:
+            metadata_files = get_settings().config.get("add_repo_metadata_file_list",
+                                                        ["AGENTS.md", "QODO.md", "CLAUDE.md"])
+
+            # Collect contents of all metadata files that exist in the repo
+            metadata_content_parts = []
+            for file_name in metadata_files:
+                if not _is_safe_repo_file_path(file_name):
+                    get_logger().warning(f"Skipping unsafe metadata file path: '{file_name}'")
+                    continue
+                content = git_provider.get_repo_file(file_name)
+                if content and content.strip():
+                    metadata_content_parts.append(content.strip())
+                    get_logger().info(f"Loaded repository metadata file: {file_name}")
+
+            # Append combined metadata to extra_instructions for every tool that supports it.
+            # Content-based check prevents duplication in non-context runtimes (CLI, polling).
+            if metadata_content_parts:
+                combined_metadata = "\n\n".join(metadata_content_parts)
+                tool_sections = [
+                    "pr_reviewer",
+                    "pr_description",
+                    "pr_code_suggestions",
+                    "pr_add_docs",
+                    "pr_update_changelog",
+                    "pr_test",
+                    "pr_improve_component",
+                ]
+                for section in tool_sections:
+                    section_obj = get_settings().get(section, None)
+                    if section_obj is not None and hasattr(section_obj, "extra_instructions"):
+                        existing = section_obj.extra_instructions or ""
+                        if combined_metadata in existing:
+                            continue
+                        if existing:
+                            new_value = f"{existing}\n\n{combined_metadata}"
+                        else:
+                            new_value = combined_metadata
+                        get_settings().set(f"{section}.extra_instructions", new_value)
+            # Mark as applied for this request (server mode only)
Evidence
apply_repo_settings() appends combined_metadata into each tool section’s extra_instructions via
get_settings().set(...) with only an if combined_metadata in existing guard, meaning different
metadata strings will accumulate over time. In github_polling there is no request-scoped
context["settings"] initialization, and get_settings() therefore returns the process-global
global_settings, making these mutations persistent across multiple PRs handled in the same
process.

pr_agent/git_providers/utils.py[111-173]
pr_agent/config_loader.py[47-60]
pr_agent/servers/github_polling.py[11-72]
pr_agent/servers/github_app.py[47-53]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`apply_repo_settings()` appends repository metadata into `extra_instructions` by mutating Dynaconf settings. In non-webhook runtimes (notably `github_polling.py`) there is no request-scoped `context["settings"]`, so `get_settings()` returns `global_settings` and the appended metadata persists across multiple PRs, causing later PRs to inherit earlier PRs’ metadata.
### Issue Context
The current guard only prevents duplication when the *exact* `combined_metadata` string is already present. This does not stop accumulation when PR A metadata differs from PR B metadata.
### Fix Focus Areas
- pr_agent/git_providers/utils.py[111-173]
- pr_agent/config_loader.py[47-60]
- pr_agent/servers/github_polling.py[11-72]
### Implementation direction
Pick one (or combine):
1) **Make metadata non-additive across PRs in non-context mode**: store the last-applied metadata string (and which sections were modified) in a global variable or in `get_settings().config` (e.g., `config._repo_metadata_last_applied`). Before appending new metadata, remove the previously applied metadata block from each section.
2) **Introduce per-PR settings isolation in polling/CLI**: ensure polling creates a per-task/per-PR settings copy (similar to webhook servers), so mutations don’t persist across PRs.
3) Add/extend unit tests to cover applying metadata for PR1 then PR2 with different metadata and assert PR2 does not contain PR1 instructions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Encoded traversal bypass🐞 Bug ⛨ Security
Description
_is_safe_repo_file_path() validates only literal '..' segments and does not decode percent-encoded
sequences, so inputs like '%2e%2e/%2e%2e/...' can pass validation. For HTTP-based providers that
interpret percent-encoded paths, this can bypass the intended directory-traversal protection and
attempt to read unintended repo paths.
Code

pr_agent/git_providers/utils.py[R15-34]

+def _is_safe_repo_file_path(file_path: str) -> bool:
+    """
+    Validate that a file path is safe to read from a repository root.
+    Rejects absolute paths, paths with '..' traversal components, and backslashes.
+    """
+    if not file_path or not file_path.strip():
+        return False
+    # Reject absolute paths (Unix and Windows-style)
+    if os.path.isabs(file_path) or file_path.startswith("/") or file_path.startswith("\\"):
+        return False
+    if len(file_path) >= 2 and file_path[1] == ":":  # e.g. C:\...
+        return False
+    # Reject backslashes (non-standard on most git providers, potential traversal vector)
+    if "\\" in file_path:
+        return False
+    # Normalize and reject any ".." components
+    normalized = posixpath.normpath(file_path)
+    if normalized.startswith("..") or "/.." in normalized:
+        return False
+    return True
Evidence
The validator uses posixpath.normpath(file_path) and checks for .. in the normalized string, but
it never URL-decodes the input before checking, so encoded dot segments are not detected.
Bitbucket’s provider interpolates file_path directly into a URL path, making percent-encoded
traversal sequences especially relevant there.

pr_agent/git_providers/utils.py[15-34]
pr_agent/git_providers/bitbucket_provider.py[92-101]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`_is_safe_repo_file_path()` can be bypassed with percent-encoded traversal tokens (e.g. `%2e%2e/`) because it validates the raw string without URL-decoding.
### Issue Context
The metadata file list is configurable (`add_repo_metadata_file_list`). If an attacker can influence that configuration (e.g., via repo settings in a PR context), the validator should not allow encoded traversal sequences to reach HTTP-based providers.
### Fix Focus Areas
- pr_agent/git_providers/utils.py[15-34]
### Implementation direction
- Apply `urllib.parse.unquote()` (or stricter decoding) to the input before normalization and checks.
- Consider rejecting any path containing `%` after decoding/normalization if you want to avoid ambiguous encodings.
- After decoding, validate that no path segment equals `..` (e.g., split on `/` and check) rather than relying only on prefix/substring checks.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. Gerrit reads local files🐞 Bug ⛨ Security
Description
GerritProvider.get_repo_file() opens files directly from the local cloned repo directory, so
add_repo_metadata can read untracked/sensitive files (e.g., .git/config) and inject them into LLM
prompts when enabled. This is a security exposure unique to the Gerrit provider because it bypasses
“git tree only” access used by other providers.
Code

pr_agent/git_providers/gerrit_provider.py[R234-239]

+    def get_repo_file(self, file_path: str) -> str:
+        try:
+            with open(self.repo_path / file_path, 'r', encoding='utf-8') as f:
+                return f.read()
+        except OSError:
+            return ""
Evidence
GerritProvider clones the repo to a temp directory (which includes a .git directory) and then
get_repo_file reads arbitrary relative paths from that on-disk checkout; apply_repo_settings uses
get_repo_file to load metadata content into extra_instructions.

pr_agent/git_providers/gerrit_provider.py[102-109]
pr_agent/git_providers/gerrit_provider.py[234-240]
pr_agent/git_providers/utils.py[111-170]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`GerritProvider.get_repo_file()` reads from the local filesystem (`open(self.repo_path / file_path, ...)`). With `add_repo_metadata` enabled, this allows reading untracked internal files (not part of the git tree), such as `.git/config`, and injecting them into prompts.
### Issue Context
Other providers fetch files via the provider API at a specific ref/sha, implicitly limiting reads to repository content. Gerrit should do the same: read the file blob from the current commit/tree, not from the working directory.
### Fix Focus Areas
- pr_agent/git_providers/gerrit_provider.py[234-240]
- pr_agent/git_providers/gerrit_provider.py[102-109]
### Suggested fix approach
- Implement `get_repo_file()` by reading from `self.repo.head.commit.tree` (or `self.repo.git.show(f"HEAD:{file_path}")`) and decoding as UTF-8.
- Treat missing paths as "" (catch `KeyError`/`GitCommandError`).
- Consider decoding with `errors="replace"` to avoid crashing on non-UTF8 content.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


8. Tests leak global settings🐞 Bug ☼ Reliability
Description
tests/unittest/test_repo_metadata.py mutates global Dynaconf config keys
(config.add_repo_metadata and config.add_repo_metadata_file_list) but only restores
extra_instructions, which can leak configuration into other tests and make test runs
order-dependent/flaky.
Code

tests/unittest/test_repo_metadata.py[R33-50]

+@pytest.fixture(autouse=True)
+def _reset_extra_instructions():
+    """Reset extra_instructions for all tool sections before each test."""
+    tool_sections = [
+        "pr_reviewer", "pr_description", "pr_code_suggestions",
+        "pr_add_docs", "pr_update_changelog", "pr_test", "pr_improve_component",
+    ]
+    original_values = {}
+    for section in tool_sections:
+        section_obj = get_settings().get(section, None)
+        if section_obj is not None:
+            original_values[section] = getattr(section_obj, 'extra_instructions', "")
+
+    yield
+
+    for section, value in original_values.items():
+        get_settings().set(f"{section}.extra_instructions", value)
+
Evidence
When no Starlette request context exists (typical unit tests), get_settings() returns a
process-global singleton (global_settings). The new test file calls get_settings().set(...) for
config keys in multiple tests, but its autouse fixture only restores tool-section
extra_instructions, not the config keys it changes.

pr_agent/config_loader.py[12-61]
tests/unittest/test_repo_metadata.py[33-50]
tests/unittest/test_repo_metadata.py[53-76]
tests/unittest/test_repo_metadata.py[134-145]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new unit tests modify global Dynaconf config (`config.add_repo_metadata`, `config.add_repo_metadata_file_list`) and do not restore it, which can affect other tests since `get_settings()` returns a singleton outside Starlette context.
### Issue Context
`get_settings()` returns `global_settings` when `context["settings"]` is unavailable (typical in unit tests). Any `.set()` persists for the whole process.
### Fix Focus Areas
- tests/unittest/test_repo_metadata.py[33-50]
- tests/unittest/test_repo_metadata.py[53-145]
### Suggested fix
- Extend the autouse fixture to snapshot and restore:
- `config.add_repo_metadata`
- `config.add_repo_metadata_file_list`
- Optionally also clear any Starlette-context flags if present (best-effort), e.g. `context.pop("repo_metadata_applied", None)` inside a try/except.
- Ensure each test starts from a clean config baseline to avoid order dependence.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


9. Global metadata guard leaks🐞 Bug ≡ Correctness
Description
In non-Starlette contexts, apply_repo_settings() falls back to a process-global
config._repo_metadata_applied flag, causing repository metadata to be loaded only once per process
and then skipped for subsequent PRs handled in the same process (e.g., polling loops). This makes
add_repo_metadata behavior incorrect across multiple PRs and can lead to missing/incorrect
extra_instructions.
Code

pr_agent/git_providers/utils.py[R119-168]

+    repo_metadata_applied = False
+    try:
+        repo_metadata_applied = context.get("repo_metadata_applied", False)
+    except Exception:
+        # No request context (e.g. CLI mode) — fall back to a flag on the settings object
+        repo_metadata_applied = get_settings().get("config._repo_metadata_applied", False)
+    if not repo_metadata_applied and get_settings().config.get("add_repo_metadata", False):
+        try:
+            metadata_files = get_settings().config.get("add_repo_metadata_file_list",
+                                                        ["AGENTS.md", "QODO.md", "CLAUDE.md"])
+
+            # Collect contents of all metadata files that exist in the repo
+            metadata_content_parts = []
+            for file_name in metadata_files:
+                if not _is_safe_repo_file_path(file_name):
+                    get_logger().warning(f"Skipping unsafe metadata file path: '{file_name}'")
+                    continue
+                content = git_provider.get_repo_file(file_name)
+                if content and content.strip():
+                    metadata_content_parts.append(content.strip())
+                    get_logger().info(f"Loaded repository metadata file: {file_name}")
+
+            # Append combined metadata to extra_instructions for every tool that supports it.
+            if metadata_content_parts:
+                combined_metadata = "\n\n".join(metadata_content_parts)
+                tool_sections = [
+                    "pr_reviewer",
+                    "pr_description",
+                    "pr_code_suggestions",
+                    "pr_add_docs",
+                    "pr_update_changelog",
+                    "pr_test",
+                    "pr_improve_component",
+                ]
+                for section in tool_sections:
+                    section_obj = get_settings().get(section, None)
+                    if section_obj is not None and hasattr(section_obj, "extra_instructions"):
+                        existing = section_obj.extra_instructions or ""
+                        if existing:
+                            new_value = f"{existing}\n\n{combined_metadata}"
+                        else:
+                            new_value = combined_metadata
+                        get_settings().set(f"{section}.extra_instructions", new_value)
+            # Mark as applied so repeated calls within the same request don't re-append
+            try:
+                context["repo_metadata_applied"] = True
+            except Exception:
+                pass
+            get_settings().set("config._repo_metadata_applied", True)
+        except Exception as e:
Evidence
apply_repo_settings() reads the guard from Starlette context when available, but when context is
missing it reads and then sets a global Dynaconf value (config._repo_metadata_applied), which
persists for the lifetime of the process. get_settings() returns a global singleton when
context["settings"] is not present, and github_polling runs without initializing per-request context
settings, so subsequent PR processing in that process will see the guard as already applied and skip
metadata loading.

pr_agent/git_providers/utils.py[119-168]
pr_agent/config_loader.py[47-61]
pr_agent/servers/github_polling.py[11-17]
pr_agent/servers/github_app.py[49-53]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`apply_repo_settings()` uses `config._repo_metadata_applied` as a fallback guard when Starlette context is unavailable. Because `get_settings()` then returns the global settings singleton, this flag persists across PRs in the same process and suppresses metadata loading for subsequent PRs.
### Issue Context
This guard is intended to prevent duplicate *appends* within the same request, but in non-web runtimes (e.g. polling/background scripts) it becomes cross-PR state.
### Fix Focus Areas
- pr_agent/git_providers/utils.py[111-169]
- pr_agent/config_loader.py[47-61]
### Implementation direction
- Do not use a process-global Dynaconf flag for this guard.
- Prefer a request-scoped marker (Starlette `context`) only.
- For non-context runtimes, either:
- always load metadata (and optionally de-duplicate by checking whether the combined metadata is already present in `extra_instructions`), or
- keep a per-PR guard keyed by `pr_url` in a bounded in-memory cache (not in Dynaconf global settings).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


10. Azure reads merged commit🐞 Bug ≡ Correctness
Description
AzureDevOpsProvider.get_repo_file() reads files at self.pr.last_merge_commit (merge result) instead
of the PR source/head commit, so metadata may reflect the merged preview rather than the branch
under review. This is inconsistent with other providers in this PR that explicitly read from the
PR/MR source branch or head SHA.
Code

pr_agent/git_providers/azuredevops_provider.py[R177-193]

+    def get_repo_file(self, file_path: str) -> str:
+        try:
+            head_sha = self.pr.last_merge_commit
+            version = GitVersionDescriptor(
+                version=head_sha.commit_id, version_type="commit"
+            ) if head_sha else None
+            contents = self.azure_devops_client.get_item_content(
+                repository_id=self.repo_slug,
+                project=self.workspace_slug,
+                download=False,
+                include_content_metadata=False,
+                include_content=True,
+                path=file_path,
+                version_descriptor=version,
+            )
+            content = list(contents)[0]
+            return content.decode("utf-8") if isinstance(content, bytes) else content
Evidence
The Azure implementation uses self.pr.last_merge_commit to build the version descriptor. In the
same provider, last_merge_commit is also used as the diff ‘head’ alongside
last_merge_target_commit, indicating it represents the merge result used for diff/merge-preview,
not the PR source commit; therefore metadata can be fetched from the wrong ref.

pr_agent/git_providers/azuredevops_provider.py[177-196]
pr_agent/git_providers/azuredevops_provider.py[215-223]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`AzureDevopsProvider.get_repo_file()` fetches repository files using `self.pr.last_merge_commit`, which corresponds to the merge-result/merge-preview commit, not the PR source/head. Repository metadata should reflect the branch under review.
### Issue Context
Other providers in this PR intentionally read from the PR/MR source branch or head SHA.
### Fix Focus Areas
- pr_agent/git_providers/azuredevops_provider.py[177-193]
### Fix approach
Fetch the file using the PR source/head commit instead of `last_merge_commit`.
- Prefer Azure PR fields that represent the source commit (e.g., `last_merge_source_commit` if available on the PR model).
- If the model doesn’t expose it directly, derive the correct commit via the PR’s source ref and latest commit and use that commit id in `GitVersionDescriptor`.
- Ensure the ref used here matches what `get_pr_branch()` identifies as the PR source branch.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


11. Broad except Exception hides errors📘 Rule violation ☼ Reliability
Description
The repository-metadata loading block catches Exception and only logs at debug level, which can
mask programming errors and make failures hard to diagnose. This violates the requirement to use
targeted exception handling and avoid broad except Exception.
Code

pr_agent/git_providers/utils.py[R92-126]

+        try:
+            metadata_files = get_settings().config.get("add_repo_metadata_file_list",
+                                                        ["AGENTS.md", "QODO.md", "CLAUDE.md"])
+
+            # Collect contents of all metadata files that exist in the repo
+            metadata_content_parts = []
+            for file_name in metadata_files:
+                content = git_provider.get_repo_file(file_name)
+                if content and content.strip():
+                    metadata_content_parts.append(content.strip())
+                    get_logger().info(f"Loaded repository metadata file: {file_name}")
+
+            # Append combined metadata to extra_instructions for every tool that supports it.
+            if metadata_content_parts:
+                combined_metadata = "\n\n".join(metadata_content_parts)
+                tool_sections = [
+                    "pr_reviewer",
+                    "pr_description",
+                    "pr_code_suggestions",
+                    "pr_add_docs",
+                    "pr_update_changelog",
+                    "pr_test",
+                    "pr_improve_component",
+                ]
+                for section in tool_sections:
+                    section_obj = get_settings().get(section, None)
+                    if section_obj is not None and hasattr(section_obj, 'extra_instructions'):
+                        existing = section_obj.extra_instructions or ""
+                        if existing:
+                            new_value = f"{existing}\n\n{combined_metadata}"
+                        else:
+                            new_value = combined_metadata
+                        get_settings().set(f"{section}.extra_instructions", new_value)
+        except Exception as e:
+            get_logger().debug(f"Failed to load repository metadata files: {e}")
Evidence
The checklist disallows broad exception handling where narrower exceptions are appropriate. The new
metadata loader wraps the entire flow in except Exception as e: and swallows errors by returning
to normal execution, potentially hiding real defects.

pr_agent/git_providers/utils.py[92-126]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The metadata-loading logic uses a broad `except Exception` and suppresses failures, which can hide real bugs and reduce debuggability.
## Issue Context
The try/except currently wraps reading config, fetching files, and updating settings in one block.
## Fix Focus Areas
- pr_agent/git_providers/utils.py[92-126]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


12. GitLab.get_repo_file() swallows exceptions📘 Rule violation ☼ Reliability
Description
GitLabProvider.get_repo_file() catches Exception broadly and returns an empty string without
logging, making failures silent and hard to debug. This violates the requirement to avoid broad
except Exception and to handle expected errors explicitly.
Code

pr_agent/git_providers/gitlab_provider.py[R800-807]

+    def get_repo_file(self, file_path: str) -> str:
+        try:
+            # Read from the MR's source branch so metadata files reflect the branch under review
+            contents = self.gl.projects.get(self.id_project).files.get(
+                file_path=file_path, ref=self.mr.source_branch).decode()
+            return contents.decode("utf-8") if isinstance(contents, bytes) else contents
+        except Exception:
+            return ""
Evidence
The checklist requires targeted exception handling instead of broad except Exception. The new
GitLab implementation catches all exceptions and silently returns "", which hides root causes.

pr_agent/git_providers/gitlab_provider.py[800-807]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`GitLabProvider.get_repo_file()` uses a broad `except Exception` and suppresses errors without any logging.
## Issue Context
The call to `files.get(...).decode()` has known expected failure modes (e.g., not found / permission / API errors) that should be handled explicitly.
## Fix Focus Areas
- pr_agent/git_providers/gitlab_provider.py[800-807]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


13. Github.get_repo_file() broad except📘 Rule violation ☼ Reliability
Description
GithubProvider.get_repo_file() includes a broad except Exception fallback that suppresses
unexpected failures and logs only at debug level. This violates the requirement to avoid broad
exception handling where narrower exceptions are appropriate.
Code

pr_agent/git_providers/github_provider.py[R743-754]

+    def get_repo_file(self, file_path: str) -> str:
+        try:
+            # Read from the PR's head branch so metadata files reflect the branch under review
+            contents = self.repo_obj.get_contents(file_path, ref=self.pr.head.sha).decoded_content
+            return contents.decode("utf-8") if isinstance(contents, bytes) else contents
+        except GithubException as e:
+            if e.status != 404:
+                get_logger().warning(f"Failed to get repo file '{file_path}': {e}")
+            return ""
+        except Exception as e:
+            get_logger().debug(f"Failed to get repo file '{file_path}': {e}")
+            return ""
Evidence
The checklist requires catching specific exception types and avoiding broad except Exception. The
new method catches Exception in addition to GithubException, potentially masking programming
errors and other unexpected failures.

pr_agent/git_providers/github_provider.py[743-754]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`GithubProvider.get_repo_file()` uses a broad `except Exception` that suppresses errors and may hide unexpected defects.
## Issue Context
The code already handles `GithubException` (including 404), so the remaining exceptions should be narrowed to specific expected failure modes (e.g., decode issues) and logged appropriately.
## Fix Focus Areas
- pr_agent/git_providers/github_provider.py[743-754]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


14. Gerrit path traversal read🐞 Bug ⛨ Security
Description
GerritProvider.get_repo_file() opens self.repo_path / file_path without validating/normalizing the
path, allowing .. or absolute paths to escape the repo checkout. Since
add_repo_metadata_file_list is configuration-driven, this can read arbitrary local files and
inject them into AI prompts.
Code

pr_agent/git_providers/gerrit_provider.py[R234-237]

+    def get_repo_file(self, file_path: str) -> str:
+        try:
+            with open(self.repo_path / file_path, 'r') as f:
+                return f.read()
Evidence
apply_repo_settings() iterates over config.add_repo_metadata_file_list and passes each entry to
git_provider.get_repo_file() without sanitization. The Gerrit provider implementation directly
opens the resulting path on disk, so path traversal sequences can escape repo_path.

pr_agent/git_providers/utils.py[91-101]
pr_agent/git_providers/gerrit_provider.py[234-239]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`GerritProvider.get_repo_file()` uses `open(self.repo_path / file_path)` with no validation. A configured file path containing `..` or an absolute path can escape the repository directory and read arbitrary local files.
### Issue Context
The new feature reads file names from `config.add_repo_metadata_file_list` and forwards them directly into provider implementations.
### Fix Focus Areas
- pr_agent/git_providers/gerrit_provider.py[234-239]
- pr_agent/git_providers/utils.py[91-101]
### Proposed fix
- Implement path validation (preferably centrally in `apply_repo_settings()` and defensively in Gerrit):
- Reject absolute paths.
- Normalize/resolve `candidate = (repo_path / file_path).resolve()` and ensure it stays within `repo_path.resolve()` (e.g., `candidate.is_relative_to(repo_root)` on Py3.9+ or manual prefix check).
- Optionally restrict to a safe subset (e.g., no backslashes on *nix, no drive letters on Windows).
- Open with an explicit encoding (`utf-8`) and error handling if needed.
- Add a unit test covering a traversal attempt like `../secrets.txt` and assert it is rejected/returns empty.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


15. get_repo_file() uses broad except📘 Rule violation ☼ Reliability
Description
Several new get_repo_file() implementations catch Exception and silently return an empty string,
which can mask real failures and removes debugging context. This violates the requirement to use
targeted exception handling and preserve context when handling expected errors.
Code

pr_agent/git_providers/azuredevops_provider.py[R177-190]

+    def get_repo_file(self, file_path: str) -> str:
+        try:
+            contents = self.azure_devops_client.get_item_content(
+                repository_id=self.repo_slug,
+                project=self.workspace_slug,
+                download=False,
+                include_content_metadata=False,
+                include_content=True,
+                path=file_path,
+            )
+            content = list(contents)[0]
+            return content.decode("utf-8") if isinstance(content, bytes) else content
+        except Exception:
+            return ""
Evidence
PR Compliance ID 19 forbids broad except Exception handling for expected error cases and requires
preserving context. The new get_repo_file() methods in multiple providers (and the metadata
loader) all use except Exception to swallow errors and return "", losing the exception context.

pr_agent/git_providers/azuredevops_provider.py[177-190]
pr_agent/git_providers/github_provider.py[743-750]
pr_agent/git_providers/bitbucket_provider.py[92-103]
pr_agent/git_providers/codecommit_provider.py[302-308]
pr_agent/git_providers/gitea_provider.py[626-638]
pr_agent/git_providers/utils.py[91-106]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
New code uses broad `except Exception` in `get_repo_file()` (and metadata loading) and silently returns `""`, which can hide real errors and loses debugging context.
## Issue Context
Different providers should catch the *expected* provider/HTTP/file exceptions (e.g., 404/not found) and either log with context or re-raise with exception chaining when appropriate.
## Fix Focus Areas
- pr_agent/git_providers/azuredevops_provider.py[177-190]
- pr_agent/git_providers/github_provider.py[743-750]
- pr_agent/git_providers/bitbucket_provider.py[92-103]
- pr_agent/git_providers/codecommit_provider.py[302-308]
- pr_agent/git_providers/gitea_provider.py[626-638]
- pr_agent/git_providers/utils.py[91-106]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


16. Azure get_repo_file lacks head-ref📎 Requirement gap ≡ Correctness
Description
AzureDevopsProvider.get_repo_file() fetches file content without specifying the PR head
branch/commit, so it may read metadata from the default branch instead of the PR under review. This
can prevent AGENTS.md (and other instruction files) from being correctly loaded from the PR head
branch when add_repo_metadata is enabled.
Code

pr_agent/git_providers/azuredevops_provider.py[R179-186]

+            contents = self.azure_devops_client.get_item_content(
+                repository_id=self.repo_slug,
+                project=self.workspace_slug,
+                download=False,
+                include_content_metadata=False,
+                include_content=True,
+                path=file_path,
+            )
Evidence
PR Compliance ID 7 requires loading root-level AGENTS.md (and similar instruction files) from the
PR head branch when add_repo_metadata is enabled. The Azure DevOps implementation calls
get_item_content(...) with only path=file_path and no branch/commit/version selector, unlike
other providers that explicitly use the PR head SHA/source branch.

Load and include root-level AGENTS.md from the PR head branch when add_repo_metadata is enabled
pr_agent/git_providers/azuredevops_provider.py[177-189]
pr_agent/git_providers/github_provider.py[743-747]
pr_agent/git_providers/bitbucket_provider.py[94-100]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Azure DevOps `get_repo_file()` does not specify the PR head branch/commit when fetching file content, so `add_repo_metadata` may ingest instruction files from the wrong ref.
## Issue Context
Other providers explicitly read metadata files from the PR head SHA/source branch. Azure DevOps should do the same by passing the appropriate version/commit descriptor to the API used for fetching items.
## Fix Focus Areas
- pr_agent/git_providers/azuredevops_provider.py[177-189]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


17. BitbucketServer ignores PR head🐞 Bug ≡ Correctness
Description
BitbucketServerProvider.get_repo_file() reads repository files without using the PR’s head commit,
so metadata may not correspond to the PR source revision. The provider already has a commit-specific
helper and also computes head_sha, but get_repo_file doesn’t use either.
Code

pr_agent/git_providers/bitbucket_server_provider.py[R119-127]

+    def get_repo_file(self, file_path: str) -> str:
+        try:
+            content = self.bitbucket_client.get_content_of_file(self.workspace_slug, self.repo_slug, file_path)
+            return content.decode("utf-8") if isinstance(content, bytes) else content
+        except Exception as e:
+            if isinstance(e, HTTPError) and e.response.status_code == 404:
+                return ""
+            get_logger().error(f"Failed to load {file_path} file, error: {e}")
+            return ""
Evidence
get_repo_file() calls get_content_of_file(..., file_path) without a commit id. The same provider
defines get_file(path, commit_id) which calls get_content_of_file(..., path, commit_id), and
get_diff_files() computes head_sha = self.pr.fromRef['latestCommit'], showing the PR head commit
is available but unused.

pr_agent/git_providers/bitbucket_server_provider.py[119-127]
pr_agent/git_providers/bitbucket_server_provider.py[211-220]
pr_agent/git_providers/bitbucket_server_provider.py[239-244]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
BitbucketServerProvider `get_repo_file()` does not read from the PR head commit, so metadata files may come from an unrelated revision.
### Issue Context
The provider already has `get_file(path, commit_id)` and computes `head_sha` in `get_diff_files()`.
### Fix Focus Areas
- pr_agent/git_providers/bitbucket_server_provider.py[119-127]
- pr_agent/git_providers/bitbucket_server_provider.py[211-220]
- pr_agent/git_providers/bitbucket_server_provider.py[239-244]
### Suggested fix
Implement `get_repo_file()` by calling `self.get_file(file_path, head_sha)` (or equivalent PR head commit id), and decode to UTF-8 consistently.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

18. apply_repo_settings() mixes responsibilities 📘 Rule violation ⚙ Maintainability ⭐ New
Description
The added repository-metadata ingestion logic significantly expands apply_repo_settings() to
handle multiple concerns (settings loading, per-request idempotency, cross-PR baseline restoration,
metadata collection, and tool prompt mutation). This violates the single-responsibility requirement
and increases maintenance risk.
Code

pr_agent/git_providers/utils.py[R126-207]

+    # Repository metadata: fetch well-known instruction files (AGENTS.md, QODO.md, CLAUDE.md, …)
+    # from the PR's head branch root and inject their contents into every tool's extra_instructions.
+    # See: https://qodo-merge-docs.qodo.ai/usage-guide/additional_configurations/#bringing-additional-repository-metadata-to-pr-agent
+    #
+    # Guard: apply_repo_settings() can be called multiple times per request (e.g. once in the
+    # server handler and again inside PRAgent.handle_request). The TOML settings are idempotent
+    # (set/overwrite), but metadata is *appended* to extr...

Comment thread pr_agent/git_providers/azuredevops_provider.py
Comment thread pr_agent/git_providers/azuredevops_provider.py
Comment thread pr_agent/git_providers/bitbucket_server_provider.py
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 3, 2026

Persistent review updated to latest commit 10994bc

Comment thread pr_agent/git_providers/utils.py
Comment thread pr_agent/git_providers/gitlab_provider.py
Comment thread pr_agent/git_providers/github_provider.py
Comment thread pr_agent/git_providers/gerrit_provider.py Outdated
Sanitize file paths from add_repo_metadata_file_list before passing them
to any git provider, preventing directory traversal attacks (e.g.
"../etc/passwd"). Validation rejects absolute paths, ".." components,
and backslashes. Removes redundant defensive check from GerritProvider
in favor of the single central gate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 3, 2026

Persistent review updated to latest commit 772dbaa

Catch GitlabGetError specifically for expected not-found cases and log
unexpected errors at debug level instead of silently suppressing all
exceptions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 3, 2026

Persistent review updated to latest commit 38af09e

Comment on lines +177 to +193
def get_repo_file(self, file_path: str) -> str:
try:
head_sha = self.pr.last_merge_commit
version = GitVersionDescriptor(
version=head_sha.commit_id, version_type="commit"
) if head_sha else None
contents = self.azure_devops_client.get_item_content(
repository_id=self.repo_slug,
project=self.workspace_slug,
download=False,
include_content_metadata=False,
include_content=True,
path=file_path,
version_descriptor=version,
)
content = list(contents)[0]
return content.decode("utf-8") if isinstance(content, bytes) else content
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Azure reads merged commit 🐞 Bug ≡ Correctness

AzureDevOpsProvider.get_repo_file() reads files at self.pr.last_merge_commit (merge result) instead
of the PR source/head commit, so metadata may reflect the merged preview rather than the branch
under review. This is inconsistent with other providers in this PR that explicitly read from the
PR/MR source branch or head SHA.
Agent Prompt
### Issue description
`AzureDevopsProvider.get_repo_file()` fetches repository files using `self.pr.last_merge_commit`, which corresponds to the merge-result/merge-preview commit, not the PR source/head. Repository metadata should reflect the branch under review.

### Issue Context
Other providers in this PR intentionally read from the PR/MR source branch or head SHA.

### Fix Focus Areas
- pr_agent/git_providers/azuredevops_provider.py[177-193]

### Fix approach
Fetch the file using the PR source/head commit instead of `last_merge_commit`.
- Prefer Azure PR fields that represent the source commit (e.g., `last_merge_source_commit` if available on the PR model).
- If the model doesn’t expose it directly, derive the correct commit via the PR’s source ref and latest commit and use that commit id in `GitVersionDescriptor`.
- Ensure the ref used here matches what `get_pr_branch()` identifies as the PR source branch.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 3, 2026

Persistent review updated to latest commit 7b9f60e

@asbach
Copy link
Copy Markdown
Author

asbach commented Apr 3, 2026

Regarding

When config.add_repo_metadata is enabled, metadata files are loaded from the PR head/source and
appended into every tool’s extra_instructions, allowing the PR author to change the agent’s
instruction set by editing those files in the PR branch. This is inconsistent with repo settings
which are intentionally loaded from the default branch, making the trust model surprising.

I agree that this may be problematic. My implementation follows the existing documentation and I think the benefit of not needing two PRs when the instructions change while working on a feature outweighs the problems of adding bad/malicious context to the reviewer instructions.

…ettings() calls

Guard metadata injection with a context flag (Starlette request context
in server mode, settings flag as fallback in CLI mode) so that repeated
calls within the same request do not re-append metadata to
extra_instructions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 3, 2026

Persistent review updated to latest commit 9a85654

Comment thread pr_agent/git_providers/utils.py
@asbach asbach changed the title feat: Impmement "add_repo_metadata" feat: Implement the documented "add_repo_metadata" functionality Apr 3, 2026
…cation

The settings-based _repo_metadata_applied flag persisted across PRs in
polling/CLI mode, suppressing metadata loading after the first PR.
Replace with content-based deduplication (check if combined_metadata is
already present in extra_instructions) and keep the Starlette context
flag for server mode only.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 3, 2026

Persistent review updated to latest commit 5bdc1a9

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 3, 2026

Persistent review updated to latest commit 7454325

@asbach
Copy link
Copy Markdown
Author

asbach commented Apr 3, 2026

/describe

Comment thread tests/unittest/test_repo_metadata.py
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

PR Description updated to latest commit (7454325)

The _reset_settings fixture now snapshots and restores add_repo_metadata
and add_repo_metadata_file_list in addition to extra_instructions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 3, 2026

Persistent review updated to latest commit 87672ea

Comment thread pr_agent/git_providers/gerrit_provider.py
GerritProvider.get_repo_file() previously used open() on the working
directory, allowing access to untracked files like .git/config. Now
reads from self.repo.head.commit.tree to match other providers' behavior
of only serving committed content.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 3, 2026

Persistent review updated to latest commit 524ef53

Comment thread pr_agent/git_providers/utils.py
Comment thread pr_agent/git_providers/utils.py
…ile_path

URL-decode input before validation so encoded sequences like %2e%2e/
cannot bypass traversal checks. Also reject paths with residual %
(double-encoding) and validate segments individually rather than
relying solely on prefix/substring matching.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 3, 2026

Persistent review updated to latest commit 29b6a1d

Comment on lines +142 to +179
if not repo_metadata_applied and get_settings().config.get("add_repo_metadata", False):
try:
tool_sections = [
"pr_reviewer",
"pr_description",
"pr_code_suggestions",
"pr_add_docs",
"pr_update_changelog",
"pr_test",
"pr_improve_component",
]

# In non-context runtimes (CLI, polling), restore extra_instructions to their
# pre-metadata baseline so metadata from a previous PR doesn't persist.
global _extra_instructions_baseline
is_context_mode = False
try:
is_context_mode = context.exists()
except Exception:
pass
if not is_context_mode:
if _extra_instructions_baseline:
# Restore baseline before applying this PR's metadata
for section, baseline_value in _extra_instructions_baseline.items():
get_settings().set(f"{section}.extra_instructions", baseline_value)
else:
# First run: capture the current values as the baseline
for section in tool_sections:
section_obj = get_settings().get(section, None)
if section_obj is not None and hasattr(section_obj, "extra_instructions"):
_extra_instructions_baseline[section] = section_obj.extra_instructions or ""

metadata_files = get_settings().config.get("add_repo_metadata_file_list",
["AGENTS.md", "QODO.md", "CLAUDE.md"])

# Collect contents of all metadata files that exist in the repo
metadata_content_parts = []
for file_name in metadata_files:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. add_repo_metadata_file_list not validated 📘 Rule violation ≡ Correctness

add_repo_metadata_file_list is used directly without validating/normalizing its type/contents,
which can lead to incorrect behavior (e.g., iterating over characters if a string is provided) or
runtime errors. The checklist requires normalizing and validating user-provided settings before
using them in logic.
Agent Prompt
## Issue description
`config.add_repo_metadata_file_list` is consumed without validation. If the setting is mis-typed (e.g., a string, `None`, or a list containing non-strings), metadata loading can behave incorrectly or crash.

## Issue Context
This setting is user-provided via TOML/env/config overrides and must be normalized to a predictable shape before iterating.

## Fix Focus Areas
- pr_agent/git_providers/utils.py[142-186]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +20 to +27
def _is_safe_repo_file_path(file_path: str) -> bool:
"""
Validate that a file path is safe to read from a repository root.
Rejects absolute paths, paths with '..' traversal components, backslashes,
and percent-encoded bypass attempts.
"""
if not file_path or not file_path.strip():
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. _is_safe_repo_file_path() lacks type guard 📘 Rule violation ☼ Reliability

_is_safe_repo_file_path() calls .strip() on file_path without confirming it is a string, which
can raise AttributeError if non-string values reach it (e.g., from configuration). The checklist
requires defensive type checks before calling methods on external inputs.
Agent Prompt
## Issue description
`_is_safe_repo_file_path()` calls `file_path.strip()` without first ensuring `file_path` is a string. This can raise at runtime if configuration contains non-string entries.

## Issue Context
Even though the type annotation is `str`, configuration values are external inputs and can be malformed.

## Fix Focus Areas
- pr_agent/git_providers/utils.py[20-49]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +121 to +122
head_sha = self.pr.fromRef['latestCommit']
content = self.get_file(file_path, head_sha)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. fromref['latestcommit'] unguarded access 📘 Rule violation ☼ Reliability

bitbucket_server_provider.get_repo_file() directly indexes self.pr.fromRef['latestCommit']
without checking the key exists or the shape is as expected. This can raise KeyError/TypeError
on unexpected webhook/provider payload shapes.
Agent Prompt
## Issue description
`self.pr.fromRef['latestCommit']` is accessed via direct indexing, which can crash if `fromRef` is missing, not a dict, or the key is absent.

## Issue Context
Provider payloads are external inputs and may vary across Bitbucket Server versions / API responses.

## Fix Focus Areas
- pr_agent/git_providers/bitbucket_server_provider.py[119-123]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +92 to +101
def get_repo_file(self, file_path: str) -> str:
try:
# Read from the PR's source branch so metadata files reflect the branch under review
url = (f"https://api.bitbucket.org/2.0/repositories/{self.workspace_slug}/{self.repo_slug}/src/"
f"{self.pr.source_branch}/{file_path}")
response = requests.request("GET", url, headers=self.headers)
if response.status_code == 404:
return ""
response.raise_for_status()
return response.text
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

4. Bitbucket file fetch can hang 🐞 Bug ☼ Reliability

BitbucketProvider.get_repo_file performs an HTTP GET without a timeout, so enabling
add_repo_metadata can block apply_repo_settings indefinitely on stalled connections. Because
apply_repo_settings is called before request handling logic, this can stall the whole PR-agent flow.
Agent Prompt
### Issue description
`BitbucketProvider.get_repo_file()` makes an outbound HTTP request without a timeout. In network stalls, this can hang indefinitely and block `apply_repo_settings()`.

### Issue Context
`apply_repo_settings()` runs before PR processing and calls `get_repo_file()` for each metadata filename, so this path is on the critical startup path when `add_repo_metadata=true`.

### Fix Focus Areas
- Add an explicit `timeout=` to the Bitbucket `requests.request("GET", ...)` call.
- Prefer a configurable timeout (or a sensible constant) and handle timeout exceptions similarly to other request errors.
- pr_agent/git_providers/bitbucket_provider.py[92-108]
- pr_agent/git_providers/utils.py[174-186]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trying to make the add_repo_metadata flag to work with Azure Devops

1 participant