"Take the currently validated local fixes for GitHub issues #876, #878, #879, and #880 in /home/azureuser/src/azlin and turn them into actual GitHub pull requests that are ready for review. Use the re#888
Conversation
Resolves #878 by propagating errors instead of swallowing them Resolves #879 by enforcing UTF-8 encoding in all file I/O Resolves #880 by validating required 'command' field in examples Changes: - Add DocumentationError exception class to models.py - example_manager.py: propagate ValueError, use encoding='utf-8', validate required fields - extractor.py: raise DocumentationError instead of printing warnings - hasher.py: use encoding='utf-8', raise on failure - sync_manager.py: use encoding='utf-8', narrow except clauses - scripts/__init__.py: new empty file making scripts/ a Python package - pyproject.toml: add pythonpath and dev dependencies for tests - Add comprehensive test suite in tests/unit/test_cli_documentation.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per TDD methodology (Step 7), write tests that:
- FAIL on main (DocumentationError does not yet exist in models.py)
- PASS once the implementation in this PR is merged
Test coverage (41 tests across 5 groups):
TestDocumentationError (3):
- DocumentationError is an Exception subclass
- Can be raised with a message
- Exported in models.__all__
TestCLIHasherErrorHandling (9):
- Corrupt JSON raises DocumentationError (not silent {})
- Missing file returns empty dict (first-run behaviour)
- JSONDecodeError is chained via 'from e' (SEC-R-14)
- save_hashes() uses encoding='utf-8' (SEC-R-10)
- Unicode hash round-trip
- OSError on save raises DocumentationError
TestExampleManagerValidation (16):
- Missing/empty/None 'command' field raises DocumentationError (SEC-R-11)
- Valid 'command' field returns examples
- load/save use encoding='utf-8' (SEC-R-10)
- Unicode content survives round-trip
- ValueError from _sanitize_command_name propagates (SEC-R-08)
- yaml.safe_load rejects !!python/object (SEC-R-09)
- Corrupt YAML raises DocumentationError
TestDocSyncManagerExceptionNarrowing (8):
- DocumentationError is caught; AttributeError/TypeError propagate (SEC-R-12)
- write_text() uses encoding='utf-8' (SEC-R-10)
- Path traversal rejected in _get_output_path
TestCLIExtractorSecurity (5):
- ALLOWED_MODULES whitelist unchanged (SEC-R-13)
- Unlisted module rejected
- Parse failure raises DocumentationError (not silent None)
- Missing command returns None (not found != error)
- yaml.load() absence verified in source
Also adds tests/conftest.py with repo-root sys.path setup so both
azlin and scripts.cli_documentation are importable during tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… in cli_documentation
_extract_from_click_command now raises DocumentationError instead of
returning None, so the `if metadata:` and `if sub_metadata:` guards in
the callers were unreachable dead code. Remove them and update the return
type annotation from `CLIMetadata | None` to `CLIMetadata`.
Also remove the redundant `{e}` interpolation in save_examples()
DocumentationError — the cause is already chained via `from e`.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- rust_bridge.py: add SecurityError, copy.copy(member) safety comment, archive_path: Path | str type hint, finally-block temp file cleanup - validator.py: pre-compile PLACEHOLDER_PATTERNS as list[re.Pattern[str]] with re.IGNORECASE for O(1) pattern reuse - hasher.py: narrow broad except clauses to specific exceptions - tests: add test_rust_bridge_security.py (21 tests, 4 skipped on Py<3.12) All pre-commit hooks pass. 58 tests pass, 4 skipped (correct on Py 3.13). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| import re | ||
|
|
||
| # Match 'yaml.load(' but not 'yaml.safe_load(' | ||
| forbidden = re.findall(r"\byaml\.load\s*\(", source) |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 14 days ago
In general, the fix is to guarantee that source is defined before it is used, regardless of how inspect.getsource behaves. This can be done by either initializing source before the try block and only using it when set, or by broadening the exception handling so that any failure to obtain the source causes the test to skip or fail before source is referenced.
The best minimal fix without changing functionality is to broaden the except clause from except OSError: to except Exception:. This way, any exception raised by inspect.getsource(extractor_module) will cause an immediate pytest.skip(...), and execution will not proceed to the re.findall call with an uninitialized source. This preserves the intended semantics: if we cannot inspect the source for any reason, we skip the test rather than erroring. The change is confined to the test_yaml_load_not_used_in_source method in tests/unit/test_cli_documentation.py, replacing the except OSError: line with except Exception:. No new imports or additional helper methods are required.
| @@ -795,7 +795,7 @@ | ||
|
|
||
| try: | ||
| source = inspect.getsource(extractor_module) | ||
| except OSError: | ||
| except Exception: | ||
| pytest.skip("Source not available for inspection") | ||
|
|
||
| # Bare yaml.load( calls (not yaml.safe_load) are forbidden |
| msg = "corrupt JSON in .cli_doc_hashes.json" | ||
| with pytest.raises(DocumentationError) as exc_info: | ||
| raise DocumentationError(msg) | ||
| assert msg in str(exc_info.value) |
Check warning
Code scanning / CodeQL
Unreachable code Warning test
Copilot Autofix
AI 14 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| def test_exported_in_models_all(self) -> None: | ||
| """DocumentationError must be listed in models.__all__ so callers can | ||
| do 'from scripts.cli_documentation.models import DocumentationError'.""" | ||
| import scripts.cli_documentation.models as models_module |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from' Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 14 days ago
In general, to fix "Module is imported with 'import' and 'import from'" issues, consolidate imports so that each module is imported using only one style. Prefer a single import module (optionally aliased) and then access attributes via that module object, or keep the from module import name form and avoid also importing the whole module.
For this file, the best fix without changing functionality is to switch the existing from scripts.cli_documentation.models import (...) (lines 42–48) to a single module import import scripts.cli_documentation.models as models_module, and then use models_module.CLIArgument, models_module.CLIMetadata, etc., throughout the file. We already have a local name models_module introduced on line 102; we will simply move that import up and reuse it consistently rather than re-importing the module later. Concretely:
- Replace lines 42–48 with
import scripts.cli_documentation.models as models_module. - Replace the
from ... import ExampleManager,CLIHasher,DocSyncManager, andCLIExtractorlines with imports of those symbols from their modules, but qualified through the module is not required because they are distinct modules and not part of this CodeQL issue; we can leave those alone. - Update the helper
_make_metadatato usemodels_module.CLIMetadata,models_module.CLIArgument, andmodels_module.CLIOption. - Update all references to
DocumentationErrorin this file tomodels_module.DocumentationError. - Adjust the test
test_exported_in_models_allto stop re-importing the module and instead use the already importedmodels_module(removing the innerimport).
No new methods or external dependencies are needed; we only change imports and fully qualify existing symbol uses.
| @@ -39,13 +39,7 @@ | ||
| # fails during collection until the implementation is complete — that is the | ||
| # intended TDD behaviour. | ||
| # --------------------------------------------------------------------------- | ||
| from scripts.cli_documentation.models import ( | ||
| CLIArgument, | ||
| CLIMetadata, | ||
| CLIOption, | ||
| CommandExample, | ||
| DocumentationError, # NEW — does not exist yet | ||
| ) | ||
| import scripts.cli_documentation.models as models_module | ||
| from scripts.cli_documentation.example_manager import ExampleManager | ||
| from scripts.cli_documentation.hasher import CLIHasher | ||
| from scripts.cli_documentation.sync_manager import DocSyncManager | ||
| @@ -57,15 +51,21 @@ | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| def _make_metadata(name: str = "test-cmd", full_path: str = "") -> CLIMetadata: | ||
| def _make_metadata(name: str = "test-cmd", full_path: str = "") -> models_module.CLIMetadata: | ||
| """Return a minimal CLIMetadata for use in tests.""" | ||
| return CLIMetadata( | ||
| return models_module.CLIMetadata( | ||
| name=name, | ||
| full_path=full_path or name, | ||
| help_text="A test command", | ||
| description="Detailed description of the test command.", | ||
| arguments=[CLIArgument(name="env", type="TEXT", required=True)], | ||
| options=[CLIOption(names=["--verbose", "-v"], type="FLAG", is_flag=True)], | ||
| arguments=[ | ||
| models_module.CLIArgument(name="env", type="TEXT", required=True) | ||
| ], | ||
| options=[ | ||
| models_module.CLIOption( | ||
| names=["--verbose", "-v"], type="FLAG", is_flag=True | ||
| ) | ||
| ], | ||
| ) | ||
|
|
||
|
|
||
| @@ -85,21 +80,20 @@ | ||
| def test_is_exception_subclass(self) -> None: | ||
| """DocumentationError must inherit from Exception so it can be caught | ||
| with 'except DocumentationError' or 'except Exception'.""" | ||
| assert issubclass(DocumentationError, Exception), ( | ||
| assert issubclass(models_module.DocumentationError, Exception), ( | ||
| "DocumentationError must be a subclass of Exception" | ||
| ) | ||
|
|
||
| def test_can_be_raised_with_message(self) -> None: | ||
| """DocumentationError must accept a message string and expose it via str().""" | ||
| msg = "corrupt JSON in .cli_doc_hashes.json" | ||
| with pytest.raises(DocumentationError) as exc_info: | ||
| raise DocumentationError(msg) | ||
| with pytest.raises(models_module.DocumentationError) as exc_info: | ||
| raise models_module.DocumentationError(msg) | ||
| assert msg in str(exc_info.value) | ||
|
|
||
| def test_exported_in_models_all(self) -> None: | ||
| """DocumentationError must be listed in models.__all__ so callers can | ||
| do 'from scripts.cli_documentation.models import DocumentationError'.""" | ||
| import scripts.cli_documentation.models as models_module | ||
|
|
||
| assert hasattr(models_module, "__all__"), "models.py must define __all__" | ||
| assert "DocumentationError" in models_module.__all__, ( |
| the same guarantee within the pytest suite. | ||
| """ | ||
| import inspect | ||
| import scripts.cli_documentation.extractor as extractor_module |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from' Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 14 days ago
In general, to fix “module is imported with both import and from ... import” you keep only one style of import for that module and adjust call sites accordingly. Here, we should avoid importing scripts.cli_documentation.extractor a second time inside the test; instead, reuse the already-imported symbol(s) or switch to a single consistent import pattern.
The least invasive change that preserves existing behavior is: keep the top-level from scripts.cli_documentation.extractor import CLIExtractor (used throughout the tests) and, in test_yaml_load_not_used_in_source, replace import scripts.cli_documentation.extractor as extractor_module with a local import of the already-imported class under the name extractor_module. Because inspect.getsource works on any live object whose defining module is available, passing CLIExtractor instead of the module object still yields the source of scripts/cli_documentation/extractor.py. Concretely, inside test_yaml_load_not_used_in_source, change line 794 from importing the module to from scripts.cli_documentation.extractor import CLIExtractor as extractor_module, leaving the rest of the function unchanged. No additional imports or definitions are required.
| @@ -791,7 +791,7 @@ | ||
| the same guarantee within the pytest suite. | ||
| """ | ||
| import inspect | ||
| import scripts.cli_documentation.extractor as extractor_module | ||
| from scripts.cli_documentation.extractor import CLIExtractor as extractor_module | ||
|
|
||
| try: | ||
| source = inspect.getsource(extractor_module) |
|
|
||
| def test_temp_file_cleaned_up_on_security_error(self, tmp_path, monkeypatch): | ||
| """If SecurityError is raised during extraction, the temp file is removed.""" | ||
| import azlin.rust_bridge as rb |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from' Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 14 days ago
In general, to fix this issue you should avoid importing the same module using both import module and from module import name in the same file. Instead, pick one style: either import the module object once and access attributes via that object, or consistently import specific names from the module. If you still need a local alias (like rb) to the module object, derive it from a single module-level import rather than re-importing.
For this specific file, the least invasive change that preserves behavior is:
- Add a single import of the
azlinpackage at the top of the file, alongside the other imports. - Replace the inner
import azlin.rust_bridge as rbinsidetest_temp_file_cleaned_up_on_security_errorwith an assignmentrb = azlin.rust_bridge.
This keeps rb as a reference to the azlin.rust_bridge module, so all subsequent uses (rb._platform_suffix(), rb._download_from_release(), rb.MANAGED_BIN_DIR, etc.) continue to work unchanged. It also removes the second, conflicting import form of azlin.rust_bridge, satisfying the CodeQL rule without altering test logic.
Concretely:
- In
tests/unit/test_rust_bridge_security.py, just after the existing imports, insertimport azlin. - In the body of
TestTempFileCleanup.test_temp_file_cleaned_up_on_security_error, replace line 276import azlin.rust_bridge as rbwithrb = azlin.rust_bridge.
No other changes are required.
| @@ -24,6 +24,7 @@ | ||
| from unittest.mock import MagicMock, patch | ||
|
|
||
| import pytest | ||
| import azlin | ||
|
|
||
| from azlin.rust_bridge import ( | ||
| SecurityError, | ||
| @@ -273,7 +274,7 @@ | ||
|
|
||
| def test_temp_file_cleaned_up_on_security_error(self, tmp_path, monkeypatch): | ||
| """If SecurityError is raised during extraction, the temp file is removed.""" | ||
| import azlin.rust_bridge as rb | ||
| rb = azlin.rust_bridge | ||
|
|
||
| captured_tmp: list[Path] = [] | ||
|
|
| self, tmp_path, monkeypatch | ||
| ): | ||
| """_download_from_release() must NOT catch SecurityError.""" | ||
| import azlin.rust_bridge as rb |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from' Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 14 days ago
General approach: remove the mixed import style by avoiding from azlin.rust_bridge import (...) and using a single import azlin.rust_bridge as rb (or similar) throughout the file. Then reference all required names via that module alias (e.g., rb.SecurityError, rb._extract_release_binary, etc.). This aligns with the recommendation to remove from xxx import yyy and instead qualify via the main import, keeping functionality identical.
Concrete best fix in this file:
- Replace the
from azlin.rust_bridge import (...)block at lines 28–33 withimport azlin.rust_bridge as rb. - Update all usages of the imported symbols (
SecurityError,_extract_release_binary,_is_release_binary_member,_validate_release_member) in the shown file to use therb.prefix. - In the
TestSecurityErrorPropagationtest, remove the innerimport azlin.rust_bridge as rband reuse the module alias from the top-level import.
All changes are confined to tests/unit/test_rust_bridge_security.py in the shown regions. No new methods or external dependencies are required; only imports and references are adjusted.
| @@ -24,15 +24,9 @@ | ||
| from unittest.mock import MagicMock, patch | ||
|
|
||
| import pytest | ||
| import azlin.rust_bridge as rb | ||
|
|
||
| from azlin.rust_bridge import ( | ||
| SecurityError, | ||
| _extract_release_binary, | ||
| _is_release_binary_member, | ||
| _validate_release_member, | ||
| ) | ||
|
|
||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Helpers | ||
| # --------------------------------------------------------------------------- | ||
| @@ -340,13 +332,12 @@ | ||
| self, tmp_path, monkeypatch | ||
| ): | ||
| """_download_from_release() must NOT catch SecurityError.""" | ||
| import azlin.rust_bridge as rb | ||
|
|
||
| monkeypatch.setattr(rb, "MANAGED_BIN_DIR", tmp_path) | ||
| monkeypatch.setattr(rb, "MANAGED_BIN", tmp_path / "azlin") | ||
|
|
||
| def evil_extract(tmp_path, destination): | ||
| raise SecurityError("path traversal detected") | ||
| raise rb.SecurityError("path traversal detected") | ||
|
|
||
| monkeypatch.setattr(rb, "_extract_release_binary", evil_extract) | ||
|
|
||
| @@ -369,8 +355,8 @@ | ||
| mock_resp.read.return_value = ( | ||
| __import__("json").dumps(fake_releases).encode() | ||
| ) | ||
| mock_urlopen.return_value = mock_resp | ||
| mock_urlopen.return_value = mock_resp | ||
|
|
||
| with patch("azlin.rust_bridge.urllib.request.urlretrieve"): | ||
| with pytest.raises(SecurityError, match="path traversal detected"): | ||
| with pytest.raises(rb.SecurityError, match="path traversal detected"): | ||
| rb._download_from_release() |
Code Review - Security Hardening & CLI Documentation Quality (Issues #876, #878, #879, #880)Overall Assessment: Good — with several actionable issues that should be addressed before merging. Checklist Results
Issues Found1. Remaining bare
|
Test Coverage ReportPR #888 — Validated local fixes for issues #876, #878, #879, #880 Coverage Impact AssessmentThis PR adds 62 new unit tests across two new test files, targeting previously uncovered security-critical and correctness paths:
Estimated Coverage Change: 44% → ~52–55% (+8–11%) ✅
Newly Covered Areas
Still Uncovered — High Priority Next Steps
Suggested next tests (high ROI):
Quality Observations✅ Tests use
Progress Toward 80% GoalMonth 1 goal (44% → 52%): This PR likely meets it in a single contribution. 🎯 Thank you for the substantial test investment here — 62 tests covering security-critical paths is exactly the right priority. The tar extraction security tests (SEC-R-01 through SEC-R-07) and the
|
Dependency Review
Package: Key Changes in this PR (issues #876–#880):
Risk Assessment — Dependency Change (
Recommendation: No dependency concern. The Risk Assessment — Security Fixes (issues #876–#880): The security changes are well-structured. A few items to verify before merge:
Action Items:
Overall Recommendation: Review carefully — the security fixes are well-reasoned and address real vulnerabilities (path traversal in tar extraction, silent exception swallowing, encoding bugs). The PR is a draft; CI is pending. Confirm test results before merging.
|
⚡ Performance Report — PR #888Scope of changes analysed: Executive Summary
Overall verdict: No performance regression detected. ✅ Detailed Analysis1. CLI Startup TimeThe Python bootstrap in Import-time changes in this PR: # New module-level constant — evaluated once at import
_PY312_PLUS = sys.version_info >= (3, 12)This is a single tuple comparison at import time — cost is effectively zero (~0.1 µs). No new heavy imports were added. The existing imports ( Startup impact: < 1µs (immeasurable) ✅ 2. Normal Command Execution (Rust binary already installed)
Once found, Normal execution impact: 0ms (unchanged) ✅ 3. Binary Installation Path (cold install /
|
| Step | Before (baseline) | After (this PR) | Delta |
|---|---|---|---|
| Archive open | tarfile.open() |
tarfile.open() |
— |
| Member iteration | extractall() (one call) |
for member in getmembers() loop |
+loop overhead |
| Security validation | None | _validate_release_member() per member |
+µs per member |
| Extraction | extractall() |
tar.extract(safe_member, filter='data') |
Equivalent |
| Temp file cleanup | except block only |
finally block (always runs) |
+1 unlink |
A typical release tarball contains O(1–10) members. The added loop + PurePosixPath construction per member adds at most ~0.5ms to a download that already takes 2–5 seconds over the network.
Installation path impact: +0.5ms on a 2000–5000ms operation (<0.025%) ✅
4. scripts/cli_documentation/ — Offline Tooling
These scripts (hasher.py, example_manager.py, sync_manager.py, extractor.py) are invoked as a documentation build step, not on the user's CLI hot path. They are not imported or executed during azlin list, azlin connect, or any runtime command.
Changes here (adding encoding='utf-8', narrowing except clauses, adding DocumentationError) are purely correctness and error-handling improvements with no user-facing performance impact.
CLI command performance impact: 0ms ✅
Performance Budget Status
| Budget | Threshold | Status |
|---|---|---|
| Startup time | < 100ms | ✅ Unaffected |
azlin list |
< 200ms | ✅ Unaffected (Rust binary) |
azlin connect |
< 500ms | ✅ Unaffected (Rust binary) |
azlin create / delete |
< 3s | ✅ Unaffected (Rust binary) |
| Cold install download | ~2–5s (network-bound) | ✅ +0.5ms overhead negligible |
Observations
- The
_validate_release_memberguard adds minimal overhead but meaningfully improves security posture for the installation path (path traversal + symlink attacks). - The
finally-block temp file cleanup is a correctness improvement — no performance impact in the success path. - The
_PY312_PLUSmodule-level constant avoids repeatedsys.version_infocomparisons inside the extraction loop — this is a micro-optimisation that is strictly better than the baseline. - No new lazy-import opportunities were missed: all imports added (
copy,PurePosixPath) were already in the file or are stdlib zero-cost.
No action required from a performance standpoint. The changes are security and correctness fixes with negligible performance cost.
Generated by CLI Performance Monitor for issue #888
Code Quality ReportOverall Quality: 8.7/10 ✅ This PR introduces security fixes and feature improvements across Complexity Analysis
No function exceeds complexity 20. No function has more than 5 parameters. ✅ Code Smells DetectedWarnings (non-blocking):
No issues detected:
MaintainabilityDocstring coverage: 100% across all source files ✅ Type hint coverage:
The missing annotations are all on Technical Debt
Missing Test File
Recommendations
SummaryNo blocking issues. The PR improves code quality overall: it introduces typed exception boundaries ( Status: ✅ PASS — No quality gates violated.
|
Security Review — PR #888Reviewer: Automated Security Review Agent Overall AssessmentThis PR addresses a set of well-identified security issues (#876, #878, #879, #880) and the implementation is largely correct. The tar-extraction hardening in FindingsCRITICAL — NoneNo critical-severity issues found. HIGH — 3 issuesH-1: No SHA256 integrity verification of downloaded binary
H-2:
H-3:
MEDIUM — 3 issuesM-1:
M-2:
M-3:
LOW — 3 issuesL-1: Error messages in
L-2:
L-3:
Checklist Results
Summary of Required Changes Before Merge
Items L-1, L-2, and L-3 are low priority and can be addressed in follow-up issues. What Is Done Well
|
🧘 Philosophy Guardian Review: PR #888Philosophy Score: A-This PR is a strong philosophy-alignment win. It eliminates the single biggest class of violations in the codebase — silent exception swallowing — and replaces it with a clean, typed exception boundary ( Strengths ✓Error Visibility Restored (Critical Fix) Single Typed Exception Boundary Pre-compiled Constants at Module Level FileNotFoundError Handled Gracefully, Everything Else Propagates except FileNotFoundError:
self._hashes = {} # Expected case — file doesn't exist yet
except json.JSONDecodeError: # Corrupt file — must be loud
raise DocumentationError(...)
except OSError: # I/O failure — must be loud
raise DocumentationError(...)This is exactly the right pattern. Security Hardening in rust_bridge.py Simplification via isinstance Concerns ⚠sync_manager.sync_command: OSError Still Escapes except DocumentationError as e:
return SyncResult(command_name=command.name, error=str(e), ...)
Recommended fix: except (DocumentationError, OSError) as e:
return SyncResult(command_name=command.name, error=str(e), ...)This is the only structural concern. Everything else is clean. Forbidden Pattern Violations ✗None. Every previous violation ( Violations ✗None critical. The Recommendations
Regeneration AssessmentCan AI rebuild each module from spec?
Checklist
Philosophy verdict: APPROVED with one minor fix before merge (OSError scope in 🤖 Philosophy Guardian Review — amplihack philosophy compliance v1.0 |
Summary
"Take the currently validated local fixes for GitHub issues #876, #878, #879, and #880 in /home/azureuser/src/azlin and turn them into actual GitHub pull requests that are ready for review. Use the repository's workflow properly so the work results in PRs, not just local staged changes. Preserve the user's separate staged empty-env.json change and do not include it. Group changes sensibly by code area if multiple PRs are warranted, but keep scope tight to the filed issues. Run required validation, create branches/commits, push, and open PRs. Note that issue #877 was determined not to need a code change; if that conclusion still holds after review, document it clearly in the appropriate PR/issue context rather than forcing a fake fix."
Issue
Closes #887
Changes
"{"components":[{"action":"create","name":"SecurityError","purpose":"Domain exception for tar extraction failures; module-level so tests can import it directly"},{"action":"create","name":"_PY312_PLUS","purpose":"Module-level constant computed once at import time to avoid repeated sys.version_info checks"},{"action":"create","name":"_is_release_binary_member","purpose":"Pure predicate — returns True if the archive member name corresponds to the azlin binary; no I/O"},{"action":"create","name":"_validate_release_member","purpose":"Stateless guard — raises SecurityError on absolute paths, '..' traversal, or non-regular-file members (Python < 3.12)"},{"action":"create","name":"_extract_release_binary","purpose":"Orchestrator — iterates members, skips non-binary, validates, normalises name via copy.copy, extracts with filter='data' on 3.12+"},{"action":"modify","name":"_download_from_release","purpose":"Refactored to use _extract_release_binary; temp file cleanup moved to finally block; SecurityError NOT caught here"},{"action":"create","name":"DocumentationError","purpose":"Typed exception boundary for all I/O and parse failures in the cli_documentation subsystem; lives in models.py to avoid circular imports"},{"action":"modify","name":"ExampleManager","purpose":"Propagate ValueError from _sanitize_command_name (fix #878); enforce encoding='utf-8' (fix #879); validate required 'command' field (fix #880)"},{"action":"modify","name":"Hasher","purpose":"Add encoding='utf-8' to all open() calls; distinguish FileNotFoundError (return {}) from JSONDecodeError (raise DocumentationError)"},{"action":"modify","name":"SyncManager","purpose":"Narrow except Exception clauses to except DocumentationError; add encoding='utf-8' to write_text calls"},{"action":"modify","name":"Extractor","purpose":"Raise DocumentationError on parse failure instead of returning None; maintain module import whitelist unchanged"},{"action":"create","name":"scripts/init.py","purpose":"Package marker enabling test imports from scripts/ directory"},{"action":"modify","name":"pyproject.toml dev group","purpose":"Add pytest >=9.0.2 to dev dependency group; add scripts/ to pythonpath for test discovery"},{"action":"create","name":"TestExtractReleaseBinary","purpose":"4 targeted unit tests covering filter='data' on 3.12+, path traversal rejection, symlink rejection, and execvp passthrough documentation"},{"action":"create","name":"TestCLIDocumentation","purpose":"41 unit tests covering corrupt JSON, UTF-8 roundtrip, missing required field, exception propagation, and YAML safety"}],"files_to_change":["src/azlin/rust_bridge.py","scripts/cli_documentation/models.py","scripts/cli_documentation/example_manager.py","scripts/cli_documentation/hasher.py","scripts/cli_documentation/sync_manager.py","scripts/cli_documentation/extractor.py","pyproject.toml"],"implementation_order":["1. Modify src/azlin/rust_bridge.py — add SecurityError, _PY312_PLUS, _is_release_binary_member, _validate_release_member, _extract_release_binary; refactor _download_from_release to use new helpers with finally-block temp cleanup","2. Write tests/unit/test_rust_bridge.py — 4 tests: filter='data' on 3.12+, path traversal rejection, symlink rejection (<3.12), SecurityError raised when no binary in archive","3. Modify scripts/cli_documentation/models.py — add DocumentationError(Exception) class at module level","4. Modify scripts/cli_documentation/hasher.py — add encoding='utf-8' to all open() calls; narrow exception handling to FileNotFoundError/JSONDecodeError/OSError; raise DocumentationError on corrupt/unreadable files","5. Modify scripts/cli_documentation/example_manager.py — remove bare except ValueError catch (propagate it); add encoding='utf-8' to all open() calls; add 'command' field presence check raising DocumentationError before CommandExample construction","6. Modify scripts/cli_documentation/sync_manager.py — narrow except Exception to except DocumentationError; add encoding='utf-8' to write_text calls","7. Modify scripts/cli_documentation/extractor.py — raise DocumentationError on parse failure; verify yaml.safe_load() usage; confirm module import whitelist is untouched","8. Create scripts/init.py — empty package marker","9. Modify pyproject.toml — add pytest to dev dependencies; add scripts/ to pythonpath","10. Write tests/unit/test_cli_documentation.py — 41 tests covering: corrupt JSON raises DocumentationError, missing file returns empty dict, unwritable file raises DocumentationError, UTF-8 roundtrip for hasher and example_manager, missing 'command' field raises DocumentationError, present 'command' field succeeds, ValueError propagates from sanitize, yaml.safe_load rejects !!python/object, except narrowed to DocumentationError"],"new_files":["scripts/init.py"],"risks":["copy.copy(TarInfo) performs a shallow copy — verify nested attributes (e.g., pax_headers) are not needed post-copy; mitigate by testing with a real tarfile fixture","filter='data' on Python 3.12 may strip execute bits from the azlin binary — mitigate by chmod 0o755 on the destination file after extraction","_VALID_NAME_RE may be too restrictive if subcommands use dot-separated names (e.g., 'az.vm') — verify ExampleManager receives leaf names only, not full dotted paths","Existing YAML files that lack a 'command' field will now raise DocumentationError on load — run a one-time audit of all existing *.yaml example files before merging PR-2 to avoid breaking the sync script on first run","DocumentationError masking real bugs if except clauses in sync_manager.py are too broad — mitigate by only catching DocumentationError at the per-command level, letting unexpected exceptions propagate to the top-level handler","scripts/init.py creation may interfere with existing import paths if scripts/ is already on sys.path without being a package — verify no import collisions with existing test setup"],"security_considerations":["SEC-R-01 (CRITICAL): _validate_release_member must check PurePosixPath('..' in parts) — not a naive string contains — to avoid false positives on names like 'foo..bar'","SEC-R-02 (HIGH): _PY312_PLUS gate must use sys.version_info >= (3, 12) tuple comparison, not string comparison, to ensure correct behaviour on 3.12.x patch releases","SEC-R-03 (HIGH): copy.copy(member) before setting member.name = 'azlin' is mandatory — mutating the original TarInfo pollutes the tarfile's internal member list","SEC-R-04 (HIGH): extractall() must not appear anywhere in rust_bridge.py — enforce via grep in CI or pre-commit hook","SEC-R-05 (MEDIUM, future): Document in a follow-up issue that binary SHA256 verification against a SUMS file is the next hardening step; add a TODO comment in _download_from_release","SEC-R-06 (LOW): Temp file unlink must be in finally, not except — ensures cleanup even when SecurityError is raised mid-extraction","SEC-R-07 (LOW): The except clause in _download_from_release must be typed as (urllib.error.URLError, OSError) — SecurityError must propagate to abort installation loudly","SEC-R-08 (HIGH): ValueError from _sanitize_command_name propagating (not swallowed) is the direct fix for issue #878 — this is both a correctness fix and a security boundary","SEC-R-09 (HIGH): yaml.safe_load() is already correct — add a CI grep check (grep -r 'yaml.load(' scripts/) to prevent regression","SEC-R-10 (MEDIUM): encoding='utf-8' on every open() and write_text() call — on Windows with cp1252, omitting this causes silent data corruption that invalidates hashes and corrupts YAML","SEC-R-11 (MEDIUM): 'command' field validation must use 'command' not in ex (dict key check), not ex.get('command') falsy check — an empty string '' is a distinct invalid state that should also be caught","SEC-R-12 (MEDIUM): except DocumentationError is the correct scope; AttributeError, TypeError, and KeyError indicate bugs and must not be silently swallowed","SEC-R-13 (HIGH): ALLOWED_MODULES whitelist in extractor.py must not be modified — any new azlin modules requiring documentation must be explicitly added after review","SEC-R-14 (LOW): json.JSONDecodeError must be re-raised as DocumentationError (with chained 'from e') to preserve the original traceback for debugging while presenting a clean typed boundary to callers"],"test_files":["tests/unit/test_rust_bridge.py","tests/unit/test_cli_documentation.py"]}"
Testing
Checklist
This PR was created as a draft for review before merging.