Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions CODEX-FIX-SPEC-2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Codex Remediation Task #2: space-hog

You are tasked with fixing the remaining architectural, security, and quality issues found in the post-Codex audit. Please carefully apply the following fixes:

## 1. Eliminate `shell=True` in `stats.py`
- **File:** `space_hog/stats.py`
- **Issue:** `run_cleanup` executes `subprocess.run(command, shell=True)`. This bypasses all the safe deletion logic and leaves the app vulnerable to command injection.
- **Fix:** Remove `shell=True`. Change the signature of `run_cleanup` to accept a list of strings if necessary, or use `shlex.split(command)` to parse the string into a list before passing it to `subprocess.run`. Ensure that `shell=False` is used implicitly or explicitly.

## 2. Fix AppleScript Injection in `memory.py`
- **File:** `space_hog/memory.py`
- **Issue:** `remove_login_item` interpolates `app_name` into an AppleScript block executed via `osascript`.
- **Fix:** Strictly sanitize `app_name` before interpolation using an allowlist regex (e.g., `import re; app_name = re.sub(r'[^a-zA-Z0-9 ._-]', '', app_name)`) to prevent AppleScript injection.

## 3. Fix TOCTOU Race and State-After-Mutation in `safe_delete.py`
- **File:** `space_hog/safe_delete.py`
- **Issue 1:** `target.is_file()` is called *after* `send2trash.send2trash(str(target))`, which fails because the file is already moved. This causes all deleted files to be incorrectly logged as directories.
- **Issue 2:** The `is_symlink()` check is separated from `send2trash`, creating a TOCTOU race.
- **Fix:** In `move_to_trash()`, evaluate and store `is_file = target.is_file()` *before* calling `send2trash`. Pass `is_file` to `_record_removal` (e.g., `'file' if is_file else 'directory'`). If possible, ensure no symlinks are passed to `send2trash` by checking `target.is_symlink()` immediately before, though the primary fix is the `is_file` order.

## 4. Fix Symlink Traversal in `utils.py`
- **File:** `space_hog/utils.py`
- **Issue:** `get_dir_size` uses `path.rglob('*')`, which follows directory symlinks on Python < 3.13.
- **Fix:** Rewrite `get_dir_size` to use `os.walk(path, topdown=True, followlinks=False)` or `os.scandir` to safely traverse the directory without following symlinks. Sum the sizes of the files found.

## 5. Improve Docker Label Sanitization
- **File:** `space_hog/docker.py`
- **Issue:** `_sanitize_label_text` only strips control characters, allowing shell metacharacters like `;`, `|`, `$()`.
- **Fix:** Change the regex to a strict allowlist: `cleaned = re.sub(r'[^a-zA-Z0-9_.-]', '', text).strip()`.

## 6. Fix `os.walk` Issues in `scanners.py`
- **File:** `space_hog/scanners.py`
- **Issue 1:** Missing `onerror` handler in `os.walk`, causing `PermissionError`s to silently abort the entire scan.
- **Fix 1:** Add an `onerror` callback to `os.walk` (e.g., `onerror=lambda e: None` or log a warning) so traversals continue past inaccessible directories.
- **Issue 2:** The pruning logic (`dirnames[:] = []`) prevents finding nested space hogs (e.g., `node_modules` inside `.venv`).
- **Fix 2:** Instead of clearing `dirnames` completely, selectively remove the matched directory name from `dirnames` or adjust the logic so it can still process independent nested hogs.

## 7. Fix Silent Exception Swallowing
- **File:** Across codebase (e.g., `safe_delete.py`, `utils.py`, `caches.py`).
- **Issue:** Too many `except Exception: pass` blocks hide errors.
- **Fix:** Where appropriate, change `except Exception:` to `except Exception as e: import logging; logging.warning(f"Error: {e}")` to ensure errors are at least visible.

## 8. Fix Build Metadata and Dependencies
- **File:** `pyproject.toml` and `space_hog/__init__.py`
- **Issue:** Version mismatch (`0.2.0` vs `0.5.0`). Missing `pytest` and `docker` dependencies.
- **Fix:** Update `pyproject.toml` version to `0.5.0` to match `__init__.py`. Add `docker` to `dependencies`. Add `[project.optional-dependencies]` with `test = ["pytest"]`.

Apply these fixes comprehensively and carefully.
30 changes: 30 additions & 0 deletions CODEX-FIX-SPEC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Codex Remediation Task: space-hog

You are tasked with remediating critical security and performance vulnerabilities in the `space-hog` repository. Apply the following fixes:

## 1. Eliminate Command Injection (RCE)
- **File:** `space_hog/safe_delete.py`
- **Issue:** The `move_to_trash` function uses string interpolation inside an AppleScript block (`osascript`) to delete files. This allows arbitrary code execution if a filename contains quotes or AppleScript commands.
- **Fix:** Replace the entire `osascript` block with the `send2trash` Python library. Add `send2trash` to `pyproject.toml` dependencies (e.g., `send2trash>=1.8.0`). Make sure to `import send2trash` and use `send2trash.send2trash(str(target))`.

## 2. Prevent Symlink Traversal (Arbitrary File Deletion)
- **File:** `space_hog/safe_delete.py` and `space_hog/scanners.py`
- **Issue:** Directory traversals (e.g., `target.iterdir()` or `rglob`) follow symlinks blindly. If a directory contains a symlink to `/usr/local`, the tool might delete system files.
- **Fix:** Add `is_symlink()` checks. For `safe_delete.py`'s `trash_contents`, skip items where `item.is_symlink()` is true and append a warning to the `errors` list.

## 3. Fix Unbounded Resource Consumption (Performance / DoS)
- **File:** `space_hog/scanners.py`
- **Issue:** `find_space_hogs` iterates over `SPACE_HOG_PATTERNS` and calls `root.rglob(pattern)` for each one (17 separate full-disk traversals). This takes `O(P * N)` time.
- **Fix:** Refactor `find_space_hogs` to use a single `os.walk(root)` pass. During the pass, check if the current directory name matches any key in `SPACE_HOG_PATTERNS`. If it does, calculate its size, add it to the results, and remove it from `dirnames` so `os.walk` doesn't recurse into it. Also, skip any symlinks using `os.path.islink`.

## 4. Fix Silent Exception Swallowing
- **File:** `space_hog/safe_delete.py`
- **Issue:** The `_record_removal` function has an empty `except Exception: pass` block.
- **Fix:** Change it to `except Exception as e: import logging; logging.warning(f"Failed to record removal: {e}")` to ensure failures are visible without crashing the app.

## 5. Prevent Docker Label Injection
- **File:** `space_hog/docker.py`
- **Issue:** `analyze_docker_volumes` parses project names from Docker labels and later these are embedded in suggested CLI commands without sanitization.
- **Fix:** Sanitize the extracted `project` variable by stripping control characters (e.g., using a regex like `re.sub(r'[\x00-\x1f\x7f]', '', text)`). Also, update `print_docker_analysis` to use `shlex.quote(proj)` when generating the `docker volume rm` suggestion.

Please thoroughly apply these changes. Ensure all code remains functional.
11 changes: 10 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "space-hog"
version = "0.2.0"
version = "0.5.0"
description = "Find and reclaim wasted disk space on macOS"
readme = "README.md"
requires-python = ">=3.10"
Expand All @@ -13,6 +13,10 @@ authors = [
{name = "dshanklin-bv"}
]
keywords = ["disk", "space", "cleanup", "macos", "cli"]
dependencies = [
"send2trash>=1.8.0",
"psutil>=5.9.0",
]
classifiers = [
"Development Status :: 4 - Beta",
"Environment :: Console",
Expand All @@ -33,3 +37,8 @@ space-hog = "space_hog:main"
[project.urls]
Homepage = "https://github.com/rhea-impact/space-hog"
Repository = "https://github.com/rhea-impact/space-hog"

[project.optional-dependencies]
test = [
"pytest",
]
13 changes: 7 additions & 6 deletions space_hog/caches.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""Cache analysis for Space Hog."""

import logging
import time
from pathlib import Path

Expand All @@ -18,8 +19,8 @@ def check_caches() -> list[tuple[Path, int, str]]:
size = get_dir_size(path)
if size > 0:
results.append((path, size, description))
except (PermissionError, OSError):
pass
except (PermissionError, OSError) as e:
logging.warning(f"Failed to inspect cache location {path}: {e}")

return sorted(results, key=lambda x: x[1], reverse=True)

Expand Down Expand Up @@ -50,9 +51,9 @@ def get_downloads_analysis(min_age_days: int = 30) -> tuple[int, list[FileInfo]]
if stat.st_mtime < cutoff:
old_files.append(FileInfo(entry, stat.st_size))
total_size += stat.st_size
except (PermissionError, OSError):
pass
except (PermissionError, OSError):
pass
except (PermissionError, OSError) as e:
logging.warning(f"Failed to inspect Downloads file {entry}: {e}")
except (PermissionError, OSError) as e:
logging.warning(f"Failed to read Downloads directory {downloads}: {e}")

return total_size, sorted(old_files, key=lambda x: x.size, reverse=True)
7 changes: 4 additions & 3 deletions space_hog/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,8 @@ def print_full_report():
def run_tier_cleanup(tier: int = 1, dry_run: bool = False):
"""Run cleanup for a specific tier."""
from .advisor import collect_cleanup_opportunities, calculate_tier_savings
from .stats import run_cleanup, start_cleanup_session, end_cleanup_session, print_post_cleanup_summary
from .safe_delete import safe_cleanup
from .stats import start_cleanup_session, end_cleanup_session, print_post_cleanup_summary

print_header(f"TIER {tier} CLEANUP" + (" (DRY RUN)" if dry_run else ""))

Expand Down Expand Up @@ -181,10 +182,10 @@ def run_tier_cleanup(tier: int = 1, dry_run: bool = False):

for item in tier_info['items']:
print(f" Cleaning {item['name']}...", end=' ', flush=True)
result = run_cleanup(
result = safe_cleanup(
command=item['command'],
description=item['name'],
category=item.get('category_key', 'manual')
category=item.get('category_key', 'manual'),
)
if result['success']:
print(f"{c.GREEN}OK{c.RESET} ({result['bytes_freed_human']})")
Expand Down
30 changes: 23 additions & 7 deletions space_hog/docker.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,24 @@
"""Docker disk analysis for Space Hog."""

import json
import logging
import re
import shlex
import subprocess
from pathlib import Path

from .utils import format_size, Colors


def _sanitize_label_text(text: str | None) -> str | None:
"""Allow only safe label characters for output and shell use."""
if not text:
return None
if re.fullmatch(r'^[a-zA-Z0-9_.-]+$', text):
return text
return None


def analyze_docker() -> dict:
"""Analyze Docker disk usage including VM disk bloat."""
result = {
Expand Down Expand Up @@ -103,8 +115,8 @@ def analyze_docker() -> dict:
except json.JSONDecodeError:
continue

except (subprocess.CalledProcessError, subprocess.TimeoutExpired):
pass
except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
logging.warning(f"Failed to inspect Docker disk usage: {e}")

# Calculate totals
result['total_usage'] = (
Expand Down Expand Up @@ -181,13 +193,14 @@ def analyze_docker_volumes() -> list[dict]:
if 'com.supabase.cli.project=' in labels:
for part in labels.split(','):
if 'com.supabase.cli.project=' in part:
project = part.split('=')[1]
project = part.split('=', 1)[1]
break
elif 'com.docker.compose.project=' in labels:
for part in labels.split(','):
if 'com.docker.compose.project=' in part:
project = part.split('=')[1]
project = part.split('=', 1)[1]
break
project = _sanitize_label_text(project)

# Parse size
size_str = vol.get('Size', '0B')
Expand All @@ -204,8 +217,8 @@ def analyze_docker_volumes() -> list[dict]:
'driver': vol.get('Driver', 'local'),
})

except (subprocess.CalledProcessError, subprocess.TimeoutExpired, json.JSONDecodeError):
pass
except (subprocess.CalledProcessError, subprocess.TimeoutExpired, json.JSONDecodeError) as e:
logging.warning(f"Failed to inspect Docker volumes: {e}")

return sorted(volumes, key=lambda x: -x['size'])

Expand Down Expand Up @@ -333,7 +346,10 @@ def print_docker_analysis():
orphan_size = sum(v['size'] for v in orphaned)
print(f" {c.YELLOW}Orphaned volumes (no running containers): {format_size(orphan_size)}{c.RESET}")
print(f" To remove orphaned volumes for a project:")
print(f" docker volume rm $(docker volume ls -q -f 'label=com.docker.compose.project=PROJECT_NAME')")
orphan_projects = sorted({v.get('project') for v in orphaned if v.get('project')})
for proj in orphan_projects:
quoted_project = shlex.quote(proj)
print(f" docker volume rm $(docker volume ls -q -f label=com.docker.compose.project={quoted_project})")
print()

# JSON output for AI
Expand Down
15 changes: 11 additions & 4 deletions space_hog/memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
Tracks RAM consumers, autostart items, and background processes.
"""

import logging
import re
import subprocess
from pathlib import Path
from typing import Optional
Expand Down Expand Up @@ -65,8 +67,8 @@ def get_login_items() -> list[str]:
)
if result.returncode == 0 and result.stdout.strip():
return [item.strip() for item in result.stdout.strip().split(',')]
except (subprocess.TimeoutExpired, Exception):
pass
except (subprocess.TimeoutExpired, Exception) as e:
logging.warning(f"Failed to read login items: {e}")
return []


Expand All @@ -85,7 +87,8 @@ def get_launch_agents() -> list[dict]:
capture_output=True, text=True, timeout=5
)
is_running = result.returncode == 0
except:
except Exception as e:
logging.warning(f"Failed to inspect launch agent {label}: {e}")
is_running = False

agents.append({
Expand Down Expand Up @@ -114,7 +117,8 @@ def get_launch_daemons() -> list[dict]:
capture_output=True, text=True, timeout=5
)
is_running = result.returncode == 0
except:
except Exception as e:
logging.warning(f"Failed to inspect launch daemon {label}: {e}")
is_running = False

daemons.append({
Expand Down Expand Up @@ -298,6 +302,9 @@ def remove_login_item(app_name: str) -> dict:

Returns dict with success status.
"""
if not re.fullmatch(r"^[a-zA-Z0-9 ._\-'&]+$", app_name):
return {'success': False, 'error': 'Invalid app name'}

try:
result = subprocess.run(
['osascript', '-e',
Expand Down
Loading