Skip to content

Latest commit

 

History

History
169 lines (119 loc) · 10.9 KB

File metadata and controls

169 lines (119 loc) · 10.9 KB

ProcmonMCP - Project Review

Summary

ProcmonMCP is a Model Context Protocol (MCP) server (single Python file, ~2300 lines) that enables LLMs to autonomously analyze Process Monitor (Procmon) XML log files. The project loads a Procmon XML file into memory with string interning and index optimizations, then exposes 14 MCP tools for querying events, inspecting processes, and exporting results.

The project is functional and demonstrates solid domain knowledge of both the MCP protocol and Procmon data analysis. The review below covers architecture, code quality, security, performance, testing, and packaging.


1. Architecture & Design

Strengths:

  • Two-pass XML parsing (processes first, then events) is well-designed for building the process lookup table before event enrichment.
  • String interning via StringInterner is an effective memory optimization for repetitive Procmon data (process names, operations, paths, results).
  • Pre-built indices for process name and operation enable fast filtering without full scans.
  • Compression support (gzip, bz2, lzma) adds practical flexibility.
  • Graceful fallbacks (lxml -> stdlib ElementTree, psutil optional, MCP SDK mock) are pragmatic.

Concerns:

  • Single-file monolith (2303 lines): The entire server—data structures, XML parsing, MCP tool definitions, filtering engine, export logic, CLI entry point—lives in one file. Splitting into modules (e.g., models.py, parser.py, tools.py, filters.py) would improve navigability and testability.
  • Global mutable state: LOADED_DATA is a module-level global modified by main_execution() and read by all MCP tool handlers. This makes unit testing difficult and prevents running multiple server instances in-process. Consider dependency injection or a server context object.
  • Single-file limitation: Only one Procmon file can be loaded per server instance. This is documented but architecturally baked in through the global state pattern.

2. Code Quality

Formatting & Style

  • Semicolons as statement separators are used extensively (e.g., procmon-mcp.py:92-96, procmon-mcp.py:461, procmon-mcp.py:1583), placing multiple statements on a single line. This reduces readability. Standard Python style uses one statement per line.
  • Inconsistent line lengths: Some lines exceed 200 characters, making the code difficult to read without horizontal scrolling.
  • Commented-out code: Dead code blocks remain in _clear_elem() (procmon-mcp.py:178-188) and StringInterner.get_id() (procmon-mcp.py:213-218). These should be removed; version control preserves history.

Specific Issues

  1. Duplicate io import (procmon-mcp.py:7 and procmon-mcp.py:26): import io and import io as pstats_io import the same module twice under different names. Use io.StringIO directly instead.

  2. StopAsyncIteration raised inside async generator (procmon-mcp.py:1018, 1021): Per PEP 479 (enforced in Python 3.7+), raising StopIteration or StopAsyncIteration inside a generator body is converted to RuntimeError. The correct pattern is to simply return from the generator:

    # Current (problematic):
    raise StopAsyncIteration("Log data not loaded.")
    
    # Should be:
    await ctx.error("Filtering failed: Log data not loaded.")
    return
  3. exit() vs sys.exit() (procmon-mcp.py:2158, 2179, 2182, etc.): The built-in exit() is intended for the interactive interpreter. Use sys.exit() in scripts for clarity and reliability.

  4. CSV export mutates dictionaries (procmon-mcp.py:2113-2115): export_query_results converts extra_data, stack_trace, and process_details_summary to JSON strings in-place on the row_dict returned by _get_formatted_event_details. While these are temporary copies in practice, mutating dicts during iteration is a code smell. Create copies or use a serialization helper.

  5. iterparse tag parameter compatibility (procmon-mcp.py:408, 498): The tag keyword argument to iterparse is not supported in the same way across all Python/lxml versions. In CPython's xml.etree.ElementTree, tag support in iterparse was not available until Python 3.13. This could cause TypeError on older Python versions despite the 3.7+ minimum claim.

  6. Process list parsing state machine (procmon-mcp.py:421-427): The state transitions from seeking_procmon to seeking_processlist on any end-event that isn't procmon. If the XML has unexpected top-level elements, this could skip the processlist or misparse. A more robust approach would check for the processlist start tag explicitly.

  7. README typo (README.md:97): "To use GhidraMCP with Cline" should read "To use ProcmonMCP with Cline".


3. Security

Well-Handled

  • Path traversal prevention in export_query_results (procmon-mcp.py:2011-2042) uses os.path.commonpath to ensure output files stay within the working directory. This is correctly implemented.
  • Security warning in the README is prominent and detailed.

Concerns

  1. No authentication on SSE transport: When using --transport sse, the server binds to a network interface and accepts unauthenticated connections. Any process or user that can reach the host:port has full read access to all loaded Procmon data. The README warns about trusted environments, but the server could at minimum support a shared secret or API key.

  2. Regex Denial of Service (ReDoS): User-supplied regex patterns (filter_path_regex, filter_process_regex, filter_detail_regex) are compiled and executed against potentially millions of events with no timeout or complexity limit (procmon-mcp.py:1027-1029). A pathological regex like (a+)+$ could hang the server. Consider using re2 or adding a timeout wrapper.

  3. No input length limits on filter parameters: String filter parameters have no maximum length validation. Extremely long strings could cause memory or performance issues during filtering.

  4. os.makedirs in export (procmon-mcp.py:2032): While the path traversal check prevents writing outside CWD, the os.makedirs call allows creating arbitrary directory structures within CWD. This is low-risk but worth noting.


4. Performance

Strengths:

  • String interning significantly reduces memory for repetitive data.
  • Pre-built indices for pname_id and op_id enable O(1) lookup of candidate event sets before applying further filters.
  • Stream-based XML parsing (iterparse) keeps peak memory manageable during loading.
  • Progress reporting during long operations provides good user feedback.
  • The optimization decision to use direct element iteration over lxml XPath (procmon-mcp.py:163-169) is well-documented with profiling justification.

Improvement Opportunities:

  1. count_events_by_process (procmon-mcp.py:1620-1662) iterates all events and resolves string names, when it could use the existing pname_id_index directly:

    # Could be:
    for pname_id, event_indices in log_data.pname_id_index.items():
        name = log_data.get_string(IK_PROCESS_NAME, pname_id) or 'Unknown'
        event_counts[name] = len(event_indices)

    This would be O(unique_names) instead of O(total_events).

  2. get_process_lifetime (procmon-mcp.py:1816-1852) performs a full linear scan of all events. It could use the pname_id_index to narrow the search to events for the specific process, then further filter by operation using op_id_index intersection.

  3. Sorting candidate_indices (procmon-mcp.py:1103): Converting a set to a sorted list for "cache locality" adds O(n log n) overhead. For large candidate sets, this overhead may exceed any cache benefit. Profiling should validate whether this helps.

  4. find_file_access does a full linear scan with no index support. For a frequently used analysis tool, a path-based index (even a simple substring trie) could help.


5. Testing

Current state: The project has a single integration test file (tests/procmon-mcp-tester.py) that:

  • Requires a running MCP server with loaded data
  • Only supports SSE transport (not stdio)
  • Performs smoke tests by calling each tool and printing results
  • Has no assertions or pass/fail criteria
  • Has no test data fixtures

Recommendations:

  • Add unit tests for core components: StringInterner, ProcessInfo.from_xml_element, _parse_timestamp_str, _strip_namespace, _find_text_ignore_ns, time filter parsing.
  • Add integration tests with small sample XML fixtures that can run without a full MCP server.
  • Add a test runner configuration (pytest) and consider CI integration.
  • Create sample Procmon XML test files (small, synthetic) for reproducible testing.

6. Packaging & Distribution

Missing files:

  • No requirements.txt or pyproject.toml — dependencies are only documented in the README. Users must manually install mcp[cli], lxml, and psutil.
  • No .gitignore — risks committing .prof files, __pycache__, IDE configs, exported CSV/JSON files, etc.
  • No setup.py or packaging metadata — the project can't be installed via pip.
  • No Dockerfile — could simplify deployment, especially for SSE transport mode.
  • No license file.

7. Midnight Rollover Handling

The timestamp rollover logic (procmon-mcp.py:566-568) advances the date when a parsed time is less than the previous time. This is documented but fragile:

  • A single out-of-order event (common in multi-threaded Procmon captures) could trigger a false rollover, advancing the date permanently for all subsequent events.
  • There is no mechanism to detect or recover from false rollovers.
  • Consider requiring a minimum time gap (e.g., the new time must be significantly earlier, like > 1 hour earlier) before triggering a rollover.

8. Summary of Findings by Severity

High

# Finding Location
1 StopAsyncIteration raised inside async generator causes RuntimeError procmon-mcp.py:1018,1021
2 iterparse tag parameter not supported in stdlib before Python 3.13 procmon-mcp.py:408,498
3 No authentication on SSE transport procmon-mcp.py:2187-2198

Medium

# Finding Location
4 ReDoS risk with user-supplied regex patterns procmon-mcp.py:1027-1029
5 Fragile midnight rollover detection procmon-mcp.py:566-568
6 No requirements.txt or packaging metadata Project root
7 No .gitignore file Project root
8 No unit tests tests/

Low

# Finding Location
9 Duplicate io import procmon-mcp.py:7,26
10 exit() instead of sys.exit() Multiple locations
11 CSV export mutates source dictionaries procmon-mcp.py:2113-2115
12 README references "GhidraMCP" instead of "ProcmonMCP" README.md:97
13 Semicolons reduce readability Throughout
14 count_events_by_process could use existing index procmon-mcp.py:1641