Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 75 additions & 25 deletions codebase_rag/mcp/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@

import json
import os
import sys
from pathlib import Path

from loguru import logger
Expand All @@ -20,14 +19,54 @@
from codebase_rag.services.llm import CypherGenerator


def setup_logging() -> None:
"""Configure logging to stderr for MCP stdio transport."""
def setup_logging(enable_logging: bool = False) -> None:
"""Configure logging for MCP stdio transport.

By default, logging is disabled to prevent token waste in LLM context.
Can be enabled via environment variable MCP_ENABLE_LOGGING=1 for debugging.

When enabled, logs are written to a file to avoid polluting STDIO transport.
The log file path can be configured via MCP_LOG_FILE environment variable.

Args:
enable_logging: Whether to enable logging output. Defaults to False.
Can also be controlled via MCP_ENABLE_LOGGING environment variable.
"""
Comment on lines +23 to +34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the project's general rules, docstrings are not allowed. Please convert this docstring into a regular block comment to adhere to the project's standards.

# Configure logging for MCP stdio transport.
#
# By default, logging is disabled to prevent token waste in LLM context.
# Can be enabled via environment variable MCP_ENABLE_LOGGING=1 for debugging.
#
# When enabled, logs are written to a file to avoid polluting STDIO transport.
# The log file path can be configured via MCP_LOG_FILE environment variable.
#
# Args:
#     enable_logging: Whether to enable logging output. Defaults to False.
#                    Can also be controlled via MCP_ENABLE_LOGGING environment variable.
References
  1. Docstrings are not allowed in this project, as enforced by a pre-commit hook.

logger.remove() # Remove default handler
logger.add(
sys.stderr,
level="INFO",
format="<green>{time:YYYY-MM-DD HH:mm:ss}</green> | <level>{level: <8}</level> | <level>{message}</level>",

# Check environment variable to override enable_logging parameter
env_enable = os.environ.get("MCP_ENABLE_LOGGING", "").lower() in (
"1",
"true",
"yes",
)
should_enable = enable_logging or env_enable

if should_enable:
# Get log file path from environment or use default
log_file = os.environ.get("MCP_LOG_FILE")
if not log_file:
# Use ~/.cache/code-graph-rag/mcp.log as default
cache_dir = Path.home() / ".cache" / "code-graph-rag"
cache_dir.mkdir(parents=True, exist_ok=True)
log_file = str(cache_dir / "mcp.log")

# Ensure log file directory exists
log_path = Path(log_file)
log_path.parent.mkdir(parents=True, exist_ok=True)
Comment on lines +47 to +56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for determining the log file path and creating its parent directory can be simplified to avoid redundant mkdir calls and improve clarity. You can determine the Path object for the log file first, and then call mkdir once on its parent directory.

        log_file_path_str = os.environ.get("MCP_LOG_FILE")
        if log_file_path_str:
            log_file = Path(log_file_path_str)
        else:
            # Use ~/.cache/code-graph-rag/mcp.log as default
            cache_dir = Path.home() / ".cache" / "code-graph-rag"
            log_file = cache_dir / "mcp.log"

        # Ensure log file directory exists
        log_file.parent.mkdir(parents=True, exist_ok=True)

Comment on lines +50 to +56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directory created twice in default path case - cache_dir.mkdir() at 51, then log_path.parent.mkdir() at 56 creates same directory.

Suggested change
cache_dir = Path.home() / ".cache" / "code-graph-rag"
cache_dir.mkdir(parents=True, exist_ok=True)
log_file = str(cache_dir / "mcp.log")
# Ensure log file directory exists
log_path = Path(log_file)
log_path.parent.mkdir(parents=True, exist_ok=True)
log_file = os.environ.get("MCP_LOG_FILE")
if not log_file:
# Use ~/.cache/code-graph-rag/mcp.log as default
cache_dir = Path.home() / ".cache" / "code-graph-rag"
log_file = str(cache_dir / "mcp.log")
# Ensure log file directory exists (handles both env and default cases)
Path(log_file).parent.mkdir(parents=True, exist_ok=True)
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/mcp/server.py
Line: 50-56

Comment:
Directory created twice in default path case - `cache_dir.mkdir()` at 51, then `log_path.parent.mkdir()` at 56 creates same directory.

```suggestion
        log_file = os.environ.get("MCP_LOG_FILE")
        if not log_file:
            # Use ~/.cache/code-graph-rag/mcp.log as default
            cache_dir = Path.home() / ".cache" / "code-graph-rag"
            log_file = str(cache_dir / "mcp.log")

        # Ensure log file directory exists (handles both env and default cases)
        Path(log_file).parent.mkdir(parents=True, exist_ok=True)
```

How can I resolve this? If you propose a fix, please make it concise.


# Add file handler - logs go to file, not STDERR/STDOUT
logger.add(
log_file,
level="INFO",
format="{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | {message}",
colorize=False, # Disable ANSI color codes
rotation="10 MB", # Rotate when file reaches 10MB
retention="7 days", # Keep logs for 7 days
Comment on lines +63 to +65
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comments without (H) marker violate comment policy. Either prefix with (H) or remove (parameters are self-documenting).

Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/mcp/server.py
Line: 63-65

Comment:
Inline comments without `(H)` marker violate comment policy. Either prefix with `(H)` or remove (parameters are self-documenting).

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

)
else:
# Disable all logging by default for MCP mode
logger.disable("codebase_rag")
Comment on lines +38 to +69
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded strings violate coding standards. Per rule d4240b05, non-config/constants/logs files should have almost no string literals. Move to codebase_rag/mcp/constants.py:

# constants.py
from enum import StrEnum

ENV_MCP_ENABLE_LOGGING = "MCP_ENABLE_LOGGING"
ENV_MCP_LOG_FILE = "MCP_LOG_FILE"
ENABLE_VALUES = ("1", "true", "yes")
DEFAULT_CACHE_DIR = ".cache"
DEFAULT_PROJECT_NAME = "code-graph-rag"
DEFAULT_LOG_FILE = "mcp.log"
LOG_FORMAT = "{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | {message}"
LOG_ROTATION = "10 MB"
LOG_RETENTION = "7 days"
LOGGER_NAME = "codebase_rag"
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/mcp/server.py
Line: 38-69

Comment:
Hardcoded strings violate coding standards. Per rule d4240b05, non-config/constants/logs files should have almost no string literals. Move to `codebase_rag/mcp/constants.py`:

```python
# constants.py
from enum import StrEnum

ENV_MCP_ENABLE_LOGGING = "MCP_ENABLE_LOGGING"
ENV_MCP_LOG_FILE = "MCP_LOG_FILE"
ENABLE_VALUES = ("1", "true", "yes")
DEFAULT_CACHE_DIR = ".cache"
DEFAULT_PROJECT_NAME = "code-graph-rag"
DEFAULT_LOG_FILE = "mcp.log"
LOG_FORMAT = "{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | {message}"
LOG_ROTATION = "10 MB"
LOG_RETENTION = "7 days"
LOGGER_NAME = "codebase_rag"
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!



def get_project_root() -> Path:
Expand Down Expand Up @@ -143,34 +182,45 @@ async def call_tool(name: str, arguments: dict) -> list[TextContent]:

Tool handlers are dynamically resolved from the MCPToolsRegistry,
ensuring consistency with tool definitions.

Logging is suppressed during tool execution to prevent token waste in LLM context.
"""
logger.info(f"[GraphCode MCP] Calling tool: {name}")
import io
from contextlib import redirect_stderr, redirect_stdout
Comment on lines +188 to +189
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports inside function. Move to module top per Python conventions and project code organization standards.

Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/mcp/server.py
Line: 188-189

Comment:
Imports inside function. Move to module top per Python conventions and project code organization standards.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


try:
# Resolve handler from registry
handler_info = tools.get_tool_handler(name)
if not handler_info:
error_msg = f"Unknown tool: {name}"
logger.error(f"[GraphCode MCP] {error_msg}")
error_msg = "Unknown tool"
return [TextContent(type="text", text=f"Error: {error_msg}")]
Comment on lines +195 to 196
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While returning a generic error message to the LLM is reasonable to save tokens, removing the log for an unknown tool makes debugging difficult. Please log this error. The log will be correctly routed to a file or disabled based on the setup_logging configuration.

                logger.error(f"[GraphCode MCP] Unknown tool: {name}")
                return [TextContent(type="text", text="Error: Unknown tool")]


handler, returns_json = handler_info

# Call handler with unpacked arguments
result = await handler(**arguments)

# Format result based on output type
if returns_json:
result_text = json.dumps(result, indent=2)
else:
result_text = str(result)

return [TextContent(type="text", text=result_text)]

except Exception as e:
error_msg = f"Error executing tool '{name}': {str(e)}"
logger.error(f"[GraphCode MCP] {error_msg}", exc_info=True)
return [TextContent(type="text", text=f"Error: {error_msg}")]
# Suppress all logging output during tool execution
with redirect_stdout(io.StringIO()), redirect_stderr(io.StringIO()):
logger.disable("codebase_rag")
try:
# Call handler with unpacked arguments
result = await handler(**arguments)

# Format result based on output type
if returns_json:
result_text = json.dumps(result, indent=2)
else:
result_text = str(result)

return [TextContent(type="text", text=result_text)]
finally:
logger.enable("codebase_rag")
Comment on lines +201 to +215
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The use of contextlib.redirect_stdout and logger.disable within an asynchronous function creates a race condition that can lead to a Denial of Service (DoS). In Python, sys.stdout and the loguru logger's enabled/disabled state are global. Since call_tool is an async function, multiple tool calls can execute concurrently. If one tool call redirects stdout or disables logging, it affects all other concurrent tasks, including the MCP server's own transport mechanism which relies on stdout for JSON-RPC communication. This can result in the corruption of the transport protocol, causing the server to crash or become unresponsive when multiple tools are invoked simultaneously.

Comment on lines +202 to +215
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.enable() in finally re-enables logging globally, even when setup_logging() disabled it. After first tool call, all logs will appear despite MCP_ENABLE_LOGGING not being set.

Check if logging was enabled before disabling, and only restore previous state:

# Track if logging was enabled
logging_was_enabled = not logger._core.enabled.get("codebase_rag", False)
if logging_was_enabled:
    logger.disable("codebase_rag")
try:
    result = await handler(**arguments)
    # ...
finally:
    if logging_was_enabled:
        logger.enable("codebase_rag")
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/mcp/server.py
Line: 202-215

Comment:
`logger.enable()` in finally re-enables logging globally, even when `setup_logging()` disabled it. After first tool call, all logs will appear despite `MCP_ENABLE_LOGGING` not being set.

Check if logging was enabled before disabling, and only restore previous state:

```python
# Track if logging was enabled
logging_was_enabled = not logger._core.enabled.get("codebase_rag", False)
if logging_was_enabled:
    logger.disable("codebase_rag")
try:
    result = await handler(**arguments)
    # ...
finally:
    if logging_was_enabled:
        logger.enable("codebase_rag")
```

How can I resolve this? If you propose a fix, please make it concise.


except Exception:
# Fail silently without logging or printing error details
return [
TextContent(
type="text", text="Error: There was an error executing the tool"
Comment on lines +195 to +221
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages are hardcoded strings. Move to tool_errors.py per coding standards:

# tool_errors.py  
from enum import Enum

class ToolError(str, Enum):
    UNKNOWN_TOOL = "Unknown tool"
    EXECUTION_FAILED = "There was an error executing the tool"
    
    def __call__(self, **kwargs) -> str:
        return self.value.format(**kwargs) if kwargs else self.value
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/mcp/server.py
Line: 195-221

Comment:
Error messages are hardcoded strings. Move to `tool_errors.py` per coding standards:

```python
# tool_errors.py  
from enum import Enum

class ToolError(str, Enum):
    UNKNOWN_TOOL = "Unknown tool"
    EXECUTION_FAILED = "There was an error executing the tool"
    
    def __call__(self, **kwargs) -> str:
        return self.value.format(**kwargs) if kwargs else self.value
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

)
]
Comment on lines +217 to +223
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Catching all exceptions and failing silently without logging is a dangerous pattern that can hide critical bugs and make debugging extremely difficult. While the goal is to prevent error details from reaching the LLM, they absolutely should be logged to the configured file (if logging is enabled). Using logger.exception() will handle this correctly, as it will only log if the logger is enabled and will include valuable traceback information.

        except Exception:
            # Log the exception if logging is enabled, but return a generic error to the client.
            logger.exception(f"[GraphCode MCP] Error executing tool '{name}'")
            return [
                TextContent(
                    type="text", text="Error: There was an error executing the tool"
                )
            ]


return server, ingestor

Expand Down