Skip to content

feat(analyze): add data360_analyze_development_topic with LiteLLM sampling and structured output#73

Open
rafmacalaba wants to merge 3 commits intodevfrom
feat/mcp-sampling
Open

feat(analyze): add data360_analyze_development_topic with LiteLLM sampling and structured output#73
rafmacalaba wants to merge 3 commits intodevfrom
feat/mcp-sampling

Conversation

@rafmacalaba
Copy link
Copy Markdown
Collaborator

@rafmacalaba rafmacalaba commented Apr 26, 2026

Summary

Supersedes #63. Rebases data360_analyze_development_topic onto upstream/dev and incorporates all review feedback from that PR.

The dependency chain for #63 (feat/multi-query-search #57, background sync #67, search-card #62, database_name #65) is now fully merged into dev, so this PR applies only the sampling-specific additions as a clean single commit.


ACTUAL TESTS
TEST SAMPLING in VSCode: decomposition_method: "sampling_client"

image

SAMPLING TIER 2 on clients that doesn't support sampling. decomposition_method: "sampling_server"

image

Changes from PR #63

New module: src/data360/mcp_server/sampling.py

Extracts the sampling handler into a standalone, testable module. Replaces the hard-coded OpenAISamplingHandler with a generic LiteLLM-based handler supporting multiple providers through a unified interface:

Provider LITELLM_MODEL example Credentials
OpenAI (default) gpt-4o-mini OPENAI_API_KEY
Anthropic anthropic/claude-haiku-3-5 ANTHROPIC_API_KEY
Google Gemini gemini/gemini-2.0-flash GEMINI_API_KEY
Mistral AI mistral/mistral-small-latest MISTRAL_API_KEY
AWS Bedrock (direct) bedrock/mistral.mistral-7b-instruct-v0:2 AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME

The handler is enabled only when at least one credential env var is present (has_llm_credentials()). If none are set, the server logs a clear message and the handler is omitted, so FastMCP skips to the client tier or rule-based tier without attempting a failing LLM call.

src/data360/mcp_server/_server_definition.py

Wires up the new handler. Replaces the OpenAISamplingHandler import and the silent except Exception: pass block with explicit startup logging.

src/data360/api.py

Structured output via result_type=_DecompositionResult

Defines two Pydantic models:

class _SampledQueryGroup(BaseModel):
    queries: list[str]
    country: str

class _DecompositionResult(BaseModel):
    sub_queries: list[str] | None = None
    query_groups: list[_SampledQueryGroup] | None = None

ctx.sample() is called with result_type=_DecompositionResult. FastMCP enforces the schema on the LLM response and returns a validated object via result.result. This eliminates:

  • The json.loads() / json.JSONDecodeError fallback
  • The markdown-fence stripping regex
  • The broad except (ValueError, json.JSONDecodeError) handler (Copilot review item)

Simplified fallback path

Any failure from ctx.sample() (client does not support sampling, LiteLLM call failed, validation error) logs a WARNING and proceeds with the raw user query. No partial JSON fallback. No sampling_error field surfaced to the caller.

Two-phase sampling for broad MCP client compatibility

Not all MCP clients support schema-constrained sampling. VS Code Copilot (1.99+), for example, supports plain-text sampling but rejects calls that include a result_type JSON schema constraint.

To handle this, sampling now uses a two-phase approach:

Phase A — structured output (result_type=_DecompositionResult):

  • Works with FastMCP's server-side LiteLLM handler and any client that supports schema-constrained sampling.
  • FastMCP enforces the Pydantic schema and returns a validated _DecompositionResult object directly — no manual parsing needed.

Phase B — plain-text fallback (triggered only if Phase A raises):

  • Retries with a plain ctx.sample() call (no result_type).
  • Manually parses the text response as JSON using json.loads().
  • Handles three response shapes: {"sub_queries": [...]}, {"query_groups": [...]}, and the legacy flat list ["topic1", ...].
  • The system prompt includes explicit JSON format examples so models reliably produce parseable output without the schema being injected by FastMCP.

If both phases fail (or neither produces valid output), the tool proceeds with the raw user query (decomposition_method="none").

Verified working with VS Code Copilot (Phase B path) and FastMCP LiteLLM (Phase A path).

Updated system prompt

The prompt describes intent and field semantics. FastMCP automatically appends the Pydantic JSON schema when result_type is set, so the explicit JSON examples in the old prompt are no longer needed.

database_name in response (avsolatorio review item)

Each selected_indicators entry now includes database_name alongside database_id.

Per-indicator country scope for prefetch (Copilot review item)

requested_country = getattr(ind, "requested_country", None) or country_code

In multi-country query_groups mode, each indicator carries its own requested_country from the group it was found in. Prefetch now uses that scope instead of the global country_code.

tests/test_analyze_topic.py

23 tests:

  • TestScoreIndicator — indicator scoring (3 tests)
  • TestHasLlmCredentials — provider credential detection (4 tests)
  • TestAnalyzeDevelopmentTopic — integration tests with mocked search/get_data (16 tests)
    • Phase A structured output (flat and grouped)
    • Phase B plain-text fallback: flat JSON, grouped JSON, non-JSON (all 3 cases)
    • database_name assertion
    • Per-indicator country scope for prefetch
    • Both phases fail -> proceeds with raw query
    • Empty _DecompositionResult -> proceeds with raw query
    • sampling_server tier when client does not advertise native sampling

pyproject.toml

Added litellm>=1.40.0 to project dependencies.


Sampling Tier Summary

Tier Condition LLM provider On failure
1 Client natively supports MCP sampling Client's own LLM Warning -> Tier 2
2 Any _CREDENTIAL_ENV_VARS set on server LiteLLM (multi-provider) Warning -> Tier 3
3 Neither available None Proceed with raw user query

Testing

Quick test with LiteLLM (any provider)

Set any supported provider's API key and run the test suite:

# Option 1: OpenAI (default model: gpt-4o-mini)
export OPENAI_API_KEY="sk-..."
uv run pytest tests/test_analyze_topic.py -v

# Option 2: Anthropic
export ANTHROPIC_API_KEY="ant-..."
export LITELLM_MODEL="anthropic/claude-haiku-3-5"
uv run pytest tests/test_analyze_topic.py -v

# Option 3: Mistral AI (direct API)
export MISTRAL_API_KEY="..."
export LITELLM_MODEL="mistral/mistral-small-latest"
uv run pytest tests/test_analyze_topic.py -v

# Option 4: Google Gemini
export GEMINI_API_KEY="..."
export LITELLM_MODEL="gemini/gemini-2.0-flash"
uv run pytest tests/test_analyze_topic.py -v

Manual e2e test (server-side sampling)

# Start server with sampling handler
export OPENAI_API_KEY="sk-..."
uv run poe serve

# In another terminal, call the tool via MCP client or curl
# The decomposition_method in the response should be "sampling_server"
# (or "sampling_client" if the MCP client supports native sampling)

Testing with mAI Factory APIM Bedrock (Ministral-3B)

Note: APIM Bedrock integration is planned as a follow-up PR. The instructions below document how to validate the Bedrock SLM path once it is implemented.

The mAI Factory platform provides Mistral SLMs via an Azure APIM Bedrock proxy. The recommended model for structured-output query decomposition is Ministral-3B (mistral.ministral-3-3b-instruct), which is designed for function calling and JSON-style structured outputs (256k context).

Environments available: DEV, QA (PROD in progress)

# Direct APIM Bedrock test (validates the model works for our use case)
import requests
from azure.identity import ClientSecretCredential
import os, json

# --- Auth ---
credential = ClientSecretCredential(
    tenant_id=os.environ["AZURE_TENANT_ID"],
    client_id=os.environ["AZURE_CLIENT_ID"],
    client_secret=os.environ["AZURE_CLIENT_SECRET"],
)
# DEV scope
SCOPE = "a1dd6401-acf2-43b5-a37c-c3230ef1be7d/.default"
token = credential.get_token(SCOPE).token

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json",
    "x-source-type": "job",
}

# --- Call Ministral-3B via APIM Bedrock proxy ---
MODEL_ID = "mistral.ministral-3-3b-instruct"
url = f"https://azapimdev.worldbank.org/conversationalai/bedrock/model/{MODEL_ID}/converse"

payload = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "system": (
        "You are a development economist. Given the user's question about "
        "development data, generate 3-5 specific, measurable topics that can "
        "be searched in a statistical database.\n\n"
        "Respond with valid JSON matching this schema:\n"
        '{"sub_queries": ["topic1", "topic2", ...]} or\n'
        '{"query_groups": [{"queries": ["topic1"], "country": "CountryName"}, ...]}\n'
        "Populate exactly one of sub_queries or query_groups."
    ),
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "User question: What are Ghana's economic challenges?"}
            ],
        }
    ],
    "temperature": 0.2,
}

response = requests.post(url, headers=headers, json=payload)
if response.ok:
    result = response.json()
    for item in result["output"]["message"]["content"]:
        if "text" in item:
            parsed = json.loads(item["text"])
            print(json.dumps(parsed, indent=2))
    usage = result.get("usage", {})
    print(f"Tokens -- in: {usage.get('inputTokens')}, out: {usage.get('outputTokens')}")
else:
    print(response.status_code, response.text)

What to verify:

  1. The model returns valid JSON matching the _DecompositionResult schema
  2. sub_queries contains 3-5 specific, searchable topics (not vague rephrases)
  3. For multi-country questions, query_groups is populated with per-country scoping
  4. Latency is acceptable for a server-side fallback (target: <2s for 3-5 sub-queries)
  5. Token usage is low (this is a small prompt; expect <200 input, <100 output tokens)

Follow-up: APIM Bedrock Integration (separate PR)

The current implementation uses LiteLLM with standard provider API keys. A follow-up PR will add native support for the mAI Factory APIM Bedrock proxy as an alternative Tier 2 fallback path. This is needed because:

  • LiteLLM's bedrock/ prefix calls AWS Bedrock directly using AWS_ACCESS_KEY_ID. It does not route through the APIM gateway.
  • The WB deployment accesses Bedrock models through Azure APIM (azapim{env}.worldbank.org/conversationalai/bedrock/model/{model_id}/converse) using Azure AD tokens.
  • These are fundamentally different auth and transport paths.

Planned architecture for follow-up PR

Tier 1: ctx.sample() --> client's LLM (no server dependency)
Tier 2a: LiteLLM (OpenAI/Anthropic/Gemini/Mistral-direct) --> if standard API keys exist
Tier 2b: APIM Bedrock proxy (Ministral-3B) --> if Azure AD creds + APIM config exist
Tier 3: Raw query fallback --> search directly with the user query

New env vars for APIM path

Variable Description Example
MAI_APIM_ENV Target environment DEV, QA, PROD
AZURE_CLIENT_ID Azure AD app registration (from Key Vault)
AZURE_CLIENT_SECRET Azure AD app secret (from Key Vault)
AZURE_TENANT_ID Azure AD tenant (from Key Vault)
MAI_APIM_SUBSCRIPTION_KEY APIM subscription key (Application Access only) (from Key Vault)
MAI_BEDROCK_MODEL Bedrock model ID override mistral.ministral-3-3b-instruct (default)

Default SLM: Ministral-3B

mistral.ministral-3-3b-instruct is recommended for the server-side fallback because:

  • 3B params: Fast inference, low cost
  • Native structured output: Designed for function calling and JSON-style outputs
  • 256k context: More than sufficient for our ~200 token decomposition prompts
  • Available in DEV/QA now via APIM Bedrock, PROD in progress

Alternative: mistral.mistral-small-latest (Mistral-Small, 7B, more capable but slower).

Implementation sketch

# sampling.py (addition in follow-up PR)

_APIM_ENV_CONFIG = {
    "DEV":  {"endpoint": "https://azapimdev.worldbank.org/conversationalai", "scope": "a1dd6401-acf2-43b5-a37c-c3230ef1be7d/.default"},
    "QA":   {"endpoint": "https://azapimqa.worldbank.org/conversationalai",  "scope": "c626bd72-9ef7-4efe-9176-5c75800f7670/.default"},
    "PROD": {"endpoint": "https://azapim.worldbank.org/conversationalai",    "scope": "0b3b356c-4b5f-4d5b-97ad-c99343ad5557/.default"},
}

async def apim_bedrock_sampling_handler(messages, params, context) -> str:
    """Calls Ministral-3B via mAI Factory APIM Bedrock proxy."""
    from azure.identity import ClientSecretCredential  # lazy import
    # 1. Acquire Azure AD token
    # 2. POST to /bedrock/model/{model_id}/converse
    # 3. Parse Bedrock converse response format
    # 4. Return extracted text
    ...

Test Results

23 passed (test_analyze_topic.py)
236 passed (existing suite)

…pling and structured output

Supersedes PR #63. Implements MCP sampling integration for the
analyze_development_topic tool with the following improvements over the
original:

- Generic LiteLLM handler in src/data360/mcp_server/sampling.py
  Supports OpenAI, Anthropic, Gemini, Mistral AI, and AWS Bedrock
  (Mistral SLMs) through a single interface. Provider is selected via
  LITELLM_MODEL env var (default: gpt-4o-mini). No hard dependency on
  a single provider.

- Structured output via result_type=_DecompositionResult (Pydantic)
  Replaces the manual JSON parsing / regex extraction path. FastMCP
  enforces the schema on the LLM response, eliminating unexpected
  free-text outputs and the broad except (ValueError, json.JSONDecodeError)
  handler.

- Clean fallback path
  Any sampling failure (client does not support sampling, LiteLLM call
  fails) logs a warning and falls through to rule-based decomposition.
  No partial JSON fallback is attempted.

- database_name added to selected_indicators response dict
  Avoids LLM needing a separate lookup to get the human-readable name.

- Per-indicator requested_country for data prefetch
  Uses ind.requested_country over the global country_code when present,
  so multi-country queries prefetch data for the correct scope per indicator.

- Sampling handler extracted to standalone module
  src/data360/mcp_server/sampling.py is separately testable with clear
  provider documentation and a has_llm_credentials() helper.

- System prompt updated for Pydantic schema
  Prompt describes intent and field semantics; FastMCP appends the JSON
  schema automatically when result_type is provided.

Tier resolution (unchanged from PR #63):
  Tier 1 — client native sampling
  Tier 2 — server-side LiteLLM handler (fallback)
  Tier 3 — rule-based regex decomposition

Tests: 28 new / 236 existing all pass.
…mpatibility

Phase A tries ctx.sample() with result_type=_DecompositionResult (schema-constrained
structured output). If the MCP client rejects result_type (e.g. VS Code Copilot,
Claude Desktop), Phase B retries with a plain ctx.sample() call and parses the
text response as JSON manually.

The system prompt now includes explicit JSON format examples so models reliably
produce parseable output in both phases.

Sampling cascade:
  Phase A: result_type= (Pydantic-validated, FastMCP enforces schema)
  Phase B: plain-text + manual JSON parse (VS Code compatible)
  Fallback: proceed with raw user query (decomposition_method='none')

Tests: 3 new test cases covering Phase B flat, grouped, and non-JSON paths.
@rafmacalaba
Copy link
Copy Markdown
Collaborator Author

@avsolatorio some clients like VS Code, doesn't support structured outputs as you suggested in the previous PR. handled it using a fallback.

Two-phase sampling for broad MCP client compatibility
Not all MCP clients support schema-constrained sampling. VS Code Copilot (1.99+), for example, supports plain-text sampling but rejects calls that include a result_type JSON schema constraint.

To handle this, sampling now uses a two-phase approach:

Phase A — structured output (result_type=_DecompositionResult):

Works with FastMCP's server-side LiteLLM handler and any client that supports schema-constrained sampling.
FastMCP enforces the Pydantic schema and returns a validated _DecompositionResult object directly — no manual parsing needed.
Phase B — plain-text fallback (triggered only if Phase A raises):

Retries with a plain ctx.sample() call (no result_type).
Manually parses the text response as JSON using json.loads().
Handles three response shapes: {"sub_queries": [...]}, {"query_groups": [...]}, and the legacy flat list ["topic1", ...].
The system prompt includes explicit JSON format examples so models reliably produce parseable output without the schema being injected by FastMCP.
If both phases fail (or neither produces valid output), the tool proceeds with the raw user query (decomposition_method="none").

Verified working with VS Code Copilot (Phase B path) and FastMCP LiteLLM (Phase A path).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new data360_analyze_development_topic MCP tool that decomposes broad user questions into searchable sub-queries using MCP sampling, with a server-side LiteLLM fallback and structured-output validation.

Changes:

  • Implemented analyze_development_topic with two-phase sampling (structured result_type then plain-text JSON fallback) and indicator prefetch/scoring.
  • Added a LiteLLM-based MCP sampling handler and wired it into the server definition (enabled only when credentials exist).
  • Added a dedicated test suite for sampling tiers, decomposition shapes, and response fields; added litellm dependency.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/data360/api.py Adds topic-analysis tool, sampling logic, scoring, and response shaping.
src/data360/mcp_server/sampling.py Introduces LiteLLM-based server-side sampling handler and credential detection.
src/data360/mcp_server/_server_definition.py Enables sampling handler conditionally and configures fallback behavior.
src/data360/mcp_server/tools.py Registers data360_analyze_development_topic as an MCP tool.
tests/test_analyze_topic.py Adds tests for decomposition/scoring/sampling tiers and response fields.
pyproject.toml Adds litellm>=1.40.0 dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/data360/api.py
Comment on lines +1877 to +1878
if country_code and "," in country_code:
multi_country_codes = [c.strip() for c in country_code.split(",") if c.strip()]
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-country detection/splitting is currently based on commas (,) in country_code, but _resolve_country_code() returns semicolon-delimited codes for multi-country input. This means Path B (cross-product query_groups) won’t trigger for real multi-country requests. Consider switching this logic to use ; (and updating docs/tests accordingly), or changing _resolve_country_code() contract consistently across the codebase.

Suggested change
if country_code and "," in country_code:
multi_country_codes = [c.strip() for c in country_code.split(",") if c.strip()]
if country_code and ";" in country_code:
multi_country_codes = [c.strip() for c in country_code.split(";") if c.strip()]

Copilot uses AI. Check for mistakes.
Comment thread src/data360/api.py
# --- Topic Analysis Helpers ---

# Conjunctions and stopwords used for rule-based query decomposition.
# Conjunctions and stopwords used for rule-based query decomposition.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above _STOPWORDS is duplicated on two consecutive lines. Remove one to avoid noise in this section.

Suggested change
# Conjunctions and stopwords used for rule-based query decomposition.

Copilot uses AI. Check for mistakes.
Comment on lines +359 to +364
patch("data360.api._resolve_country_code", return_value="MAR,ETH"),
patch("data360.api.get_data", mock_data),
):
result = await analyze_development_topic(
query="Labor market Morocco vs Ethiopia",
country="Morocco, Ethiopia",
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses comma-delimited multi-country input and patches _resolve_country_code to return "MAR,ETH", but the real resolver uses semicolon-delimited codes for multi-country values. Update the test to reflect the production delimiter so it validates the actual routing behavior.

Suggested change
patch("data360.api._resolve_country_code", return_value="MAR,ETH"),
patch("data360.api.get_data", mock_data),
):
result = await analyze_development_topic(
query="Labor market Morocco vs Ethiopia",
country="Morocco, Ethiopia",
patch("data360.api._resolve_country_code", return_value="MAR;ETH"),
patch("data360.api.get_data", mock_data),
):
result = await analyze_development_topic(
query="Labor market Morocco vs Ethiopia",
country="Morocco; Ethiopia",

Copilot uses AI. Check for mistakes.
Comment on lines +551 to +560
patch("data360.api._resolve_country_code", return_value="MAR,ETH"),
patch("data360.api.get_data", mock_data),
):
result = await analyze_development_topic(
query="labor market and manufacturing",
country="Morocco, Ethiopia",
)

assert result["decomposition_method"] == "none"
assert result["country_code"] == "MAR,ETH"
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case also assumes comma-delimited multi-country codes ("MAR,ETH") and inputs ("Morocco, Ethiopia"), which doesn’t match _resolve_country_code()’s semicolon-delimited multi-country contract. Align the test (and the production code/docs) on one delimiter so multi-country behavior is actually exercised correctly.

Suggested change
patch("data360.api._resolve_country_code", return_value="MAR,ETH"),
patch("data360.api.get_data", mock_data),
):
result = await analyze_development_topic(
query="labor market and manufacturing",
country="Morocco, Ethiopia",
)
assert result["decomposition_method"] == "none"
assert result["country_code"] == "MAR,ETH"
patch("data360.api._resolve_country_code", return_value="MAR;ETH"),
patch("data360.api.get_data", mock_data),
):
result = await analyze_development_topic(
query="labor market and manufacturing",
country="Morocco; Ethiopia",
)
assert result["decomposition_method"] == "none"
assert result["country_code"] == "MAR;ETH"

Copilot uses AI. Check for mistakes.
Comment thread src/data360/api.py
Comment on lines +1658 to +1660
country: Optional country name or 3-letter code (e.g. "Ghana", "GHA").
Can also be comma-separated for multi-country comparisons (e.g. "Morocco, Ethiopia").
max_indicators: Maximum number of indicators to return (default 4, max 6).
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool docs and implementation treat multi-country input as comma-separated (e.g. "Morocco, Ethiopia"), but _resolve_country_code() only supports semicolon-delimited multi-country values (because some country names contain commas). As a result, multi-country resolution and routing will fail unless this is aligned (either accept semicolons here, or implement robust parsing that won’t break names like "Korea, Republic of").

Copilot uses AI. Check for mistakes.
Comment thread src/data360/api.py
Comment on lines 1 to +10
import asyncio
import json
import logging
import re
import zlib
from typing import Any
from urllib.parse import urlencode

from fastmcp import Context
from pydantic import BaseModel
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re is imported but not used in this file. If it’s leftover from the old markdown-stripping logic, it should be removed to avoid dead imports and keep the module clean.

Copilot uses AI. Check for mistakes.
Comment thread src/data360/api.py
Comment on lines +1667 to +1671
query: The original question.
country / country_code: Resolved country info (if provided). May be comma-separated.
sub_queries: The decomposed search terms (from LLM or rule-based).
decomposition_method: "sampling_client", "sampling_server", or "none".
selected_indicators: List of ranked indicator dicts, each with:
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring and surrounding comments describe a “rule-based decomposition” fallback, but the current implementation falls back to sub_queries=[query] with decomposition_method="none" (no rule-based splitting). Please update the docstring/comments to match the actual behavior (or implement the intended rule-based decomposition).

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +83
temperature=params.temperature or 0.2,
max_tokens=params.maxTokens or 512,
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using params.temperature or 0.2 (and similarly params.maxTokens or 512) will override valid falsy values like 0.0 temperature. Prefer an explicit is None check so 0 / 0.0 are preserved when the caller intentionally sets them.

Suggested change
temperature=params.temperature or 0.2,
max_tokens=params.maxTokens or 512,
temperature=(
params.temperature if params.temperature is not None else 0.2
),
max_tokens=(
params.maxTokens if params.maxTokens is not None else 512
),

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +61
Raises on failure so that FastMCP can propagate the error back to the
caller (who should catch it and fall back to rule-based decomposition).
"""
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring says the caller will “fall back to rule-based decomposition”, but analyze_development_topic currently falls back to using the raw query (decomposition_method="none"). Please align this wording with the actual Tier 3 behavior to avoid confusing operators.

Copilot uses AI. Check for mistakes.
Comment on lines +257 to +267
patch("data360.api._resolve_country_code", return_value="MAR,ETH"),
patch("data360.api.get_data", mock_data),
):
result = await analyze_development_topic(
query="Labor market Morocco vs manufacturing Ethiopia",
country="Morocco, Ethiopia",
ctx=ctx,
)

assert result["decomposition_method"] == "sampling_client"
assert result["country_code"] == "MAR,ETH"
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests treat multi-country input as comma-delimited (and patch _resolve_country_code to return "MAR,ETH"), but _resolve_country_code() actually uses semicolons for multi-country (to avoid ambiguity with country names containing commas). Adjust the test inputs/expected values to match the real delimiter (or update the production code contract consistently).

Suggested change
patch("data360.api._resolve_country_code", return_value="MAR,ETH"),
patch("data360.api.get_data", mock_data),
):
result = await analyze_development_topic(
query="Labor market Morocco vs manufacturing Ethiopia",
country="Morocco, Ethiopia",
ctx=ctx,
)
assert result["decomposition_method"] == "sampling_client"
assert result["country_code"] == "MAR,ETH"
patch("data360.api._resolve_country_code", return_value="MAR;ETH"),
patch("data360.api.get_data", mock_data),
):
result = await analyze_development_topic(
query="Labor market Morocco vs manufacturing Ethiopia",
country="Morocco; Ethiopia",
ctx=ctx,
)
assert result["decomposition_method"] == "sampling_client"
assert result["country_code"] == "MAR;ETH"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants