feat(analyze): add data360_analyze_development_topic with LiteLLM sampling and structured output by rafmacalaba · Pull Request #73 · worldbank/data360-mcp

rafmacalaba · 2026-04-26T06:25:45Z

Summary

Supersedes #63. Rebases data360_analyze_development_topic onto upstream/dev and incorporates all review feedback from that PR.

The dependency chain for #63 (feat/multi-query-search #57, background sync #67, search-card #62, database_name #65) is now fully merged into dev, so this PR applies only the sampling-specific additions as a clean single commit.

ACTUAL TESTS
TEST SAMPLING in VSCode: decomposition_method: "sampling_client"

SAMPLING TIER 2 on clients that doesn't support sampling. decomposition_method: "sampling_server"

Changes from PR #63

New module: `src/data360/mcp_server/sampling.py`

Extracts the sampling handler into a standalone, testable module. Replaces the hard-coded OpenAISamplingHandler with a generic LiteLLM-based handler supporting multiple providers through a unified interface:

Provider	`LITELLM_MODEL` example	Credentials
OpenAI (default)	`gpt-4o-mini`	`OPENAI_API_KEY`
Anthropic	`anthropic/claude-haiku-3-5`	`ANTHROPIC_API_KEY`
Google Gemini	`gemini/gemini-2.0-flash`	`GEMINI_API_KEY`
Mistral AI	`mistral/mistral-small-latest`	`MISTRAL_API_KEY`
AWS Bedrock (direct)	`bedrock/mistral.mistral-7b-instruct-v0:2`	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION_NAME`

The handler is enabled only when at least one credential env var is present (has_llm_credentials()). If none are set, the server logs a clear message and the handler is omitted, so FastMCP skips to the client tier or rule-based tier without attempting a failing LLM call.

`src/data360/mcp_server/_server_definition.py`

Wires up the new handler. Replaces the OpenAISamplingHandler import and the silent except Exception: pass block with explicit startup logging.

`src/data360/api.py`

Structured output via `result_type=_DecompositionResult`

Defines two Pydantic models:

class _SampledQueryGroup(BaseModel):
    queries: list[str]
    country: str

class _DecompositionResult(BaseModel):
    sub_queries: list[str] | None = None
    query_groups: list[_SampledQueryGroup] | None = None

ctx.sample() is called with result_type=_DecompositionResult. FastMCP enforces the schema on the LLM response and returns a validated object via result.result. This eliminates:

The json.loads() / json.JSONDecodeError fallback
The markdown-fence stripping regex
The broad except (ValueError, json.JSONDecodeError) handler (Copilot review item)

Simplified fallback path

Any failure from ctx.sample() (client does not support sampling, LiteLLM call failed, validation error) logs a WARNING and proceeds with the raw user query. No partial JSON fallback. No sampling_error field surfaced to the caller.

Two-phase sampling for broad MCP client compatibility

Not all MCP clients support schema-constrained sampling. VS Code Copilot (1.99+), for example, supports plain-text sampling but rejects calls that include a result_type JSON schema constraint.

To handle this, sampling now uses a two-phase approach:

Phase A — structured output (result_type=_DecompositionResult):

Works with FastMCP's server-side LiteLLM handler and any client that supports schema-constrained sampling.
FastMCP enforces the Pydantic schema and returns a validated _DecompositionResult object directly — no manual parsing needed.

Phase B — plain-text fallback (triggered only if Phase A raises):

Retries with a plain ctx.sample() call (no result_type).
Manually parses the text response as JSON using json.loads().
Handles three response shapes: {"sub_queries": [...]}, {"query_groups": [...]}, and the legacy flat list ["topic1", ...].
The system prompt includes explicit JSON format examples so models reliably produce parseable output without the schema being injected by FastMCP.

If both phases fail (or neither produces valid output), the tool proceeds with the raw user query (decomposition_method="none").

Verified working with VS Code Copilot (Phase B path) and FastMCP LiteLLM (Phase A path).

Updated system prompt

The prompt describes intent and field semantics. FastMCP automatically appends the Pydantic JSON schema when result_type is set, so the explicit JSON examples in the old prompt are no longer needed.

`database_name` in response (avsolatorio review item)

Each selected_indicators entry now includes database_name alongside database_id.

Per-indicator country scope for prefetch (Copilot review item)

requested_country = getattr(ind, "requested_country", None) or country_code

In multi-country query_groups mode, each indicator carries its own requested_country from the group it was found in. Prefetch now uses that scope instead of the global country_code.

`tests/test_analyze_topic.py`

23 tests:

TestScoreIndicator — indicator scoring (3 tests)
TestHasLlmCredentials — provider credential detection (4 tests)
TestAnalyzeDevelopmentTopic — integration tests with mocked search/get_data (16 tests)
- Phase A structured output (flat and grouped)
- Phase B plain-text fallback: flat JSON, grouped JSON, non-JSON (all 3 cases)
- database_name assertion
- Per-indicator country scope for prefetch
- Both phases fail -> proceeds with raw query
- Empty _DecompositionResult -> proceeds with raw query
- sampling_server tier when client does not advertise native sampling

`pyproject.toml`

Added litellm>=1.40.0 to project dependencies.

Sampling Tier Summary

Tier	Condition	LLM provider	On failure
1	Client natively supports MCP sampling	Client's own LLM	Warning -> Tier 2
2	Any `_CREDENTIAL_ENV_VARS` set on server	LiteLLM (multi-provider)	Warning -> Tier 3
3	Neither available	None	Proceed with raw user query

Testing

Quick test with LiteLLM (any provider)

Set any supported provider's API key and run the test suite:

# Option 1: OpenAI (default model: gpt-4o-mini)
export OPENAI_API_KEY="sk-..."
uv run pytest tests/test_analyze_topic.py -v

# Option 2: Anthropic
export ANTHROPIC_API_KEY="ant-..."
export LITELLM_MODEL="anthropic/claude-haiku-3-5"
uv run pytest tests/test_analyze_topic.py -v

# Option 3: Mistral AI (direct API)
export MISTRAL_API_KEY="..."
export LITELLM_MODEL="mistral/mistral-small-latest"
uv run pytest tests/test_analyze_topic.py -v

# Option 4: Google Gemini
export GEMINI_API_KEY="..."
export LITELLM_MODEL="gemini/gemini-2.0-flash"
uv run pytest tests/test_analyze_topic.py -v

Manual e2e test (server-side sampling)

# Start server with sampling handler
export OPENAI_API_KEY="sk-..."
uv run poe serve

# In another terminal, call the tool via MCP client or curl
# The decomposition_method in the response should be "sampling_server"
# (or "sampling_client" if the MCP client supports native sampling)

Testing with mAI Factory APIM Bedrock (Ministral-3B)

Note: APIM Bedrock integration is planned as a follow-up PR. The instructions below document how to validate the Bedrock SLM path once it is implemented.

The mAI Factory platform provides Mistral SLMs via an Azure APIM Bedrock proxy. The recommended model for structured-output query decomposition is Ministral-3B (mistral.ministral-3-3b-instruct), which is designed for function calling and JSON-style structured outputs (256k context).

Environments available: DEV, QA (PROD in progress)

# Direct APIM Bedrock test (validates the model works for our use case)
import requests
from azure.identity import ClientSecretCredential
import os, json

# --- Auth ---
credential = ClientSecretCredential(
    tenant_id=os.environ["AZURE_TENANT_ID"],
    client_id=os.environ["AZURE_CLIENT_ID"],
    client_secret=os.environ["AZURE_CLIENT_SECRET"],
)
# DEV scope
SCOPE = "a1dd6401-acf2-43b5-a37c-c3230ef1be7d/.default"
token = credential.get_token(SCOPE).token

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json",
    "x-source-type": "job",
}

# --- Call Ministral-3B via APIM Bedrock proxy ---
MODEL_ID = "mistral.ministral-3-3b-instruct"
url = f"https://azapimdev.worldbank.org/conversationalai/bedrock/model/{MODEL_ID}/converse"

payload = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "system": (
        "You are a development economist. Given the user's question about "
        "development data, generate 3-5 specific, measurable topics that can "
        "be searched in a statistical database.\n\n"
        "Respond with valid JSON matching this schema:\n"
        '{"sub_queries": ["topic1", "topic2", ...]} or\n'
        '{"query_groups": [{"queries": ["topic1"], "country": "CountryName"}, ...]}\n'
        "Populate exactly one of sub_queries or query_groups."
    ),
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "User question: What are Ghana's economic challenges?"}
            ],
        }
    ],
    "temperature": 0.2,
}

response = requests.post(url, headers=headers, json=payload)
if response.ok:
    result = response.json()
    for item in result["output"]["message"]["content"]:
        if "text" in item:
            parsed = json.loads(item["text"])
            print(json.dumps(parsed, indent=2))
    usage = result.get("usage", {})
    print(f"Tokens -- in: {usage.get('inputTokens')}, out: {usage.get('outputTokens')}")
else:
    print(response.status_code, response.text)

What to verify:

The model returns valid JSON matching the _DecompositionResult schema
sub_queries contains 3-5 specific, searchable topics (not vague rephrases)
For multi-country questions, query_groups is populated with per-country scoping
Latency is acceptable for a server-side fallback (target: <2s for 3-5 sub-queries)
Token usage is low (this is a small prompt; expect <200 input, <100 output tokens)

Follow-up: APIM Bedrock Integration (separate PR)

The current implementation uses LiteLLM with standard provider API keys. A follow-up PR will add native support for the mAI Factory APIM Bedrock proxy as an alternative Tier 2 fallback path. This is needed because:

LiteLLM's bedrock/ prefix calls AWS Bedrock directly using AWS_ACCESS_KEY_ID. It does not route through the APIM gateway.
The WB deployment accesses Bedrock models through Azure APIM (azapim{env}.worldbank.org/conversationalai/bedrock/model/{model_id}/converse) using Azure AD tokens.
These are fundamentally different auth and transport paths.

Planned architecture for follow-up PR

Tier 1: ctx.sample() --> client's LLM (no server dependency)
Tier 2a: LiteLLM (OpenAI/Anthropic/Gemini/Mistral-direct) --> if standard API keys exist
Tier 2b: APIM Bedrock proxy (Ministral-3B) --> if Azure AD creds + APIM config exist
Tier 3: Raw query fallback --> search directly with the user query

New env vars for APIM path

Variable	Description	Example
`MAI_APIM_ENV`	Target environment	`DEV`, `QA`, `PROD`
`AZURE_CLIENT_ID`	Azure AD app registration	(from Key Vault)
`AZURE_CLIENT_SECRET`	Azure AD app secret	(from Key Vault)
`AZURE_TENANT_ID`	Azure AD tenant	(from Key Vault)
`MAI_APIM_SUBSCRIPTION_KEY`	APIM subscription key (Application Access only)	(from Key Vault)
`MAI_BEDROCK_MODEL`	Bedrock model ID override	`mistral.ministral-3-3b-instruct` (default)

Default SLM: Ministral-3B

mistral.ministral-3-3b-instruct is recommended for the server-side fallback because:

3B params: Fast inference, low cost
Native structured output: Designed for function calling and JSON-style outputs
256k context: More than sufficient for our ~200 token decomposition prompts
Available in DEV/QA now via APIM Bedrock, PROD in progress

Alternative: mistral.mistral-small-latest (Mistral-Small, 7B, more capable but slower).

Implementation sketch

# sampling.py (addition in follow-up PR)

_APIM_ENV_CONFIG = {
    "DEV":  {"endpoint": "https://azapimdev.worldbank.org/conversationalai", "scope": "a1dd6401-acf2-43b5-a37c-c3230ef1be7d/.default"},
    "QA":   {"endpoint": "https://azapimqa.worldbank.org/conversationalai",  "scope": "c626bd72-9ef7-4efe-9176-5c75800f7670/.default"},
    "PROD": {"endpoint": "https://azapim.worldbank.org/conversationalai",    "scope": "0b3b356c-4b5f-4d5b-97ad-c99343ad5557/.default"},
}

async def apim_bedrock_sampling_handler(messages, params, context) -> str:
    """Calls Ministral-3B via mAI Factory APIM Bedrock proxy."""
    from azure.identity import ClientSecretCredential  # lazy import
    # 1. Acquire Azure AD token
    # 2. POST to /bedrock/model/{model_id}/converse
    # 3. Parse Bedrock converse response format
    # 4. Return extracted text
    ...

Test Results

23 passed (test_analyze_topic.py)
236 passed (existing suite)

…pling and structured output Supersedes PR #63. Implements MCP sampling integration for the analyze_development_topic tool with the following improvements over the original: - Generic LiteLLM handler in src/data360/mcp_server/sampling.py Supports OpenAI, Anthropic, Gemini, Mistral AI, and AWS Bedrock (Mistral SLMs) through a single interface. Provider is selected via LITELLM_MODEL env var (default: gpt-4o-mini). No hard dependency on a single provider. - Structured output via result_type=_DecompositionResult (Pydantic) Replaces the manual JSON parsing / regex extraction path. FastMCP enforces the schema on the LLM response, eliminating unexpected free-text outputs and the broad except (ValueError, json.JSONDecodeError) handler. - Clean fallback path Any sampling failure (client does not support sampling, LiteLLM call fails) logs a warning and falls through to rule-based decomposition. No partial JSON fallback is attempted. - database_name added to selected_indicators response dict Avoids LLM needing a separate lookup to get the human-readable name. - Per-indicator requested_country for data prefetch Uses ind.requested_country over the global country_code when present, so multi-country queries prefetch data for the correct scope per indicator. - Sampling handler extracted to standalone module src/data360/mcp_server/sampling.py is separately testable with clear provider documentation and a has_llm_credentials() helper. - System prompt updated for Pydantic schema Prompt describes intent and field semantics; FastMCP appends the JSON schema automatically when result_type is provided. Tier resolution (unchanged from PR #63): Tier 1 — client native sampling Tier 2 — server-side LiteLLM handler (fallback) Tier 3 — rule-based regex decomposition Tests: 28 new / 236 existing all pass.

… with raw query

…mpatibility Phase A tries ctx.sample() with result_type=_DecompositionResult (schema-constrained structured output). If the MCP client rejects result_type (e.g. VS Code Copilot, Claude Desktop), Phase B retries with a plain ctx.sample() call and parses the text response as JSON manually. The system prompt now includes explicit JSON format examples so models reliably produce parseable output in both phases. Sampling cascade: Phase A: result_type= (Pydantic-validated, FastMCP enforces schema) Phase B: plain-text + manual JSON parse (VS Code compatible) Fallback: proceed with raw user query (decomposition_method='none') Tests: 3 new test cases covering Phase B flat, grouped, and non-JSON paths.

rafmacalaba · 2026-04-26T13:30:41Z

@avsolatorio some clients like VS Code, doesn't support structured outputs as you suggested in the previous PR. handled it using a fallback.

Two-phase sampling for broad MCP client compatibility
Not all MCP clients support schema-constrained sampling. VS Code Copilot (1.99+), for example, supports plain-text sampling but rejects calls that include a result_type JSON schema constraint.

To handle this, sampling now uses a two-phase approach:

Phase A — structured output (result_type=_DecompositionResult):

Works with FastMCP's server-side LiteLLM handler and any client that supports schema-constrained sampling.
FastMCP enforces the Pydantic schema and returns a validated _DecompositionResult object directly — no manual parsing needed.
Phase B — plain-text fallback (triggered only if Phase A raises):

Retries with a plain ctx.sample() call (no result_type).
Manually parses the text response as JSON using json.loads().
Handles three response shapes: {"sub_queries": [...]}, {"query_groups": [...]}, and the legacy flat list ["topic1", ...].
The system prompt includes explicit JSON format examples so models reliably produce parseable output without the schema being injected by FastMCP.
If both phases fail (or neither produces valid output), the tool proceeds with the raw user query (decomposition_method="none").

Verified working with VS Code Copilot (Phase B path) and FastMCP LiteLLM (Phase A path).

Copilot

Pull request overview

Adds a new data360_analyze_development_topic MCP tool that decomposes broad user questions into searchable sub-queries using MCP sampling, with a server-side LiteLLM fallback and structured-output validation.

Changes:

Implemented analyze_development_topic with two-phase sampling (structured result_type then plain-text JSON fallback) and indicator prefetch/scoring.
Added a LiteLLM-based MCP sampling handler and wired it into the server definition (enabled only when credentials exist).
Added a dedicated test suite for sampling tiers, decomposition shapes, and response fields; added litellm dependency.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`src/data360/api.py`	Adds topic-analysis tool, sampling logic, scoring, and response shaping.
`src/data360/mcp_server/sampling.py`	Introduces LiteLLM-based server-side sampling handler and credential detection.
`src/data360/mcp_server/_server_definition.py`	Enables sampling handler conditionally and configures fallback behavior.
`src/data360/mcp_server/tools.py`	Registers `data360_analyze_development_topic` as an MCP tool.
`tests/test_analyze_topic.py`	Adds tests for decomposition/scoring/sampling tiers and response fields.
`pyproject.toml`	Adds `litellm>=1.40.0` dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-26T13:38:34Z

+    if country_code and "," in country_code:
+        multi_country_codes = [c.strip() for c in country_code.split(",") if c.strip()]


Multi-country detection/splitting is currently based on commas (,) in country_code, but _resolve_country_code() returns semicolon-delimited codes for multi-country input. This means Path B (cross-product query_groups) won’t trigger for real multi-country requests. Consider switching this logic to use ; (and updating docs/tests accordingly), or changing _resolve_country_code() contract consistently across the codebase.

Suggested change

if country_code and "," in country_code:

multi_country_codes = [c.strip() for c in country_code.split(",") if c.strip()]

if country_code and ";" in country_code:

multi_country_codes = [c.strip() for c in country_code.split(";") if c.strip()]

Copilot · 2026-04-26T13:38:35Z

+# --- Topic Analysis Helpers ---
+
+# Conjunctions and stopwords used for rule-based query decomposition.
+# Conjunctions and stopwords used for rule-based query decomposition.


The comment above _STOPWORDS is duplicated on two consecutive lines. Remove one to avoid noise in this section.

Suggested change

# Conjunctions and stopwords used for rule-based query decomposition.

Copilot · 2026-04-26T13:38:35Z

+            patch("data360.api._resolve_country_code", return_value="MAR,ETH"),
+            patch("data360.api.get_data", mock_data),
+        ):
+            result = await analyze_development_topic(
+                query="Labor market Morocco vs Ethiopia",
+                country="Morocco, Ethiopia",


This test uses comma-delimited multi-country input and patches _resolve_country_code to return "MAR,ETH", but the real resolver uses semicolon-delimited codes for multi-country values. Update the test to reflect the production delimiter so it validates the actual routing behavior.

Suggested change

patch("data360.api._resolve_country_code", return_value="MAR,ETH"),

patch("data360.api.get_data", mock_data),

):

result = await analyze_development_topic(

query="Labor market Morocco vs Ethiopia",

country="Morocco, Ethiopia",

patch("data360.api._resolve_country_code", return_value="MAR;ETH"),

patch("data360.api.get_data", mock_data),

):

result = await analyze_development_topic(

query="Labor market Morocco vs Ethiopia",

country="Morocco; Ethiopia",

Copilot · 2026-04-26T13:38:35Z

+            patch("data360.api._resolve_country_code", return_value="MAR,ETH"),
+            patch("data360.api.get_data", mock_data),
+        ):
+            result = await analyze_development_topic(
+                query="labor market and manufacturing",
+                country="Morocco, Ethiopia",
+            )
+
+        assert result["decomposition_method"] == "none"
+        assert result["country_code"] == "MAR,ETH"


This test case also assumes comma-delimited multi-country codes ("MAR,ETH") and inputs ("Morocco, Ethiopia"), which doesn’t match _resolve_country_code()’s semicolon-delimited multi-country contract. Align the test (and the production code/docs) on one delimiter so multi-country behavior is actually exercised correctly.

Suggested change

patch("data360.api._resolve_country_code", return_value="MAR,ETH"),

patch("data360.api.get_data", mock_data),

):

result = await analyze_development_topic(

query="labor market and manufacturing",

country="Morocco, Ethiopia",

)

assert result["decomposition_method"] == "none"

assert result["country_code"] == "MAR,ETH"

patch("data360.api._resolve_country_code", return_value="MAR;ETH"),

patch("data360.api.get_data", mock_data),

):

result = await analyze_development_topic(

query="labor market and manufacturing",

country="Morocco; Ethiopia",

)

assert result["decomposition_method"] == "none"

assert result["country_code"] == "MAR;ETH"

Copilot · 2026-04-26T13:38:35Z

+        country: Optional country name or 3-letter code (e.g. "Ghana", "GHA").
+            Can also be comma-separated for multi-country comparisons (e.g. "Morocco, Ethiopia").
+        max_indicators: Maximum number of indicators to return (default 4, max 6).


The tool docs and implementation treat multi-country input as comma-separated (e.g. "Morocco, Ethiopia"), but _resolve_country_code() only supports semicolon-delimited multi-country values (because some country names contain commas). As a result, multi-country resolution and routing will fail unless this is aligned (either accept semicolons here, or implement robust parsing that won’t break names like "Korea, Republic of").

Copilot · 2026-04-26T13:38:36Z

 import asyncio
 import json
 import logging
+import re
 import zlib
 from typing import Any
 from urllib.parse import urlencode

+from fastmcp import Context
+from pydantic import BaseModel


re is imported but not used in this file. If it’s leftover from the old markdown-stripping logic, it should be removed to avoid dead imports and keep the module clean.

Copilot · 2026-04-26T13:38:36Z

+            query: The original question.
+            country / country_code: Resolved country info (if provided). May be comma-separated.
+            sub_queries: The decomposed search terms (from LLM or rule-based).
+            decomposition_method: "sampling_client", "sampling_server", or "none".
+            selected_indicators: List of ranked indicator dicts, each with:


The docstring and surrounding comments describe a “rule-based decomposition” fallback, but the current implementation falls back to sub_queries=[query] with decomposition_method="none" (no rule-based splitting). Please update the docstring/comments to match the actual behavior (or implement the intended rule-based decomposition).

Copilot · 2026-04-26T13:38:36Z

+            temperature=params.temperature or 0.2,
+            max_tokens=params.maxTokens or 512,


Using params.temperature or 0.2 (and similarly params.maxTokens or 512) will override valid falsy values like 0.0 temperature. Prefer an explicit is None check so 0 / 0.0 are preserved when the caller intentionally sets them.

Suggested change

temperature=params.temperature or 0.2,

max_tokens=params.maxTokens or 512,

temperature=(

params.temperature if params.temperature is not None else 0.2

),

max_tokens=(

params.maxTokens if params.maxTokens is not None else 512

),

Copilot · 2026-04-26T13:38:36Z

+    Raises on failure so that FastMCP can propagate the error back to the
+    caller (who should catch it and fall back to rule-based decomposition).
+    """


This docstring says the caller will “fall back to rule-based decomposition”, but analyze_development_topic currently falls back to using the raw query (decomposition_method="none"). Please align this wording with the actual Tier 3 behavior to avoid confusing operators.

Copilot · 2026-04-26T13:38:36Z

+            patch("data360.api._resolve_country_code", return_value="MAR,ETH"),
+            patch("data360.api.get_data", mock_data),
+        ):
+            result = await analyze_development_topic(
+                query="Labor market Morocco vs manufacturing Ethiopia",
+                country="Morocco, Ethiopia",
+                ctx=ctx,
+            )
+
+        assert result["decomposition_method"] == "sampling_client"
+        assert result["country_code"] == "MAR,ETH"


These tests treat multi-country input as comma-delimited (and patch _resolve_country_code to return "MAR,ETH"), but _resolve_country_code() actually uses semicolons for multi-country (to avoid ambiguity with country names containing commas). Adjust the test inputs/expected values to match the real delimiter (or update the production code contract consistently).

Suggested change

patch("data360.api._resolve_country_code", return_value="MAR,ETH"),

patch("data360.api.get_data", mock_data),

):

result = await analyze_development_topic(

query="Labor market Morocco vs manufacturing Ethiopia",

country="Morocco, Ethiopia",

ctx=ctx,

)

assert result["decomposition_method"] == "sampling_client"

assert result["country_code"] == "MAR,ETH"

patch("data360.api._resolve_country_code", return_value="MAR;ETH"),

patch("data360.api.get_data", mock_data),

):

result = await analyze_development_topic(

query="Labor market Morocco vs manufacturing Ethiopia",

country="Morocco; Ethiopia",

ctx=ctx,

)

assert result["decomposition_method"] == "sampling_client"

assert result["country_code"] == "MAR;ETH"

rafmacalaba added 3 commits April 26, 2026 13:52

refactor(sampling): remove rule-based decomposition fallback, proceed…

0910ff2

… with raw query

rafmacalaba requested review from avsolatorio and Copilot April 26, 2026 13:31

Copilot started reviewing on behalf of rafmacalaba April 26, 2026 13:31 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(analyze): add data360_analyze_development_topic with LiteLLM sampling and structured output#73

feat(analyze): add data360_analyze_development_topic with LiteLLM sampling and structured output#73
rafmacalaba wants to merge 3 commits intodevfrom
feat/mcp-sampling

rafmacalaba commented Apr 26, 2026 •

edited

Loading

Uh oh!

rafmacalaba commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if country_code and "," in country_code:
		multi_country_codes = [c.strip() for c in country_code.split(",") if c.strip()]

		temperature=params.temperature or 0.2,
		max_tokens=params.maxTokens or 512,

-            temperature=params.temperature or 0.2,
-            max_tokens=params.maxTokens or 512,
+            temperature=(
+                params.temperature if params.temperature is not None else 0.2
+            ),
+            max_tokens=(
+                params.maxTokens if params.maxTokens is not None else 512
+            ),

Conversation

rafmacalaba commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes from PR #63

New module: src/data360/mcp_server/sampling.py

src/data360/mcp_server/_server_definition.py

src/data360/api.py

Structured output via result_type=_DecompositionResult

Simplified fallback path

Two-phase sampling for broad MCP client compatibility

Updated system prompt

database_name in response (avsolatorio review item)

Per-indicator country scope for prefetch (Copilot review item)

tests/test_analyze_topic.py

pyproject.toml

Sampling Tier Summary

Testing

Quick test with LiteLLM (any provider)

Manual e2e test (server-side sampling)

Testing with mAI Factory APIM Bedrock (Ministral-3B)

Follow-up: APIM Bedrock Integration (separate PR)

Planned architecture for follow-up PR

New env vars for APIM path

Default SLM: Ministral-3B

Implementation sketch

Test Results

Uh oh!

rafmacalaba commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rafmacalaba commented Apr 26, 2026 •

edited

Loading

New module: `src/data360/mcp_server/sampling.py`

`src/data360/mcp_server/_server_definition.py`

`src/data360/api.py`

Structured output via `result_type=_DecompositionResult`

`database_name` in response (avsolatorio review item)

`tests/test_analyze_topic.py`

`pyproject.toml`