diff --git a/skills/feedmob-reporting-skills/SKILL.md b/skills/feedmob-reporting-skills/SKILL.md index 5a8471e..842c768 100644 --- a/skills/feedmob-reporting-skills/SKILL.md +++ b/skills/feedmob-reporting-skills/SKILL.md @@ -211,7 +211,185 @@ For detailed report structure and formatting guide, see: **[Report Structure Gui --- -### 2. AppsFlyer MMP Client Workflow +### 2. Net Spend Verification Workflow (Partner Reports) + +Compare partner reports (Jampp, Kayzen, etc.) with direct spend data to verify net spend accuracy. + +**Applicable Partners:** +- ✅ Jampp +- ✅ Kayzen +- ✅ YouAppi +- ✅ Samsung +- ✅ Smadex +- ✅ InMobi +- ✅ Liftoff + +**When to use:** When verifying net spend from partner platforms against FeedMob direct spend records. + +**Key Difference from Gross Spend:** +- **Gross Spend**: Requires calculation (`client_paid_action_count × gross_cpi`) +- **Net Spend**: Direct comparison (`partner_net_spend` vs `feedmob_net_spend`) + +**Workflow Steps:** + +#### Step 1: Fetch Partner Report + +**Direct Partners (Jampp, Kayzen, YouAppi, Samsung):** + +```javascript +// Jampp - requires client_id +mcp__feedmob-reporting__get_jampp_reports({ + client_id: 123, + start_date: "2025-01-01", + end_date: "2025-01-31" +}) + +// Kayzen - no client_id required +mcp__feedmob-reporting__get_kayzen_reports({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) + +// YouAppi - no client_id required +mcp__feedmob-reporting__get_youappi_reports({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) + +// Samsung - no client_id required +mcp__feedmob-reporting__get_samsung_reports({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +**Multi-step Partners (Smadex, InMobi, Liftoff):** + +**Smadex (3-step process):** +```javascript +// Step 1: Get report IDs +mcp__feedmob-reporting__get_smadex_report_ids({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) + +// Step 2: Check status (repeat until ready) +mcp__feedmob-reporting__check_smadex_report_status({ + report_id: "abc-123" +}) + +// Step 3: Get report data when status is "ready" +mcp__feedmob-reporting__get_smadex_reports({ + report_id: "abc-123" +}) +``` + +**InMobi (3-step process):** +```javascript +// Step 1: Get report IDs +mcp__feedmob-reporting__get_inmobi_report_ids({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) + +// Step 2: Check status (repeat until ready) +mcp__feedmob-reporting__check_inmobi_report_status({ + report_id: "abc-123", + start_date: "2025-01-01", + end_date: "2025-01-31" +}) + +// Step 3: Get report data when status is "ready" +mcp__feedmob-reporting__get_inmobi_reports({ + skan_report_id: "skan-123", + non_skan_report_id: "non-skan-456", + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +**Liftoff (3-step process):** +```javascript +// Step 1: Get report IDs +mcp__feedmob-reporting__get_liftoff_report_ids({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) + +// Step 2: Check status (repeat until ready) +mcp__feedmob-reporting__check_liftoff_report_status({ + stash_report_id: "stash-123", + possible_finance_report_id: "pf-456" +}) + +// Step 3: Get report data when status is "ready" +mcp__feedmob-reporting__get_liftoff_reports({ + stash_report_id: "stash-123", + possible_finance_report_id: "pf-456", + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +**Response contains:** +- `click_url_id`: Click URL identifier +- `partner_net_spend`: Net spend from partner report +- `date`: Report date +- Other partner-specific fields + +#### Step 2: Extract Click URL IDs and Fetch Direct Spends + +From partner report, extract all unique `click_url_id` values, then fetch direct spends: + +```javascript +mcp__feedmob-reporting__get_direct_spends({ + start_date: "2025-01-01", + end_date: "2025-01-31", + click_url_ids: ["12345", "12346", ...] // string array +}) +``` + +#### Step 3: Use Automation Scripts (Recommended) + +**Step 3.1: Compare Net Spend** + +First use Glob to find: `**/compare_net_spend_datafusion.py` + +```bash +python3 scripts/compare_net_spend_datafusion.py \ + +``` + +**Step 3.2: Generate Analysis Summary** + +```bash +python3 scripts/analyze_net_spend_datafusion.py \ + +``` + +**For detailed script usage, see:** +**[Scripts Usage Guide](references/scripts-usage-guide.md)** + +#### Step 4: Generate Final Report + +Compare `partner_net_spend` vs `feedmob_net_spend`: + +| Click URL | Date | Partner Net | FeedMob Net | Difference | Diff % | Status | +|-----------|------|-------------|-------------|------------|--------|--------| +| 12345 | 2025-01-01 | $1,500.00 | $1,500.00 | $0.00 | 0.00% | ✅ | +| 12346 | 2025-01-01 | $2,000.00 | $1,950.00 | $50.00 | 2.56% | 🚨 | + +**Status Icons:** +- ✅ **Perfect Match**: 0% difference +- ⚠️ **Minor Difference**: <2% difference +- 🚨 **Significant Difference**: ≥2% difference + +**For detailed report structure, see:** +**[Report Structure Guide](references/report-structure.md)** + +--- + +### 3. AppsFlyer MMP Client Workflow Use this workflow when client uses AppsFlyer as MMP (instead of Singular or Adjust). diff --git a/skills/feedmob-reporting-skills/references/mcp_tools.md b/skills/feedmob-reporting-skills/references/mcp_tools.md index c8f1dd5..30d05c3 100644 --- a/skills/feedmob-reporting-skills/references/mcp_tools.md +++ b/skills/feedmob-reporting-skills/references/mcp_tools.md @@ -25,8 +25,9 @@ API reference for all feedmob-reporting MCP tools. 2. [Koho Financial Tools](#koho-financial-tools) 3. [TextNow Tools](#textnow-tools) 4. [AppsFlyer Tools](#appsflyer-tools) -5. [Direct Spend Tools](#direct-spend-tools) -6. [AdOps Tools](#adops-tools) +5. [Partner Reports (Net Spend)](#partner-reports-net-spend) +6. [Direct Spend Tools](#direct-spend-tools) +7. [AdOps Tools](#adops-tools) --- @@ -296,6 +297,251 @@ mcp__feedmob-reporting__get_appsflyer_reports({ --- +## Partner Reports (Net Spend) + +These tools fetch partner platform reports for net spend verification. Unlike gross spend verification (which requires calculation), net spend verification directly compares `partner_net_spend` vs `feedmob_net_spend`. + +### Partner Reports Overview + +| Partner | Tool | Workflow Type | client_id Required | +|---------|------|---------------|-------------------| +| Jampp | `get_jampp_reports` | Direct (1-step) | ✅ Yes | +| Kayzen | `get_kayzen_reports` | Direct (1-step) | ❌ No | +| YouAppi | `get_youappi_reports` | Direct (1-step) | ❌ No | +| Samsung | `get_samsung_reports` | Direct (1-step) | ❌ No | +| Smadex | `get_smadex_reports` | Multi-step | ❌ No | +| InMobi | `get_inmobi_reports` | Multi-step | ❌ No | +| Liftoff | `get_liftoff_reports` | Multi-step | ❌ No | + +**Unified Response Field:** All partner reports return `partner_net_spend` for net spend amount. + +--- + +### get_jampp_reports + +Fetches Jampp report data for a specified date range. + +**Parameters:** +- `client_id` (number, required): Client ID +- `start_date` (string, required): Start date in YYYY-MM-DD format +- `end_date` (string, required): End date in YYYY-MM-DD format + +**Returns:** +- Array of report data containing: + - `click_url_id`: The click URL identifier + - `partner_net_spend`: Net spend from Jampp + - `date`: Report date + - Other Jampp-specific fields + +**Example:** +```javascript +mcp__feedmob-reporting__get_jampp_reports({ + client_id: 123, + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +**Common Use Cases:** +- Net spend verification for Jampp campaigns +- Daily/weekly/monthly reconciliation +- Campaign performance analysis + +--- + +### get_kayzen_reports + +Fetches Kayzen report data for a specified date range. + +**Parameters:** +- `start_date` (string, required): Start date in YYYY-MM-DD format +- `end_date` (string, required): End date in YYYY-MM-DD format + +**Returns:** +- Array of report data containing: + - `click_url_id`: The click URL identifier + - `partner_net_spend`: Net spend from Kayzen + - `date`: Report date + - Other Kayzen-specific fields + +**Example:** +```javascript +mcp__feedmob-reporting__get_kayzen_reports({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +--- + +### get_youappi_reports + +Fetches YouAppi report data for a specified date range. + +**Parameters:** +- `start_date` (string, required): Start date in YYYY-MM-DD format +- `end_date` (string, required): End date in YYYY-MM-DD format + +**Returns:** +- Array of report data containing: + - `click_url_id`: The click URL identifier + - `partner_net_spend`: Net spend from YouAppi + - `date`: Report date + - Other YouAppi-specific fields + +**Example:** +```javascript +mcp__feedmob-reporting__get_youappi_reports({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +--- + +### get_samsung_reports + +Fetches Samsung report data for a specified date range. + +**Parameters:** +- `start_date` (string, required): Start date in YYYY-MM-DD format +- `end_date` (string, required): End date in YYYY-MM-DD format + +**Returns:** +- Array of report data containing: + - `click_url_id`: The click URL identifier + - `partner_net_spend`: Net spend from Samsung + - `date`: Report date + - Other Samsung-specific fields + +**Example:** +```javascript +mcp__feedmob-reporting__get_samsung_reports({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +--- + +### get_smadex_reports (Multi-step) + +Fetches Smadex report data. Requires 3-step workflow. + +**Step 1: Get Report IDs** +```javascript +mcp__feedmob-reporting__get_smadex_report_ids({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +**Step 2: Check Status (repeat until ready)** +```javascript +mcp__feedmob-reporting__check_smadex_report_status({ + report_id: "abc-123" +}) +``` + +**Step 3: Get Report Data** +```javascript +mcp__feedmob-reporting__get_smadex_reports({ + report_id: "abc-123" +}) +``` + +**Returns:** +- Array of report data containing: + - `click_url_id`: The click URL identifier + - `partner_net_spend`: Net spend from Smadex + - `date`: Report date + - Other Smadex-specific fields + +--- + +### get_inmobi_reports (Multi-step) + +Fetches InMobi report data. Requires 3-step workflow with SKAN and Non-SKAN reports. + +**Step 1: Get Report IDs** +```javascript +mcp__feedmob-reporting__get_inmobi_report_ids({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +**Step 2: Check Status (repeat until ready)** +```javascript +mcp__feedmob-reporting__check_inmobi_report_status({ + report_id: "abc-123", + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +**Step 3: Get Report Data** +```javascript +mcp__feedmob-reporting__get_inmobi_reports({ + skan_report_id: "skan-123", + non_skan_report_id: "non-skan-456", + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +**Returns:** +- Array of report data containing: + - `click_url_id`: The click URL identifier + - `partner_net_spend`: Net spend from InMobi + - `date`: Report date + - Other InMobi-specific fields + +**Note:** InMobi returns two report types (SKAN and Non-SKAN) that should be combined for full coverage. + +--- + +### get_liftoff_reports (Multi-step) + +Fetches Liftoff report data. Requires 3-step workflow with Stash and Possible Finance reports. + +**Step 1: Get Report IDs** +```javascript +mcp__feedmob-reporting__get_liftoff_report_ids({ + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +**Step 2: Check Status (repeat until ready)** +```javascript +mcp__feedmob-reporting__check_liftoff_report_status({ + stash_report_id: "stash-123", + possible_finance_report_id: "pf-456" +}) +``` + +**Step 3: Get Report Data** +```javascript +mcp__feedmob-reporting__get_liftoff_reports({ + stash_report_id: "stash-123", + possible_finance_report_id: "pf-456", + start_date: "2025-01-01", + end_date: "2025-01-31" +}) +``` + +**Returns:** +- Array of report data containing: + - `click_url_id`: The click URL identifier + - `partner_net_spend`: Net spend from Liftoff + - `date`: Report date + - Other Liftoff-specific fields + +**Note:** Liftoff returns two report types (Stash and Possible Finance) that should be combined for full coverage. + +--- + ## Direct Spend Tools ### get_direct_spends diff --git a/skills/feedmob-reporting-skills/references/report-structure.md b/skills/feedmob-reporting-skills/references/report-structure.md index 6410d7f..90af291 100644 --- a/skills/feedmob-reporting-skills/references/report-structure.md +++ b/skills/feedmob-reporting-skills/references/report-structure.md @@ -245,3 +245,163 @@ Before sending the final report, verify: ❌ **Don't include CPM in "Perfect Match" counts** - Perfect matches should only count verifiable (Non-CPM) campaigns - CPM accuracy cannot be calculated without CPM rates + +--- + +## Net Spend Report Structure + +For partner reports (Jampp, Kayzen, YouAppi, Samsung, Smadex, InMobi, Liftoff), use this structure for net spend verification. + +### Key Difference from Gross Spend + +| Aspect | Gross Spend | Net Spend | +|--------|-------------|-----------| +| **Source** | Attribution Reports (Singular/Adjust) | Partner Reports | +| **Comparison** | `calculated_gross` vs `direct_gross` | `partner_net_spend` vs `feedmob_net_spend` | +| **Calculation** | `event_count × gross_cpi` | No calculation needed | +| **Rate Required** | Yes (`gross_cpi` from histories) | No | + +--- + +### 1. Net Spend Overall Summary + +``` +Total Click URLs: X +Total Partner Net Spend: $X,XXX.XX +Total FeedMob Net Spend: $X,XXX.XX +Total Difference: $XXX.XX (X.X%) +Partner: [Jampp/Kayzen/YouAppi/etc.] +Date Range: YYYY-MM-DD to YYYY-MM-DD +``` + +--- + +### 2. Click URL Level Comparison Table + +``` +| Click URL | Campaign | Vendor | Date | Partner Net | FeedMob Net | Difference | Diff % | Status | +|-----------|----------|--------|------|-------------|-------------|------------|--------|--------| +| 12345 | Campaign_A | Jampp | 2025-01-01 | $1,500.00 | $1,500.00 | $0.00 | 0.00% | ✅ | +| 12346 | Campaign_B | Jampp | 2025-01-01 | $2,000.00 | $1,950.00 | $50.00 | 2.56% | 🚨 | +``` + +**Columns:** +- **Click URL**: Unique campaign identifier +- **Campaign**: Campaign name +- **Vendor**: Partner/vendor name +- **Date**: Report date +- **Partner Net**: Net spend from partner report (`partner_net_spend`) +- **FeedMob Net**: Net spend from FeedMob records (`feedmob_net_spend`) +- **Difference**: Partner Net - FeedMob Net +- **Diff %**: Percentage difference +- **Status**: Visual indicator (see Status Icons below) + +--- + +### 3. Vendor Level Summary Table + +``` +| Vendor | Click URLs | Partner Net | FeedMob Net | Difference | Diff % | Status | +|--------|-----------|-------------|-------------|------------|--------|--------| +| Jampp | 5 | $15,000.00 | $14,950.00 | $50.00 | 0.33% | ✅ | +``` + +**Aggregation:** +- Sum all net spend across click URLs for each vendor +- Useful for identifying which vendors have discrepancies + +--- + +### 4. Net Spend Verification Accuracy Statistics + +``` +| Accuracy Level | Click URL Count | Percentage | Total Amount | +|----------------|----------------|------------|-------------| +| Perfect (0%) | X | XX.X% | $X,XXX.XX | +| Excellent (<1%) | X | XX.X% | $X,XXX.XX | +| Good (1-2%) | X | XX.X% | $X,XXX.XX | +| Needs Attention (≥2%) | X | XX.X% | $X,XXX.XX | + +Verified Accuracy: X/X = XX.X% ✅ +``` + +**Accuracy Levels for Net Spend:** +- **Perfect**: 0% difference (partner = feedmob) +- **Excellent**: < 1% difference +- **Good**: 1-2% difference +- **Needs Attention**: ≥ 2% difference + +--- + +### 5. Status Icons for Net Spend + +Use these visual indicators in tables: + +- ✅ **Perfect Match**: 0% difference +- ⚠️ **Minor Difference**: <2% difference +- 🚨 **Significant Difference**: ≥2% difference + +--- + +### 6. Key Findings and Recommendations + +Provide actionable insights: + +- List top performing Click URLs and Vendors (by accuracy) +- Flag Click URLs or Vendors with significant discrepancies +- Identify: + - **Partner-only entries**: Click URLs in partner report but not in FeedMob + - **FeedMob-only entries**: Click URLs in FeedMob but not in partner report +- Provide specific action items: + - "Investigate Click URL 12345 - $50 discrepancy" + - "Excellent performance from Jampp - all campaigns within 1%" + - "Missing FeedMob records for Click URLs: [list]" + +--- + +### Net Spend Report Filtering Rules + +Apply these rules when generating reports: + +**Sorting:** +- **Click URL Table**: Sort by Absolute Difference (descending) +- **Vendor Table**: Sort by Total Partner Net Spend (descending) + +**Filtering:** +- **Filter out zero-activity rows**: Exclude rows where both Partner Net and FeedMob Net are $0.00 +- **Keep single-side entries**: Include rows where only one side has data (indicates missing records) + +**Grouping:** +- Group by Click URL ID for aggregation across dates +- Group by Vendor for vendor-level analysis + +--- + +### Example Net Spend Report Output + +``` +## Jampp Net Spend Verification Report +**Date Range:** 2025-01-01 to 2025-01-31 + +### Overall Summary +- Total Click URLs: 15 +- Total Partner Net Spend: $45,234.56 +- Total FeedMob Net Spend: $45,189.23 +- Total Difference: $45.33 (0.10%) + +### Verification Status +- ✅ Perfect Match: 10 campaigns (66.7%) +- ⚠️ Minor Difference (<2%): 4 campaigns (26.7%) +- 🚨 Significant Difference (≥2%): 1 campaign (6.6%) + +### Anomalies +| Click URL | Date | Partner | FeedMob | Diff | Action | +|-----------|------|---------|---------|------|--------| +| 12346 | 2025-01-15 | $2,000.00 | $1,950.00 | $50.00 (2.56%) | Review billing | + +### Key Findings +1. Excellent overall accuracy - 99.9% match +2. 1 campaign with >2% difference requires review +3. All discrepancies are in Jampp's favor (partner > feedmob) +``` + diff --git a/skills/feedmob-reporting-skills/scripts/README.md b/skills/feedmob-reporting-skills/scripts/README.md index 555a292..8dcad54 100644 --- a/skills/feedmob-reporting-skills/scripts/README.md +++ b/skills/feedmob-reporting-skills/scripts/README.md @@ -114,7 +114,7 @@ click_url_id,campaign_name,vendor_name,date,gross_cpi,net_cpi,client_paid_action 3. **Direct Spends** (direct_spends_*.csv): ``` -feedmob_click_url_id,campaign_name,date,feedmob_net_spend,feedmob_gross_spend +click_url_id,campaign_name,date,feedmob_net_spend,feedmob_gross_spend 19742,PossibleFinance_iOS_US_CPI_Agency,2026-01-01,307.81,308.0 ``` @@ -212,7 +212,7 @@ for row in singular_data: # Manual merge direct_spend_map = {} for row in direct_spend_data: - key = (int(row['feedmob_click_url_id']), row['date']) + key = (int(row['click_url_id']), row['date']) direct_spend_map[key] = float(row['feedmob_gross_spend']) merged_data = [] @@ -247,7 +247,7 @@ SELECT c.calculated_gross_spend - d.feedmob_gross_spend as difference FROM calculated c LEFT JOIN direct_spend d - ON c.click_url_id = d.feedmob_click_url_id + ON c.click_url_id = d.click_url_id AND c.date = d.date ``` diff --git a/skills/feedmob-reporting-skills/scripts/analyze_net_spend_datafusion.py b/skills/feedmob-reporting-skills/scripts/analyze_net_spend_datafusion.py new file mode 100644 index 0000000..49690ec --- /dev/null +++ b/skills/feedmob-reporting-skills/scripts/analyze_net_spend_datafusion.py @@ -0,0 +1,346 @@ +#!/usr/bin/env python3 +""" +DataFusion Analysis Script - Multi-dimensional Net Spend Comparison Report Analysis + +Features: Generates 10 dimension analysis reports (CSV format), suitable for LLM reading +Automatically installs dependencies: datafusion, pandas, pyarrow + +Usage: + python3 analyze_net_spend_datafusion.py + +Input: CSV from compare_net_spend_datafusion.py +Output: Multiple CSV files with different analysis dimensions +""" + +import sys +import subprocess +from pathlib import Path + + +def check_and_install_dependencies(): + """Check and automatically install dependencies""" + missing_packages = [] + + try: + import datafusion + print("✓ datafusion installed") + except ImportError: + missing_packages.append("datafusion") + + try: + import pandas + print("✓ pandas installed") + except ImportError: + missing_packages.append("pandas") + + try: + import pyarrow + print("✓ pyarrow installed") + except ImportError: + missing_packages.append("pyarrow") + + if missing_packages: + print(f"⚠️ Missing packages: {', '.join(missing_packages)}") + print("Auto-installing from requirements.txt...") + + script_dir = Path(__file__).parent + requirements_file = script_dir / "requirements.txt" + + if not requirements_file.exists(): + print(f"✗ requirements.txt not found: {requirements_file}") + print("Please install manually: pip install datafusion pandas pyarrow --user") + return False + + try: + subprocess.check_call([ + sys.executable, "-m", "pip", "install", "-r", str(requirements_file), + "--user", "--quiet" + ]) + print("✓ Dependencies installed successfully") + return True + except subprocess.CalledProcessError: + try: + subprocess.check_call([ + sys.executable, "-m", "pip", "install", "-r", str(requirements_file), + "--break-system-packages", "--quiet" + ]) + print("✓ Dependencies installed successfully") + return True + except subprocess.CalledProcessError as e: + print(f"✗ Failed to install dependencies: {e}") + print(f"Please install manually: pip install -r {requirements_file} --user") + return False + + return True + + +def run_analysis(input_csv, output_dir): + """Run multi-dimensional analysis""" + from datafusion import SessionContext + import pandas as pd + + # Create output directory + output_dir = Path(output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + + print("=" * 60) + print("DataFusion Net Spend Analysis") + print("=" * 60) + print(f"Input: {input_csv}") + print(f"Output: {output_dir}/") + print() + + # Create DataFusion session + ctx = SessionContext() + ctx.register_csv('comparison_report', str(input_csv), has_header=True) + + print("Generating analysis reports...") + print() + + queries = { + "01_global_summary.csv": """ + SELECT + COUNT(*) as total_rows, + COUNT(DISTINCT click_url_id) as unique_campaigns, + COUNT(DISTINCT vendor_name) as unique_vendors, + COUNT(DISTINCT date) as date_range_days, + ROUND(SUM(partner_net_spend), 2) as total_partner_net, + ROUND(SUM(feedmob_net_spend), 2) as total_feedmob_net, + ROUND(SUM(difference), 2) as total_difference, + ROUND(CASE + WHEN SUM(feedmob_net_spend) = 0 THEN 0 + ELSE (SUM(difference) / SUM(feedmob_net_spend) * 100) + END, 2) as diff_pct + FROM comparison_report + """, + + "02_by_vendor.csv": """ + SELECT + vendor_name, + COUNT(*) as rows, + COUNT(DISTINCT click_url_id) as campaigns, + ROUND(SUM(partner_net_spend), 2) as partner_net, + ROUND(SUM(feedmob_net_spend), 2) as feedmob_net, + ROUND(SUM(difference), 2) as diff, + ROUND(CASE + WHEN SUM(feedmob_net_spend) = 0 THEN 0 + ELSE (SUM(difference) / SUM(feedmob_net_spend) * 100) + END, 2) as diff_pct + FROM comparison_report + GROUP BY vendor_name + ORDER BY ABS(SUM(difference)) DESC + """, + + "03_by_campaign.csv": """ + SELECT + click_url_id, + MAX(campaign_name) as campaign_name, + MAX(vendor_name) as vendor, + COUNT(DISTINCT date) as days, + COUNT(*) as rows, + ROUND(SUM(partner_net_spend), 2) as partner_net, + ROUND(SUM(feedmob_net_spend), 2) as feedmob_net, + ROUND(SUM(difference), 2) as diff, + ROUND(CASE + WHEN SUM(feedmob_net_spend) = 0 THEN 0 + ELSE (SUM(difference) / SUM(feedmob_net_spend) * 100) + END, 2) as diff_pct + FROM comparison_report + GROUP BY click_url_id + ORDER BY ABS(SUM(difference)) DESC + """, + + "04_match_status.csv": """ + SELECT + status, + COUNT(*) as rows, + ROUND(SUM(partner_net_spend), 2) as partner_net, + ROUND(SUM(feedmob_net_spend), 2) as feedmob_net, + ROUND(SUM(ABS(difference)), 2) as abs_diff + FROM comparison_report + GROUP BY status + ORDER BY SUM(ABS(difference)) DESC + """, + + "05_accuracy_levels.csv": """ + SELECT + CASE + WHEN ABS(difference) < 0.01 THEN 'Perfect (0%)' + WHEN ABS(difference_pct) < 1 THEN 'Excellent (<1%)' + WHEN ABS(difference_pct) < 2 THEN 'Good (1-2%)' + ELSE 'Needs Attention (≥2%)' + END as accuracy_level, + COUNT(*) as row_count, + ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) as percentage, + ROUND(SUM(partner_net_spend), 2) as partner_net, + ROUND(SUM(feedmob_net_spend), 2) as feedmob_net + FROM comparison_report + GROUP BY + CASE + WHEN ABS(difference) < 0.01 THEN 'Perfect (0%)' + WHEN ABS(difference_pct) < 1 THEN 'Excellent (<1%)' + WHEN ABS(difference_pct) < 2 THEN 'Good (1-2%)' + ELSE 'Needs Attention (≥2%)' + END + ORDER BY + CASE + WHEN ABS(difference) < 0.01 THEN 1 + WHEN ABS(difference_pct) < 1 THEN 2 + WHEN ABS(difference_pct) < 2 THEN 3 + ELSE 4 + END + """, + + "06_top50_anomalies.csv": """ + SELECT + date, + click_url_id, + campaign_name, + vendor_name, + ROUND(partner_net_spend, 2) as partner_net, + ROUND(feedmob_net_spend, 2) as feedmob_net, + ROUND(difference, 2) as diff, + ROUND(difference_pct, 2) as diff_pct, + status + FROM comparison_report + WHERE ABS(difference) > 0.01 + ORDER BY ABS(difference) DESC + LIMIT 50 + """, + + "07_daily_trend.csv": """ + SELECT + date, + COUNT(DISTINCT click_url_id) as campaigns, + ROUND(SUM(partner_net_spend), 2) as partner_net, + ROUND(SUM(feedmob_net_spend), 2) as feedmob_net, + ROUND(SUM(difference), 2) as diff, + ROUND(CASE + WHEN SUM(feedmob_net_spend) = 0 THEN 0 + ELSE (SUM(difference) / SUM(feedmob_net_spend) * 100) + END, 2) as diff_pct + FROM comparison_report + GROUP BY date + ORDER BY date DESC + """, + + "08_weekly_trend.csv": """ + SELECT + DATE_TRUNC('week', date) as week_start, + COUNT(DISTINCT click_url_id) as campaigns, + COUNT(DISTINCT date) as days, + ROUND(SUM(partner_net_spend), 2) as partner_net, + ROUND(SUM(feedmob_net_spend), 2) as feedmob_net, + ROUND(SUM(difference), 2) as diff, + ROUND(CASE + WHEN SUM(feedmob_net_spend) = 0 THEN 0 + ELSE (SUM(difference) / SUM(feedmob_net_spend) * 100) + END, 2) as diff_pct + FROM comparison_report + GROUP BY DATE_TRUNC('week', date) + ORDER BY DATE_TRUNC('week', date) DESC + """, + + "09_partner_only.csv": """ + SELECT + date, + click_url_id, + campaign_name, + vendor_name, + ROUND(partner_net_spend, 2) as partner_net + FROM comparison_report + WHERE feedmob_net_spend = 0 AND partner_net_spend > 0 + ORDER BY partner_net_spend DESC + """, + + "10_feedmob_only.csv": """ + SELECT + date, + click_url_id, + campaign_name, + vendor_name, + ROUND(feedmob_net_spend, 2) as feedmob_net + FROM comparison_report + WHERE partner_net_spend = 0 AND feedmob_net_spend > 0 + ORDER BY feedmob_net_spend DESC + """ + } + + query_names = { + "01_global_summary.csv": "Global Summary", + "02_by_vendor.csv": "By Vendor", + "03_by_campaign.csv": "By Campaign", + "04_match_status.csv": "Match Status", + "05_accuracy_levels.csv": "Accuracy Levels", + "06_top50_anomalies.csv": "Top 50 Anomalies", + "07_daily_trend.csv": "Daily Trend", + "08_weekly_trend.csv": "Weekly Trend", + "09_partner_only.csv": "Partner Only (No FeedMob)", + "10_feedmob_only.csv": "FeedMob Only (No Partner)" + } + + # Execute all queries + for filename, query in queries.items(): + try: + df = ctx.sql(query).to_pandas() + output_file = output_dir / filename + df.to_csv(output_file, index=False) + print(f" ✓ {query_names[filename]}") + except Exception as e: + print(f" ✗ {query_names[filename]} (error: {e})") + + print() + print("=" * 60) + print("Analysis Complete") + print("=" * 60) + print() + + # Display quick summary + summary_file = output_dir / "01_global_summary.csv" + if summary_file.exists(): + summary_df = pd.read_csv(summary_file) + if not summary_df.empty: + row = summary_df.iloc[0] + print(f"Total Rows: {int(row['total_rows'])}") + print(f"Campaigns: {int(row['unique_campaigns'])}") + print(f"Vendors: {int(row['unique_vendors'])}") + print(f"Date Range: {int(row['date_range_days'])} days") + print(f"Partner Net: ${row['total_partner_net']}") + print(f"FeedMob Net: ${row['total_feedmob_net']}") + print(f"Difference: ${row['total_difference']} ({row['diff_pct']}%)") + + print() + print(f"✓ All reports generated in: {output_dir}/") + print() + + +def main(): + """Main function""" + if len(sys.argv) != 3: + print("Usage: python3 analyze_net_spend_datafusion.py ") + sys.exit(1) + + input_csv = sys.argv[1] + output_dir = sys.argv[2] + + # Validate input file + if not Path(input_csv).exists(): + print(f"✗ Error: File not found: {input_csv}") + sys.exit(1) + + # Check and install dependencies + if not check_and_install_dependencies(): + sys.exit(1) + print() + + # Run analysis + try: + run_analysis(input_csv, output_dir) + except Exception as e: + print(f"✗ Error: Analysis failed") + print(f"Details: {e}") + sys.exit(1) + + +if __name__ == '__main__': + main() diff --git a/skills/feedmob-reporting-skills/scripts/calculate_gross_spend.py b/skills/feedmob-reporting-skills/scripts/calculate_gross_spend.py index 4480aea..5e54bd1 100755 --- a/skills/feedmob-reporting-skills/scripts/calculate_gross_spend.py +++ b/skills/feedmob-reporting-skills/scripts/calculate_gross_spend.py @@ -149,7 +149,7 @@ def merge_with_direct_spend(calculated_data, direct_spend_data): # Build direct spend lookup: (click_url_id, date) -> direct_gross_spend direct_spend_map = {} for row in direct_spend_data: - key = (int(row['feedmob_click_url_id']), row['date']) + key = (int(row['click_url_id']), row['date']) direct_spend_map[key] = float(row['feedmob_gross_spend']) # Merge diff --git a/skills/feedmob-reporting-skills/scripts/calculate_gross_spend_datafusion.py b/skills/feedmob-reporting-skills/scripts/calculate_gross_spend_datafusion.py index ca10685..a93f8b6 100755 --- a/skills/feedmob-reporting-skills/scripts/calculate_gross_spend_datafusion.py +++ b/skills/feedmob-reporting-skills/scripts/calculate_gross_spend_datafusion.py @@ -242,7 +242,7 @@ def execute_query_and_save(attribution_csv, histories_csv, direct_spend_csv, ) as difference_pct FROM calculated_with_spend c LEFT JOIN direct_spend d - ON CAST(c.click_url_id AS BIGINT) = CAST(d.feedmob_click_url_id AS BIGINT) + ON CAST(c.click_url_id AS BIGINT) = CAST(d.click_url_id AS BIGINT) AND c.date = d.date WHERE c.event_count > 0 OR COALESCE(d.feedmob_gross_spend, 0) > 0 ORDER BY c.date, c.click_url_id @@ -372,7 +372,7 @@ def print_summary(summary, available_events, diagnostics=None): def create_empty_direct_spend_csv(output_path): """Create an empty direct spend CSV with proper headers""" headers = [ - 'feedmob_click_url_id', 'date', 'campaign_name', + 'click_url_id', 'date', 'campaign_name', 'feedmob_net_spend', 'feedmob_gross_spend' ] with open(output_path, 'w', newline='') as f: diff --git a/skills/feedmob-reporting-skills/scripts/compare_net_spend_datafusion.py b/skills/feedmob-reporting-skills/scripts/compare_net_spend_datafusion.py new file mode 100644 index 0000000..c087c7a --- /dev/null +++ b/skills/feedmob-reporting-skills/scripts/compare_net_spend_datafusion.py @@ -0,0 +1,360 @@ +#!/usr/bin/env python3 +""" +Dynamic DataFusion Python - Net Spend Comparison + +Compares partner_net_spend (from partner reports) with feedmob_net_spend (from direct spends). +Unlike gross spend (which requires calculation), net spend is a direct comparison. + +Works with all partner reports: +- Jampp, Kayzen, YouAppi, Samsung (1-step API) +- Smadex, InMobi, Liftoff (multi-step API) + +Usage Examples: + python3 compare_net_spend_datafusion.py \ + ./tmp/jampp_reports_2025-01-01_2025-01-31.csv \ + ./tmp/direct_spends_2025-01-01_2025-01-31.csv \ + ./tmp/net_spend_comparison.csv + + python3 compare_net_spend_datafusion.py \ + ./tmp/smadex_reports_2025-01-01_2025-01-31.csv \ + ./tmp/direct_spends_2025-01-01_2025-01-31.csv \ + ./tmp/net_spend_comparison.csv +""" + +import sys +import csv +import subprocess +from pathlib import Path +from datetime import datetime + + +def check_and_install_dependencies(): + """Check and automatically install dependencies (from requirements.txt)""" + missing_packages = [] + + try: + import datafusion + print("✓ datafusion installed") + except ImportError: + missing_packages.append("datafusion") + + try: + import pandas + print("✓ pandas installed") + except ImportError: + missing_packages.append("pandas") + + try: + import pyarrow + print("✓ pyarrow installed") + except ImportError: + missing_packages.append("pyarrow") + + if missing_packages: + print(f"⚠️ Missing packages: {', '.join(missing_packages)}") + print("Auto-installing from requirements.txt...") + + script_dir = Path(__file__).parent + requirements_file = script_dir / "requirements.txt" + + if not requirements_file.exists(): + print(f"✗ requirements.txt not found: {requirements_file}") + print("Please install manually: pip install datafusion pandas pyarrow --user") + return False + + try: + subprocess.check_call([ + sys.executable, "-m", "pip", "install", "-r", str(requirements_file), + "--user", "--quiet" + ]) + print("✓ Dependencies installed successfully") + return True + except subprocess.CalledProcessError: + try: + subprocess.check_call([ + sys.executable, "-m", "pip", "install", "-r", str(requirements_file), + "--break-system-packages", "--quiet" + ]) + print("✓ Dependencies installed successfully") + return True + except subprocess.CalledProcessError as e: + print(f"✗ Failed to install dependencies: {e}") + print(f"Please install manually: pip install -r {requirements_file} --user") + print("Or use a virtual environment:") + print(" python3 -m venv venv") + print(" source venv/bin/activate") + print(f" pip install -r {requirements_file}") + return False + + return True + + +def detect_partner_report_columns(csv_path): + """Detect available columns in partner report CSV""" + with open(csv_path, 'r') as f: + reader = csv.DictReader(f) + columns = reader.fieldnames + + # Find key columns + has_click_url_id = 'click_url_id' in columns + has_partner_net_spend = 'partner_net_spend' in columns + has_date = 'date' in columns + + # Look for other useful columns + optional_columns = { + 'campaign_name': 'campaign_name' in columns, + 'vendor_name': 'vendor_name' in columns, + } + + return columns, { + 'has_click_url_id': has_click_url_id, + 'has_partner_net_spend': has_partner_net_spend, + 'has_date': has_date, + **optional_columns + } + + +def create_empty_direct_spend_csv(output_path): + """Create an empty direct spend CSV with proper headers""" + headers = [ + 'click_url_id', 'date', 'campaign_name', + 'feedmob_net_spend', 'feedmob_gross_spend' + ] + with open(output_path, 'w', newline='') as f: + writer = csv.writer(f) + writer.writerow(headers) + + +def validate_and_prepare_direct_spend_csv(direct_spend_csv): + """Validate direct spend CSV, create empty one if missing or empty. + Returns: 'empty', 'has_data'""" + direct_spend_path = Path(direct_spend_csv) + + if not direct_spend_path.exists(): + print(f"⚠️ Direct spend file not found, creating empty file: {direct_spend_csv}") + create_empty_direct_spend_csv(direct_spend_csv) + return 'empty' + + with open(direct_spend_csv, 'r') as f: + content = f.read().strip() + if not content: + print(f"⚠️ Direct spend file is empty, adding headers: {direct_spend_csv}") + create_empty_direct_spend_csv(direct_spend_csv) + return 'empty' + + lines = content.split('\n') + if len(lines) <= 1: + print(f"⚠️ Direct spend file has no data rows, using as-is: {direct_spend_csv}") + return 'empty' + + print(f"✓ Direct spend file has {len(lines) - 1} data rows") + return 'has_data' + + +def execute_query_and_save(partner_csv, direct_spend_csv, output_csv, column_info): + """Execute DataFusion query and save results""" + from datafusion import SessionContext + + # Create DataFusion session + ctx = SessionContext() + + # Register CSV tables + ctx.register_csv('partner_report', partner_csv, has_header=True) + ctx.register_csv('direct_spend', direct_spend_csv, has_header=True) + + # Build dynamic column selection - use direct column reference since aggregated_partner already has the column + campaign_name_expr = "p.campaign_name" if column_info.get('campaign_name') else "'Unknown'" + vendor_name_expr = "p.vendor_name" if column_info.get('vendor_name') else "'Unknown'" + + # Build query - aggregate partner report by click_url_id and date + # Include campaign_name and vendor_name in aggregation if available + campaign_name_select = ", MAX(campaign_name) as campaign_name" if column_info.get('campaign_name') else "" + vendor_name_select = ", MAX(vendor_name) as vendor_name" if column_info.get('vendor_name') else "" + + full_query = f""" +WITH aggregated_partner AS ( + SELECT + date, + CAST(click_url_id AS BIGINT) as click_url_id, + SUM(CAST(COALESCE(partner_net_spend, 0) AS DOUBLE)) as partner_net_spend + {campaign_name_select} + {vendor_name_select} + FROM partner_report + WHERE click_url_id IS NOT NULL AND click_url_id != '' + GROUP BY date, click_url_id +), +comparison AS ( + SELECT + p.date, + p.click_url_id, + {campaign_name_expr} as campaign_name, + {vendor_name_expr} as vendor_name, + ROUND(p.partner_net_spend, 2) as partner_net_spend, + ROUND(COALESCE(d.feedmob_net_spend, 0), 2) as feedmob_net_spend, + ROUND(p.partner_net_spend - COALESCE(d.feedmob_net_spend, 0), 2) as difference, + ROUND( + CASE + WHEN COALESCE(d.feedmob_net_spend, 0) > 0 + THEN ((p.partner_net_spend - COALESCE(d.feedmob_net_spend, 0)) / d.feedmob_net_spend * 100) + ELSE 0 + END, + 2 + ) as difference_pct + FROM aggregated_partner p + LEFT JOIN direct_spend d + ON CAST(p.click_url_id AS BIGINT) = CAST(d.click_url_id AS BIGINT) + AND p.date = d.date +) +SELECT + date, + click_url_id, + campaign_name, + vendor_name, + partner_net_spend, + feedmob_net_spend, + difference, + difference_pct, + CASE + WHEN ABS(difference) < 0.01 THEN '✅ Perfect' + WHEN ABS(difference_pct) < 2 THEN '⚠️ Minor' + ELSE '🚨 Significant' + END as status +FROM comparison +WHERE partner_net_spend > 0 OR feedmob_net_spend > 0 +ORDER BY date, ABS(difference) DESC +""" + + # Execute query + df = ctx.sql(full_query) + + # Convert to pandas and save as CSV + pandas_df = df.to_pandas() + pandas_df.to_csv(output_csv, index=False) + + return pandas_df + + +def generate_summary(df): + """Generate summary statistics""" + total_partner = df['partner_net_spend'].sum() + total_feedmob = df['feedmob_net_spend'].sum() + total_diff = df['difference'].sum() + total_diff_pct = (total_diff / total_feedmob * 100) if total_feedmob > 0 else 0 + + # Count by status + status_counts = df['status'].value_counts().to_dict() if 'status' in df.columns else {} + + return { + 'total_partner': total_partner, + 'total_feedmob': total_feedmob, + 'total_diff': total_diff, + 'total_diff_pct': total_diff_pct, + 'row_count': len(df), + 'status_counts': status_counts + } + + +def print_summary(summary): + """Print summary report""" + print("\n" + "="*60) + print("Net Spend Comparison Summary") + print("="*60) + print() + print(f"Total Partner Net Spend: ${summary['total_partner']:,.2f}") + print(f"Total FeedMob Net Spend: ${summary['total_feedmob']:,.2f}") + print(f"Total Difference: ${summary['total_diff']:,.2f} ({summary['total_diff_pct']:.2f}%)") + print() + + # Status breakdown + status_counts = summary.get('status_counts', {}) + if status_counts: + print("Status Breakdown:") + for status, count in status_counts.items(): + print(f" {status}: {count}") + print() + + # Overall status + abs_diff_pct = abs(summary['total_diff_pct']) + if summary['row_count'] == 0 or (summary['total_partner'] == 0 and summary['total_feedmob'] == 0): + status = "⚠️ No Data to Compare" + elif abs_diff_pct < 0.01: + status = "✅ Perfect Match" + elif abs_diff_pct < 2: + status = "⚠️ Minor Difference (<2%)" + else: + status = "🚨 Significant Difference (≥2%)" + + print(f"Overall Status: {status}") + print() + print(f"Rows Compared: {summary['row_count']}") + print() + + +def main(): + """Main function""" + if len(sys.argv) != 4: + print("Usage: python3 compare_net_spend_datafusion.py ") + sys.exit(1) + + partner_csv = sys.argv[1] + direct_spend_csv = sys.argv[2] + output_csv = sys.argv[3] + + # Validate partner report file + print("Validating input files...") + if not Path(partner_csv).exists(): + print(f"✗ Error: Partner report file not found: {partner_csv}") + sys.exit(1) + print("✓ Partner report file exists") + + # Handle direct spend file (can be missing or empty) + direct_spend_status = validate_and_prepare_direct_spend_csv(direct_spend_csv) + print() + + # Check and install dependencies + if not check_and_install_dependencies(): + sys.exit(1) + print() + + # Detect columns + print("Detecting columns in partner report...") + columns, column_info = detect_partner_report_columns(partner_csv) + print(f"CSV Header: {', '.join(columns)}") + print() + + if not column_info['has_click_url_id']: + print("✗ Error: click_url_id column not found in partner report") + sys.exit(1) + if not column_info['has_partner_net_spend']: + print("✗ Error: partner_net_spend column not found in partner report") + sys.exit(1) + + print(f"✓ Required columns found: click_url_id, partner_net_spend") + print() + + # Execute query + print("Starting DataFusion Net Spend analysis...") + print() + + try: + df = execute_query_and_save(partner_csv, direct_spend_csv, output_csv, column_info) + + row_count = len(df) + print(f"✓ Query executed successfully") + print(f"✓ Report saved to: {output_csv}") + print(f"✓ Processed {row_count} rows") + + # Generate summary + summary = generate_summary(df) + print_summary(summary) + + print("Done!") + + except Exception as e: + print(f"✗ Error: Query execution failed") + print(f"Details: {e}") + sys.exit(1) + + +if __name__ == '__main__': + main()