Skip to content

feat: CLI migration + progressive disclosure redesign for ultimate-scraper#33

Open
lukas-bekr wants to merge 11 commits intoapify:mainfrom
lukas-bekr:feat/ultimate-scraper-cli-migration-and-workflow-upgrade
Open

feat: CLI migration + progressive disclosure redesign for ultimate-scraper#33
lukas-bekr wants to merge 11 commits intoapify:mainfrom
lukas-bekr:feat/ultimate-scraper-cli-migration-and-workflow-upgrade

Conversation

@lukas-bekr
Copy link
Copy Markdown
Contributor

@lukas-bekr lukas-bekr commented Mar 30, 2026

Summary

Major upgrade to the apify-ultimate-scraper skill: migrates from REST API scripts to Apify CLI, restructures the information architecture using progressive disclosure, and enriches all workflow guides with 58 research-backed data pipeline patterns.

Phase 1: CLI migration

  • Replaced 3 Node.js scripts (search_actors.js, run_actor.js, fetch_actor_details.js) with CLI commands (apify actors call --json, actors search, actors info, datasets get-items)
  • --json output as stable API contract - immune to upcoming CLI UI changes (Markdown default, colors)
  • OAuth-first authentication (apify login) with env var fallback. Fixed security contradiction in actorization skill (was using apify login -t exposing tokens in shell history, aligned with PR fix: migrate security fixes to actorization skill #31)

Phase 2: Progressive disclosure restructure

  • Replaced monolithic 400-line Actor index with hub-and-spoke architecture
  • SKILL.md (~109 lines) routes to lean actor-index (206 lines) + 14 workflow guides + gotchas (108 lines)
  • Simple task ("scrape Nike's Instagram") loads ~300 lines. Complex pipeline loads ~500. Neither loads the other 13 guides.

Phase 3: Research-driven workflow enrichment

  • 4-workstream research: Notion internal use cases + AI research (Perplexity/Gemini/ChatGPT) + n8n template library scraping (85+ templates, 26 use Apify) + social media scraping
  • 58 distinct workflow patterns mapped to Apify Actors, ranked by cross-source frequency
  • Every workflow guide now has 4-6 pipelines with explicit Actor chaining, data piping (results[].website -> startUrls), PPE cost estimates, and gotchas

Phase 4: New content

  • 4 new workflow categories: e-commerce price monitoring, contact enrichment, knowledge base/RAG, company research (covers 5,000+ Store Actors with previously zero workflow coverage)
  • Enriched gotchas with anti-bot guidance (Cloudflare, SPA, fingerprinting), platform rate limits, cost estimation protocols

By the numbers

  • 17 files, 1,597 lines (was 13 files, 782 lines)
  • Token budget for simple tasks: ~300 lines (unchanged, progressive disclosure)
  • 14 workflow guides with 4-6 pipelines each (was 10 with 1-4 each)
  • Design principles: Anthropic's "Lessons from Building Skills" - skip the obvious, gotchas are highest-signal, hub-and-spoke progressive disclosure, don't railroad

Scope

  • apify-ultimate-scraper skill only (full rewrite)
  • apify-actorization auth fix (aligned with PR fix: migrate security fixes to actorization skill #31)
  • apify-actor-development minor auth alignment (OAuth-first)
  • commands/create-actor.md auth alignment
  • Did NOT touch developer skill content (actor-development, actorization workflows) - Patrik's territory

lukas-bekr and others added 11 commits March 30, 2026 14:29
- Standardize auth to OAuth-first across all skills
- Fix security contradiction in actorization (remove -t flag)
- Delete legacy Node.js scripts (replaced by CLI commands)
- Bump version to 2.0.0
- Add design spec and implementation plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove error handling table (moving non-obvious errors to gotchas.md),
add 4 new routing rows for e-commerce, contact enrichment, knowledge base/RAG,
and company research, and replace error section with a brief troubleshooting pointer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…low guides

Added 7 new pipelines across 3 files from combined-patterns research:
- brand-monitoring: Twitter/X real-time mention routing (P16), Reddit brand monitoring (P17), multi-platform social listening with sentiment (P18)
- review-analysis: competitor review intelligence (P21), Google Play app review monitoring (P22), multi-platform hospitality aggregation (P20)
- content-and-seo: SERP content brief generation (P23), sitemap content audit (P24), keyword rank tracking with alerts (P26), deep research agent (P54)

All pipelines include explicit pipe field paths, PPE cost estimates where applicable, and non-obvious gotchas only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…with research patterns

Added 3 new pipelines to lead-generation.md (Sales Navigator bulk, SERP discovery, Apollo icebreakers, Reddit lead mining), 3 to competitive-intel.md (website change detection, SERP position monitoring, feature benchmarking), and 3 to influencer-vetting.md (TikTok creator vetting, YouTube channel audit, cross-platform hashtag discovery). All entries include explicit field paths, cost estimates for PPE Actors, and per-pipeline gotchas.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…flow guides

Add 2 pipelines to each guide from research patterns: Instagram competitor
analysis + LinkedIn company page analytics (social); Reddit trend mining +
YouTube outlier discovery (trend); sales signal outreach + Upwork monitoring
(jobs); lead scoring/routing + construction discovery (real estate).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…esearch)

Adds workflow reference guides for the 4 new categories identified in combined-patterns.md research: e-commerce price monitoring (patterns 45-49), contact enrichment (50-52), knowledge base and RAG pipelines (53-55), and company research (56-58). Each guide follows the existing format with When/Pipeline/Output fields/Cost estimate/Gotcha sections.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants