Skip to content

Scrape any authenticated website with Claude Code in 2 minutes. Auto-captures network requests with auth headers.

License

Notifications You must be signed in to change notification settings

Salberg87/authenticated-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” Authenticated Scrape

Scrape any authenticated website in 2 minutes with Claude Code

License: MIT Claude Code PRs Welcome GitHub Stars

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  🌐 Open Site β†’ πŸ”‘ Login β†’ πŸ“‘ Auto-Capture β†’ πŸ’Ύ Extract    β”‚
β”‚                                                              β”‚
β”‚  No curl copying. No manual headers. Just pure automation.  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Getting Started β€’ Examples β€’ Contributing


✨ What It Does

Scrape data from any authenticated website in minutes using Chrome DevTools automation. No more manually copying curl commands or dealing with complex auth flows.

The magic: Log in once in your browser, and this skill automatically captures all API requests with auth headers included.

Why This Exists

😫 The Old Way

1. Open DevTools
2. Navigate to Network tab
3. Refresh page
4. Find the right request
5. Right-click β†’ Copy as cURL
6. Paste into terminal
7. Parse the response
8. Extract data manually
9. Repeat for every session πŸ”

✨ With This Skill

1. Run /authenticated-scrape
2. Log in once
3. Done! βœ…

Claude handles everything:
- Captures requests
- Parses responses
- Extracts data
- Creates reusable scripts

🎯 Use Cases

  • πŸ“Š Export data from SaaS dashboards
  • πŸ”’ Download your private analytics
  • 🏒 Automate data extraction from business tools
  • 🎨 Scrape authenticated content you own
  • πŸ”„ Create reusable scripts for recurring data needs

πŸš€ Quick Start

Installation

  1. Install the skill:
mkdir -p ~/.claude/skills/authenticated-scrape
cd ~/.claude/skills/authenticated-scrape
curl -O https://raw.githubusercontent.com/Salberg87/authenticated-scrape/main/SKILL.md
  1. Restart Claude Code

  2. Invoke the skill:

/authenticated-scrape

πŸ“– How It Works

graph LR
    A[πŸš€ Start Skill] --> B[🌐 Open Browser]
    B --> C[πŸ”‘ Login]
    C --> D[πŸ“‘ Capture Network]
    D --> E[🎯 Select Request]
    E --> F[πŸ“¦ Extract Data]
    F --> G[πŸ’Ύ Save/Script]

    style A fill:#7C3AED,stroke:#5B21B6,color:#fff
    style G fill:#10B981,stroke:#059669,color:#fff
Loading

Step-by-Step Process

Step Action What Happens
1 🌐 Launch Skill opens Chrome and navigates to your target site
2 πŸ”‘ Authenticate You log in normally (or let Claude automate it)
3 πŸ“‘ Capture All API calls with auth headers captured automatically
4 🎯 Select Choose which request contains your data
5 πŸ“¦ Extract Parse JSON/HTML and extract specific fields
6 πŸ’Ύ Reuse Generate Node.js/Python/curl scripts for future use

πŸ’‘ Example Workflow

User: /authenticated-scrape

Claude: What website do you want to scrape?

User: my company dashboard at dashboard.example.com

Claude: Opening browser... Please log in.

[You log in normally]

Claude: Authenticated! Navigate to the page with data you need.

[You click around to load the data]

Claude: Captured 8 requests:
1. GET /api/users - 200 OK (JSON, 45KB) ← Contains user data
2. GET /api/analytics - 200 OK (JSON, 12KB)
3. GET /api/settings - 200 OK (JSON, 2KB)

Which one has your data?

User: The first one

Claude: Found Authorization: Bearer eyJ...
Response has 200 users with: id, email, name, created_at

What data do you want?

User: All emails

Claude: Extracted 200 emails. Save to emails.json?

I can also create a reusable Node.js script with the auth headers. Want that?

πŸ› οΈ Technical Details

Under the Hood

πŸ”§ Technology

Chrome DevTools Protocol
via MCP

⚑ Performance

Real-time network
capture

πŸ”’ Security

Local execution
No cloud storage

Supported Auth Types

Auth Method Status Notes
πŸͺ Cookie-based βœ… Supported Most common web apps
🎫 JWT/Bearer βœ… Supported Modern APIs
πŸ” OAuth 2.0 βœ… Supported Social logins, enterprise
πŸ›‘οΈ Custom Headers βœ… Supported API keys, tokens
πŸ”‘ Multi-Factor Auth βœ… Supported You handle MFA manually

Output Formats

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ“ JSON         β†’ Structured data      β”‚
β”‚  πŸ“Š CSV          β†’ Spreadsheet ready    β”‚
β”‚  🐍 Python       β†’ pip install requests β”‚
β”‚  πŸ“¦ Node.js      β†’ Production ready     β”‚
β”‚  πŸ”§ cURL         β†’ Command line tool    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

⚑ Advanced Usage

Automating Login

If you want to automate the login process:

User: /authenticated-scrape

Claude: What site?

User: example.com - can you auto-login? Username is user@email.com

Claude: I'll automate the login for you...

Handling Pagination

The skill can help identify pagination parameters:

Claude: I see the request has a 'page' parameter.
Want me to create a script that fetches all pages?

Session Reuse

Save auth tokens for repeated use:

Claude: I'll save these headers to auth-headers.json
You can reuse them until the token expires.

⚠️ Known Limitations & Troubleshooting

Automated Login Detection

Many major platforms detect automated browsers and block login attempts. This is a security feature, not a bug.

Sites that typically block automation:

  • ❌ GitHub, Google, Microsoft
  • ❌ Banking and financial sites
  • ❌ Social media platforms
  • ❌ Sites with aggressive bot detection

Recommended Approach

βœ… Best Practice: Manual Login + Network Capture

1. User logs in manually in the browser window
2. Skill captures the authenticated session
3. Extracts requests with auth tokens
4. Generates reusable scripts

This approach works on any website!

Common Issues & Solutions

"Could not log in - This browser may not be secure"

Cause: The site detected the automated browser (Chrome DevTools Protocol)

Solution:

  • Have user log in manually instead of automating
  • Focus on capturing the already-authenticated session
  • The skill still works perfectly for data extraction!
No network requests captured

Cause: Page hasn't loaded data yet

Solution:

  • Wait for page to fully load
  • Navigate to the actual page that loads data
  • Check "All" request types, not just XHR/Fetch
Getting HTML instead of JSON

Cause: Endpoint requires authentication or returns error page

Solution:

  • Verify you're authenticated on the site
  • Check the endpoint URL is correct
  • Test the request in DevTools Network tab first

What Actually Works

Feature Status Notes
Network request capture βœ… Always works Core functionality
Auth header extraction βœ… Always works Cookies, tokens, etc.
JSON/HTML parsing βœ… Always works Any response format
Script generation βœ… Always works Node.js, Python, cURL
Manual login βœ… Always works User types credentials
Automated login ⚠️ Site-dependent Blocked by major platforms

πŸ”’ Security & Ethics

Important Notes

  • πŸ” Your auth tokens are sensitive - Scripts contain your session credentials
  • ⏰ Tokens expire - You may need to recapture after timeout
  • βœ… Only your data - This is for scraping your own authenticated sessions
  • πŸ“œ Respect ToS - Check if scraping is allowed by the service terms
  • 🚦 Rate limiting - Be respectful with automated requests

What This Skill Does NOT Do

  • ❌ Bypass security measures
  • ❌ Scrape other users' data
  • ❌ Break authentication systems
  • ❌ Violate terms of service

🀝 Contributing

Contributions welcome! Feel free to:

  • πŸ› Report bugs
  • πŸ’‘ Suggest features
  • πŸ”§ Submit pull requests
  • πŸ“– Improve documentation

πŸ“„ License

MIT License - see LICENSE file for details

πŸ‘€ Author

Salberg

πŸ™ Acknowledgments

  • Built for Claude Code
  • Uses Chrome DevTools Protocol
  • Inspired by the "copy as cURL" workflow

πŸ”— Links


❓ FAQ

Do I need to know how to code?
No! This skill guides you through everything. Claude handles all the technical work.
Is my auth token safe?
Yes. Everything runs locally. Tokens are only used in your scripts and never sent to external servers.
What if my session expires?
Just run the skill again to capture fresh tokens. Claude will notify you if requests fail due to expiration.
Can I scrape any website?
You should only scrape websites where you have an authenticated account and proper authorization. Always respect terms of service.
Does this work with single sign-on (SSO)?
Yes! Whether it's Google OAuth, Microsoft, Okta, or any other SSO provider, the skill captures the final authenticated session.

⭐ If this skill saved you time, give it a star!


Created by Salberg

Made with ❀️ for the Claude Code community


GitHub followers Twitter Follow

About

Scrape any authenticated website with Claude Code in 2 minutes. Auto-captures network requests with auth headers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published