🚀 Reddit JSON Scraper

A simple, powerful Python tool to download Reddit posts, comments, images, and videos using Reddit's public JSON API. No login or API keys required!

✨ Features

📥 Download from subreddits - Get posts sorted by hot, new, top, rising, or controversial
👤 Download user posts - Archive all posts from any Reddit user
💬 Full comment extraction - Save comment threads with nested replies (up to 5 levels)
🖼️ All media types - Images (JPG, PNG), videos (MP4), GIFs, and more
🔗 Link preservation - Save link posts as HTML redirects
⚡ Fast and efficient - Rate limiting to avoid IP bans
🎯 No authentication - Uses Reddit's public JSON API
📝 Markdown format - Posts and comments saved in readable Markdown
🔧 CLI and Python API - Use from command line or import as a library

🎬 Quick Start

Installation

1. Clone this repository

git clone https://github.com/0anxt/reddit-json-scraper.git
cd reddit-json-scraper

2. Create a virtual environment (recommended)

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

That's it! You're ready to go. 🎉

💡 Having installation issues? See the detailed INSTALL.md guide for:

Virtual environment setup

Platform-specific instructions (Linux/Mac/Windows)

Troubleshooting common errors

Docker installation

Basic Usage

🎯 Interactive Mode (Recommended for Beginners)

Just run the interactive script and answer the questions:

python interactive.py

The script will guide you through:

What to scrape (subreddit, user, or single post)
Sorting options
Number of posts
Whether to include comments
Confirmation before downloading

Perfect for first-time users! No need to remember command-line arguments.

⚡ Command-Line Mode (For Advanced Users)

Download top posts from a subreddit:

python reddit_scraper.py --subreddit python --sort top --time week --limit 25

Download posts from a user:

python reddit_scraper.py --user username --limit 10

Download a specific post:

python reddit_scraper.py --url "https://reddit.com/r/pics/comments/abc123/"

📖 Detailed Usage

Command Line Options

Option	Short	Description	Example
`--subreddit`	`-s`	Subreddit to download (without r/)	`--subreddit python`
`--user`	`-u`	User to download (without u/)	`--user spez`
`--url`		Single post URL to download	`--url "https://..."`
`--sort`		Sort: hot, new, top, rising, controversial	`--sort top`
`--time`		Time: hour, day, week, month, year, all	`--time week`
`--limit`	`-l`	Max posts to download	`--limit 50`
`--output`	`-o`	Output directory	`--output downloads`
`--no-comments`		Skip downloading comments (faster)	`--no-comments`

Examples

1. Archive a subreddit's top posts this month

python reddit_scraper.py --subreddit AskReddit --sort top --time month --limit 100

2. Download new posts (good for monitoring)

python reddit_scraper.py --subreddit news --sort new --limit 25

3. Get images from r/pics (skip comments for speed)

python reddit_scraper.py --subreddit pics --sort hot --limit 50 --no-comments

4. Download a specific post with all comments

python reddit_scraper.py --url "https://reddit.com/r/nextfuckinglevel/comments/1o8ze9o/"

5. Custom output directory

python reddit_scraper.py --subreddit python --output my_python_archive --limit 30

🐍 Python API

You can also use the scraper in your Python scripts:

from reddit_scraper import RedditJSONScraper

# Create scraper instance
scraper = RedditJSONScraper(output_dir="downloads")

# Download from subreddit
scraper.download_subreddit(
    subreddit="python",
    sort="top",
    time_filter="week",
    limit=50,
    include_comments=True
)

# Download from user
scraper.download_user(username="spez", limit=25)

# Download single post
scraper.download_from_url("https://reddit.com/r/pics/comments/abc123/")

# Fetch posts without downloading (for analysis)
posts = scraper.fetch_subreddit("python", sort="hot", limit=100)
for post in posts:
    print(f"{post['title']} - {post['score']} upvotes")

See examples.py for more usage examples!

📁 Output Structure

Downloaded content is organized like this:

reddit_downloads/
├── r_python/
│   ├── 1abc123_Amazing_Python_Tutorial.md          # Text post with comments
│   ├── 1def456_Cool_Screenshot.png                 # Image
│   ├── 1ghi789_Funny_Video.mp4                     # Video
│   └── 1jkl012_Interesting_Article_link.html       # Link post
├── r_pics/
│   └── ...
└── u_username/
    └── ...

File Naming

Files are named: {post_id}_{sanitized_title}.{extension}

Post ID - Reddit's unique ID (prevents duplicates)
Title - Post title with special characters removed
Extension - Based on content type (.md, .jpg, .mp4, .html)

Content Formats

Text Posts (.md) - Markdown files with:

Post title, author, score, timestamp
Full post content
All comments with nested replies
Formatted for easy reading

Images - Downloaded in original format (JPG, PNG, etc.)

Videos - MP4 format, including Reddit-hosted videos

Link Posts (.html) - HTML files that auto-redirect to the original URL

🔧 How It Works

This scraper uses Reddit's public JSON API - no authentication needed!

Simply append .json to any Reddit URL:

https://reddit.com/r/python/top.json - Top posts
https://reddit.com/r/python/comments/abc123.json - Post with comments

The scraper:

Fetches JSON data from Reddit
Parses posts and comments
Downloads media files
Saves everything in organized folders

Rate Limiting: Automatically waits 2 seconds between requests to avoid IP bans. If rate-limited (429 error), waits 60 seconds and retries.

⚠️ Important Notes

What You Can Download

✅ Any public subreddit
✅ Any public user's posts
✅ Any public post with comments
✅ Images, videos, GIFs
✅ Text posts with full formatting
✅ Link posts

What You Cannot Download

❌ Private subreddits (requires login)
❌ Deleted or removed posts
❌ User-specific content (saved posts, messages)
❌ Content from banned/quarantined subreddits

Best Practices

🕐 Use reasonable limits (don't download 10,000 posts at once)
⏱️ The scraper has built-in rate limiting - don't modify it
💾 Large downloads can take significant time and disk space
🤝 Be respectful of Reddit's servers

🆚 Comparison with Other Tools

Feature	This Scraper	easy-reddit-downloader	Gallery-dl
Language	Python	Node.js	Python
Auth Required	❌ No	❌ No	❌ No
Comments	✅ Nested	✅ Yes	❌ No
CLI	✅ Args	✅ Interactive	✅ Args
Python API	✅ Yes	❌ No	⚠️ Limited
Dependencies	1 (requests)	Multiple	Multiple
Setup Time	< 1 min	~2 min	~2 min

🐛 Troubleshooting

"No items returned" or "403 Forbidden"

Subreddit might be private or doesn't exist
Try a different subreddit or check the spelling
Reddit might be temporarily blocking requests

"Rate limited (429)"

The scraper will automatically wait and retry
If it persists, wait a few minutes before running again

"No module named 'requests'"

Install the dependency: pip install requests

Downloads are slow

This is normal - the scraper waits 2 seconds between requests
Use --no-comments to skip comments for faster downloads
Reduce --limit for smaller batches

Some posts didn't download

Posts may be deleted, removed, or failed to load
Check the output for error messages
This is expected - not all posts will download successfully

📝 Examples

Check out examples.py for 8 different usage examples:

Basic subreddit download
Multiple subreddits
User posts
Single post with comments
Fetch and analyze (no download)
Filter by score
Media-only downloads
Custom data processing

Run examples:

python examples.py

🤝 Contributing

Contributions are welcome! Feel free to:

🐛 Report bugs
💡 Suggest features
🔧 Submit pull requests
📖 Improve documentation

📄 License

MIT License - See LICENSE file for details.

⚖️ Disclaimer

This tool is for educational and personal use only. Please:

Respect Reddit's Terms of Service
Don't use for commercial purposes without permission
Don't abuse Reddit's servers with excessive requests
Be a good internet citizen! 🌐

🙏 Acknowledgments

Inspired by easy-reddit-downloader
Uses Reddit's public JSON API
Built with Python and the requests library

💬 Support

📖 Check the Quick Start Guide
🐛 Open an issue
💡 View examples

Made with ❤️ for the Reddit community

If you find this useful, give it a ⭐ on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
test_chatbot_format/r_NevilleGoddard		test_chatbot_format/r_NevilleGoddard
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
INSTALL.md		INSTALL.md
INTERACTIVE_GUIDE.md		INTERACTIVE_GUIDE.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
chatbot_formatter.py		chatbot_formatter.py
examples.py		examples.py
interactive.py		interactive.py
rag_formatter.py		rag_formatter.py
reddit_scraper.py		reddit_scraper.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Reddit JSON Scraper

✨ Features

🎬 Quick Start

Installation

Basic Usage

🎯 Interactive Mode (Recommended for Beginners)

⚡ Command-Line Mode (For Advanced Users)

📖 Detailed Usage

Command Line Options

Examples

🐍 Python API

📁 Output Structure

File Naming

Content Formats

🔧 How It Works

⚠️ Important Notes

What You Can Download

What You Cannot Download

Best Practices

🆚 Comparison with Other Tools

🐛 Troubleshooting

📝 Examples

🤝 Contributing

📄 License

⚖️ Disclaimer

🙏 Acknowledgments

💬 Support

About

Uh oh!

Releases

Packages

Languages

License

0anxt/reddit-json-scraper

Folders and files

Latest commit

History

Repository files navigation

🚀 Reddit JSON Scraper

✨ Features

🎬 Quick Start

Installation

Basic Usage

🎯 Interactive Mode (Recommended for Beginners)

⚡ Command-Line Mode (For Advanced Users)

📖 Detailed Usage

Command Line Options

Examples

🐍 Python API

📁 Output Structure

File Naming

Content Formats

🔧 How It Works

⚠️ Important Notes

What You Can Download

What You Cannot Download

Best Practices

🆚 Comparison with Other Tools

🐛 Troubleshooting

📝 Examples

🤝 Contributing

📄 License

⚖️ Disclaimer

🙏 Acknowledgments

💬 Support

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages