Skip to content

πŸ“Έ Scrape Instagram profile URLs at scale β€” auto-scroll, smart filtering, deduplication, CSV export. Python + Selenium.

License

Notifications You must be signed in to change notification settings

SoCloseSociety/InstagramDataScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Instagram Data Scraper

Scrape Instagram profile URLs at scale β€” automated scrolling, smart filtering, clean CSV export.

License: MIT Python 3.10+ Platform Selenium GitHub Stars Issues Forks

Quick Start β€’ Features β€’ Configuration β€’ FAQ β€’ Contributing


What is Instagram Data Scraper?

Instagram Data Scraper is a free, open-source Instagram profile extraction tool built with Python and Selenium. It automates the collection of Instagram profile URLs from any page β€” feed, hashtag, explore, followers β€” with smart filtering and deduplication.

Need to build a prospect list, analyze followers, or study engagement patterns? Manually copying profiles is slow and tedious. This scraper handles login, scrolling, extraction, deduplication, and CSV export in one command.

Who is this for?

  • Growth Hackers building lead lists for outreach campaigns
  • Digital Marketers studying competitors' follower bases
  • Data Analysts collecting social media datasets
  • Researchers studying engagement patterns and influencer networks
  • Startup Founders identifying potential customers or partners
  • Developers learning Selenium browser automation

Key Features

  • One-Command Setup - Clone, install, run β€” scraping in under 2 minutes
  • Smart Login - Automated Instagram authentication via Selenium
  • Infinite Scroll - Continuous feed scrolling with auto-stop detection
  • Profile Filtering - Extracts only profile URLs, skips /explore/, /reels/, /settings/
  • Deduplication - Built-in set() ensures zero duplicate profiles
  • Human-Like Delays - Randomized scroll timing (0.8s-2.0s) to mimic real behavior
  • Auto-Save - Progress saved every 50 iterations β€” never lose data
  • Graceful Stop - Press Ctrl+C anytime β€” all collected data is saved
  • Secure Credentials - .env file support β€” credentials never in code
  • Clean CSV Output - Full Instagram URLs, sorted alphabetically, UTF-8 encoded
  • Free & Open Source - MIT license, no API key required

Quick Start

Prerequisites

Requirement Details
Python Version 3.10 or higher (Download)
Google Chrome Latest version (Download)
Instagram Account A valid Instagram account

Installation

# 1. Clone the repository
git clone https://github.com/SoCloseSociety/InstagramDataScraper.git
cd InstagramDataScraper

# 2. (Recommended) Create a virtual environment
python -m venv venv

# Activate it:
# Windows:
venv\Scripts\activate
# macOS / Linux:
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

Configure Credentials

cp .env.example .env

Edit .env:

INSTA_USERNAME=your_username_or_email
INSTA_PASSWORD=your_password

Skip the .env file to enter credentials at runtime instead.

Usage

python main.py

What happens:

  1. Chrome opens and logs in to Instagram
  2. You navigate to the page you want to scrape (feed, hashtag, explore...)
  3. Press ENTER to start
  4. The scraper scrolls and collects profile links automatically
  5. Results are saved to a .csv file

Press Ctrl+C at any time to stop and save.


How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Chrome opens │───>β”‚ Login to     │───>β”‚ Scroll & extract │───>β”‚ Export to β”‚
β”‚ via Selenium β”‚    β”‚ Instagram    β”‚    β”‚ profile links    β”‚    β”‚ CSV file  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                               β”‚
                                        β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
                                        β”‚ Deduplicate β”‚
                                        β”‚ + filter    β”‚
                                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Output Format

ProfileLink
https://www.instagram.com/alice/
https://www.instagram.com/bob/
https://www.instagram.com/charlie/

Configuration

Edit the constants at the top of main.py:

Variable Default Description
MAX_STALE_ITERATIONS 500 Stop after N iterations with no new links
SCROLL_PAUSE_MIN 0.8s Minimum delay between scrolls
SCROLL_PAUSE_MAX 2.0s Maximum delay between scrolls
SCROLL_AMOUNT 600 Pixels to scroll down per iteration
SAVE_INTERVAL 50 Save to CSV every N iterations

Tech Stack

Technology Purpose
Python 3.10+ Core language
Selenium 4.x Browser automation
BeautifulSoup4 HTML parsing
lxml Fast HTML parser backend
python-dotenv Environment variable management
webdriver-manager Automatic ChromeDriver setup

Project Structure

InstagramDataScraper/
β”œβ”€β”€ main.py              # Core scraper script
β”œβ”€β”€ requirements.txt     # Python dependencies
β”œβ”€β”€ .env.example         # Credential template
β”œβ”€β”€ assets/
β”‚   └── banner.svg       # Project banner
β”œβ”€β”€ pyproject.toml       # Python project metadata
β”œβ”€β”€ CONTRIBUTING.md      # Contribution guidelines
β”œβ”€β”€ LICENSE              # MIT License
β”œβ”€β”€ README.md            # This file
└── .gitignore           # Git ignore rules

Troubleshooting

Chrome driver issues

pip install --upgrade webdriver-manager

Login fails

If the automated login doesn't work:

  1. Check your credentials in .env
  2. Instagram may require 2FA β€” complete it manually in the browser window
  3. Try logging in manually first, then press ENTER to start scraping

No profiles found

If the scraper scrolls but doesn't find profiles:

  1. Make sure you navigated to a page with profile links (feed, hashtag page, followers list)
  2. Instagram may have changed its HTML structure β€” open an issue

FAQ

Q: Is this free? A: Yes. Instagram Data Scraper is 100% free and open source under the MIT license.

Q: Do I need an Instagram API key? A: No. This tool uses browser automation (Selenium), no API key needed.

Q: How many profiles can I scrape? A: No hard limit. The scraper runs until no new profiles are found for 500 consecutive iterations. Be mindful of Instagram's usage policies.

Q: Are my credentials safe? A: Credentials are stored in a local .env file that is gitignored. They are never uploaded or shared.

Q: Can I scrape hashtag pages? A: Yes. After login, navigate to any hashtag page, press ENTER, and the scraper will collect profile links.

Q: Does it work on Mac / Linux? A: Yes. Fully cross-platform on Windows, macOS, and Linux.


Alternatives Comparison

Feature Instagram Data Scraper Manual Copy-Paste Instagram API Paid Tools
Price Free Free Free (limited) $30-100/mo
Bulk extraction Yes No Rate limited Yes
Profile filtering Yes Manual N/A Varies
Open source Yes N/A No No
API key required No No Yes Yes
Cross-platform Yes Yes Any Web only

Contributing

Contributions are welcome! Please read the Contributing Guide before submitting a pull request.


License

This project is licensed under the MIT License.


Disclaimer

This tool is provided for educational and research purposes only. Scraping Instagram may violate their Terms of Service. The authors are not responsible for any misuse or consequences resulting from the use of this software. Always respect platform policies and applicable laws.


If this project helps you, please give it a star!
It helps others discover this tool.

Star this repo


Built with purpose by SoClose β€” Digital Innovation Through Automation & AI
Website β€’ LinkedIn β€’ Twitter β€’ Contact

About

πŸ“Έ Scrape Instagram profile URLs at scale β€” auto-scroll, smart filtering, deduplication, CSV export. Python + Selenium.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages