DarkWebDreamMarketScrapper

Helping my friend to scrape information on drugs sold on DarkWeb for her economic research project on effects of legalization on illegal markets. I used selenium, Tor, openCV, tesseract and Python to crawl darkweb and overcome CAPTCHA check which activates every 100th click.

test_OCR.py - basically keeps refreshing main page and prints the captcha text as recognized by OCR. It usually guesses correctly 1 in 8 attempts. When it fails to produce anything meaningful, "fail" is returned

main.py - runs the main loop. After browser is open, the user is given a minute to navigate to the listing she wants to scrap. Then the program automatically clicks on all the drug items, goes to individual pages and gets relevant information from them. After all items on the page are scraped the program automatically navigates to the next page, until all the listing is scraped.

captcha_hack.py - contains some of the OCR-related functions used in main.py

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
captcha_hack.py		captcha_hack.py
main.py		main.py
test_OCR.py		test_OCR.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DarkWebDreamMarketScrapper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DarkWebDreamMarketScrapper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages