webScraper is a simple automated multitool for scraping websites, focusing on gathering information such as robots.txt, sitemap.xml, admin page discovery, subdomain enumeration, and DNS querying. The tool is written in Python and supports concurrency for faster enumeration and scraping tasks.
- Retrieve and save the
robots.txtof a website. - Retrieve and save the
sitemap.xml. - Discover potential admin pages using a wordlist.
- Enumerate subdomains using a wordlist.
- Perform DNS queries.
- Search for CMS based on the Wappalizer api.
- Search for CVE based on the gathered CMS.
- Search for forms in html.
- Gather data from the SSL cert.
- Clone the repository:
git clone https://github.com/laisoJS/silkWeaver.git- Install required dependencies:
pip install -r requirements.txt- Create a .env file: You need to request an API key from nist.gov website: https://nvd.nist.gov/developers/request-an-api-key
echo NDV_API_KEY=<YourAPIKey> > .envTo use the scraper, run it with the following options:
python silkweaver.py <domain> [options]python silkweaver.py example.com -a wordlists/admin_pages.txt -s wordlists/subdomains.txt -c 10 -vexemple.com: The domain to scrape- -a wordlists/admin_pages.txt: Use the admin_pages.txt wordlist for admin page discovery.
- -s wordlists/subdomains.txt: Use the subdomains.txt wordlist for subdomain enumeration.
- -c: Set concurrency to 10 tasks running simultaneously.
- -v: Enable verbose mode for detailed output.
| Argument | Description | Example |
|---|---|---|
<domain> |
The domain to scrape (without http/https). |
example.com |
-a, --admin |
Path to a wordlist file for admin page discovery. | -a wordlists/admin_pages.txt |
-s, --subs |
Path to a wordlist file for subdomain enumeration. | -s wordlists/subdomains.txt |
-c, --concurrency |
Set the maximum concurrency level for asynchronous tasks. Default: 10. | -c 20 |
-v, --verbose |
Enable verbose output to get detailed information about the process. | -v |
--ssl |
Gather data from the SSL certificate and save the public key as a .pem file | --ssl |
The output files will be saved in an output/ directory. The tool generates the following files based on the tasks executed:
robots.txtfrom the target domain.sitemap.xmlfrom the target domain.admin.txtcontaining discovered admin pages.subdomains.txtcontaining discovered subdomains.cms.jsoncontaining the name, version and category for the gathered cmscve.jsoncontaining a list of cve found by cms name and versionDNS.jsoncontaining the DNS methods and the response of itlinks.txtcontaining the links of the websitesitemap_urls.txtcontaining parsed urls from the sitemap.xmlforms.jsoncontaining gathered form from pagescert.jsoncontaining data from the SSL certcert_key.pempublic key in pem format
Feel free to submit issues and pull requests to improve this tool. Contributions are welcome!
This project is licensed under the MIT License.