- discard_phishtank.csv all websites discared from the phishtank's websites
- discard_text_analysis.csv all websites discared from the redirecting approach.
One site is available in 2 form:
- only the html page.
- all the website downloaded throught wget.
Directories structure:
- all_websites
- all_websites
- websites_captured_from_adv
- websites_from_phishtank
- all_websites_only_html
- all_websites_html_from_adv
- all_websites_html_phishtank
- all_websites
- websites_clean
- websites
- websites_only_html