Web Scrapping Project 2 for Java (wsp2j). Automated locally hosted periodic website monitoring and logging using JSoup.
target.list - Input Target List in CSV
- Required Fileds: 3 (Target Classification, Target ID, Target URL)
- Header: Optional (Required Header Format:
CLASSIFICATION,xxx,yyy)
wsp2j.frequency - Scheduler Timings File (optional; you will be asked to provide timings in the console if the file is not present)
- Required Format
HH mm SS,hours minutes seconds
targets.obj - Targets State Space (optional; autogenerated;)
- Serialized vector of targets generated everytime the targets are refreshed.
wsp2j-monitoring-YYYYmmDD-HHmmSS.txt - wsp2j Monitor Log (autogenerated)
Keys
l - Input List File,
o - Input State Space,
t - Input Timings File,
- - Any,
x - Not Found
--t: Timings loaded from input file--x: Timings loaded from console promptlx-: Create and Start Mode: Creates target vector database and initiates timing schedulerxo-: Resume Mode: Loads target vector database from file and initiates timing schedulerlo-: Update and Resume Mode: Loads and updates the vector database from files and initiates timing scheduler
-
Initialization Sequence
- Scan targets and load the database into memory
- Load required SSH certificates (into local JDK keystore) for relevant domains (skipped if all certificates are up-to-date)
- Scan and load timings for the scheduler sequence
- Scan and load the required plugins
-
Run Sequence
- Schedule single-threaded executor service to refresh target hashcodes.
- Trigger plugin hook if hashcode change is registered.
-
Error Handling
Not present; you are on your own.
Copyright (C) 2024, Prajval K (@prajvalk) & Chlorine Pentoxide (@ChlorinePentoxide)
MIT LICENSE