a data collection tool
Web-scrape is able to handle collection requests outside of its config.
gcloud pubsub topics publish vaqmr --message '{"collector":"web_scrape", "work_list":[{"url":"http://blah.meh","storage_key":"blah"}]}'Add new function in main.py
Add new config in config.yml
Test locally:
import main
main.vaqmr_worker({'collector':'your_new_collector'}, 'context')-
check output in gs://dotufp-raw/
-
deploy vaqmr
Create a schedule:
-
https://console.cloud.google.com/cloudscheduler?project=dotufp
-
Create Job -> Name = twitter-faves -> Target = Pub/Sub -> Topic = vaqmr -> Payload = {"collector":"twitter_faves"}
Test function:
-
https://console.cloud.google.com/cloudscheduler?project=dotufp
-
Find the schedule and click Run Now.
-
Check function logs.
-
Check output file.
Test locally:
import main
main.vaqmr_worker(event={'collector':'your_new_collector',
'work_list':[{'url':'http://blah.meh',
'storage_key':'blah'}]},
context='context')Or refer Manual (non-scheduled) collection.
Reference: https://cloud.google.com/functions/docs/monitoring/