webscrape

Webscrape

Deploy lambda folder top AWS lambda function.

Becuase AWS lambda doesn't support BeautifulSoup or Requests these need to be bundled with the function.

Make any changes to the lambda_function.py in lambda, this is what is ran when the lambda function is invoked. Any libraries need to be installed INTO the lambda folder. e.g

# to install requests navigate to lambda folder and run
pip install requests -t ./

requests
bs4

The contents of the lambda folder then need to be zipped.

zip -r zip.zip .

Note this is the contents (everything inside lambda) NOT the actual folder

Create a lambda function and select the upload from zip option.

Create in the lambda configuration three envrioment variables

proxy_api
secret
access_key

proxy_api is used for the webscraping. The service used is web scrape api. Create an account, get an API access key, this is the proxy_api

secret and access_key are AWS keys for an S3 bucket. The scraped web page is auto uploaded to an S3 bucket.

Scripts

get_urls - this gets all the different urls for the different catagories and saves this as a json

bookApp

This is an RShiny WebApp. Currently in development.

In bookApp directory run run_dev.R to run the dev version.

Rscript run_dev.R

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
bookApp		bookApp
classify_covers		classify_covers
lambda		lambda
.gitignore		.gitignore
README.md		README.md
get_urls.py		get_urls.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

webscrape

Webscrape

Scripts

bookApp

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

webscrape

Webscrape

Scripts

bookApp

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages