GitHub - AkshayKamath12/Web-Crawler: Takes in a seed URL and repeats a process of gathering internal URLS and their HTML contents to store in indexed files. All of these indexed files are added to a directory of the client's choice. Implementation details and expected usage are included in the README.

General description:

crawler.c takes in command line parameters for the seedURL, the directory, and the depth you want to go to. It validates the parameters and then starts the crawling process. The webpage for the seedURL is added to the bag of webpages to be crawled. It opens a file called 1 within the directory. That file stores the url, depth, and html of the seedURL (html is obtained by using the download function of url.c). Every other numbered file in the directory follows this format.

crawler.c processes the html of the seedURL and finds links within the html. After verifying those links to see if they are internal to the seed and normalizeable, which are functions url.c provides, they are added to a linked list of webpages to be crawled. Any processed urls are added to a hashtable of urls that have been seen. The crawling process stops once the last url at the end of the highest depth is processed as a file.

Compile and run by running the command make and entering "./runTest (seedURL) (directory you want files to be placed) (depth)"

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
directory		directory
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
bag.c		bag.c
crawler.c		crawler.c
crawler.h		crawler.h
curl.c		curl.c
curl.h		curl.h
hashtable.c		hashtable.c
hashtable.h		hashtable.h
pagedir.c		pagedir.c
pagedir.h		pagedir.h
set.c		set.c
set.h		set.h
url.c		url.c
url.h		url.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages