FinChat Scraper

This project is designed to scrape US equity data such as financial filings, SEC documents, press releases, and more from FinChat. The extracted data is stored in JSON format and pushed to a MongoDB database, enabling seamless data analysis and integration into other systems.

Key Features

Automated Pipeline: The entire process is automated to handle equities in bulk, eliminating the need for manual labor. The pipeline is robust and ensures data consistency and accuracy.
Generic Scraper: Tested successfully with over 250 equities, proving its ability to work for all equities available on FinChat. The scraper is designed to adapt to changes in the website structure with minimal modifications.
Data Storage: Extracted data is stored in JSON format and seamlessly integrated into MongoDB, making it easy to query and analyze the data for further insights.
Scalability: The scraper is built to handle large volumes of data efficiently, making it suitable for both small-scale and enterprise-level use cases.
Error Handling: Includes mechanisms to handle errors gracefully, ensuring uninterrupted operation even when encountering unexpected issues.

Demo

Below is a demonstration of the scraper in action:

The demo showcases the scraper's ability to extract data, process it, and store it in MongoDB in real-time.

Contributions

Contributions are welcome! If you have suggestions for improvements or new features, feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
.vscode		.vscode
Completed		Completed
US_Completed		US_Completed
__pycache__		__pycache__
assets		assets
chrome_profile		chrome_profile
utils		utils
~/.config/google-chrome		~/.config/google-chrome
.gitignore		.gitignore
ASML_sec_scraper.py		ASML_sec_scraper.py
BFA_sec_scraper.py		BFA_sec_scraper.py
KR_NRF 2024		KR_NRF 2024
LICENSE		LICENSE
Makefile		Makefile
NTNX_NTNX Q4 2018 Presentation.pdf		NTNX_NTNX Q4 2018 Presentation.pdf
README.md		README.md
a.py		a.py
ahsan.py		ahsan.py
b.py		b.py
c.py		c.py
check.py		check.py
check_duplicate.py		check_duplicate.py
close_box.py		close_box.py
combine.py		combine.py
count_json.py		count_json.py
count_missing_transcripts.py		count_missing_transcripts.py
count_pages.py		count_pages.py
count_pages_parallel.py		count_pages_parallel.py
download_report.py		download_report.py
download_slide.py		download_slide.py
download_transcript.py		download_transcript.py
dups.py		dups.py
filings.py		filings.py
generate_summary.py		generate_summary.py
get_transcript_text.py		get_transcript_text.py
main.py		main.py
main_eu.py		main_eu.py
moiz.py		moiz.py
move.py		move.py
rauf.py		rauf.py
reorder.py		reorder.py
requirements.txt		requirements.txt
scrap_sec.py		scrap_sec.py
sdsadsa.pdf		sdsadsa.pdf
sec_filings.json		sec_filings.json
sec_scraper.py		sec_scraper.py
server.py		server.py
summary.csv		summary.csv
switch_tab.py		switch_tab.py
temp.py		temp.py
zero_replace.py		zero_replace.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FinChat Scraper

Key Features

Demo

Contributions

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

SyedHassanUlHaq/finchat-scrapper

Folders and files

Latest commit

History

Repository files navigation

FinChat Scraper

Key Features

Demo

Contributions

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages