πΈ Please consider donating on Paypal to keep this project alive.
A Python-based web scraper that extracts credit card offer information from bank websites, parses the data, and stores it in structured JSON format for analysis and retrieval.
This project scrapes card offers from multiple bank websites (currently supporting NDB and DFCC banks), extracting key information such as:
- Vendor/merchant names
- Discount/savings amounts
- Contact phone numbers
- Offer expiration dates
- Offer images and descriptions
- Card categories and types
The scraper is designed to handle different website structures through configurable parameters, allowing it to adapt to different bank offer page layouts.
- Multi-bank Support: Extensible architecture supports scraping from different bank websites
- Flexible Configuration: Uses JSON configuration files to define CSS selectors for different banks
- Data Validation: Validates extracted data for consistency across multiple category pages
- Organized Storage: Stores offers organized by vendor name and saving type
- Command-line Interface: Multiple search and filtering options via command-line arguments
- Date-based Output: Automatically organizes output files by month and year
Offer Scraper is licensed under the MIT License.
This program is licensed under the MIT License but the scraped websites will have their own licenses and conditions. PLEASE BE AWARE OF THAT. This program is only for educational purposes.
A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
βοΈ Commercial use | Modification | Distribution | Private use
β Liability | Warranty
βΉοΈ License and copyright notice
Refer to the License declaration for more details.
offer-scraper/
βββ .gitignore # Gitignore file
βββ LICENSE # License file
βββ offer-scraper.py # Main scraper script
βββ parameters.json # Default bank configuration (can be customized)
βββ README.md # README file
βββ requirements.txt # PIP requirements file
βββ parameters/
β βββ parameters_dfcc.json # DFCC Bank configuration
β βββ parameters_ndb.json # NDB Bank configuration
β βββ ... # NDB Bank configuration
βββ offers/
βββ ... # Scraped data (month_year.json format)
- Python 3.6+
- pip (Python package manager)
- Clone the repository:
git clone <repository-url>
cd offer-scraper- Create and activate a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install required dependencies:
pip install -r requirements.txt- requests: HTTP library for web scraping
- beautifulsoup4: HTML parsing library for extracting data from pages
Run the scraper with default parameters:
python offer-scraper.pyDelete existing data file and perform a fresh scrape:
python offer-scraper.py -fSearch and display offers for a specific vendor:
# List all vendors
python offer-scraper.py -v
# Search for specific vendor (case-insensitive)
python offer-scraper.py -v "cinnamon"Search and display all offers in a specific category:
# List all categories
python offer-scraper.py -c
# Search for specific category (case-insensitive)
python offer-scraper.py -c "dining"Information about the program:
python offer-scraper.py -iConfiguration files are JSON-based and define CSS selectors and extraction rules for each bank's website structure.
Key Parameters:
version: Configuration version stringbank: Bank identifier (e.g., "ndb", "dfcc")url: Base URL of the bank's card offers pagesubcategories: CSS selector for category linksitems: CSS selector for individual offer itemsvendor: CSS selector for vendor/merchant namessaving: CSS selector for discount/savings amountsphone: CSS selector for contact phone numbersuntil: CSS selector for expiration datesimg_items/offer_img: CSS selectors for offer imagesroot_is_content: Whether the root URL contains offer details
Each selector is defined as an array: [selector_value, selector_type, html_element, navigation_required]
- Create a new configuration file in the
parameters/directory (e.g.,parameters_newbank.json) - Define CSS selectors for your target bank's website structure
- Update
parameters.jsonto include the same configuration - Run the scraper
Scraped data is stored in JSON files in the offers/ directory with filenames following the pattern: offers_M_YYYY.json (month_year).
Data Structure:
{
"vendor_name": {
"offer_type": {
"category": "category_name",
"savings": "discount_amount",
"phone": "contact_number",
"until": "expiration_date",
"vendor": "vendor_name",
...
}
}
}- Python 3.6 or higher
- Internet connection for web scraping
- Compliant with website terms of service (web scraping should respect robots.txt and terms of use)
- The scraper requires an active internet connection to retrieve web pages
- Headers and user-agent information should be configured appropriately when making requests
- Some websites may require additional handling for dynamic content or anti-scraping measures
- Output files are automatically generated and organized by month/year
- It's recommended to review the target websites' robots.txt and terms of service before deploying this scraper in production
ERROR: Failed to retrieve the page!
- Check your internet connection
- Verify the bank URLs are correct and accessible
- Ensure you're not being blocked by the website
ERROR: Could not find parameters in JSON
- Verify the configuration file exists and contains all required fields
- Check that CSS selectors match the current website structure (websites may change over time)
No offers found
- The website structure may have changed; update CSS selectors in the configuration
- Check if the bank's website is accessible
Bank Card Offers Web Scraper 1.0
- Initial commit
- Added about commands
- License updated to MIT license
- Renamed to 'Offer Scraper'
- Updated README
- Project made public
Β© 2026 Asanka Sovis
