Your gateway to South African government tenders! ποΈ This AWS Lambda service is one of the five powerful web scrapers in our comprehensive tender data pipeline. It connects directly to the National Treasury's eTenders portal API, extracting valuable procurement opportunities and feeding them into our intelligent processing system.
- π― Overview
- β‘ Lambda Function (lambda_handler.py)
- π Data Model (models.py)
- π·οΈ AI Tagging Initialization
- π Example Tender Data
- π Getting Started
- π¦ Deployment
- π§° Troubleshooting
This service is a crucial component of our multi-source tender aggregation pipeline! π It specializes in harvesting tender opportunities from the National Treasury's eTenders portal API, ensuring our system captures every government procurement opportunity available to businesses across South Africa.
What makes it special? β¨
- π Consistent Workflow: Maintains the same data structure and processing patterns as our other scrapers (Eskom, SANRAL, etc.)
- π‘οΈ Robust Validation: Uses Pydantic models to ensure data quality and consistency
- π¦ Intelligent Batching: Groups tenders efficiently for optimal SQS processing
- π·οΈ AI-Ready: Pre-configures every tender for downstream AI tagging and enrichment
The heart of our scraping operation! π The lambda_handler orchestrates the entire data extraction process with military precision:
-
π Fetch Data: Fires off an HTTP GET request to the eTenders paginated API endpoint to retrieve the latest batch of open tenders.
-
π‘οΈ Error Handling: Built like a tank! Handles network hiccups, API timeouts, and response issues with grace, ensuring the function never crashes and burns.
-
π Data Extraction: The eTenders API loves to nest things - it wraps the actual tender list within a
datakey. Our function expertly unwraps this gift! π -
β Data Parsing & Validation: Each tender runs through our rigorous
eTendermodel validation gauntlet. We clean dates, construct proper document URLs, and validate every field. Bad data? It gets logged and left behind! ποΈ -
π¦ Smart Batching: Valid tenders are grouped into efficient batches of up to 10 messages - because bulk operations are always better!
-
π Queue Dispatch: Each batch rockets off to the central
AIQueue.fifoSQS queue with a uniqueMessageGroupIdofeTenderScrape. This keeps our government tenders organized and separate from other sources while maintaining perfect processing order.
Our data architecture is built for consistency and extensibility! ποΈ
The bedrock of our tender universe! This abstract class defines the core DNA shared by all tenders:
𧬠Core Attributes:
title: The tender's headline - what's it all about?description: The juicy details and requirementssource: Always "eTenders" for this scraperpublished_date: When this opportunity first saw the light of dayclosing_date: The deadline - tick tock! β°supporting_docs: Treasure trove of PDF documents and specificationstags: Keywords for AI magic (starts empty, gets filled by our AI service)
This powerhouse inherits all the goodness from TenderBase and adds government-specific superpowers:
π― eTender-Specific Attributes:
tender_number: The official government reference (e.g., "SANPC/2025/003")audience: Which government department is shopping? (e.g., "Strategic Fuel Fund")office_location: Where the briefing happens (e.g., "Microsoft Teams")email: Direct line to the procurement teamaddress: Full physical address constructed from multiple API fieldsprovince: Which province holds the opportunity
We're all about that AI-ready data! π€ Every tender that leaves our scraper is perfectly prepped for the downstream AI tagging service:
# From models.py - Setting the stage for AI magic! β¨
return cls(
# ... other fields
tags=[], # Initialize tags as an empty list, ready for the AI service.
# ... other fields
)This ensures 100% compatibility with our AI pipeline - every tender object arrives with a clean, empty tags field just waiting to be filled with intelligent categorizations! π§
Here's what a real government tender looks like after our scraper works its magic! π©β¨
{
"title": "Architectural And Engineering Activities; Technical Testing And Analysis",
"description": "Rfp To Appoint A Service Provider To Remove The Old Electrical Actuators And Design, Manufacture Deliver, Install And Commission Flameproof Electrical Actuators At The Saldanha Terminal And Oil Jetty (4 Ep/Eb Or Higher)",
"source": "eTenders",
"publishedDate": "2025-10-16T00:00:00",
"closingDate": "2025-11-13T11:00:00",
"supportingDocs": [
{
"name": "Tender Document CIDB-SANPC-2025 -003 Actuators.pdf",
"url": "https://www.etenders.gov.za/home/Download/?blobName=1e4a2580-804b-45ee-bb0c-038142f1f153.pdf&downloadedFileName=Tender%20Document%20CIDB-SANPC-2025%20-003%20Actuators.pdf"
}
],
"tags": [],
"tenderNumber": "SANPC/2025/003",
"audience": "Strategic Fuel Fund",
"officeLocation": "Microsoft Teams",
"email": "sanpcprocurement@sa-npc.co.za",
"address": "151 Frans Conradie Drive, Parow (Petrosa Building), Parow, Cape Town, 7500",
"province": "Western Cape"
}π― What this shows:
- π° High-Value Opportunity: Engineering services for critical fuel infrastructure
- π Industrial Scope: Electrical actuator replacement at Saldanha Terminal
- π Complete Documentation: PDF tender documents readily available
- π Location Clarity: Western Cape, with virtual briefing sessions
- β° Clear Timeline: Published Oct 16, closing Nov 13
Ready to dive into government tender scraping? Let's get you set up! π
- AWS CLI configured with appropriate credentials π
- Python 3.9+ with pip π
- Access to AWS Lambda and SQS services βοΈ
- π Clone the repository
- π¦ Install dependencies:
pip install -r requirements.txt - π§ͺ Run tests:
python -m pytest - π Test locally: Use AWS SAM or similar tools
This section covers three deployment methods for the eTenders Processing Lambda Service. Choose the method that best fits your workflow and infrastructure preferences.
Before deploying, ensure you have:
- AWS CLI configured with appropriate credentials π
- AWS SAM CLI installed (
pip install aws-sam-cli) - Python 3.13 runtime support in your target region
- Access to AWS Lambda, SQS, and CloudWatch Logs services βοΈ
- Required Python dependency:
requests
Deploy directly through your IDE using the AWS Toolkit extension.
- Install AWS Toolkit in your IDE (VS Code, IntelliJ, etc.)
- Configure AWS Profile with your credentials
- Open Project containing
lambda_handler.pyandmodels.py
- Right-click on
lambda_handler.pyin your IDE - Select "Deploy Lambda Function" from AWS Toolkit menu
- Configure Deployment:
- Function Name:
eTendersLambda - Runtime:
python3.13 - Handler:
lambda_handler.lambda_handler - Memory:
128 MB - Timeout:
120 seconds
- Function Name:
- Add Layers manually after deployment:
- requests-library layer
- Set Environment Variables:
SQS_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/211635102441/AIQueue.fifo API_TIMEOUT=30 - Configure IAM Permissions for SQS and CloudWatch Logs
- Test the function using the AWS Toolkit test feature
- Monitor logs through CloudWatch integration
- Update function code directly from IDE for quick iterations
Use AWS SAM for infrastructure-as-code deployment with the provided template.
# Install AWS SAM CLI
pip install aws-sam-cli
# Verify installation
sam --versionSince the template references a layer not included in the repository, create it:
# Create layer directory
mkdir -p requests-library/python
# Install requests layer
pip install requests -t requests-library/python/# Build the SAM application
sam build
# Deploy with guided configuration (first time)
sam deploy --guided
# Follow the prompts:
# Stack Name: etenders-lambda-stack
# AWS Region: us-east-1 (or your preferred region)
# Parameter SQSQueueURL: https://sqs.us-east-1.amazonaws.com/211635102441/AIQueue.fifo
# Parameter APITimeout: 30
# Confirm changes before deploy: Y
# Allow SAM to create IAM roles: Y
# Save parameters to samconfig.toml: YAdd these parameters to your SAM template or set them after deployment:
# Add to template.yml under eTendersLambda Properties
Environment:
Variables:
SQS_QUEUE_URL: https://sqs.us-east-1.amazonaws.com/211635102441/AIQueue.fifo
API_TIMEOUT: "30"# Quick deployment after initial setup
sam build && sam deploy# Test function locally with environment variables
sam local invoke eTendersLambda --env-vars env.json
# Create env.json file:
echo '{
"eTendersLambda": {
"SQS_QUEUE_URL": "https://sqs.us-east-1.amazonaws.com/211635102441/AIQueue.fifo",
"API_TIMEOUT": "30"
}
}' > env.json- β Complete infrastructure management
- β Automatic layer creation and management
- β IAM permissions defined in template
- β Easy rollback capabilities
- β CloudFormation integration
Automated deployment using GitHub Actions workflow for production environments.
-
GitHub Repository Secrets:
AWS_ACCESS_KEY_ID: Your AWS access key AWS_SECRET_ACCESS_KEY: Your AWS secret key AWS_REGION: us-east-1 (or your target region) -
Pre-existing Lambda Function: The workflow updates an existing function, so deploy initially using Method 1 or 2.
-
Create Release Branch:
# Create and switch to release branch git checkout -b release # Make your changes to lambda_handler.py or models.py # Commit changes git add . git commit -m "feat: update eTenders processing logic" # Push to trigger deployment git push origin release
-
Automatic Deployment: The workflow will:
- Checkout the code
- Configure AWS credentials
- Create deployment zip with
lambda_handler.pyandmodels.py - Update the existing Lambda function code
- Maintain existing configuration (layers, environment variables, etc.)
You can also trigger deployment manually:
- Go to Actions tab in your GitHub repository
- Select "Deploy Python Scraper to AWS" workflow
- Click "Run workflow"
- Choose the
releasebranch - Click "Run workflow" button
- β Automated CI/CD pipeline
- β Consistent deployment process
- β Audit trail of deployments
- β Easy rollback to previous commits
- β No local environment dependencies
Regardless of deployment method, configure the following:
Set these environment variables in your Lambda function:
SQS_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/211635102441/AIQueue.fifo
API_TIMEOUT=30
USER_AGENT=Mozilla/5.0 (compatible; eTenders-Bot/1.0)aws lambda update-function-configuration \
--function-name eTendersLambda \
--environment Variables='{
"SQS_QUEUE_URL":"https://sqs.us-east-1.amazonaws.com/211635102441/AIQueue.fifo",
"API_TIMEOUT":"30",
"USER_AGENT":"Mozilla/5.0 (compatible; eTenders-Bot/1.0)"
}'Set up scheduled execution:
# Create CloudWatch Events rule for daily execution
aws events put-rule \
--name "eTendersLambdaSchedule" \
--schedule-expression "cron(0 9 * * ? *)" \
--description "Daily eTenders scraping"
# Add Lambda as target
aws events put-targets \
--rule "eTendersLambdaSchedule" \
--targets "Id"="1","Arn"="arn:aws:lambda:us-east-1:211635102441:function:eTendersLambda"After deployment, test the function:
# Test via AWS CLI
aws lambda invoke \
--function-name eTendersLambda \
--payload '{}' \
response.json
# Check the response
cat response.json- β Function executes without errors
- β CloudWatch logs show successful API calls
- β SQS queue receives tender messages
- β No timeout or memory errors
- β Valid JSON tender data in queue messages
- β MessageGroupId set to "eTenderScrape"
- Duration: Function execution time
- Error Rate: Failed invocations
- Memory Utilization: RAM usage patterns
- Throttles: Concurrent execution limits
# View recent logs
aws logs tail /aws/lambda/eTendersLambda --follow
# Search for errors
aws logs filter-log-events \
--log-group-name /aws/lambda/eTendersLambda \
--filter-pattern "ERROR"
# Search for successful batches
aws logs filter-log-events \
--log-group-name /aws/lambda/eTendersLambda \
--filter-pattern "Successfully sent batch"Layer Dependencies Missing
Issue: requests import errors
Solution: Ensure the requests layer is properly created and attached:
# For SAM: Verify layer directory exists and contains packages
ls -la requests-library/python/
# For manual deployment: Create and upload layer separatelyEnvironment Variables Not Set
Issue: Missing SQS_QUEUE_URL or API_TIMEOUT configuration
Solution: Set environment variables using AWS CLI or console:
aws lambda update-function-configuration \
--function-name eTendersLambda \
--environment Variables='{"SQS_QUEUE_URL":"your-queue-url","API_TIMEOUT":"30"}'IAM Permission Errors
Issue: Access denied for SQS or CloudWatch operations
Solution: Verify the Lambda execution role has required permissions:
sqs:SendMessagesqs:GetQueueUrlsqs:GetQueueAttributeslogs:CreateLogGrouplogs:CreateLogStreamlogs:PutLogEvents
Workflow Deployment Fails
Issue: GitHub Actions workflow errors
Solution: Check repository secrets are correctly configured and the target Lambda function exists in AWS.
API Connection Issues
Issue: Cannot connect to eTenders API
Solution: Verify the API endpoint is accessible and consider increasing the API_TIMEOUT environment variable.
Choose the deployment method that best fits your development workflow and infrastructure requirements. SAM deployment is recommended for development environments, while workflow deployment excels for production CI/CD pipelines.
API Rate Limiting
Issue: Getting HTTP 429 responses from eTenders API.
Solution: Implement exponential backoff and respect rate limits. The government APIs are usually generous but not infinite! ποΈ
Data Validation Failures
Issue: Tenders failing Pydantic validation.
Solution: Check the API response format - government APIs sometimes change structure. Update the model accordingly! π§
SQS Send Failures
Issue: Batches failing to send to SQS.
Solution: Check IAM permissions and queue configuration. FIFO queues are picky about message attributes! π¬
Built with love, bread, and code by Bread Corporation π¦β€οΈπ»