Building the highways to prosperity, one tender at a time! π€οΈ This AWS Lambda service is the road engineering powerhouse of our tender scraping fleet - one of five specialized crawlers that captures opportunities from South Africa's premier road infrastructure agency. From massive freeway construction to bridge maintenance, we pave the way to every opportunity! π
- π― Overview
- π£οΈ Lambda Function (lambda_function.py)
- π Data Model (models.py)
- π·οΈ AI Tagging Initialization
- π Example Tender Data
- π Getting Started
- π¦ Deployment
- π§° Troubleshooting
Welcome to the highway to opportunity! π This service is your express lane into SANRAL's vast road infrastructure ecosystem, capturing multi-billion rand highway projects, bridge constructions, geotechnical investigations, and critical maintenance contracts that keep South Africa's road network world-class! π
What makes it build better roads? ποΈ
- π£οΈ Road Infrastructure Expertise: Specialized in highways, bridges, interchanges, and road maintenance
- π Dual-Mode Intelligence: Unique two-step process combining API calls with intelligent web scraping
- π National Coverage: From Cape Town's coastal routes to Johannesburg's highway networks
- π― Engineering Precision: Captures complex geotechnical, consulting, and construction opportunities
The highway engineering brain of our operation! π§ The lambda_handler orchestrates our sophisticated dual-phase extraction process:
-
π Initial Reconnaissance: Connects to the SANRAL JSON API to get the construction site survey - a comprehensive list of all open tenders across South Africa's road network.
-
π‘οΈ Highway-Grade Error Handling: Built like a reinforced bridge! Handles network construction zones, API maintenance periods, and response detours with civil engineering precision. Always finds an alternate route! π§
-
π Deep Excavation Phase: Here's where we get serious! For each tender opportunity, we don't just scrape the surface - we conduct a full geotechnical investigation:
- Phase 1: Extract the tender URL from the API response
- Phase 2: Deploy our web scraping bulldozers (BeautifulSoup) to excavate detailed information from each tender page
- Phase 3: Consolidate surface data with deep-drill information for complete tender profiles
-
βοΈ Civil Engineering Validation: Each tender goes through our rigorous
SanralTendermodel with specialized logic for HTML parsing, regex pattern matching for emails and URLs, and robust date parsing that handles SANRAL's construction timelines. -
β Quality Assurance Inspector: Our validation process ensures only structurally sound tenders make it through. Failed excavations get logged and marked for review - no unstable foundations in our pipeline! π¨
-
π¦ Highway Batching: Valid tenders are efficiently organized into construction batches of 10 messages - optimized for maximum SQS throughput like a well-planned highway interchange.
-
π Express Lane Delivery: Each batch travels the fast lane to the central
AIQueue.fifoSQS queue with the uniqueMessageGroupIdofSanralTenderScrape. This keeps our road infrastructure tenders organized and maintains perfect traffic flow.
Our data architecture is engineered for highway-grade performance! ποΈ
The solid foundation that supports all our tender infrastructure! This abstract class defines the core roadway that connects all construction opportunities:
π§ Core Attributes:
title: The project blueprint - what road infrastructure is being built?description: Detailed engineering specifications and construction requirementssource: Always "SANRAL" for this highway construction specialistpublished_date: When this construction project broke groundclosing_date: Bid submission deadline - when the construction gate closes! β°supporting_docs: Critical engineering drawings and specificationstags: Keywords for AI intelligence (starts empty, gets surveyed by our AI service)
This engineering powerhouse inherits all the foundational strength from TenderBase and adds SANRAL's unique highway construction features:
ποΈ SANRAL-Specific Attributes:
tender_number: Official SANRAL project code (e.g., "SUB-CONTRACT SANRAL N.001-250-2024/1D-SS")category: Type of construction project (e.g., "Other Projects", "Consulting Engineering")region: Which regional office manages this highway (e.g., "Northern Region", "Western Cape")email: Direct line to the project engineer (extracted via intelligent web scraping)full_notice_text: Complete construction notice with all technical specifications and requirements
π Advanced Engineering Process:
The from_api_response method is our master civil engineer! It performs:
- HTML Excavation: BeautifulSoup-powered deep drilling into tender pages
- Regex Survey: Pattern matching for emails, URLs, and technical specifications
- Date Engineering: Robust parsing of construction timelines and deadlines
We're all about intelligent highway planning! π€ Every tender that travels through our system is perfectly prepared for downstream AI enhancement:
# From models.py - Preparing for AI highway classification! π£οΈ
return cls(
# ... other fields
tags=[], # Initialize tags as an empty list, ready for the AI service.
# ... other fields
)This ensures seamless highway integration with our AI pipeline - every tender object arrives with a clean, empty tags field just waiting to be surveyed with intelligent categorizations! π§ π€οΈ
Here's what a real SANRAL highway project looks like after our scraper works its construction magic! π©β¨
{
"title": "Sub-Contract Sanral N.001-250-2024/1D-Ss",
"description": "For Geotechnical Investigation Including Test Pitting, Laboratory Testing And Rotary Core Drilling For The Upgrading Of National Route 1 Section 25 From Modimolle Interchange (Km 0.0) To Tobias Zyn Loop (Km 30.0) Sub-Contract Sanral N.001-250-2024/1D-Ss",
"source": "SANRAL",
"publishedDate": "2025-09-26T00:00:00",
"closingDate": "2025-10-17T11:00:00",
"supporting_docs": [
{
"name": "Tender Details",
"url": "https://www.nra.co.za/open-tenders/sub-contract-sanral-n-001-250-2024-1d-ss"
}
],
"tags": [],
"tenderNumber": "SUB-CONTRACT SANRAL N.001-250-2024/1D-SS",
"category": "Other Projects",
"region": "Northern Region",
"email": "",
"fullNoticeText": "Tender Notice And Invitation To Tender (Incorporating Sbd1) The South African National Roads Agency Soc Limited (Sanral) On Behalf Of Zutari (Pty) Ltd Invites Suitably Qualified Tenderers For Sanral N.001-250-2024/1D-Ss Geotechnical Investigation Including Test Pitting, Laboratory Testing And Rotary Core Drilling For The Upgrading Of National Route 1 Section 25 From Modimolle Interchange (Km 0.0) To Tobias Zyn Loop (Km 30.0 ). This Project Is In The Province Of Limpopo And In The District Municipality Of Waterberg And Local Municipality Of ModimolleβMookgophong. The Approximate Duration Is 4.5 Months..."
}π£οΈ What this highway project delivers:
- ποΈ Major Highway Upgrade: National Route 1 enhancement project in Limpopo
- π Geotechnical Engineering: Comprehensive soil investigation with core drilling
- π Strategic Corridor: 30km stretch from Modimolle Interchange to Tobias Zyn Loop
- β° Fast-Track Timeline: 4.5-month engineering investigation period
- π Regional Impact: Critical infrastructure for Waterberg District transportation
- π― Professional Opportunity: Sub-contract through established engineering firm Zutari
Ready to build the highway to success? Let's lay the foundation! ποΈ
- AWS CLI configured with appropriate credentials π
- Python 3.9+ with pip π
- BeautifulSoup4 for web scraping capabilities π
- Access to AWS Lambda and SQS services βοΈ
- Understanding of civil engineering and road construction terminology π£οΈ
- π Clone the repository
- π¦ Install dependencies:
pip install -r requirements.txt - π§ͺ Run tests:
python -m pytest - π Test locally: Use AWS SAM for local Lambda simulation
This section covers three deployment methods for the SANRAL Tender Processing Lambda Service. Choose the method that best fits your workflow and infrastructure preferences.
Before deploying, ensure you have:
- AWS CLI configured with appropriate credentials π
- AWS SAM CLI installed (
pip install aws-sam-cli) - Python 3.13 runtime support in your target region
- Access to AWS Lambda, SQS, and CloudWatch Logs services βοΈ
- Required Python dependencies:
beautifulsoup4andrequests
Deploy directly through your IDE using the AWS Toolkit extension.
- Install AWS Toolkit in your IDE (VS Code, IntelliJ, etc.)
- Configure AWS Profile with your credentials
- Open Project containing
lambda_function.pyandmodels.py
- Right-click on
lambda_function.pyin your IDE - Select "Deploy Lambda Function" from AWS Toolkit menu
- Configure Deployment:
- Function Name:
SanralFunction - Runtime:
python3.13 - Handler:
lambda_function.lambda_handler - Memory:
128 MB - Timeout:
120 seconds
- Function Name:
- Add Layers manually after deployment:
- beautifulsoup4-library layer
- requests-library layer
- Set Environment Variables as needed
- Configure IAM Permissions for SQS, Logs, and EC2 (for VPC if needed)
- Test the function using the AWS Toolkit test feature
- Monitor logs through CloudWatch integration
- Update function code directly from IDE for quick iterations
Use AWS SAM for infrastructure-as-code deployment with the provided template.
# Install AWS SAM CLI
pip install aws-sam-cli
# Verify installation
sam --versionSince the template references layers not included in the repository, create them:
# Create layer directories
mkdir -p beautifulsoup4-library/python
mkdir -p requests-library/python
# Install beautifulsoup4 layer
pip install beautifulsoup4 -t beautifulsoup4-library/python/
# Install requests layer
pip install requests -t requests-library/python/# Build the SAM application
sam build
# Deploy with guided configuration (first time)
sam deploy --guided
# Follow the prompts:
# Stack Name: sanral-lambda-stack
# AWS Region: us-east-1 (or your preferred region)
# Confirm changes before deploy: Y
# Allow SAM to create IAM roles: Y
# Save parameters to samconfig.toml: Y# Quick deployment after initial setup
sam build && sam deploy# Test function locally
sam local invoke SanralFunction
# Start local API Gateway (if needed)
sam local start-api- β Complete infrastructure management
- β Automatic layer creation and management
- β IAM permissions defined in template
- β Easy rollback capabilities
- β CloudFormation integration
Automated deployment using GitHub Actions workflow for production environments.
-
GitHub Repository Secrets:
AWS_ACCESS_KEY_ID: Your AWS access key AWS_SECRET_ACCESS_KEY: Your AWS secret key AWS_REGION: us-east-1 (or your target region) -
Pre-existing Lambda Function: The workflow updates an existing function, so deploy initially using Method 1 or 2.
-
Create Release Branch:
# Create and switch to release branch git checkout -b release # Make your changes to lambda_function.py or models.py # Commit changes git add . git commit -m "feat: update SANRAL tender processing logic" # Push to trigger deployment git push origin release
-
Automatic Deployment: The workflow will:
- Checkout the code
- Configure AWS credentials
- Create deployment zip with
lambda_function.pyandmodels.py - Update the existing Lambda function code
- Maintain existing configuration (layers, environment variables, etc.)
You can also trigger deployment manually:
- Go to Actions tab in your GitHub repository
- Select "Deploy Python Scraper to AWS" workflow
- Click "Run workflow"
- Choose the
releasebranch - Click "Run workflow" button
- β Automated CI/CD pipeline
- β Consistent deployment process
- β Audit trail of deployments
- β Easy rollback to previous commits
- β No local environment dependencies
Regardless of deployment method, configure the following:
SQS_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/211635102441/AIQueue.fifo
API_TIMEOUT=30
SCRAPING_TIMEOUT=30
BATCH_SIZE=10
USER_AGENT=Mozilla/5.0 (compatible; SANRAL-Tender-Bot/1.0)Set up scheduled execution:
# Create CloudWatch Events rule for daily execution
aws events put-rule \
--name "SanralLambdaSchedule" \
--schedule-expression "cron(0 9 * * ? *)" \
--description "Daily SANRAL tender scraping"
# Add Lambda as target
aws events put-targets \
--rule "SanralLambdaSchedule" \
--targets "Id"="1","Arn"="arn:aws:lambda:us-east-1:211635102441:function:SanralFunction"After deployment, test the function:
# Test via AWS CLI
aws lambda invoke \
--function-name SanralFunction \
--payload '{}' \
response.json
# Check the response
cat response.json- β Function executes without errors
- β CloudWatch logs show successful API calls and scraping activity
- β SQS queue receives tender messages
- β No timeout or memory errors
- β Valid JSON tender data in queue messages
- Duration: Function execution time
- Error Rate: Failed invocations
- Memory Utilization: RAM usage patterns
- Throttles: Concurrent execution limits
# View recent logs
aws logs tail /aws/lambda/SanralFunction --follow
# Search for errors
aws logs filter-log-events \
--log-group-name /aws/lambda/SanralFunction \
--filter-pattern "ERROR"Layer Dependencies Missing
Issue: beautifulsoup4 or requests import errors
Solution: Ensure layers are properly created and attached:
# For SAM: Verify layer directories exist and contain packages
ls -la beautifulsoup4-library/python/
ls -la requests-library/python/
# For manual deployment: Create and upload layers separatelyIAM Permission Errors
Issue: Access denied for SQS or CloudWatch operations
Solution: Verify the Lambda execution role has required permissions:
sqs:SendMessagesqs:GetQueueUrlsqs:GetQueueAttributeslogs:CreateLogGrouplogs:CreateLogStreamlogs:PutLogEventsec2:CreateNetworkInterfaceec2:DeleteNetworkInterfaceec2:DescribeNetworkInterfaces
Workflow Deployment Fails
Issue: GitHub Actions workflow errors
Solution: Check repository secrets are correctly configured and the target Lambda function exists in AWS.
API Connection Issues
Issue: Cannot connect to SANRAL API endpoints
Solution: Verify network connectivity and consider VPC configuration if the Lambda needs specific network access.
Choose the deployment method that best fits your development workflow and infrastructure requirements. SAM deployment is recommended for development environments, while workflow deployment excels for production CI/CD pipelines.
Web Scraping Timeouts
Issue: BeautifulSoup operations timing out on complex SANRAL tender pages.
Solution: SANRAL tender pages can be engineering document-heavy! Increase scraping timeouts and implement intelligent content parsing that focuses on key data sections. Sometimes you need to excavate carefully! π
HTML Structure Changes
Issue: SANRAL website updates breaking the scraping logic.
Solution: Highway maintenance never stops! Monitor for HTML structure changes and update your scraping selectors accordingly. Keep your web scraping tools as current as road maintenance! π οΈ
Dual-Phase Processing Failures
Issue: API call succeeds but web scraping phase fails.
Solution: Implement robust fallback logic. If detailed scraping fails, ensure you can still process basic tender information from the API. A partial highway is better than no highway! π§
Engineering Document Processing
Issue: Complex technical documents causing parsing failures.
Solution: SANRAL deals in serious civil engineering! Update your parsing logic to handle technical specifications, engineering drawings references, and construction terminology. Build your parser like you'd build a bridge - to last! π
Regional Data Variations
Issue: Different regional offices formatting data differently.
Solution: SANRAL operates across diverse regions with varying formatting standards. Implement flexible parsing that can handle Northern Region, Western Cape, and other regional variations! π
Built with love, bread, and code by Bread Corporation π¦β€οΈπ»