A production-ready, enterprise-grade data processing pipeline demonstrating professional AWS architecture with Java 21, Spring Boot 3, and Terraform Infrastructure as Code.
┌─────────────┐ ┌──────────────┐ ┌──────────┐
│ Client │────▶│ ALB │────▶│ EC2 │
└─────────────┘ └──────────────┘ │ (Spring) │
└────┬─────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌──────────┐
│ S3 │ │ RDS │ │ SNS │
│ (Raw) │ │(Postgres) │ Topic │
└────┬────┘ └─────────┘ └────┬─────┘
│ │
│ ▼
│ ┌─────────┐
│ │ SQS │
│ │ Queue │
│ └────┬────┘
│ │
│ ▼
│ ┌──────────┐
│ │ Lambda │
│ │Processor │
│ └────┬─────┘
│ │
│ ┌────────────────────────────────┤
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────┐
│ S3 (Processed) │ │RDS Proxy │
└──────────────────┘ └──────────┘
This system implements an asynchronous file processing pipeline where:
- Users upload files via REST API (hosted on EC2)
- Files are stored in S3 and metadata saved to PostgreSQL
- SNS broadcasts upload events to subscribed services
- SQS queues processing tasks with fault tolerance (DLQ)
- Lambda function processes files asynchronously
- Processed files are stored separately with updated status
- Users can query processing status and download results
| Component | Technology | Version |
|---|---|---|
| Language | Java | 21 |
| Framework | Spring Boot | 3.2.1 |
| IaC | Terraform | ≥ 1.6.0 |
| Build Tool | Maven | 3.8+ |
| Database | PostgreSQL | 15.4 |
| AWS SDK | AWS SDK for Java | 2.21.0 |
| Runtime | Amazon Corretto | 21 |
Hosts the Spring Boot REST API with Auto Scaling Group and Application Load Balancer for high availability.
- Raw Bucket: Stores uploaded files with versioning and lifecycle policies (Standard → IA → Glacier)
- Processed Bucket: Stores transformed files with encryption at rest
Serverless function with SnapStart enabled for Java 21, reducing cold start latency by 90%. Processes files asynchronously with automatic scaling.
PostgreSQL database with Multi-AZ deployment and RDS Proxy for connection pooling, essential for Lambda's connection management.
Publishes file upload events in a pub/sub pattern, enabling fan-out to multiple subscribers.
Queues processing tasks with Dead Letter Queue (DLQ) for failed messages, decoupling upload from processing.
aws-data-pipeline/
├── terraform/ # Infrastructure as Code
│ ├── backend.tf # S3 + DynamoDB state management
│ ├── provider.tf # AWS provider configuration
│ ├── variables.tf # Input variables
│ ├── outputs.tf # Resource outputs
│ ├── main.tf # Root module orchestration
│ └── modules/
│ ├── networking/ # VPC, subnets, NAT, security groups
│ ├── storage/ # S3 buckets with policies
│ ├── compute/ # EC2, ASG, ALB
│ ├── database/ # RDS with Proxy, Multi-AZ
│ ├── messaging/ # SNS topics, SQS queues
│ └── serverless/ # Lambda functions
├── java-application/
│ ├── pom.xml # Parent POM with dependency management
│ ├── shared-config/ # Centralized AWS client configuration
│ │ └── src/main/java/com/datapipeline/config/
│ │ ├── AwsClientConfig.java # Singleton AWS SDK clients
│ │ ├── EnvironmentConfig.java # Environment variable loader
│ │ └── Constants.java # Application constants
│ ├── ec2-api-service/ # Spring Boot REST API
│ │ └── src/main/java/com/datapipeline/api/
│ │ ├── Application.java
│ │ ├── controller/ # REST endpoints
│ │ ├── service/ # Business logic
│ │ ├── repository/ # Data access layer
│ │ └── model/ # JPA entities
│ └── lambda-processor/ # Lambda function
│ └── src/main/java/com/datapipeline/lambda/
│ ├── handler/FileProcessorHandler.java
│ ├── service/ # S3, RDS, transformation
│ └── model/ # Event models
├── .env.example # Environment variables template
├── .gitignore
└── README.md
- AWS Account with appropriate IAM permissions
- Terraform ≥ 1.6.0
- Java JDK 21 or higher
- Maven 3.8 or higher
- AWS CLI configured with credentials
Copy the environment template:
cp .env.example .envEdit .env with your AWS configuration:
AWS_REGION=us-east-1
AWS_ACCOUNT_ID=123456789012
S3_BUCKET_RAW=data-pipeline-production-raw
S3_BUCKET_PROCESSED=data-pipeline-production-processed
SNS_TOPIC_ARN=arn:aws:sns:us-east-1:123456789012:data-pipeline-production-file-events
SQS_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/123456789012/data-pipeline-production-processing
RDS_PROXY_ENDPOINT=data-pipeline-production-rds-proxy.proxy-xxxxx.us-east-1.rds.amazonaws.com
RDS_ENDPOINT=data-pipeline-production-db.xxxxx.us-east-1.rds.amazonaws.com
DB_NAME=pipeline
DB_USERNAME=admin
DB_PASSWORD=<your-secure-password>
DB_PORT=5432
JWT_SECRET=<your-generated-secret>Important: Never use the placeholder JWT_SECRET from .env.example in production.
Windows PowerShell:
$bytes = New-Object byte[] 64
[Security.Cryptography.RNGCryptoServiceProvider]::Create().GetBytes($bytes)
[Convert]::ToBase64String($bytes)Linux/Mac:
openssl rand -base64 64Copy the generated key and replace JWT_SECRET in your .env file.
Security Notes:
- Use cryptographically secure random generators (minimum 512-bit)
- Never commit
.envto version control - Rotate secrets regularly
- Use different secrets for each environment (dev, staging, production)
cd terraform
# Initialize Terraform (downloads providers)
terraform init
# Preview infrastructure changes
terraform plan
# Deploy all resources
terraform apply
# Note outputs for environment variables
terraform outputImportant: Update your .env file with Terraform outputs (bucket names, ARNs, endpoints).
cd java-application
# Build all modules (shared-config, ec2-api-service, lambda-processor)
mvn clean package# Upload Lambda JAR to S3
aws s3 cp lambda-processor/target/lambda-processor-1.0.0.jar \
s3://your-lambda-deployment-bucket/
# Update Lambda function code
aws lambda update-function-code \
--function-name data-pipeline-production-file-processor \
--s3-bucket your-lambda-deployment-bucket \
--s3-key lambda-processor-1.0.0.jar# Copy JAR to EC2 instance
scp -i your-key.pem ec2-api-service/target/ec2-api-service-1.0.0.jar \
ec2-user@<ec2-public-ip>:/home/ec2-user/
# SSH into EC2
ssh -i your-key.pem ec2-user@<ec2-public-ip>
# Run application
java -jar ec2-api-service-1.0.0.jarhttp://<alb-dns-name>/api/files
POST /api/files/upload
# Example with curl
curl -X POST http://alb-dns-name/api/files/upload \
-F "file=@document.txt" \
-H "Content-Type: multipart/form-data"
# Response
{
"id": 1,
"fileKey": "raw/uuid-123.txt",
"fileName": "document.txt",
"status": "uploaded",
"message": "File uploaded successfully"
}GET /api/files/{id}
# Example
curl http://alb-dns-name/api/files/1
# Response
{
"id": 1,
"fileKey": "raw/uuid-123.txt",
"fileName": "document.txt",
"fileSize": 1024,
"contentType": "text/plain",
"status": "COMPLETED",
"uploadedAt": "2024-01-06T10:30:00",
"processedAt": "2024-01-06T10:30:15",
"processedFileKey": "processed/uuid-123.txt"
}GET /api/files
curl http://alb-dns-name/api/filesGET /api/files/download/{id}
curl -O http://alb-dns-name/api/files/download/1AwsClientConfig.java implements the Singleton pattern with double-checked locking for thread-safe AWS SDK client creation. Clients are configured with retry policies and connection timeouts.
S3Service.java uses S3TransferManager for efficient multipart uploads, automatically handling large files (>5MB) by splitting them into chunks.
RdsProxyService.java generates temporary authentication tokens instead of hardcoded passwords, implementing IAM Database Authentication for enhanced security.
SNS publishes events to SQS, which triggers Lambda. This decoupling ensures the upload API responds immediately while processing happens asynchronously.
| Module | Resources Provisioned |
|---|---|
networking |
VPC, 2 public subnets, 2 private subnets, Internet Gateway, NAT Gateway, Route Tables |
storage |
S3 buckets with versioning, encryption (AES256), lifecycle policies, public access blocking |
compute |
Launch Template, Auto Scaling Group (1-3 instances), Application Load Balancer, Target Group |
database |
RDS PostgreSQL Multi-AZ, RDS Proxy, Secrets Manager for credentials, DB Subnet Group |
messaging |
SNS topic, SQS queue, Dead Letter Queue, SQS → Lambda event source mapping |
serverless |
Lambda function with SnapStart, IAM role, VPC configuration, CloudWatch Logs |
| Annotation | Purpose |
|---|---|
@SpringBootApplication |
Enables auto-configuration and component scanning |
@RestController |
Marks class as REST API controller returning JSON |
@Service |
Declares business logic component for dependency injection |
@Repository |
Marks data access layer interface (Spring Data JPA) |
@Entity |
Designates JPA entity mapped to database table |
@Transactional |
Manages database transactions automatically |
@Value |
Injects property values from application.yml |
@RequestMapping |
Maps HTTP requests to controller methods |
@PostMapping / @GetMapping |
Shorthand for POST/GET method mappings |
Java applications suffer from cold start latency. SnapStart creates a snapshot after initialization, reducing startup time from ~10s to ~1s.
Lambda functions scale to thousands of concurrent executions. Without RDS Proxy, each would create a database connection, exhausting the connection pool. Proxy pools and reuses connections.
RDS Multi-AZ automatically replicates data to a standby instance in a different availability zone. If the primary fails, AWS promotes the standby (failover) with ~60 seconds downtime.
SNS enables fan-out: one upload event can trigger multiple processing pipelines (Lambda, analytics, archival) by adding more SQS subscribers without changing the upload API.
Approximate monthly costs (assuming moderate usage):
| Service | Configuration | Est. Cost |
|---|---|---|
| EC2 | 1x t3.medium (730 hrs) | $30 |
| RDS | db.t3.micro Multi-AZ | $30 |
| S3 | 100 GB storage + requests | $3 |
| Lambda | 1M requests, 1024MB, 5s avg | $20 |
| Data Transfer | 50 GB out | $5 |
| Total | ~$88/month |
Actual costs vary by usage. Use AWS Cost Calculator for precise estimates.
To avoid ongoing charges, destroy all resources:
cd terraform
terraform destroyConfirm with yes when prompted. This removes all provisioned infrastructure.
- DRY (Don't Repeat Yourself): Shared configuration module eliminates duplication
- SOLID Principles: Single responsibility for each service class
- Dependency Injection: Spring Boot's IoC container manages object lifecycle
- Singleton Pattern: AWS clients instantiated once and reused
- Event-Driven Architecture: Asynchronous processing with SNS/SQS
- Infrastructure as Code: Terraform ensures reproducible environments
- Security Best Practices: No hardcoded credentials, IAM roles, encryption at rest/in-transit
- Auto Scaling Group: EC2 instances scale 1-3 based on CPU utilization
- Lambda Concurrency: Automatically scales to handle bursts (up to 1000 concurrent by default)
- RDS Read Replicas: Can be added for read-heavy workloads
- S3 Unlimited Capacity: No storage limits
- SQS Throughput: Handles unlimited messages per second
- Multi-AZ RDS: Automatic failover to standby instance
- Multi-Subnet EC2: Instances distributed across availability zones
- Application Load Balancer: Health checks and automatic traffic routing
- S3 Durability: 11 nines (99.999999999%) durability across multiple facilities
- VPC Isolation: Private subnets for Lambda and RDS
- Security Groups: Least-privilege network access rules
- IAM Roles: No access keys in code
- Encryption at Rest: S3 (AES256), RDS (default encryption)
- Encryption in Transit: TLS for all AWS API calls
- Secrets Manager: RDS credentials never in code or logs
- API Gateway: Replace ALB with API Gateway for better Lambda integration
- DynamoDB: Add NoSQL for high-speed metadata queries
- CloudFront: CDN for processed file distribution
- EventBridge: Advanced event routing with custom rules
- Step Functions: Orchestrate complex multi-step workflows
- X-Ray: Distributed tracing for performance analysis
This is a portfolio project demonstrating professional AWS architecture patterns. Feedback and suggestions are welcome via issues.
MIT License - Free to use for learning and portfolio purposes.
Author: Senior Java Engineer
Date: January 2026
Purpose: Enterprise-grade AWS portfolio demonstration