Skip to content

NobodyKnows09/log-archiver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Log Archiver

Automated log cleanup and archival tool for EC2 fleets.
Finds old .log files, uploads them to S3, deletes the local copies, and emails a summary via AWS SES.


Overview

Application servers accumulate log files in /var/log over time, eventually consuming all available disk space. Log Archiver solves this by:

  1. Scanning /var/log for .log files older than 30 days
  2. Uploading each file to an S3 bucket (organised by instance ID)
  3. Deleting the local file only after a confirmed upload
  4. Sending an email report via AWS SES with the run summary

The script is designed to be executed across 50+ EC2 instances simultaneously using AWS EventBridge + Systems Manager (SSM) — no SSH or cron required.


Architecture

┌──────────────────────┐
│  EventBridge Rule    │  ← runs every midnight: cron(0 0 * * ? *)
│  (scheduled)         │
└────────┬─────────────┘
         │ triggers
         ▼
┌──────────────────────┐
│  SSM Run Command     │  ← sends the script command to the fleet
└────────┬─────────────┘
         │ executes on
         ▼
┌──────────────────────┐
│  EC2 Instance Fleet  │  ← 50+ servers, each runs log_archiver.py
│  (tagged targets)    │
└────────┬─────────────┘
         │ each instance:
         ▼
   ┌─────────────┐   ┌──────────┐   ┌──────────┐
   │ Scan /var/log│──▶│ Upload S3│──▶│ Send SES │
   │ for old logs │   │ + delete │   │  report  │
   └─────────────┘   └──────────┘   └──────────┘

See architecture.md for a detailed explanation of every design decision.


Prerequisites

Requirement Detail
Python 3.9 or later
AWS Account With S3, SES, SSM, and EventBridge access
S3 Bucket Created ahead of time (e.g. your-log-backup-bucket)
SES Sender and recipient addresses verified in your SES region
IAM Role Attached to every EC2 instance — see IAM Permissions
SSM Agent Installed and running on each EC2 instance (Amazon Linux 2 / Ubuntu AMIs include it by default)

Installation

# Clone the repo
git clone https://github.com/your-org/log-archiver.git
cd log-archiver

# Install Python dependencies
pip install -r requirements.txt

Configuration

Edit config.py to match your environment:

S3_BUCKET   = "your-log-backup-bucket"   # Target S3 bucket
S3_REGION   = "us-east-1"
SES_SENDER  = "devops@example.com"       # Verified SES sender
SES_RECIPIENT = "team@example.com"       # Report recipient
LOG_DIR     = "/var/log"
MAX_AGE_DAYS = 30

Note: Never hard-code AWS credentials. The script uses the IAM role attached to the EC2 instance.


How to Run

Manual (single instance)

sudo python3 log_archiver.py

sudo is needed because some files in /var/log are owned by root.

Automated (fleet-wide)

Use EventBridge + SSM Run Command to execute the script across all instances every midnight without SSH. See deployment.md for full setup instructions.


Scheduling with EventBridge + SSM

Why EventBridge + SSM Instead of Cron?

Concern Cron EventBridge + SSM
Scale to 50+ instances Must configure each one individually One rule targets all tagged instances
New instance joins fleet Must manually add cron entry Auto-included if it has the tag
Instance replaced by ASG Cron entry is lost New instance inherits tag → auto-included
Audit trail None SSM logs every execution to CloudTrail
Visibility into results Must SSH in to check SSM console shows pass/fail per instance
Change the schedule Edit 50+ crontabs Change one EventBridge rule
Error handling Silent failures SSM captures stdout/stderr, can alert on failure

How the Components Fit Together

Component Purpose
EventBridge Rule The clock — fires at midnight UTC every day using cron(0 0 * * ? *)
SSM Run Command The executor — sends a shell command to EC2 instances matching a tag filter
EC2 Tag The targeting mechanism — instances tagged log-archiver: enabled are included

Quick Setup (CLI)

1. Tag your instances:

aws ec2 create-tags \
  --resources i-0a1b2c3d4e5f67890 i-0b2c3d4e5f678901a \
  --tags Key=log-archiver,Value=enabled

2. Create the EventBridge rule (midnight UTC daily):

aws events put-rule \
  --name "log-archiver-midnight" \
  --schedule-expression "cron(0 0 * * ? *)" \
  --state ENABLED \
  --description "Trigger log archiver on all tagged EC2 instances at midnight UTC"

3. Attach SSM Run Command as the target:

aws events put-targets \
  --rule "log-archiver-midnight" \
  --targets '[{
    "Id": "LogArchiverTarget",
    "Arn": "arn:aws:ssm:us-east-1::document/AWS-RunShellScript",
    "RoleArn": "arn:aws:iam::<ACCOUNT_ID>:role/EventBridgeSSMRole",
    "RunCommandParameters": {
      "RunCommandTargets": [{
        "Key": "tag:log-archiver",
        "Values": ["enabled"]
      }],
      "Parameters": {
        "commands": ["cd /opt/log-archiver && sudo python3 log_archiver.py"]
      }
    }
  }]'

Replace <ACCOUNT_ID> with your 12-digit AWS account ID.

One-Time Deployment of the Script to All Instances

Use SSM Run Command to push the script from an S3 deployment bucket to every tagged instance in a single command — no SSH needed:

aws ssm send-command \
  --document-name "AWS-RunShellScript" \
  --targets "Key=tag:log-archiver,Values=enabled" \
  --parameters 'commands=[
    "mkdir -p /opt/log-archiver",
    "aws s3 cp s3://your-deployment-bucket/log-archiver/ /opt/log-archiver/ --recursive",
    "pip3 install -r /opt/log-archiver/requirements.txt"
  ]'

What Happens at Midnight (Runtime Flow)

  1. 00:00 UTC — EventBridge rule fires.
  2. EventBridge calls SSM SendCommand, targeting all instances with tag log-archiver=enabled.
  3. SSM Agent on each of the 50+ instances receives the command in parallel.
  4. Each instance executes python3 /opt/log-archiver/log_archiver.py.
  5. The script runs its full flow: find old logs → upload to S3 → delete locally → email report.
  6. SSM captures the output (all logger.info messages) — visible in the SSM console or CloudWatch.
  7. If any instance fails, you can see exactly which one and why.

For the full step-by-step setup (IAM roles, console walkthrough, verification, troubleshooting), see deployment.md.


IAM Permissions

Attach a role to every EC2 instance with these least-privilege policies:

S3 (log uploads)

{
  "Effect": "Allow",
  "Action": [
    "s3:PutObject",
    "s3:ListBucket"
  ],
  "Resource": [
    "arn:aws:s3:::your-log-backup-bucket",
    "arn:aws:s3:::your-log-backup-bucket/*"
  ]
}

SES (email reports)

{
  "Effect": "Allow",
  "Action": "ses:SendEmail",
  "Resource": "*"
}

SSM (managed by Systems Manager)

Attach the AWS-managed policy:

arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

Repository Structure

log-archiver/
├── log_archiver.py    # Main script — scan, upload, delete, email
├── config.py          # All tuneable settings in one place
├── requirements.txt   # Python dependencies (boto3, requests)
├── README.md          # This file
├── architecture.md    # Design decisions explained for interviews
└── deployment.md      # Step-by-step EventBridge + SSM setup

Demo Steps

  1. Show the code — walk through log_archiver.py top-to-bottom.
  2. Run manually on a test EC2 instance:
    sudo python3 log_archiver.py
  3. Check S3 — verify files appear under s3://your-log-backup-bucket/<instance-id>/.
  4. Check email — show the SES summary report.
  5. Show EventBridge rule in the AWS Console and explain the SSM target.
  6. Explain IAM — open the EC2 instance role and walk through the permissions.

Example Output

2026-03-06 00:00:01 [INFO] ==================================================
2026-03-06 00:00:01 [INFO] Log Archiver started
2026-03-06 00:00:01 [INFO] ==================================================
2026-03-06 00:00:01 [INFO] Running on EC2 instance: i-0a1b2c3d4e5f67890
2026-03-06 00:00:01 [INFO] Scanning /var/log for files older than 30 days...
2026-03-06 00:00:01 [INFO] Found 12 log file(s) older than 30 days in /var/log
2026-03-06 00:00:01 [INFO] Uploading logs to S3...
2026-03-06 00:00:02 [INFO] Uploaded  -> s3://your-log-backup-bucket/i-0a1b2c3d4e5f67890/app.log
2026-03-06 00:00:02 [INFO] Deleted local file: /var/log/app.log
...
2026-03-06 00:00:05 [ERROR] Failed to upload auth.log: An error occurred (AccessDenied)
2026-03-06 00:00:06 [INFO] --------------------------------------------------
2026-03-06 00:00:06 [INFO] Summary:
2026-03-06 00:00:06 [INFO]   Total logs scanned  : 12
2026-03-06 00:00:06 [INFO]   Successfully uploaded: 10
2026-03-06 00:00:06 [INFO]   Deleted locally     : 10
2026-03-06 00:00:06 [INFO]   Failures            : 2
2026-03-06 00:00:06 [INFO] --------------------------------------------------
2026-03-06 00:00:06 [INFO] Sending email summary via SES...
2026-03-06 00:00:07 [INFO] Email report sent to team@example.com
2026-03-06 00:00:07 [INFO] Log Archiver finished

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages