Wayback Machine Domain Auditor

Note: This was vibe coded with Gemini

This script queries the Wayback Machine's CDX API to find archived snapshots of a specific domain within a user-defined date range. It filters for successful captures (HTTP 200) and exports the results to a CSV file for auditing and analysis.

Features

Custom Date Filtering: Specify a start date and an optional end date for the audit.
Status Filtering: Automatically filters for successful 200 OK status codes.
Deduplication: Uses the Wayback Machine's collapse parameter to ensure unique daily captures.
CSV Export: Generates a structured report including:
Formatted capture date (YYYY-MM-DD HH:MM)
HTTP Status Code
Original URL
Direct link to the Wayback Machine archive

Prerequisites

Python 3.x
requests library

To install the required library, run:

pip install requests

Usage

Run the script:

python script_name.py

Enter the Domain: Input the domain you wish to audit (e.g., example.com).
Enter the Start Date: Provide the date in YYYYMMDD format (e.g., 20240101).
Enter the End Date: Provide the end date in YYYYMMDD format, or press Enter to default to the current date.

How it Works

The script interacts with the Internet Archive's CDX API. Below is a high-level overview of the data flow:

Request: The script sends a GET request to web.archive.org/cdx/search/cdx.
Parameters: It applies filters for the url, statuscode, and timestamp.
Processing: The script converts the raw API timestamp (e.g., 20240101123000) into a human-readable format.
Output: Data is written row-by-row into a CSV file named audit_[domain]_[start_date].csv.

Data Structure

The generated CSV follows this format:

Date	Status	Original URL	View Archive Link
2024-01-01 12:00	200	http://example.com/	https://web.archive.org/web/20240101120000/http://example.com/

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
audit.py		audit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wayback Machine Domain Auditor

Features

Prerequisites

Usage

How it Works

Data Structure

About

Uh oh!

Releases

Packages

Languages

denniskeefe/wayback-audit

Folders and files

Latest commit

History

Repository files navigation

Wayback Machine Domain Auditor

Features

Prerequisites

Usage

How it Works

Data Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages