Scrambler - CSV/Excel Anonymization Tool

A Python tool for anonymizing sensitive data in CSV and Excel files while preserving data structure and relationships. Perfect for creating test datasets, protecting privacy, and preparing data for sharing.

Features

🔒 Smart Data Detection: Automatically identifies email, phone, name, SSN, address, date, ID, and numeric data
🎯 Consistent Mapping: Same input always produces same output (with seed)
📊 Multiple Formats: Supports CSV and Excel files
📋 Clipboard Support: Process data directly from Excel/Google Sheets
⚙️ Custom Rules: Override automatic detection with JSON configuration
🔄 Reproducible: Use seeds for consistent anonymization results

Installation

Requirements

Python 3.6+

Install Dependencies

Will install the pandas, faker, openpyxl, colorama python libraries

pip install -r requirements.txt

Quick Start

Basic Usage

# Anonymize a CSV file
python main.py data.csv

# Anonymize an Excel file  
python main.py data.xlsx

# Specify output file
python main.py data.csv -o anonymized_output.csv

Important: Your input files should have column headers in row 1 (no empty rows above the headers). The program expects the first row to contain the column names.

Clipboard Processing

# Copy data from Excel/Sheets, then run:
python main.py --clipboard

Interactive Processing

# Interactive mode - manually choose anonymization for each column
python main.py data.csv -i
python main.py data.xlsx -i
python main.py --clipboard -i

Advanced Usage

Reproducible Results

# Use seed for consistent anonymization
python main.py data.csv -s 12345

Custom Anonymization Rules

Create a JSON file (rules.json) to specify how each column should be anonymized:

{
  "email_address": "email",
  "phone_number": "phone",
  "customer_name": "name", 
  "ssn": "ssn",
  "address": "address",
  "birth_date": "date",
  "user_id": "id",
  "salary": "float",
  "internal_notes": "skip"
}

Then use it:

python main.py data.csv -r rules.json

Supported Data Types

Type	Description	Example Output
`email`	Email addresses	`john.doe@example.com` → `sarah.wilson@fake.com`
`phone`	Phone numbers	`(555) 123-4567` → `(555) 987-6543`
`name`	Names (consistent mapping)	`John Smith` → `Sarah Wilson`
`ssn`	Social Security Numbers	`123-45-6789` → `987-65-4321`
`address`	Addresses	`123 Main St` → `456 Oak Ave`
`date`	Dates (±30 day offset)	`2023-01-15` → `2023-02-10`
`id`	IDs (randomized)	`user123` → `random7digit`
`integer`	Integers (digit randomization)	`1234` → `5678`
`decimal`	Floats (±10% noise)	`1000.50` → `1050.25`
`skip`	Don't anonymize	Original value preserved
`generic`	Hash the data	`any text` → `a1b2c3d4e5f6`

Command Line Options

python main.py [input_file] [options]

Options:
  -o, --output FILE     Output file (default: anonymized_<input>)
  -c, --clipboard       Process clipboard data
  -s, --seed INT        Random seed for reproducible results
  -r, --rules FILE      JSON file with column anonymization rules
  -i, --interactive     Manually process each column for anonymization
  -h, --help            Show help message

Usage Examples

Example 1: Basic File Anonymization

# Input: customer_data.csv
# Output: anonymized_customer_data.csv
python main.py customer_data.csv

Example 2: Excel with Custom Output

python main.py sales_data.xlsx -o anonymized_sales.xlsx

Example 3: Clipboard Processing

Copy data from Excel/Google Sheets
Run: python main.py --clipboard
Anonymized data is copied back to clipboard
File is also saved as anonymized_data_YYYYMMDD_HHMMSS.csv

Example 4: Interactive Usage

Interactive mode lets you manually choose how to anonymize each column:

# Using clipboard input:
python main.py --clipboard -i

# Using file input:
python main.py customer_data.csv -i
python main.py customer_data.xlsx -i

In interactive mode, the program will:

Show you each column and its detected data type
Ask if you want to change the anonymization method
Let you choose from available anonymization types
Apply your choices and process the file

Example 5: Custom Rules

Create custom_rules.json:

{
  "customer_email": "email",
  "phone": "phone",
  "full_name": "name",
  "salary": "numeric",
  "employee_id": "skip"
}

Run with custom rules:

python main.py employee_data.csv -r custom_rules.json -o safe_employee_data.csv

How It Works

Data Type Detection: The script analyzes column names and sample data to automatically detect data types
Anonymization: Applies appropriate anonymization based on detected or specified type
Consistency: Uses mapping cache to ensure same input always produces same output
Preservation: Maintains data structure and statistical properties where possible

Privacy Features

Deterministic Hashing: IDs and generic data use SHA-256 hashing
Realistic Fake Data: Uses Faker library for believable replacements
Consistent Mapping: Same real name always maps to same fake name
Statistical Preservation: Numeric data gets noise instead of complete replacement
Date Relationships: Preserves relative timing with random offsets

Troubleshooting

Common Issues

"Unsupported file type"

Ensure file has .csv, .xlsx, or .xls extension
Check file is not corrupted

"Error reading clipboard"

Make sure you've copied data from Excel/Sheets first
Try copying a smaller dataset

Missing dependencies

pip install pandas faker openpyxl

Performance Tips

For large files (>100k rows), consider processing in chunks
Use skip type for columns that don't need anonymization
Set a seed for reproducible results during testing

try it out!

python main.py test_data.csv -i

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
file_processor.py		file_processor.py
interactive.py		interactive.py
main.py		main.py
requirements.txt		requirements.txt
scrambler.py		scrambler.py
test_data.csv		test_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scrambler - CSV/Excel Anonymization Tool

Features

Installation

Requirements

Install Dependencies

Quick Start

Basic Usage

Clipboard Processing

Interactive Processing

Advanced Usage

Reproducible Results

Custom Anonymization Rules

Supported Data Types

Command Line Options

Usage Examples

Example 1: Basic File Anonymization

Example 2: Excel with Custom Output

Example 3: Clipboard Processing

Example 4: Interactive Usage

Example 5: Custom Rules

How It Works

Privacy Features

Troubleshooting

Common Issues

Performance Tips

try it out!

About

Uh oh!

Releases

Packages

Languages

chnnick/scrambler

Folders and files

Latest commit

History

Repository files navigation

Scrambler - CSV/Excel Anonymization Tool

Features

Installation

Requirements

Install Dependencies

Quick Start

Basic Usage

Clipboard Processing

Interactive Processing

Advanced Usage

Reproducible Results

Custom Anonymization Rules

Supported Data Types

Command Line Options

Usage Examples

Example 1: Basic File Anonymization

Example 2: Excel with Custom Output

Example 3: Clipboard Processing

Example 4: Interactive Usage

Example 5: Custom Rules

How It Works

Privacy Features

Troubleshooting

Common Issues

Performance Tips

try it out!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages