This project uses a Genetic Algorithm to break Vigenere cipher encryption. It evolves populations of candidate keys to find those that produce decrypted text closest to English language patterns, measured by letter frequency analysis.
Key Features:
- GA execution with multiple mutation and crossover operators
- Automated result saving and analysis
- Python-based data visualization
- Comprehensive experimental framework
- Java 8 or higher
- Python 3.7+ (for data analysis)
- matplotlib (for visualization)
- pandas (for data processing)
Install Python dependencies:
pip install matplotlib pandasAll commands run from the project root directory.
Command: javac src/*.java && java -cp src GA
Description: Main GA class for running individual experiments. Modify the main method to customize parameters (population size, mutation rate, crossover rate, etc.). Tests various parameter combinations and outputs fitness results to console.
Runtime: Varies based on parameters (typically 1-10 minutes per run).
Command: javac src/*.java && java -cp src experiment
Description: Runs comprehensive experiments across multiple parameter sets. Tests both datasets (Data1.txt and Data2.txt) with different combinations of:
- Mutation rates: 0, 0.1, 0.5
- Crossover rates: 0.9, 1.0
- Mutation types: ASCII Mutation
- Crossover types: Two-Point Crossover
Each parameter combination is run 5 times for statistical significance. Results are saved to CSV files in the Data/ directory.
Runtime: ~30-60 minutes depending on hardware.
Command: python pythonAnalysis/graph.py
Description: Generates fitness evolution graphs from CSV data. Modify the file path in the script to visualize different experimental results. Creates line plots showing fitness progression across generations for different parameter combinations.
genetic-cryptanalysis/
├── Assign2_Attachments/ # Encrypted datasets
│ ├── Data1.txt # 26-char key ciphertexts
│ └── Data2.txt # 40-char key ciphertexts
├── src/ # Source code
│ ├── GA.java # Main GA implementation
│ ├── experiment.java # Comprehensive experiment runner
│ ├── chromosome.java # Chromosome representation
│ ├── Evaluation.java # Fitness functions & decryption
│ ├── FitnessComparator.java # Comparator for sorting
│ └── csvWriter.java # Data logging utilities
├── Data/ # Experimental results (CSV files)
│ ├── averages*.csv # Average fitness per generation
│ └── bestAverages*.csv # Best average fitness values
├── graphs/ # Generated visualization graphs
├── pythonAnalysis/ # Python analysis scripts
│ ├── graph.py # Graph generation
│ └── test.py # Testing utilities
└── Final_Report_COSC_3P71_Assignment_1.pdf
Execute the comprehensive experiment suite:
javac src/*.java && java -cp src experimentWhat happens:
- Loads both datasets (Data1.txt and Data2.txt)
- Runs GA with multiple parameter combinations
- Tests ASCII mutation with two-point crossover
- Saves results to CSV files in
Data/directory - Each combination is run 5 times for statistical validity
For quick testing with custom parameters:
- Open
src/GA.java - Modify the
mainmethod parameters:popSize: Population size (default: 300)maxGenSpan: Maximum generations (default: 100)k: Tournament selection sample size (default: 2)elitePercentage: Elite preservation rate (default: 0.1)targetFitness: Early stopping threshold (default: 0.01)mutationRates: Array of mutation rates to testcrossoverRates: Array of crossover rates to test
- Compile and run:
javac src/*.java && java -cp src GA
Generate graphs from collected data:
- Open
pythonAnalysis/graph.py - Update the CSV file path to point to your desired data file
- Run:
python pythonAnalysis/graph.py
Lower values = better keys (closer to English letter frequencies). The fitness function measures the absolute difference between expected English letter frequencies and the actual frequencies in the decrypted text.
- ASCII Mutation: Randomly modifies a character in the key by ±2 positions in the alphabet
- Inverse Mutation: Inverts a random subsequence of the key
- Uniform Crossover: Each gene position is independently swapped between parents with 50% probability
- Two-Point Crossover: Swaps a contiguous segment between two randomly selected points
- Data1.txt: Shorter 26-character keys
- Data2.txt: Longer 40-character keys
The experiment class tests:
- Mutation Rates: 0, 0.1, 0.5
- Crossover Rates: 0.9, 1.0
- Runs per combination: 5 (for statistical significance)
Results are saved in the Data/ directory:
averages*.csv: Average fitness values tracked every 10 generationsbestAverages*.csv: Best average fitness across all runs for each parameter combination
File naming convention:
ASCIIMutationorInverseMutation(mutation type)TwoPointCrossoverorUniformCrossover(crossover type)1or2(dataset number)
Experiment results include:
- Best chromosomes (decrypted keys) printed to console
- Fitness evolution tracked every 10 generations
- CSV files with detailed fitness data
- Graphs showing comparative performance (in
graphs/directory)
- Selection: Tournament selection with configurable sample size (k)
- Crossover: Uniform or Two-Point crossover with configurable rate
- Mutation: ASCII or Inverse mutation with configurable rate
- Elitism: Top percentage of population preserved each generation
- Fitness: Letter frequency analysis comparing decrypted text to English language patterns
GA: Main genetic algorithm implementationchromosome: Represents a candidate key solutionEvaluation: Contains fitness function and encryption/decryption methodsexperiment: Automated experiment runner with statistical analysiscsvWriter: Utility for logging experimental data
Built by Jay Shah for Brock University (COSC 3P71)