Skip to content

Commit 11eb302

Browse files
committed
python-benchmark
1 parent 50ac00e commit 11eb302

File tree

12 files changed

+2048
-0
lines changed

12 files changed

+2048
-0
lines changed
Lines changed: 343 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,343 @@
1+
# DB-ESDK Performance Benchmark - Python
2+
3+
This directory contains the Python implementation of the AWS Database Encryption SDK (DB-ESDK) performance benchmark suite.
4+
5+
## Overview
6+
7+
The Python benchmark provides comprehensive performance testing for the DB-ESDK Python runtime, measuring:
8+
9+
- **Throughput**: Operations per second and bytes per second using ItemEncryptor operations
10+
- **Latency**: Encrypt, decrypt, and end-to-end timing for encrypted operations
11+
- **Memory Usage**: Peak memory consumption and efficiency
12+
- **Concurrency**: Multi-threaded performance scaling
13+
- **Statistical Analysis**: P50, P95, P99 latency percentiles
14+
15+
## Prerequisites
16+
17+
- Python 3.11 or higher
18+
- Poetry package manager
19+
20+
## Setup
21+
22+
### Install Poetry
23+
24+
```bash
25+
# Install Poetry (if not already installed)
26+
curl -sSL https://install.python-poetry.org | python3 -
27+
28+
# Or using pip
29+
pip install poetry
30+
```
31+
32+
### Install Dependencies
33+
34+
```bash
35+
# Install all dependencies including dev dependencies
36+
poetry install
37+
38+
# Install only production dependencies
39+
poetry install --no-dev
40+
```
41+
42+
## Building
43+
44+
```bash
45+
# Build distribution packages
46+
poetry build
47+
48+
# Install in development mode (automatic with poetry install)
49+
poetry install
50+
51+
# Run tests using tox
52+
tox -e py311
53+
54+
# Run all tox environments
55+
tox
56+
```
57+
58+
## Running Benchmarks
59+
60+
### Quick Test
61+
62+
```bash
63+
# Using Poetry
64+
poetry run esdk-benchmark --quick
65+
66+
# Using tox (recommended for isolated environment)
67+
tox -e benchmark
68+
69+
# Using module execution
70+
poetry run python -m esdk_benchmark --quick
71+
72+
# Direct script execution
73+
poetry run python src/esdk_benchmark/program.py --quick
74+
```
75+
76+
### Full Benchmark Suite
77+
78+
```bash
79+
# Using Poetry
80+
poetry run esdk-benchmark
81+
82+
# Using tox (recommended for isolated environment)
83+
tox -e benchmark-full
84+
85+
# Using module execution
86+
poetry run python -m esdk_benchmark
87+
88+
# Direct script execution
89+
poetry run python src/esdk_benchmark/program.py
90+
```
91+
92+
### Custom Configuration
93+
94+
```bash
95+
# Specify custom config and output paths
96+
poetry run esdk-benchmark \
97+
--config /path/to/config.yaml \
98+
--output /path/to/results.json
99+
```
100+
101+
## Command Line Options
102+
103+
- `--config, -c`: Path to test configuration file (default: `../../../config/test-scenarios.yaml`)
104+
- `--output, -o`: Path to output results file (default: `../../../results/raw-data/python_results.json`)
105+
- `--quick, -q`: Run quick test with reduced iterations
106+
- `--help, -h`: Show help message
107+
108+
## Configuration
109+
110+
The benchmark uses a YAML configuration file to define test parameters:
111+
112+
```yaml
113+
data_sizes:
114+
small: [1024, 5120, 10240]
115+
medium: [102400, 512000, 1048576]
116+
large: [10485760, 52428800, 104857600]
117+
118+
iterations:
119+
warmup: 5
120+
measurement: 10
121+
122+
concurrency_levels: [1, 2, 4, 8]
123+
```
124+
125+
## Output Format
126+
127+
Results are saved in JSON format with the following structure:
128+
129+
```json
130+
{
131+
"metadata": {
132+
"language": "python",
133+
"timestamp": "2025-09-05T15:30:00Z",
134+
"python_version": "3.11.5",
135+
"platform": "Darwin-23.1.0-arm64-arm-64bit",
136+
"cpu_count": 8,
137+
"total_memory_gb": 16.0,
138+
"total_tests": 45
139+
},
140+
"results": [
141+
{
142+
"test_name": "throughput",
143+
"language": "python",
144+
"data_size": 1024,
145+
"concurrency": 1,
146+
"put_latency_ms": 0.85,
147+
"get_latency_ms": 0.72,
148+
"end_to_end_latency_ms": 1.57,
149+
"ops_per_second": 636.94,
150+
"bytes_per_second": 652224.0,
151+
"peak_memory_mb": 0.0,
152+
"memory_efficiency_ratio": 0.0,
153+
"p50_latency": 1.55,
154+
"p95_latency": 1.89,
155+
"p99_latency": 2.12,
156+
"timestamp": "2025-09-05T15:30:15Z",
157+
"python_version": "3.11.5",
158+
"cpu_count": 8,
159+
"total_memory_gb": 16.0
160+
}
161+
]
162+
}
163+
```
164+
165+
## Key Features
166+
167+
### DB-ESDK Integration
168+
- Uses AWS Database Encryption SDK for DynamoDB with transparent encryption
169+
- Configures attribute actions (ENCRYPT_AND_SIGN, SIGN_ONLY, DO_NOTHING)
170+
- Tests ItemEncryptor operations with client-side encryption
171+
- Uses Raw AES keyring for consistent performance testing
172+
173+
### ItemEncryptor Operations
174+
- Performs encrypt_python_item operations using Python dict format
175+
- Measures decrypt_python_item operations for consistency
176+
- Tests realistic workloads with encryption overhead
177+
- Supports multiple data formats (Python dict, DynamoDB JSON, DBESDK shapes)
178+
179+
### Performance Metrics
180+
- **Throughput Tests**: Measures ops/sec and bytes/sec for ItemEncryptor operations
181+
- **Memory Tests**: Tracks peak memory usage during encrypted operations using psutil
182+
- **Concurrency Tests**: Evaluates multi-threaded performance scaling with ThreadPoolExecutor
183+
- **Latency Analysis**: P50, P95, P99 percentiles for operation timing
184+
185+
## Project Structure
186+
187+
```
188+
python/
189+
├── README.md # This file
190+
├── pyproject.toml # Poetry configuration and dependencies
191+
├── tox.ini # Tox configuration for testing
192+
├── src/
193+
│ └── esdk_benchmark/
194+
│ ├── __init__.py # Package initialization
195+
│ ├── __main__.py # Module execution entry point
196+
│ ├── program.py # Main program and CLI
197+
│ ├── benchmark.py # Core benchmark implementation
198+
│ ├── models.py # Data models and configuration
199+
│ └── tests.py # Individual test implementations
200+
├── tests/ # Test suite
201+
│ ├── __init__.py
202+
│ └── test_benchmark.py
203+
└── run_benchmark.py # Convenience runner script
204+
```
205+
206+
## Dependencies
207+
208+
Key dependencies used in this benchmark:
209+
210+
- **aws-dbesdk-dynamodb**: Core encryption functionality for DynamoDB (with legacy-ddbec extras)
211+
- **boto3**: AWS SDK for Python (DynamoDB client operations)
212+
- **PyYAML**: YAML configuration file processing
213+
- **pydantic**: Data validation and settings management
214+
- **tqdm**: Progress bars for visual feedback
215+
- **psutil**: System and process utilities for memory monitoring
216+
- **numpy**: Numerical operations and statistics
217+
218+
### Development Dependencies
219+
- **pytest**: Testing framework
220+
- **pytest-cov**: Coverage reporting
221+
- **black**: Code formatting
222+
- **flake8**: Linting
223+
- **mypy**: Type checking
224+
- **tox**: Testing automation
225+
- **memory-profiler**: Memory profiling utilities
226+
227+
## Development
228+
229+
### Code Style
230+
231+
The project follows Python best practices with automated tooling:
232+
233+
```bash
234+
# Format code
235+
tox -e format
236+
237+
# Check formatting
238+
tox -e format-check
239+
240+
# Lint code
241+
tox -e lint
242+
243+
# Type checking
244+
tox -e type
245+
246+
# Run all quality checks
247+
tox -e lint,type,format-check
248+
```
249+
250+
### Running Tests
251+
252+
```bash
253+
# Run all tests
254+
tox -e py311
255+
256+
# Run tests with Poetry
257+
poetry run pytest
258+
259+
# Run with coverage
260+
poetry run pytest --cov=esdk_benchmark
261+
262+
# Run specific test file
263+
poetry run pytest tests/test_benchmark.py
264+
265+
# Run all tox environments
266+
tox
267+
```
268+
269+
### Memory Profiling
270+
271+
For detailed memory analysis:
272+
273+
```bash
274+
# Memory profiler is included in dev dependencies
275+
poetry run python -m memory_profiler src/esdk_benchmark/benchmark.py
276+
277+
# Or using tox
278+
tox -e benchmark # Includes memory profiler
279+
```
280+
281+
### Tox Environments
282+
283+
Available tox environments:
284+
285+
- `py311`: Run tests under Python 3.11
286+
- `lint`: Run linting checks
287+
- `type`: Run type checking
288+
- `format`: Apply code formatting
289+
- `format-check`: Check code formatting
290+
- `benchmark`: Run quick benchmark
291+
- `benchmark-full`: Run full benchmark suite
292+
- `verify`: Verify setup and dependencies
293+
- `clean`: Clean up build artifacts
294+
295+
## Troubleshooting
296+
297+
### Common Issues
298+
299+
1. **Import Errors**: Ensure Poetry environment is properly set up
300+
```bash
301+
poetry install
302+
poetry run python -c "import esdk_benchmark; print('✓ OK')"
303+
```
304+
305+
2. **Configuration Not Found**: Check that the config file path is correct relative to execution directory
306+
```bash
307+
ls ../../config/test-scenarios.yaml
308+
```
309+
310+
3. **Memory Issues**: For large data sizes, ensure sufficient system memory is available
311+
312+
4. **Permission Errors**: Ensure write permissions for output directory
313+
```bash
314+
mkdir -p ../../results/raw-data/
315+
```
316+
317+
5. **Poetry Issues**: If Poetry environment is corrupted
318+
```bash
319+
poetry env remove python
320+
poetry install
321+
```
322+
323+
### Debug Mode
324+
325+
Enable verbose logging for troubleshooting:
326+
327+
```python
328+
import logging
329+
logging.basicConfig(level=logging.DEBUG)
330+
```
331+
332+
## Performance Comparison
333+
334+
This Python implementation mirrors the Java benchmark structure, enabling:
335+
336+
- Cross-language performance comparisons
337+
- Consistent test scenarios and data sizes
338+
- Standardized output format for analysis
339+
- Similar statistical analysis and reporting
340+
341+
## License
342+
343+
This benchmark suite is part of the AWS Database Encryption SDK project and follows the same Apache-2.0 licensing terms.

0 commit comments

Comments
 (0)