The main class for analyzing test coverage and finding duplicates.
from testiq.analyzer import CoverageDuplicateFinder
finder = CoverageDuplicateFinder(
enable_parallel=True, # Enable parallel processing
max_workers=4, # Number of worker threads
enable_caching=True # Enable result caching
)- enable_parallel (
bool, default:False) - Enable parallel processing for large test suites - max_workers (
int, default:4) - Number of worker threads when parallel processing is enabled - enable_caching (
bool, default:False) - Cache analysis results for improved performance
Add coverage data for a single test.
Parameters:
test_name- Unique identifier for the testcoverage- Dictionary mapping file paths to lists of covered line numbers
Example:
finder.add_test_coverage("test_login", {
"auth.py": [10, 11, 12, 15, 20],
"user.py": [5, 6, 7]
})Find tests with identical coverage.
Returns: List of groups, where each group contains test names with identical coverage
Example:
exact = finder.find_exact_duplicates()
# [["test_login_1", "test_login_2"], ["test_auth_a", "test_auth_b", "test_auth_c"]]Find tests that are subsets of other tests.
print(coverage.test_name) # "test_login" print(len(coverage.covered_lines)) # 3
---
## Class: CoverageDuplicateFinder
Main class for finding duplicate tests based on coverage analysis.
```python
from testiq import CoverageDuplicateFinder
finder = CoverageDuplicateFinder()Creates a new duplicate finder instance.
Add coverage data for a single test.
Signature:
def add_test_coverage(
self,
test_name: str,
coverage: Dict[str, List[int]]
) -> NoneParameters:
test_name(str): Name/identifier of the testcoverage(Dict[str, List[int]]): Dictionary mapping filenames to lists of covered line numbers
Returns: None
Example:
finder = CoverageDuplicateFinder()
finder.add_test_coverage("test_user_login", {
"auth.py": [10, 11, 12, 15, 20],
"user.py": [5, 6, 7]
})
finder.add_test_coverage("test_admin_login", {
"auth.py": [10, 11, 12, 15, 20, 30],
"user.py": [5, 6, 7],
"admin.py": [100]
})Find tests with identical coverage.
Signature:
def find_exact_duplicates(self) -> List[List[str]]Returns: List of test groups where each group has identical coverage.
Example:
finder = CoverageDuplicateFinder()
finder.add_test_coverage("test_v1", {"file.py": [1, 2, 3]})
finder.add_test_coverage("test_v2", {"file.py": [1, 2, 3]})
finder.add_test_coverage("test_v3", {"file.py": [1, 2, 3]})
finder.add_test_coverage("test_different", {"file.py": [10, 20]})
duplicates = finder.find_exact_duplicates()
# [['test_v1', 'test_v2', 'test_v3']]Use Case:
Tests with identical coverage are strong candidates for removal. Keep one test and remove the others to reduce redundancy.
Find tests where one test's coverage is completely contained in another.
Signature:
def find_subset_duplicates(self) -> List[Tuple[str, str, float]]Returns: List of tuples containing:
str: Subset test namestr: Superset test namefloat: Coverage ratio (0.0 to 1.0)
Example:
finder = CoverageDuplicateFinder()
finder.add_test_coverage("test_minimal", {"file.py": [1, 2, 3]})
finder.add_test_coverage("test_complete", {"file.py": [1, 2, 3, 4, 5, 6, 7, 8, 9]})
subsets = finder.find_subset_duplicates()
# [('test_minimal', 'test_complete', 0.33)]Use Case:
If a test's coverage is completely contained in another test, it may be redundant unless it tests specific edge cases or uses different inputs.
Find tests with similar (but not identical) coverage using Jaccard similarity.
Signature:
def find_similar_coverage(
self,
threshold: float = 0.8
) -> List[Tuple[str, str, float]]Parameters:
threshold(float): Minimum similarity ratio (0.0 to 1.0). Default: 0.8
Returns: List of tuples containing:
str: First test namestr: Second test namefloat: Similarity score (0.0 to 1.0)
Results are sorted by similarity score (descending).
Example:
finder = CoverageDuplicateFinder()
finder.add_test_coverage("test_a", {"file.py": [1, 2, 3, 4, 5]})
finder.add_test_coverage("test_b", {"file.py": [1, 2, 3, 4, 10]})
similar = finder.find_similar_coverage(threshold=0.7)
# [('test_a', 'test_b', 0.67)] # 4 common / 6 total = 0.67Similarity Calculation:
Uses Jaccard similarity coefficient:
similarity = |A ∩ B| / |A ∪ B|
Where:
A= lines covered by test AB= lines covered by test B∩= intersection (common lines)∪= union (all unique lines)
Threshold Guidelines:
0.9-1.0: Very similar (likely duplicates)0.7-0.9: Similar (review for consolidation)0.5-0.7: Some overlap (may share common setup)< 0.5: Different tests
Generate a comprehensive markdown report of all duplicate findings.
Signature:
def generate_report(self) -> strReturns: Markdown-formatted string containing:
- Exact duplicates
- Subset duplicates (top 10)
- Similar tests (top 10, threshold 0.7)
- Summary statistics
Example:
finder = CoverageDuplicateFinder()
# Add tests...
finder.add_test_coverage("test1", {"file.py": [1, 2]})
finder.add_test_coverage("test2", {"file.py": [1, 2]})
report = finder.generate_report()
print(report)Output Example:
# Test Duplication Report
## Exact Duplicates (Identical Coverage)
Found 1 groups of tests with identical coverage:
### Group 1 (2 tests):
- test1
- test2
**Action**: Keep one test, remove 1 duplicates
## Summary
- Total tests analyzed: 2
- Exact duplicates: 1 tests can be removed
- Subset duplicates: 0 tests may be redundant
- Similar tests: 0 pairs need reviewfrom testiq import CoverageDuplicateFinder
import json
# Load coverage data
with open('coverage_data.json') as f:
coverage_data = json.load(f)
# Create finder
finder = CoverageDuplicateFinder()
# Add all tests
for test_name, test_coverage in coverage_data.items():
finder.add_test_coverage(test_name, test_coverage)
# Find duplicates
exact_dups = finder.find_exact_duplicates()
print(f"Found {len(exact_dups)} groups of exact duplicates")
for group in exact_dups:
print(f" Group: {', '.join(group)}")
# Find similar tests
similar = finder.find_similar_coverage(threshold=0.8)
print(f"\nFound {len(similar)} pairs of similar tests")
for test1, test2, similarity in similar[:5]:
print(f" {test1} ↔ {test2}: {similarity:.1%}")
# Generate full report
report = finder.generate_report()
with open('duplicate_report.md', 'w') as f:
f.write(report)
print("\nReport saved to duplicate_report.md")TestIQ is fully type-annotated. Use with mypy for type checking:
mypy your_script.pyExample with type hints:
from typing import Dict, List
from testiq import CoverageDuplicateFinder
def analyze_project_tests(coverage_file: str) -> None:
finder: CoverageDuplicateFinder = CoverageDuplicateFinder()
with open(coverage_file) as f:
data: Dict[str, Dict[str, List[int]]] = json.load(f)
for test_name, coverage in data.items():
finder.add_test_coverage(test_name, coverage)
duplicates: List[List[str]] = finder.find_exact_duplicates()
print(f"Found {len(duplicates)} duplicate groups")add_test_coverage: O(L) where L is the number of linesfind_exact_duplicates: O(N × L) where N is number of testsfind_subset_duplicates: O(N² × L)find_similar_coverage: O(N² × L)
- O(N × L) for storing all test coverage data
-
For large test suites (>1000 tests):
- Process in batches
- Use higher similarity thresholds
- Filter by file/module first
-
Memory efficiency:
- Stream large coverage files
- Clear finder after processing batches
Example for large suites:
def analyze_large_suite(coverage_file: str, batch_size: int = 100):
all_tests = load_all_tests(coverage_file)
for i in range(0, len(all_tests), batch_size):
batch = all_tests[i:i + batch_size]
finder = CoverageDuplicateFinder()
for test_name, coverage in batch.items():
finder.add_test_coverage(test_name, coverage)
# Process batch
yield finder.find_exact_duplicates()- CLI Reference - Command-line interface
- Integration Guide - Framework integration
- FAQ - Common questions