Pre-compute spatial graph statistics during Data initialization#65
Merged
Pre-compute spatial graph statistics during Data initialization#65
Data initialization#65Conversation
Instead of calculating average degree and starting positions for each spatial match, pre-compute these statistics once when loading adjacency graphs. This avoids repeated map/inject operations on graph data during password matching, improving performance by approximately 9.3%. Performance improvement: 0.097ms -> 0.088ms per password (9.3% faster)
Added test coverage for previously untested Math module methods: lg (logarithm base 2): - Powers of 2 (exact values) - Non-power-of-2 values (with tolerance) - Decimal values (negative logs) nCk (combinations): - Edge cases (k > n, k = 0) - Small combinations (n=5) - Larger values (poker hands: 52 choose 5) - Symmetry property verification - Basic edge cases Test count increased from 18 to 27 examples.
Created test suite for previously untested Data class covering: Initialization: - Dictionary loading (5 expected dictionaries) - Adjacency graph loading (4 expected graphs) - Trie building for all dictionaries - Graph statistics pre-computation Graph statistics: - Verification of average_degree values - Verification of starting_positions values - Correctness checks for qwerty and keypad Ranked dictionaries: - Word ranking verification - Common password frequency checks Custom word lists: - Dictionary addition via add_word_list - Trie generation for custom dictionaries - Word searchability via tries - Empty list handling Test count increased from 271 to 291 examples (20 new tests).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The spatial entropy calculation uses graph statistics (average degree and starting positions) to compute password strength for keyboard patterns. Previously, these statistics were calculated on-demand for every spatial match by iterating through the adjacency graph data, performing map/compact/inject operations each time.
Since adjacency graphs are immutable after loading, these statistics can be computed once during initialisation and reused, eliminating redundant calculations.
Changes
Data class (
lib/zxcvbn/data.rb):compute_graph_statsprivate method that pre-computes average degree and starting positions for all adjacency graphs during initialisation@graph_statshash with structure:{ graph_name => { average_degree:, starting_positions: } }graph_statsreaderMath module (
lib/zxcvbn/math.rb):average_degree_for_graphto retrieve pre-computed value fromdata.graph_statsstarting_positions_for_graphto retrieve pre-computed value fromdata.graph_statsPerformance
Benchmark results (1000 iterations across 10 passwords with spatial patterns):
The improvement comes from eliminating repeated iterations over adjacency graph data during entropy calculations. All 262 tests pass, confirming correctness is maintained.