Skip to content

Conversation

@jucor
Copy link
Collaborator

@jucor jucor commented Nov 25, 2025

Summary

  • Fix Python PCA to use column means for missing votes (instead of 0), matching the slightly healthier Clojure behavior: better bias towards average than towards zero. Still not ideal and we'll use a better way to do that, but for now that's a first step.
  • Add regression tests comparing Python PCA output against Clojure math_blob
  • Add unit tests verifying NaN handling uses column means

Background

Investigation revealed that Python and Clojure PCA implementations differed significantly:

  • Root cause: Python filled NaN with 0, Clojure fills with column mean
  • Impact: VW dataset had 16° and 52° angle differences in PC1/PC2

Changes

Core fix (pca.py)

  • pca_project_dataframe: Replace NaN with column means instead of 0
  • powerit_pca: Add ValueError if NaN values reach it (enforces preprocessing)

New tests

  • test_pca_unit.py: Two tests verifying column-mean NaN handling
  • test_legacy_clojure_regression.py: test_pca_components_match_clojure comparing Python vs Clojure PCA

Results after fix

Dataset PC1 Correlation PC1 Angle PC2 Correlation PC2 Angle
VW 1.0000 -1.0000
Biodiversity 0.9920 7.1° -0.9977 3.6°

The small biodiversity discrepancy is within numerical tolerance for power iteration on high-sparsity (82%) data.

Test plan

  • pytest tests/test_pca_unit.py -k nan_handling - NaN handling unit tests pass
  • pytest tests/test_legacy_clojure_regression.py::test_pca_components_match_clojure - Clojure comparison passes
  • Full test suite

🤖 Generated with Claude Code

@jucor jucor changed the title Fix PCA NaN handling to match Clojure implementation Fix PCA NaN handling for slightly better practice Nov 25, 2025
@jucor
Copy link
Collaborator Author

jucor commented Nov 25, 2025

Note: next step will be to :

  • fix the test runner that can't load -- something probably changed under our feet :)
  • try comparison on other convos (might be useful to add that possibility in the test_regression, in a clean optional way that does not slow down the CI...)
  • find a way to get the vote matrix as Clojure uses it, to check if there's any difference to the input of the PCA, given there's a small error on biodiversity. I want to check if it's indeed just numeric in the computation, or if there's something else.
  • update golden record.
  • remove the now-dead code that hacks a scaling to match clojure vector norms

@jucor jucor closed this Nov 25, 2025
@jucor jucor reopened this Nov 25, 2025
@jucor jucor force-pushed the pca_work branch 3 times, most recently from ad69594 to eda5d87 Compare November 25, 2025 17:43
@jucor
Copy link
Collaborator Author

jucor commented Nov 25, 2025

Don't mind the failure of the python/python regression tests: the golden record was computer before the fix, and needs to be updated.

However:

  • 1/ Gosh the creation of the fixture on tests/test_legacy_clojure_regression.py is slow... Gotta be a way to speed this up. Polars instead of Pandas? Need to add more timers everywhere.
  • 2/ The clojure PCA comparison, which passes on VW and on biodiversity, fails on BG2050
tests/test_legacy_clojure_regression.py::test_pca_components_match_clojure[bg2050] 
[bg2050] Testing PCA components match Clojure...
  PC1 correlation: 0.994594
  PC1 angle difference: 44.89°
  PC2 correlation: -0.983012
  PC2 angle difference: 53.64°
FAILED

========================================================== FAILURES ==========================================================

FAILURE: check 44.889375337151506 <= 10.0: PC1 angle difference should be ≤10° (got 44.89°)
tests/test_legacy_clojure_regression.py:344 in test_pca_components_match_clojure() -> check.less_equal(angle_deg, 10.0,

FAILURE: check 53.64306927244186 <= 10.0: PC2 angle difference should be ≤10° (got 53.64°)
------------------------------------------------------------
Failed Checks: 2
tests/test_legacy_clojure_regression.py::test_pca_components_match_clojure[bg2050]:0: check 44.889375337151506 <= 10.0: PC1 angle difference should be ≤10° (got 44.89°)

So there's something still amiss. Sign flip can't explain the 45° degrees angle difference on PC1 ... or the 53.64° on PC2 !

@jucor
Copy link
Collaborator Author

jucor commented Nov 29, 2025

Investigation: Why PCA regression tests fail for large conversations (e.g., BG2050)

Summary

The Python PCA implementation matches Clojure for small conversations but diverges significantly for large ones. After investigation, we found this is expected behavior due to different algorithms being used—and we're now confident that dropping the Clojure stochastic approach is the right call.

Root Cause

The Clojure code has a branching condition (conversation.clj:766-796): conversations exceeding 5,000 comments or 10,000 participants switch from exact PCA to a stochastic mini-batch approximation.

Dataset Comments Algorithm Used
vw, biodiversity < 5,000 Exact PCA (same as Python) ✅
BG2050 7,757 Stochastic mini-batch PCA ❌

The stochastic algorithm is clever—it uses growing-size mini-batches with a learning rate, which is exactly the pattern found in convergence proofs and optimal algorithms for stochastic Monte Carlo methods. However, I could not find any trace in the codebase of comparisons against the exact algorithm for accuracy validation.

The Scary-Looking 50° Angle Difference (and why it's not actually scary)

Initial tests showed a ~50° angle difference between Python and Clojure PCA components for BG2050, which looked alarming. But we also observed 0.99 correlation between the same components—seemingly contradictory.

Resolution: The stochastic PCA doesn't re-normalize its components after blending iterations. The mini-batch update does:

new_component = 0.99 * old_component + 0.01 * batch_component

A weighted average of unit vectors is not a unit vector. After 100 iterations, the Clojure vectors have norms ~0.6-0.7 instead of 1.0.

My angle computation assumed unit vectors (as exact PCA produces), which gave misleading results. Once properly normalized:

  • The vectors point in nearly the same direction (high correlation confirmed)
  • The ~40% norm difference would cause non-uniform scaling of the opinion map—not ideal, but not catastrophic
  • BG2050 results weren't wrong, just slightly inaccurate

Decision

Drop the stochastic PCA codepath.

The exact PCA in Python (currently custom, but soon backed by efficient NumPy) will handle what used to be computationally challenging for Clojure. If we ever need to optimize for truly massive conversations, we can revisit with:

  • Proper comparison tests between algorithms
  • Battle-tested libraries (scikit-learn's incremental PCA, etc.)
  • Actual validation that approximations match the exact solution

@jucor
Copy link
Collaborator Author

jucor commented Nov 29, 2025

Now that we have verified the PCA algorithm and trust it, next steps:

  • Re-record Golden records with the fixed PCA we have. Remember from previous commits: in addition to the clojure comparison discussed above, we had also fixed an issue on how the Python PCA handles the NaN, and verified that this tracks with Clojure for small convos. NB: for large convos (not in the committed real_data), changing the PCA result will change the clustering, of course. But that clustering isn't really checked for regression wrt Clojure anyway (yet).
  • Optional: modify the legacy clojure comparison to be more lenient on large convos. Not ideal, but allows it to pass on the large convos. Then again, those are not committed to this repo, and are used for debug only. Cancelled
  • Then move to sklearn PCA, and see how faster that is ;-)

@jucor
Copy link
Collaborator Author

jucor commented Nov 29, 2025

Parallelized test execution across datasets

Added xdist_group markers to enable parallel test execution by dataset. This significantly speeds up local testing when running against many large conversations.

Local benchmarks (Apple Silicon M1, 64GB RAM):

Dataset Participants Comments Exact PCA time
bg2050 6,624 7,730 15s
EB 4,222 7,728 11s
P 8,308 8,048 53s

Full test suite with all local datasets: ~3 min (down from 10-12 min sequential).

Notes:

  • The speedup matters less on CI, which only uses vw and biodiversity to keep runner load light
  • For local debugging/algorithm development with large datasets, the parallel execution is very useful
  • Memory footprint is reasonable (e.g. bg2050 ~400MB for full matrix, to which we should add PCA working space -- but since we're asking for only 2 components this should be reasonable)
  • Current implementation uses manual NumPy-based PCA; switching to scikit-learn could leverage optimized BLAS/LAPACK routines for further speedup

The parallel execution infrastructure is now available if we want to enable it on CI runners in the future.

@jucor jucor changed the title Fix PCA NaN handling for slightly better practice Overhaul Python PCA, from NaN handling to SKLearn replacement Jan 5, 2026
@jucor
Copy link
Collaborator Author

jucor commented Jan 6, 2026

We now have fully replaced the Python port of the PCA by a simple call to SKLearn, much faster and much simpler 😄 🎉

The computations on all the datasets I've tried match (from small to very large convos, from public ones to private ones), up to some tiny numerical differences for small values, which barely register when looking at the projections as a whole.

Next step for this PR:

  • Check that all tests do pass, especially the regression test.
  • Record new golden snapshots, as this SKlearn implementation become the new standard
  • Remove code for clojure comparison of the PCA output, not needed anymore, as they now match.
  • Rebase on edge to resolve conflicts.

jucor and others added 14 commits January 6, 2026 22:06
commit 2024dde
Author: Julien Cornebise <julien@cornebise.com>
Date:   Mon Nov 24 22:14:34 2025 +0000

    Update test_powerit_pca_with_nans to expect ValueError

    The previous commit changed powerit_pca to raise ValueError when given
    NaN values, making NaN handling the caller responsibility. This test
    now verifies that behavior instead of testing graceful NaN handling.

    🤖 Generated with [Claude Code](https://claude.com/claude-code)

    Co-Authored-By: Claude <noreply@anthropic.com>

commit 9839770
Author: Julien Cornebise <julien@cornebise.com>
Date:   Mon Nov 24 22:14:34 2025 +0000

    Fix PCA NaN handling to use column mean (matching Clojure)

    The Python PCA was using 0 to fill missing votes, while Clojure uses
    column means. This caused significant differences in PCA components
    (16 deg and 52 deg angle differences on VW dataset).

    Changes:
    - pca_project_dataframe: Replace NaN with column means instead of 0
    - powerit_pca: Add ValueError if NaN values reach it (caller must preprocess)
    - test_regression.py: Enable ignore_pca_sign_flip for golden comparisons

    New tests:
    - test_pca_unit.py: test_nan_handling_uses_column_mean verifies column
      mean is used, test_nan_handling_differs_from_zero_fill confirms we
      are not using 0
    - test_legacy_clojure_regression.py: test_pca_components_match_clojure
      compares Python vs Clojure PCA components (correlation and angle)

    Results after fix:
    - VW: PC1/PC2 correlation 1.0/-1.0, angle 0 deg (perfect match)
    - Biodiversity: PC1/PC2 correlation 0.99/-1.0, angle 7/4 deg (minor
      numerical differences from power iteration on 82% sparse data)

    🤖 Generated with [Claude Code](https://claude.com/claude-code)

    Co-Authored-By: Claude <noreply@anthropic.com>

commit 4b33297
Author: Julien Cornebise <julien@cornebise.com>
Date:   Mon Nov 24 22:14:34 2025 +0000

    Test that sign flipping propagates

    If the PCA components flip sign, the projections
    must also flip sign.

commit e307a52
Author: Julien Cornebise <julien@cornebise.com>
Date:   Fri Nov 21 09:36:22 2025 +0000

    Flag an unneeded exception

commit 91a1e9c
Author: Julien Cornebise <julien@cornebise.com>
Date:   Wed Nov 19 15:49:34 2025 +0000

    Report any scaling issues in PCA projections

commit 395e636
Author: Julien Cornebise <julien@cornebise.com>
Date:   Wed Nov 19 15:25:46 2025 +0000

    Turn off off the Clojure sign-flip for PCA

    Of course this breaks the comparison to the golden files, as those included
    the sign flip. So we're also giving the option in the regression test to ignore
    PCA sign-flips. That will be handy later when we work on improving the PCA implementation,
    as various PCA implementations have various sign conventions.

commit 0dc1aa7
Author: Julien Cornebise <julien@cornebise.com>
Date:   Mon Nov 24 22:14:34 2025 +0000

    Clean up unused variables and imports

    Address GitHub Copilot review comments:
    - Log superseded votes count in conversation.py instead of leaving unused
    - Remove unused p1_idx/p2_idx index lookups in corr.py
    - Remove unused all_passed variable in regression_comparer.py
    - Remove unused imports (numpy, Path, List, datetime, stats, pca/cluster functions)

    🤖 Generated with [Claude Code](https://claude.com/claude-code)

    Co-Authored-By: Claude <noreply@anthropic.com>
Until now, one shared object was getting reused across tests for a given dataset,
and the cache kept on piling.
Use pytest-xdist's loadgroup distribution to run tests for different
datasets on separate workers while keeping all tests for the same
dataset on one worker (preserving fixture cache efficiency).

Changes:
- Add xdist_group marker to each dataset parameter via pytest.param()
- Switch from --dist=loadscope to --dist=loadgroup in pytest config
- Clean up conftest.py marker handling

Performance: Tests now complete in ~2 minutes with 4 workers (-n4),
down from ~9 minutes before deduplication and ~4 minutes sequential
after deduplication. The xdist_group marker ensures each dataset's
Conversation object is computed only once per worker.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add reusable helpers in conftest.py for creating pytest.param objects
with xdist_group markers, enabling efficient parallel test execution
across all dataset-parametrized tests.

Changes:
- conftest.py: Add make_dataset_params() and get_available_dataset_params()
  helper functions for xdist_group marker creation
- Update pytest_generate_tests to use xdist_group markers
- Update test_pca_smoke.py, test_repness_smoke.py, test_conversation_smoke.py,
  test_legacy_repness_comparison.py, test_golden_data.py, test_pipeline_integrity.py
  to use the new helpers

Pattern: Tests parametrized by dataset now use xdist_group markers via:
- make_dataset_params(["biodiversity", "vw"]) for hardcoded lists
- get_available_dataset_params() for dynamically discovered datasets

With --dist=loadgroup (default in pyproject.toml), pytest-xdist groups
tests by dataset, ensuring fixtures are computed once per worker.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
New pytest option allows filtering tests to run on a subset of datasets:
  pytest --datasets=biodiversity          # single dataset
  pytest --datasets=biodiversity,vw       # multiple datasets

Implementation:
- pytest_generate_tests filters dynamic parametrization
- pytest_collection_modifyitems deselects tests with static parametrization
- Report header shows "Filtered to: ..." when active

Works with both:
- Dynamic parametrization (dataset fixture via pytest_generate_tests)
- Static parametrization (@pytest.mark.parametrize with get_available_dataset_params())

Also updated tests/README.md with documentation for all pytest options
including --include-local, --datasets, and parallel execution with -n.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The test file has its own pytest_generate_tests hook that was not
respecting the --datasets option. Now imports _get_requested_datasets
and make_dataset_params from conftest to properly filter datasets.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The original angle computation assumed unit-normalized vectors, which
gave misleading results when comparing Python (exact PCA, unit vectors)
with Clojure stochastic PCA for large conversations (non-unit vectors
due to learning rate blending without re-normalization).

This discrepancy only affects large conversations (>5k comments or >10k
participants), as Clojure uses exact PCA for smaller conversations.

Now outputs both the raw angle and the properly normalized angle,
plus the norms of both vectors for diagnostics. The assertion now
checks the normalized angle, which correctly identifies when vectors
point in similar directions regardless of their magnitudes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
As discussed above, the alignment with clojure was an AI-hallucinated kludge
to pass tests by customized scaling tailored to each dataset, hiding the problems
with the Python PCA implementation (the handling of NaNs). Now that we have fixed
those, we can remove that madness and update the golden records accordingly.
Note: we will update the golden records in a later commit.
I took care to keep the "sparsity-aware" scaling, computed in a vectorized way.
jucor and others added 11 commits January 6, 2026 22:16
It's not needed anymore: the PCA is done by Sklearn, and the edge cases
are already handled by wrapped_pca caller, i.e. pca_project_dataframe.

Less code, less complexity, less maintenance.
Cluster centers are derived from PCA projections, so they inherit the
sign ambiguity. This fix ensures sign flips are detected and corrected
for `.center` paths in addition to `.pca.comps` and `.proj.` paths.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, sign flips were detected per-projection-vector, which fails
when only some components are flipped (e.g., PC1 unchanged, PC2 flipped).

Now detects flips at the component level (.pca.comps[N]) and stores them,
then applies per-dimension correction to projections and cluster centers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… tests

TLDR: it works! Differences were just numerical errors, artificially looking
huge on small values. Looking at the whole cloud of projections confirmed the
post sk-learn matches the pre-sklearn results perfectly well.

Problem:
--------
The regression comparer was failing on datasets like FLI and pakistan with
thousands of "differences" showing relative errors up to 874%. Investigation
revealed these were false positives: the high relative errors occurred on
near-zero values (e.g., golden=4.54e-06 vs current=4.42e-05) where even tiny
absolute differences (3e-04) produce huge relative errors.

Diagnosis:
----------
Comparing sklearn SVD-based PCA against power iteration golden snapshots:
- The projection point clouds are visually identical (see scatter plots)
- Important values (Q70-Q100 percentile) match within 0.2%
- Only near-zero values (Q0-Q7 percentile, ~0.0x median) show large rel errors
- These near-zero values represent participants at the origin who don't
  affect visualization or clustering

The element-wise (abs_tol, rel_tol) approach fundamentally cannot handle
this case: it either fails on small values or is too loose for large values.

Solution:
---------
Added projection comparison metrics that measure what actually matters:

| Metric                | Threshold | What it measures                    |
|-----------------------|-----------|-------------------------------------|
| Max |error| / range   | < 1%      | Worst displacement as % of axis    |
| Mean |error| / range  | < 0.1%    | Average displacement as % of axis  |
| R² (all coordinates)  | > 0.9999  | Variance explained (99.99%)        |
| R² (per dimension)    | > 0.999   | Per-PC fit quality (99.9%)         |
| Procrustes disparity  | < 1e-4    | Shape similarity after alignment   |

Results for FLI dataset:
- Max |error| / range: 0.0617% (was flagging 28% rel error on Q1 values)
- R²: 0.9999992
- Procrustes: 8.2e-07

All 7 local datasets now pass regression tests.

Also added:
- Quantile context in error reports (computed from ALL values, not just failures)
- Explanation when element-wise diffs exist but metrics confirm match
Re-recorded golden snapshots using the new sklearn SVD-based PCA.
These snapshots now serve as the baseline for regression tests.
Tests that only verify sorting/moderation now use recompute=False to
skip unnecessary PCA computation. Manager tests that need PCA but use
minimal data have targeted filterwarnings with explanations.

Changes:
- Add recompute=False to sorting tests (no PCA needed for ID ordering)
- Add recompute=False to test_moderation (only checks filtering)
- Add @pytest.mark.filterwarnings to TestConversationManager tests
  that use minimal data through the manager interface

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The sklearn PCA implementation has been validated to match Clojure.
Regressions are now caught by test_regression.py golden snapshots.

This test file will be removed once all other Clojure comparison
tests are similarly validated and covered by regression tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ion.py

- Delete test_golden_data.py: redundant with test_legacy_clojure_regression.py
- Change @Skip to @xfail for test_group_clustering and test_comment_priorities
  so they run and document expected failures rather than being silently skipped

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jucor
Copy link
Collaborator Author

jucor commented Jan 6, 2026

Rebase on edge completed

Rebased this PR on top of edge to incorporate the squash-merged changes from PRs #2312 and #2316.

What happened

This branch was originally built on top of two other PRs that have since been squash-merged into edge:

Because they were squash-merged (not regular merged), the original commits disappeared but the changes remained in edge, causing potential conflicts.

Resolution

Before rebase: 48 commits
After rebase: 24 commits

Dropped 24 commits that were already present in edge via the squash merges:

Conflict resolved

File: delphi/polismath/regression/utils.py

Nature of conflict: This PR added include_local=True to a list_available_datasets() check in load_golden_snapshot(). However, PR #2312 subsequently removed that check entirely because it's now redundant—get_dataset_info() internally calls discover_datasets(include_local=True), so the validation happens automatically.

Resolution: Accepted edge's cleaner version which delegates the validation to get_dataset_info().


🤖 Generated with Claude Code

@jucor jucor marked this pull request as ready for review January 6, 2026 22:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR overhauls the Python PCA implementation by replacing a custom power iteration algorithm with sklearn's PCA and fixing NaN handling to use column means instead of zeros, bringing Python closer to Clojure behavior.

Key Changes

  • PCA Implementation: Replaced ~500 lines of custom power iteration code with sklearn's PCA (simpler, more maintainable, better tested)
  • NaN Handling: Changed from filling missing votes with 0 to using column means, reducing bias in covariance estimates
  • Testing Infrastructure: Added PCA sign flip tolerance to regression tests, added unit tests for NaN handling, and improved test parallelization with xdist groups

Reviewed changes

Copilot reviewed 20 out of 22 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
delphi/polismath/pca_kmeans_rep/pca.py Core change: replaced custom PCA with sklearn, changed NaN fill from 0 to column mean, simplified projection logic
delphi/polismath/regression/comparer.py Added PCA sign flip detection, outlier tolerance, projection metrics for regression testing
delphi/tests/test_pca_unit.py Removed 200+ lines of custom PCA tests, added 2 focused tests for NaN handling with column means
delphi/tests/test_pca_edge_cases.py Removed tests for custom power iteration functions that no longer exist
delphi/tests/test_regression.py Added tests for PCA sign flip handling and scaling factor detection in comparison logic
delphi/tests/test_legacy_clojure_regression.py Changed to class-based tests with better fixture scoping, added placeholder for deprecated PCA test
delphi/tests/conftest.py Added xdist_group markers for parallel execution, dataset filtering via --datasets flag
delphi/tests/test_golden_data.py Deleted obsolete test file that was marked as xfail
delphi/tests/test_conversation.py Added recompute=False to tests that only verify sorting/moderation logic (performance optimization)
delphi/scripts/regression_recorder.py Added --include-local flag support
delphi/scripts/regression_comparer.py Added CLI options for PCA sign flip tolerance, outlier handling, and local datasets
delphi/pyproject.toml Added pytest config for xdist loadgroup distribution
delphi/polismath/benchmarks/bench_pca.py New benchmark script for profiling PCA performance
Various smoke/regression test files Updated imports to use conftest helper functions for dataset parametrization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

n_cmnts = matrix_data.shape[1]
n_seen = np.sum(~np.isnan(matrix_data), axis=1) # Count non-NaN votes per participant
proportions = np.sqrt(n_seen / n_cmnts)
scaled_projections = projections / proportions[:, np.newaxis]
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential division by zero: If a participant has voted on zero comments (all NaN values), n_seen will be 0, leading to proportions containing 0, which will cause division by zero when computing scaled_projections = projections / proportions[:, np.newaxis].

Consider adding a check to handle this edge case, for example:

proportions = np.sqrt(n_seen / n_cmnts)
proportions = np.where(proportions == 0, 1, proportions)  # Avoid division by zero

Or filter out participants with no votes before PCA computation.

Suggested change
scaled_projections = projections / proportions[:, np.newaxis]
# Avoid division by zero for participants with no seen comments
proportions = np.where(proportions == 0, 1.0, proportions)
scaled_projections = projections / proportions[:, np.newaxis]

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed in the same approach as the Clojure implementation uses.

Clojure reference (math/src/polismath/math/pca.clj:156):

(* (Math/sqrt (/ n-cmnts (max n-votes 1)))
   [p1 p2]))

Fix applied:

n_seen_safe = np.maximum(n_seen, 1)
proportions = np.sqrt(n_seen_safe / n_cmnts)

Also added a unit test test_participant_with_no_votes to cover this edge case.

return (np.array(all_golden) if len(all_golden) > 0 else np.array([]),
np.array(all_current) if len(all_current) > 0 else np.array([]))

def _log_projection_metrics(self, golden_stages: dict, current_stages: dict) -> None:
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return type mismatch: Function _log_projection_metrics is declared to return None but actually returns False at line 1500 when len(g_all) == 0, and returns all_pass (a boolean) at line 1572. The function signature should be changed to return bool | None or Optional[bool] to match the actual implementation.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: changed return type annotation from -> None to -> bool.

# Column mean is imperfect (pulls participants toward center, assumes Gaussian data
# while votes are ternary) but is better than 0. Future work needed
# to have a more proper way to handle missing data in PCA.
col_means = np.nanmean(matrix_data, axis=0)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge case not handled: If an entire column is NaN, col_means will contain NaN for that column at line 47, and when filling NaN values at line 50, those NaN placeholders will remain NaN. This will cause sklearn's PCA to fail or produce invalid results. Consider adding a check after line 47 to replace any remaining NaN values in col_means with a default value (e.g., 0.0):

col_means = np.where(np.isnan(col_means), 0.0, col_means)
Suggested change
col_means = np.nanmean(matrix_data, axis=0)
col_means = np.nanmean(matrix_data, axis=0)
# Handle edge case: columns that are entirely NaN will produce NaN means.
# Replace any remaining NaNs in col_means with 0.0 to ensure finite fill values.
col_means = np.where(np.isnan(col_means), 0.0, col_means)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this edge case! After investigation, this scenario is extremely unlikely in practice.

Why it can't happen:

Both Clojure and Python build the rating matrix only from votes that exist:

  • Python uses pivot_table which only creates columns for tids that appear in votes
  • Clojure uses nm/update-nmat which similarly only adds columns for voted comments

A comment with zero votes never gets a column in the matrix - it's simply not added during matrix construction. The columns (colnames/tids) are derived from actual vote records.

When could it theoretically occur?

  1. Manually constructed test DataFrames with all-NaN columns
  2. A future refactor that changes matrix construction logic

For now, leaving as-is since the defensive check would guard against a scenario that can't occur in the current architecture. If we ever change how the matrix is built, we should revisit this.

)
assert not result["match"], "Non-constant scaled data should not match"
# Should not mention a specific scaling factor
reason = result.get("reason", "")
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable reason is not used.

Suggested change
reason = result.get("reason", "")

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: removed the unused variable and clarified the comment.

if abs(g) > 1e-10:
rel_diffs.append(abs(g - c) / abs(g))
except (TypeError, ValueError):
pass
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
pass
logger.debug(
"Skipping non-numeric difference for stats at path %s: "
"golden_value=%r, current_value=%r",
diff.get("path"),
diff.get("golden_value"),
diff.get("current_value"),
)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an explanatory comment. The silent pass is intentional:

all_differences can contain non-numeric entries (type mismatches, key mismatches, length mismatches, etc.) that are already recorded and will cause the comparison to fail. This loop only computes numeric error statistics, so non-numeric differences are correctly skipped.

Design rationale: For regression testing, we want to collect all differences (not fail on the first one) so users see the full picture. Structural mismatches are reported - we just can't compute "how far off" a dict is from a list.

jucor and others added 2 commits January 6, 2026 23:15
- Fix division by zero when participant has no votes (pca.py)
  Mirrors Clojure's (max n-votes 1) approach
- Fix return type annotation for _log_projection_metrics (comparer.py)
- Add explanatory comment for intentional except pass (comparer.py)
- Remove unused variable in test (test_regression.py)
- Add unit test for no-votes edge case (test_pca_unit.py)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add "Parallel Test Execution" section to regression_testing.md explaining:
- How to use -n auto for parallel execution
- How xdist_group markers keep dataset fixtures together
- When NOT to use parallel execution (database tests, debugging)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jucor
Copy link
Collaborator Author

jucor commented Jan 6, 2026

Now that I've answered Github Copilot review, ready for human review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant