Skip to content

Conversation

@lindsay-stevens
Copy link
Contributor

@lindsay-stevens lindsay-stevens commented Aug 28, 2025

Closes #453

Why is this the best possible solution? Were any other approaches considered?

Details in commit messages but in summary this PR greatly improves test coverage for relative references calculation, and refactors the existing function responsible for the core part of that calculation to be much clearer as to what it is doing. It's not necessarily doing the right thing but at least it's a lot easier to read and the behaviour is consistent with the old function. So one of the goals in this PR is to hopefully get a clearer idea of what pyxform should be doing exactly in regards to relative references, which may allow further simplifications.

The new function uses existing and improved object hierarchy traversal functions to find the lowest common ancestor repeat (or other relevant ancestors) instead of the old approach of splitting XPath strings then looking up the objects. A large range of a/symmetrical nesting topologies are also now able to work per e6f5d4a, among which is the case in #453.

To keep the tests simple as possible at this stage, they are written as unit tests i.e. PyxformTestCase may activate other code paths that would further confuse matters, like the application of current() to the generated relative path. But to ensure that the tests are still doing something relevant, the tests optionally run the generated Survey through ODK Validate. And to make that easier in general, there's now an envar for running validate against all PyxformTestCase tests, and that flag is also used to check these unit tests with Validate, and a GitHub actions job is added to run that check for PRs.

What are the regression risks?

While the new test cases are extensive (+477 scenarios) they generally focus on combinations of nesting up to 2 levels under the lowest common ancestor repeat, and one or two levels above that ancestor, in order to cover likely use cases and the apparent code paths. So it's possible that combinations with more nesting above or below the LCAR could still have bugs or acquire new regressions. Existing tests are preserved - except for a couple that needed some changes to work with ODK Validate as mentioned above.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

I don't think it needs docs changes, unless there's docs saying that this sort of thing doesn't work e.g. guidance that suggests using explicit XPaths instead of using ${} references.

Before submitting this PR, please make sure you have:

  • included test cases for core behavior and edge cases in tests
  • run python -m unittest and verified all tests pass
  • run ruff format pyxform tests and ruff check pyxform tests to lint code
  • verified that any code or assets from external sources are properly credited in comments

- probably not worth enforcing (e.g. `def f(*, k)`) but this makes it
  clearer how insert_xpaths is being used. For instance:
  - `context` is always the calling SurveyElement, except for when
    processing references for setvalue/setgeopoint where the context
    is the Survey (actually this can just be get_xpath() lookup).
  - `use_current` only used in conjunction with choice filters.
  - `reference_parent` only used when the itemset contains a pyxform
    variable reference e.g. `select_one ${ref}`.
- should not need to be any other value but this approach is hopefully
  backwards compatible in case some external code uses these classes
- this function is the core of relative paths processing but is
  difficult to understand and unnecessarily uses string manipulation
  (i.e. the object tree can be navigated using the actual objects).
- however before it can be refactored, as guidance and to avoid
  regressions, a reasonably comprehensive test suite is needed.
- existing pyxformtestcases (mainly in test_repeat.py) cover many
  topologies but since there are so many different cases that can
  result in different behaviour they have been serialised to csv
  instead of pyxformtestcases.
  - Another reason is that other code in Survey.insert_xpaths modifies
    the output from share_same_repeat_parent which introduces even more
    variables (it's already complicated).
  - Yet another reason is that the csv format of these tests allows
    visualisations for understanding the behaviour (or bug) patterns.
- many of the behaviours seem like they might be bugs and return
  (None, None) rather than a path, so in the current suite those cases
  show what seems like should be returned, but are marked as such
  ("expect_none"=1) so that an equivalent refactored function can be
  implemented; or at least it can be clearer from the git diff how the
  new function differs from the old one.
- the current test suite covers an initial but not fully exhaustive set
  of cases that:
  - seem most likely usages i.e. none of LCAR, target or source nested
    between zero and 2 levels of grouping
  - enter each apparent code path (except for the IndexError branch on
    survey.py L130 - no current tests in pyxform hit that branch)
  - seem to be valid combinations of (see docstring of
    build_survey_from_path_spec for naming conventions):
    - lowest common ancestor repeat contexts:
      - /y
      - /y/g1o
      - /y/r1o
      - /y/r1o/g2o
    - reference target paths under LCAR:
      - /a/t
      - /a/g1t/t
      - /a/r1t/t
      - /a/g1t/g2t/t
      - /a/g1t/r2t/t
      - /a/r1t/g2t/t
      - /a/r1t/r2t/t
    - reference source paths under LCAR:
      - /a/s
      - /a/g1t/s
      - /a/g1s/s
      - /a/r1s/s
      - /a/g1t/g2t/s
      - /a/g1t/g2s/s
      - /a/g1t/r2s/s
      - /a/g1s/g2s/s
      - /a/g1s/r2s/s
      - /a/r1s/g2s/s
      - /a/r1s/r2s/s
- if the inconsistencies and potential bugs can be resolved then these
  tests could be replaced with general rules-based generated tests.
- use objects directly to navigate and hierarchy rather than strings
- simplify the existing logic in a way that is consistent with old
- allow all cases where previously (None, None) was returned, to return
  a relative path (was a small amount of typos in previously failing
  cases that are corrected now that the cases are being executed)
  - it's possible that these were intentionally returning None, which
    (it seems) ultimately results in an absolute path being returned,
    but that doesn't seem to make sense because before
    share_same_repeat_parent is invoked, it's already been decided that
    a relative path is needed via _is_return_relative_path().
- about performance changes, it seems about the same timing but that's
  probably OK considering this version doesn't have a lru_cache, which
  is also maybe why it's using less peak memory.
- has been a manual pre-release step for a long time so probably about
  time to make it easier and run it as part of PR test actions.
- required some minor changes to existing tests that failed when
  ODK Validate was run - see inline comments.
- check that the survey structure in each case generates a valid XForm
@lindsay-stevens lindsay-stevens changed the title pyxform refs + relative references in repeats 453: pyxform relative references improvements Aug 29, 2025
@lindsay-stevens lindsay-stevens marked this pull request as ready for review August 29, 2025 15:06
@lindsay-stevens lindsay-stevens changed the title 453: pyxform relative references improvements 453: pyxform relative references (in repeats) improvements Aug 29, 2025
Copy link
Contributor

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the CSV test format and spot checked a dozen lines to make sure they match my understanding.

I've spent enough time with the new implementations that I feel pretty confident I can explain them. I spent some time thinking about alternative approaches or further simplification but not enough to feel like I would have identified good alternatives. I also didn't spend enough time with the implementation to feel confident I haven't missed edge cases.

Given #778 (comment), I think we should merge now. That will also give some time for any issues to get found on the staging server.

@lindsay-stevens please address conflicts, consider naming suggestions and merge away!

expect_none=case["expect_none"],
)

def test_relative_paths__outer_gg(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not immediately seeing what makes these cases special -- could they also have been in the CSV? Are they maybe just to have examples to reference? I did find that helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSV has a set of 118 tests of unique 2-level (under LCAR) scenarios, repeated for 4 LCAR contexts: /y, /y/g1o, /y/r1o, /y/r1o/g2o. The extra test methods like the one highlighted (outer_gg, outer_gr, etc) are smoke tests for additional LCAR contexts that I kept separate to a) keep the diff down and b) in consideration of future test maintenance (balancing benefit of coverage with burden of updating it).

I think in theory the "LCAR does or does not have an ancestor repeat" behaviour pattern should remain the same no matter how many levels of nesting above the LCAR. The smoke test cases could be added to the CSV as additional sets pretty easily. Among these extra LCAR contexts, are there any that do or do not seem like likely use cases to add?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, makes sense! Keeping them separate seems fine.

I'm realizing looking at these cases more closely that I don't fully understand reference_parent. But I don't think it's critical for merge given that all tests pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think the next battles will be understanding / testing / fixing relative references behaviour of reference_parent, use_current, and indexed-repeat.

- conflicts:
  - pyxform/question.py
  - pyxform/survey.py
  - pyxform/survey_element.py
  - tests/test_repeat.py
- no cases added or removed, just to make it easier to read/update
- sorted by the first 4 columns:
  - lcar_context
  - target_path
  - source_path
  - reference_parent
@lindsay-stevens
Copy link
Contributor Author

@ln thanks for the review, I've resolved feedback and merge conflicts. Could you please confirm if you think some more cases should be added per this comment? If not should be ok to merge.

expect_none=case["expect_none"],
)

def test_relative_paths__outer_gg(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, makes sense! Keeping them separate seems fine.

I'm realizing looking at these cases more closely that I don't fully understand reference_parent. But I don't think it's critical for merge given that all tests pass.

@lognaturel lognaturel merged commit 7fcc25f into XLSForm:master Sep 18, 2025
14 checks passed
@lindsay-stevens lindsay-stevens deleted the pyxform-refs branch September 19, 2025 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calculation using node from nested repeat has absolute reference

2 participants