Skip to content

Conversation

@jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Feb 13, 2025

Closes #298

See commit messages for details, but there's a few parts I want to call out explicitly:

Merging approach

Specifically, merging of lists is done by extending the original list, rather than simply replacing it. We use a custom approach to decide whether new elements should be added to the list or replace already-present elements. As an example:

{
  "colorings": [
    {"key": "num_date", "title": "date", "type": "continuous"}
  ]
}
{
  "colorings": [
    {"key": "country", "title": "Country", "type": "categorical"},
    {"key": "num_date", "title": "Sampling Date", "type": "continuous"}
  ]
}

Here the second JSON's num_date coloring replaces the first JSONs one, and the country coloring is added in, so we end up with:

{
  "colorings": [
    {"key": "num_date", "title": "Sampling Date", "type": "continuous"}
    {"key": "country", "title": "Country", "type": "categorical"},
  ]
}

I think this is the only practical way to combine colorings in multiple configs, but we also use lists for many other options, e.g. 'panels', 'maintainers'. This PR still uses the same approach, but I'm less convinced for non-coloring options such as these whether we should extend lists or simply replace them.

Note that we don't really do deep merging here at all. This only applies in one situation as far as I can tell, config.display_defaults.panels: list[str] where overlay configs will completely replace the original list, and this is different from how config.panels currently behaves.

Refactoring of warnings code

This changed all usages of (export v2's) warn() to warning(), which in practice only changes the stacklevel, which changes... nothing? The warning() helper function was added years ago but saw ~no usage, so I think we should either use it consistently (as done here) or remove it and just use the default warn().

Validate then merge

The approach here is to validate each config JSON independently, rather than merging then validating. (This works well since our schema has no required properties.) augur validate auspice-config-v2 hasn't been updated to allow multiple auspice configs, but we could do so if we want (requires a little refactoring to avoid circular imports). We could additionally validate the merged config if people want to, although the current implementation guarantees that valid configs produce a valid merged config.

Long term direction

This work paves the way to merging in all config-related command line args within auspice_config.py which should help clean up export_v2.py quite a bit.

I didn't add an option to write out the merged config, but I know this is a subject that's been discussed quite a bit in the context of snakemake config merging so we could add this here if wanted.

Still to do:

  • regenerate docs. I need to follow Jover's instructions here as with my setup (py 3.10) I introduce changes to every docs file.
  • changelog

@jameshadfield jameshadfield force-pushed the james/composable-auspice-config-298 branch from 0c4004e to 5a162fa Compare February 13, 2025 03:41
@jameshadfield jameshadfield force-pushed the james/composable-auspice-config-298 branch 3 times, most recently from 6f4d481 to 5911881 Compare March 25, 2025 23:26
@codecov
Copy link

codecov bot commented Mar 25, 2025

Codecov Report

Attention: Patch coverage is 54.20561% with 98 lines in your changes missing coverage. Please review.

Project coverage is 73.24%. Comparing base (8d575c3) to head (f8caa5c).
Report is 165 commits behind head on master.

Files with missing lines Patch % Lines
augur/util_support/auspice_config.py 52.43% 68 Missing and 10 partials ⚠️
augur/export_v2.py 55.55% 5 Missing and 3 partials ⚠️
augur/util_support/warnings.py 66.66% 5 Missing and 1 partial ⚠️
augur/validate.py 45.45% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1756   +/-   ##
=======================================
  Coverage   73.23%   73.24%           
=======================================
  Files          79       81    +2     
  Lines        8402     8498   +96     
  Branches     1715     1732   +17     
=======================================
+ Hits         6153     6224   +71     
- Misses       1960     1981   +21     
- Partials      289      293    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jameshadfield jameshadfield marked this pull request as ready for review March 25, 2025 23:52
@victorlin victorlin self-requested a review March 26, 2025 18:30
Copy link
Member

@victorlin victorlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed the first 2 commits so far, sending in comments before lab meeting

@genehack
Copy link
Contributor

We could additionally validate the merged config if people want to, although the current implementation guarantees that valid configs produce a valid merged config.

I think re-validating the merged config is a good idea — even if it's the equivalent of a "this never happens" else clause in a conditional — because if somehow an invalid config was produced, I have to imagine it would be extremely confusing to debug.

@j23414
Copy link
Contributor

j23414 commented Mar 27, 2025

Only commenting from a user perspective, not coding:

+1 For this feature, because I'd like to use it to simplify dengue's set of auspice configs

merging of lists is done by extending the original list, rather than simply replacing it.

Cool and on-board with this! And I imagine from a user's perspective, a usage case would be similar to

augur export --auspice-config auspice_config_base.json  add-serotype-specific-coloring-and-configs.json ...
augur export --auspice-config auspice_config_base.json  add-genotype-specific-coloring-and-configs.json ...

jameshadfield added a commit that referenced this pull request Mar 28, 2025
The behaviour of this function was identified as wanting in review¹ and
since we use it twice (within `export_v2.py`) and call `AugurError`
twice I opted to move all usages to the latter. Note that this changes
the printed message slightly, but it's now more consistent with other
code in the project.

¹ <#1756 (comment)>
jameshadfield added a commit that referenced this pull request Mar 28, 2025
The previous code used `warnings.warn` as well as `warning`, where the
latter was a wrapper around the former¹. We now rename the wrapper as
`warn` so that it's consistently used. Note that the only actual usage
of the previous `warning` function was removed in an earlier commit in
this series.

This change was suggested in both the PR description and by others in
review².

¹ The underlying warning category is unchanged (`UserWarning`) but we
now have a stacklevel of 2 not 1. In the current usage I don't observe
any differences.

² <#1756 (comment)>
jameshadfield added a commit that referenced this pull request Mar 28, 2025
Re-validate the merged config, as discussed in review¹, and optionally
write out the merged JSON to a file.

¹ <#1756 (comment)>
@jameshadfield jameshadfield force-pushed the james/composable-auspice-config-298 branch from 5911881 to 62457b6 Compare March 28, 2025 01:04
jameshadfield added a commit that referenced this pull request Mar 28, 2025
Re-validate the merged config, as discussed in review¹, and optionally
write out the merged JSON to a file.

¹ <#1756 (comment)>
@jameshadfield jameshadfield force-pushed the james/composable-auspice-config-298 branch from 62457b6 to 23595ee Compare March 28, 2025 01:19
@jameshadfield
Copy link
Member Author

jameshadfield commented Mar 28, 2025

Force pushed to address requested changes (see comments above) and rebased onto latest master.

I think re-validating the merged config is a good idea

Done, and also added a --output-auspice-config argument as discussed in dev chat earlier today

Read The Docs is failing as it failed to check out the branch, which is unrelated to the changes in this PR. All other CI checks pass, so it's ok to merge from that perspective.

Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general merge behavior makes sense to me! My only blocking comment is to add some documentation to the help message to clarify how multiple Auspice configs are merged.

jameshadfield added a commit that referenced this pull request Apr 3, 2025
Re-validate the merged config, as discussed in review¹, and optionally
write out the merged JSON to a file.

¹ <#1756 (comment)>
@jameshadfield jameshadfield force-pushed the james/composable-auspice-config-298 branch from 23595ee to b294cbc Compare April 3, 2025 03:31
@jameshadfield
Copy link
Member Author

jameshadfield commented Apr 3, 2025

We now have new CI failures after this force-push. I think the salient parts of the CI logs are completely unrelated to this PR, implying that it's a flake8 version bump, so we should see this appearing on other PRs imminently 🫠

tests/test_flake8.py::test_flake8 ./augur/filter/include_exclude_rules.py:973:9: F824 `nonlocal replacements` is unused: name is never assigned in scope
./build/lib/augur/filter/include_exclude_rules.py:973:9: F824 `nonlocal replacements` is unused: name is never assigned in scope
FAILED
=================================== FAILURES ===================================
_________________________________ test_flake8 __________________________________
tests/test_flake8.py:9: in test_flake8
    assert result.returncode == 0, "flake8 exited with errors"
E   AssertionError: flake8 exited with errors
E   assert 1 == 0
E    +  where 1 = CompletedProcess(args=['flake8'], returncode=1).returncode
        result     = CompletedProcess(args=['flake8'], returncode=1)

I didn't see these locally with flake8 v7.1.2, but upgrading to flake8 7.2.0 (the latest available on pip) I now see

$ python3 -m pytest --no-cov tests/test_flake8.py 
======================================================================================== test session starts =========================================================================================
platform darwin -- Python 3.11.11, pytest-8.3.5, pluggy-1.5.0 -- /Users/naboo/miniconda3/envs/augur-py11/bin/python3
cachedir: .pytest_cache
rootdir: /Users/naboo/github/nextstrain/augur
configfile: pytest.ini
plugins: cov-6.0.0, anyio-4.9.0, mock-3.14.0
collected 1 item                                                                                                                                                                                     

tests/test_flake8.py::test_flake8 ./augur/filter/include_exclude_rules.py:973:9: F824 `nonlocal replacements` is unused: name is never assigned in scope
FAILED

============================================================================================== FAILURES ==============================================================================================
____________________________________________________________________________________________ test_flake8 _____________________________________________________________________________________________
tests/test_flake8.py:9: in test_flake8
    assert result.returncode == 0, "flake8 exited with errors"
E   AssertionError: flake8 exited with errors
E   assert 1 == 0
E    +  where 1 = CompletedProcess(args=['flake8'], returncode=1).returncode
        result     = CompletedProcess(args=['flake8'], returncode=1)
========================================================================================= 1 failed in 1.04s ==========================================================================================

@victorlin victorlin mentioned this pull request Apr 3, 2025
4 tasks
The behaviour of this function was identified as wanting in review¹ and
since we use it twice (within `export_v2.py`) and call `AugurError`
twice I opted to move all usages to the latter. Note that this changes
the printed message slightly, but it's now more consistent with other
code in the project.

¹ <#1756 (comment)>
The previous code used `warnings.warn` as well as `warning`, where the
latter was a wrapper around the former¹. We now rename the wrapper as
`warn` so that it's consistently used. Note that the only actual usage
of the previous `warning` function was removed in an earlier commit in
this series.

This change was suggested in both the PR description and by others in
review².

¹ The underlying warning category is unchanged (`UserWarning`) but we
now have a stacklevel of 2 not 1. In the current usage I don't observe
any differences.

² <#1756 (comment)>
This is in preparation for another util file wanting to have access
to these files which would otherwise result in a circular dependency if
they remained within `export_v2.py`. Over time we will hopefully make
use of these functions in other parts of Augur.

Note that `deprecationWarningsEmitted` is now a function and we've
updated the comment in `configure_warnings()` to reflect that it can
be used in any augur command, not just export.
This centralises auspice config parsing code and handles all
deprecations at the time the config is parsed, rather than than
scattering them through the `export_v2.py` code. This makes the export
code easier to follow and is a pre-requisite for (upcoming) config file
overlays/merging.

At least two unreported bugs were fixed:
1. 'maintainer' (deprecated) would be used instead of 'maintainers'
2. 'color_options' - any option which used a 'legendTitle' but not a
   'menuTitle' would trigger an uncaught `KeyError`

The `read_config` function (now named `read_json`) is moved out of
`utils.py` (without changes) so that the majority of config-related code
is now in one place.
The structure of (display) defaults changed between auspice configs v1 &
v2. The top-level key was changed ('defaults' → 'display_defaults') and
the internal keys were also changed (e.g. 'colorBy' → 'color_by').

The previously released code remapped both levels of keys (good!) but
also incorrectly remapped the deprecated internal key names for a
v2-like top-level 'display_defaults'. Such a structure is invalid, and
would have failed validation.

Here we restore the intended behaviour which updates the structure of
config['defaults'] at the time we move this deprecated structure to the
(now current) config['display_defaults']
There are a large number of modifications and special-casing around
'author'/'authors' and 'numdate'/'num_date'.

If author or temporal information is set on the metadata / node-data
JSONs (under "author" or "authors", "numdate" or "num_date") we'll
always export it on nodes in the tree. This means the need to specify
author[s] or numdate/num_date in "metadata columns" is a no-op and we
can simplify things by filtering them all out.

For colorings we preserve the existing behaviour of renaming "authors"
to "author".

Going forward a big simplification would be to move the command-line
parsing into the `read_auspice_config` function and generate a unified
config structure ahead of time, but for now I'm only focusing on the
config JSON.
This refactors and updates code in preparation for supporting multiple
auspice config JSONs
The merging approach implemented here is specific to the config files
and differs from other merging approaches we may use in the group.
Specifically, when lists (e.g. 'colorings') are found in multiple
configs we don't replace the first list but rather extend it, and use a
customised approach to choosing when to replace elements in the original
list with an updated version (e.g. for 'colorings' if the 'key' of an
individual coloring matches a previously seen one it will replace it in
the list).
Re-validate the merged config, as discussed in review¹, and optionally
write out the merged JSON to a file.

¹ <#1756 (comment)>
Following instructions in <#1762>
@jameshadfield jameshadfield force-pushed the james/composable-auspice-config-298 branch from b294cbc to f8caa5c Compare April 3, 2025 21:53
@jameshadfield jameshadfield merged commit 51d27d0 into master Apr 3, 2025
36 checks passed
@jameshadfield jameshadfield deleted the james/composable-auspice-config-298 branch April 3, 2025 22:22
victorlin added a commit that referenced this pull request Apr 10, 2025
@victorlin victorlin mentioned this pull request Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

implement composable auspice config file for augur export v2

6 participants