Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8a4d78f85e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ALE_python/cli.py
Outdated
| acceptance_prob = math.exp( | ||
| min(0, (new_log_lk + new_log_prior) - (current_log_lk + current_log_prior)) | ||
| ) * hastings_ratio |
There was a problem hiding this comment.
Clamp the full Metropolis-Hastings ratio before acceptance
ALEmcmc_undated computes exp(min(0, ΔlogP)) * hastings_ratio, but the clamp to 1 has to be applied after multiplying by the Hastings term. With the current code, any proposal that improves the posterior but has hastings_ratio < 1 is under-accepted—for example ΔlogP=5, hastings_ratio=0.1 is accepted with probability 0.1 instead of 1. That changes the chain’s stationary distribution and biases all sampled rates/reconciliations.
Useful? React with 👍 / 👎.
ALE_python/cli.py
Outdated
| if rm >= -1: | ||
| print(f"# rate multiplier for rate {rate_name} on branch with ID {e} set to {rm}") | ||
| rate_multipliers.setdefault("rate_multiplier_" + rate_name, {})[e] = rm | ||
| else: | ||
| print(f"# rate multiplier for rate {rate_name} on branch with ID {e} to be optimized ") |
There was a problem hiding this comment.
Preserve optimized branch-multiplier requests in ALEml_undated
When users pass rate_multiplier:<rate>:<branch>:<value> with <value> < -1, this branch only logs "to be optimized" and then drops the request. Nothing later records that branch/rate pair for the minimize() call, so the run still optimizes only delta/tau/lambda/O_R and leaves the requested branch-specific multiplier at its default. That silently produces the wrong likelihood surface and output rates for workflows that rely on optimized branch multipliers.
Useful? React with 👍 / 👎.
| for rm_name, rm_dict in rate_multipliers.items(): | ||
| for e, rm_val in rm_dict.items(): | ||
| model.vector_parameter[rm_name][e] = rm_val |
There was a problem hiding this comment.
Apply MCMC fixed rate multipliers after model construction
If ALEmcmc_undated is invoked with any rate_multiplier=... option, this loop indexes model.vector_parameter[...] before construct_undated() initializes the per-branch arrays. ExODTModel.__init__ starts vector_parameter empty, and construct_undated() is what creates keys like rate_multiplier_delta, so this path raises KeyError instead of running with the requested fixed multipliers.
Useful? React with 👍 / 👎.
…p test - Remove incorrect min(0, ...) clamp from MCMC acceptance probability to match C++ behavior (exp(delta_log_posterior) * hastings_ratio) - Move MCMC rate_multiplier application after construct_undated() so per-branch vectors are initialized before indexing - Warn on unsupported per-branch rate optimization (rm < -1) - Fix bip_counts test to compare by leaf name sets instead of integer IDs, which differ between Python and C++ implementations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 690d721958
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| model.MLRec_events.clear() | ||
| model.reset_T_to_from() |
There was a problem hiding this comment.
Preserve MCMC transfer frequencies across posterior samples
When ALEmcmc_undated runs with sample>1, this reset clears the per-sample transfer matrix before each draw, but unlike src/ALEmcmc_undated.cpp:458-466 the Python port never copies those counts into a running aggregate before _mcmc.uTs is written from model.T_to_from at ALE_python/cli.py:742-756. The transfer file therefore reflects only the last sampled reconciliation (and then divides by all samples), so posterior transfer frequencies are severely understated or disappear entirely.
Useful? React with 👍 / 👎.
ALE_python/cli.py
Outdated
| f"Total \t{model.MLRec_events.get('D', 0) / samples}\t" | ||
| f"{model.MLRec_events.get('T', 0) / samples}\t" | ||
| f"{model.MLRec_events.get('L', 0) / samples}\t" | ||
| f"{model.MLRec_events.get('S', 0) / samples}\n" |
There was a problem hiding this comment.
Report ALEevaluate totals from all sampled reconciliations
This Total row is computed from model.MLRec_events, but that dict is cleared before every sample at ALE_python/cli.py:993-995. For outputFiles=y with sample>1, the .uml_rec summary therefore reports only the final sampled reconciliation divided by samples, instead of the mean over all sampled reconciliations, so the duplication/transfer/loss/speciation counts in the output file are wrong.
Useful? React with 👍 / 👎.
| s_tree_str = model.string_parameter.get("S_with_ranks", model.string_parameter.get("S_un", Sstring)) | ||
| fout.write(f"S:\t{s_tree_str}\n") |
There was a problem hiding this comment.
Emit the ranked species tree in reconciliation outputs
The writers here fall back to S_un because construct_undated() never populates model.string_parameter['S_with_ranks'] like the C++ implementation does (src/undated.cpp:162-185). As a result, ALEml/ALEmcmc/ALEevaluate outputs write the raw input species tree instead of the branch-ID/rank annotated tree, so sampled reconciliations that refer to internal branch numbers cannot be mapped back to species-tree branches, and output_species_tree=y is similarly uninformative.
Useful? React with 👍 / 👎.
When rate_multiplier:<rate>:<branch>:<value> is passed with value < -1, the branch multiplier is now included as an extra parameter in the Nelder-Mead optimizer alongside delta/tau/lambda, matching the C++ behavior. Optimized multipliers are printed after convergence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Accumulate T_to_from counts across MCMC samples into a running aggregate instead of only writing the last sample's transfers - Accumulate MLRec_events across samples in ALEevaluate so the Total row reports mean counts, not just the last sample divided by N - Generate S_with_ranks in construct_undated() so reconciliation outputs include the branch-ID annotated species tree Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 28b6755324
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ALE_python/cli.py
Outdated
| print("Reconciliation model initialised, starting DTL rate optimisation..") | ||
|
|
||
| # Optimize if not all rates are fixed | ||
| if not (delta_fixed and tau_fixed and lambda_fixed and not MLOR): |
There was a problem hiding this comment.
Include branch-multiplier requests in the optimization gate
If the caller fixes delta, tau, and lambda but asks to optimize only a per-branch multiplier via rate_multiplier=...:-2, this condition evaluates false and skips minimize() even though ml_branch_multipliers is non-empty. The later for s in ml_rm_strings path then hits an UnboundLocalError, and even without that crash the requested branch multiplier would never be optimized.
Useful? React with 👍 / 👎.
ALE_python/exodt.py
Outdated
|
|
||
| # D event | ||
| uq_resum += self.PD[e] * (self.uq[gp_i][e] * self.uq[gpp_i][e] * 2) * pp + EPSILON | ||
| if r * uq_sum < uq_resum or no_T: |
There was a problem hiding this comment.
Remove the unconditional duplication shortcut in no-transfer sampling
When tau=0 or sample_undated(..., no_T=True) is used, or no_T makes the first duplication branch win unconditionally once the code reaches this point. That skips the remaining duplication choices and the later SL/DL cases, so DL-only runs produce systematically distorted sampled reconciliations and event counts in .uml_rec.
Useful? React with 👍 / 👎.
ALE_python/ale.py
Outdated
| if self.alpha > 0: | ||
| return (Bip_count + self.alpha / self.N_Gamma | ||
| * self.Bi(self.set_sizes[g_id])) / ( |
There was a problem hiding this comment.
Handle unseen bipartitions before applying alpha smoothing
With alpha > 0, any call that scores an unobserved split (for example p_bip_by_set() on a clade absent from the sample) reaches self.set_sizes[g_id] even though the code just assigned g_id = -10 for that case. The result is a KeyError instead of the intended smoothed prior probability, which makes alpha-corrected scoring unusable on unseen bipartitions.
Useful? React with 👍 / 👎.
ALE_python/ale.py
Outdated
| if (g_id >= len(self.Dip_counts) | ||
| or parts not in self.Dip_counts.get(g_id, {}) |
There was a problem hiding this comment.
Stop calling dict methods on the Dip_counts list
If alpha/beta smoothing makes an unsampled partition selectable in random_tree(), this membership check calls .get() on self.Dip_counts, but Dip_counts is a list. The first such draw raises AttributeError, so corrected random sampling crashes as soon as it needs to pick an unobserved split.
Useful? React with 👍 / 👎.
…ts access - Include ml_branch_multipliers in the optimization gate so per-branch multiplier optimization runs even when global rates are fixed - Remove 'or no_T' shortcut from D event sampling that biased partition selection when transfers are disabled - Use .get() for set_sizes in alpha-smoothed p_bip to avoid KeyError on unseen bipartitions - Fix Dip_counts access in random_tree to use index bounds check instead of .get() on a list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Guard prior_origination against zero/negative values in expovariate and _compute_exponential_log_probability to avoid ZeroDivisionError and ValueError - Clamp MH acceptance exponent at 700 to prevent OverflowError on large likelihood improvements (auto-accepts as intended) - Mark optimizer-dependent tests with @pytest.mark.slow; add 'pixi run test-fast' task that skips them (40s vs 8min) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cd641f7ae9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ALE_python/cli.py
Outdated
| import random | ||
| import sys | ||
|
|
||
| import numpy as np |
There was a problem hiding this comment.
Avoid importing NumPy at CLI module import time
Because ALE_python/__main__.py imports cli.main, this top-level import makes every CLI entry point fail with ModuleNotFoundError in a plain Python environment—even ALEobserve, ALEcount, or just python -m ALE_python --help, which otherwise only need the standard library. In this patch pyproject.toml does not declare NumPy/Scipy, so users outside the Pixi environment will hit this immediately.
Useful? React with 👍 / 👎.
| if "(" in line: | ||
| tree_i += 1 | ||
| if tree_i > burnin and tree_i % every == 0: | ||
| trees.append(line) |
There was a problem hiding this comment.
Handle singleton
A; trees when observing ALE files
The C++ observe_ALE_from_file has a fallback for lines like A; and produces a one-leaf ALE object, because singleton gene families are valid inputs. Here we only keep lines containing '(', so a file containing a single-leaf tree falls through to ValueError("No trees found..."). That means ALEobserve cannot preprocess singleton families at all.
Useful? React with 👍 / 👎.
pixi.toml
Outdated
| configure = "cmake -S . -B build" | ||
| build = { cmd = "cmake --build build -j4", depends-on = ["configure"] } | ||
| python_ale = "python -m ALE_python" | ||
| test = "pytest ALE_python/tests -v" |
There was a problem hiding this comment.
Make the Pixi test task build the C++ binaries first
ALE_python/tests/conftest.py points cpp_bin at build/bin, and multiple tests spawn ALEobserve/ALEml_undated from there. The new test task runs pytest directly, so on a fresh checkout pixi run test fails with FileNotFoundError unless the user already ran pixi run build by hand. Adding a build dependency (or redirecting this task to test-fast) would make the advertised workflow work out of the box.
Useful? React with 👍 / 👎.
- Move numpy import from module-level into ALEml_undated so CLI
commands that don't need it (ALEobserve, ALEcount, etc.) work
without numpy/scipy installed
- Accept singleton trees (e.g. 'A;') in observe_ALE_from_file by
matching lines with ';' not just '('
- Make 'pixi run test' depend on build so C++ binaries are available
on fresh checkouts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a850dacbfe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ALE_python/ale_util.py
Outdated
| if "(" in line or ";" in line: | ||
| tree_i += 1 | ||
| if tree_i > burnin and tree_i % every == 0: | ||
| trees.append(line) |
There was a problem hiding this comment.
Handle
name; single-leaf inputs explicitly
observe_ALE_from_file() now treats any line containing ; as a normal Newick tree and appends it unchanged. The C++ implementation in src/ALE_util.cpp:32-43 special-cases that form, strips the delimiter, and builds a one-leaf ALE directly. With the Python port, a sample file containing A; produces a leaf named A; instead of A, so single-tip families no longer round-trip correctly and later species-name matching can fail.
Useful? React with 👍 / 👎.
ALE_python/cli.py
Outdated
| tree_i += 1 | ||
| if tree_i > burnin and tree_i % every == 0: | ||
| trees.append(line) |
There was a problem hiding this comment.
Preserve ALEadd's original
every= subsampling order
ALEadd increments tree_i before checking every, which changes which trees are merged whenever every > 1. In src/ALEadd.cpp:56-61 the filter is evaluated before the increment, so every=2 burnin=0 keeps trees 1,3,5,...; this port keeps 2,4,6,... instead. That silently changes the clade counts added to an existing .ale, so downstream posterior estimates diverge for any workflow that relies on subsampling.
Useful? React with 👍 / 👎.
ALE_python/cli.py
Outdated
| bounds = [(1e-10, 10.0)] * (len(x0) - len(ml_branch_multipliers)) | ||
| bounds += [(1e-7, 10000.0)] * len(ml_branch_multipliers) |
There was a problem hiding this comment.
Restore the original optimizer bounds for rates and O_R
These bounds cap every optimized scalar at 10.0, but the C++ optimizer allows D/T/L rates up to 100 (src/ALEml_undated.cpp:34-37) and O_R up to 1000 (src/ALEml_undated.cpp:64-67). Any family whose likelihood keeps improving above 10 will now stick on this artificial boundary and report incorrect ML rates and log-likelihoods; MLOR runs are affected most because their search space is reduced by two orders of magnitude.
Useful? React with 👍 / 👎.
…ounds - Strip ';' from singleton tree lines so leaf name is clean (e.g. 'A' not 'A;'), matching C++ ALE_util.cpp behavior - Fix ALEadd to check burnin/every before incrementing tree_i, matching C++ ALEadd.cpp subsampling order (>=burnin, then increment) - Set optimizer bounds to match C++: rates (1e-10, 100), O_R (1e-10, 1000), branch multipliers (1e-7, 10000); use 1e-6 lower bound for rates when no_T is set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
- Validate singleton tree lines by tokenizing on C++ delimiters (comma, colon, space) and taking only the first field, rejecting lines like 'not a tree;' that aren't valid single-leaf Newick - Add build dependency to test-fast so C++ binaries are available on fresh checkouts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ately Match C++ behavior where a singleton line (e.g. 'A;') immediately builds a one-leaf ApproxPosterior via a minimal .ale state file and returns, rather than mixing singletons into the normal tree list. This prevents silent data loss with multiple singletons or KeyErrors when mixing singletons with normal trees. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arser Add argparse with subparsers for all 8 commands, providing --help at both top-level and per-command. Each subparser documents positional args, key=value options, defaults, and usage examples. The original bespoke key=value parsing is preserved for actual execution — argparse is only used for help/version display and subcommand dispatch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ivision Add _normalize_argv() preprocessor that converts --key value long options to key=value form before dispatching to the bespoke parsers. Update all subcommand epilogs to show both styles. Guard against ZeroDivisionError when sample=0 in event count output across all commands. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace all bespoke one-line usage strings with the full argparse help for each subcommand, so users see descriptions, options, and examples. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Opus-4.6's attempt at a straight port to Python. I haven't looked at this code at all, so there is probably errors. There are 15 tests that pass, though, fwiw.
I also added a pixi.toml so that
builds the C++ version from source (can rmeove the python-specific tasks of this if helpful)