python_ALE: First stab. by wwood · Pull Request #48 · ssolo/ALE

wwood · 2026-03-20T02:55:49Z

Opus-4.6's attempt at a straight port to Python. I haven't looked at this code at all, so there is probably errors. There are 15 tests that pass, though, fwiw.

I also added a pixi.toml so that

pixi run build

builds the C++ version from source (can rmeove the python-specific tasks of this if helpful)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a4d78f85e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T03:03:03Z

ALE_python/cli.py

+        acceptance_prob = math.exp(
+            min(0, (new_log_lk + new_log_prior) - (current_log_lk + current_log_prior))
+        ) * hastings_ratio


Clamp the full Metropolis-Hastings ratio before acceptance

ALEmcmc_undated computes exp(min(0, ΔlogP)) * hastings_ratio, but the clamp to 1 has to be applied after multiplying by the Hastings term. With the current code, any proposal that improves the posterior but has hastings_ratio < 1 is under-accepted—for example ΔlogP=5, hastings_ratio=0.1 is accepted with probability 0.1 instead of 1. That changes the chain’s stationary distribution and biases all sampled rates/reconciliations.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T03:03:03Z

ALE_python/cli.py

+            if rm >= -1:
+                print(f"# rate multiplier for rate {rate_name} on branch with ID {e} set to {rm}")
+                rate_multipliers.setdefault("rate_multiplier_" + rate_name, {})[e] = rm
+            else:
+                print(f"# rate multiplier for rate {rate_name} on branch with ID {e} to be optimized ")


Preserve optimized branch-multiplier requests in ALEml_undated

When users pass rate_multiplier:<rate>:<branch>:<value> with <value> < -1, this branch only logs "to be optimized" and then drops the request. Nothing later records that branch/rate pair for the minimize() call, so the run still optimizes only delta/tau/lambda/O_R and leaves the requested branch-specific multiplier at its default. That silently produces the wrong likelihood surface and output rates for workflows that rely on optimized branch multipliers.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T03:03:03Z

ALE_python/cli.py

+    for rm_name, rm_dict in rate_multipliers.items():
+        for e, rm_val in rm_dict.items():
+            model.vector_parameter[rm_name][e] = rm_val


Apply MCMC fixed rate multipliers after model construction

If ALEmcmc_undated is invoked with any rate_multiplier=... option, this loop indexes model.vector_parameter[...] before construct_undated() initializes the per-branch arrays. ExODTModel.__init__ starts vector_parameter empty, and construct_undated() is what creates keys like rate_multiplier_delta, so this path raises KeyError instead of running with the requested fixed multipliers.

Useful? React with 👍 / 👎.

…p test - Remove incorrect min(0, ...) clamp from MCMC acceptance probability to match C++ behavior (exp(delta_log_posterior) * hastings_ratio) - Move MCMC rate_multiplier application after construct_undated() so per-branch vectors are initialized before indexing - Warn on unsupported per-branch rate optimization (rm < -1) - Fix bip_counts test to compare by leaf name sets instead of integer IDs, which differ between Python and C++ implementations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

wwood · 2026-03-20T06:43:12Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 690d721958

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T06:55:32Z

ALE_python/cli.py

+            model.MLRec_events.clear()
+            model.reset_T_to_from()


Preserve MCMC transfer frequencies across posterior samples

When ALEmcmc_undated runs with sample>1, this reset clears the per-sample transfer matrix before each draw, but unlike src/ALEmcmc_undated.cpp:458-466 the Python port never copies those counts into a running aggregate before _mcmc.uTs is written from model.T_to_from at ALE_python/cli.py:742-756. The transfer file therefore reflects only the last sampled reconciliation (and then divides by all samples), so posterior transfer frequencies are severely understated or disappear entirely.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T06:55:32Z

ALE_python/cli.py

+                f"Total \t{model.MLRec_events.get('D', 0) / samples}\t"
+                f"{model.MLRec_events.get('T', 0) / samples}\t"
+                f"{model.MLRec_events.get('L', 0) / samples}\t"
+                f"{model.MLRec_events.get('S', 0) / samples}\n"


Report ALEevaluate totals from all sampled reconciliations

This Total row is computed from model.MLRec_events, but that dict is cleared before every sample at ALE_python/cli.py:993-995. For outputFiles=y with sample>1, the .uml_rec summary therefore reports only the final sampled reconciliation divided by samples, instead of the mean over all sampled reconciliations, so the duplication/transfer/loss/speciation counts in the output file are wrong.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T06:55:32Z

ALE_python/cli.py

+        s_tree_str = model.string_parameter.get("S_with_ranks", model.string_parameter.get("S_un", Sstring))
+        fout.write(f"S:\t{s_tree_str}\n")


Emit the ranked species tree in reconciliation outputs

The writers here fall back to S_un because construct_undated() never populates model.string_parameter['S_with_ranks'] like the C++ implementation does (src/undated.cpp:162-185). As a result, ALEml/ALEmcmc/ALEevaluate outputs write the raw input species tree instead of the branch-ID/rank annotated tree, so sampled reconciliations that refer to internal branch numbers cannot be mapped back to species-tree branches, and output_species_tree=y is similarly uninformative.

Useful? React with 👍 / 👎.

When rate_multiplier:<rate>:<branch>:<value> is passed with value < -1, the branch multiplier is now included as an extra parameter in the Nelder-Mead optimizer alongside delta/tau/lambda, matching the C++ behavior. Optimized multipliers are printed after convergence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Accumulate T_to_from counts across MCMC samples into a running aggregate instead of only writing the last sample's transfers - Accumulate MLRec_events across samples in ALEevaluate so the Total row reports mean counts, not just the last sample divided by N - Generate S_with_ranks in construct_undated() so reconciliation outputs include the branch-ID annotated species tree Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

wwood · 2026-03-20T07:37:35Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 28b6755324

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T07:47:34Z

ALE_python/cli.py

+    print("Reconciliation model initialised, starting DTL rate optimisation..")
+
+    # Optimize if not all rates are fixed
+    if not (delta_fixed and tau_fixed and lambda_fixed and not MLOR):


Include branch-multiplier requests in the optimization gate

If the caller fixes delta, tau, and lambda but asks to optimize only a per-branch multiplier via rate_multiplier=...:-2, this condition evaluates false and skips minimize() even though ml_branch_multipliers is non-empty. The later for s in ml_rm_strings path then hits an UnboundLocalError, and even without that crash the requested branch multiplier would never be optimized.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T07:47:34Z

ALE_python/exodt.py

+
+                # D event
+                uq_resum += self.PD[e] * (self.uq[gp_i][e] * self.uq[gpp_i][e] * 2) * pp + EPSILON
+                if r * uq_sum < uq_resum or no_T:


Remove the unconditional duplication shortcut in no-transfer sampling

When tau=0 or sample_undated(..., no_T=True) is used, or no_T makes the first duplication branch win unconditionally once the code reaches this point. That skips the remaining duplication choices and the later SL/DL cases, so DL-only runs produce systematically distorted sampled reconciliations and event counts in .uml_rec.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T07:47:34Z

ALE_python/ale.py

+        if self.alpha > 0:
+            return (Bip_count + self.alpha / self.N_Gamma
+                    * self.Bi(self.set_sizes[g_id])) / (


Handle unseen bipartitions before applying alpha smoothing

With alpha > 0, any call that scores an unobserved split (for example p_bip_by_set() on a clade absent from the sample) reaches self.set_sizes[g_id] even though the code just assigned g_id = -10 for that case. The result is a KeyError instead of the intended smoothed prior probability, which makes alpha-corrected scoring unusable on unseen bipartitions.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T07:47:34Z

ALE_python/ale.py

+                    if (g_id >= len(self.Dip_counts)
+                            or parts not in self.Dip_counts.get(g_id, {})


Stop calling dict methods on the Dip_counts list

If alpha/beta smoothing makes an unsampled partition selectable in random_tree(), this membership check calls .get() on self.Dip_counts, but Dip_counts is a list. The first such draw raises AttributeError, so corrected random sampling crashes as soon as it needs to pick an unobserved split.

Useful? React with 👍 / 👎.

…ts access - Include ml_branch_multipliers in the optimization gate so per-branch multiplier optimization runs even when global rates are fixed - Remove 'or no_T' shortcut from D event sampling that biased partition selection when transfers are disabled - Use .get() for set_sizes in alpha-smoothed p_bip to avoid KeyError on unseen bipartitions - Fix Dip_counts access in random_tree to use index bounds check instead of .get() on a list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Guard prior_origination against zero/negative values in expovariate and _compute_exponential_log_probability to avoid ZeroDivisionError and ValueError - Clamp MH acceptance exponent at 700 to prevent OverflowError on large likelihood improvements (auto-accepts as intended) - Mark optimizer-dependent tests with @pytest.mark.slow; add 'pixi run test-fast' task that skips them (40s vs 8min) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

wwood · 2026-03-20T08:38:09Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd641f7ae9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T08:46:10Z

ALE_python/cli.py

+import random
+import sys
+
+import numpy as np


Avoid importing NumPy at CLI module import time

Because ALE_python/__main__.py imports cli.main, this top-level import makes every CLI entry point fail with ModuleNotFoundError in a plain Python environment—even ALEobserve, ALEcount, or just python -m ALE_python --help, which otherwise only need the standard library. In this patch pyproject.toml does not declare NumPy/Scipy, so users outside the Pixi environment will hit this immediately.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T08:46:10Z

ALE_python/ale_util.py

+                if "(" in line:
+                    tree_i += 1
+                    if tree_i > burnin and tree_i % every == 0:
+                        trees.append(line)


Handle singleton A; trees when observing ALE files

The C++ observe_ALE_from_file has a fallback for lines like A; and produces a one-leaf ALE object, because singleton gene families are valid inputs. Here we only keep lines containing '(', so a file containing a single-leaf tree falls through to ValueError("No trees found..."). That means ALEobserve cannot preprocess singleton families at all.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T08:46:10Z

pixi.toml

+configure = "cmake -S . -B build"
+build = { cmd = "cmake --build build -j4", depends-on = ["configure"] }
+python_ale = "python -m ALE_python"
+test = "pytest ALE_python/tests -v"


Make the Pixi test task build the C++ binaries first

ALE_python/tests/conftest.py points cpp_bin at build/bin, and multiple tests spawn ALEobserve/ALEml_undated from there. The new test task runs pytest directly, so on a fresh checkout pixi run test fails with FileNotFoundError unless the user already ran pixi run build by hand. Adding a build dependency (or redirecting this task to test-fast) would make the advertised workflow work out of the box.

Useful? React with 👍 / 👎.

- Move numpy import from module-level into ALEml_undated so CLI commands that don't need it (ALEobserve, ALEcount, etc.) work without numpy/scipy installed - Accept singleton trees (e.g. 'A;') in observe_ALE_from_file by matching lines with ';' not just '(' - Make 'pixi run test' depend on build so C++ binaries are available on fresh checkouts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

wwood · 2026-03-20T09:19:09Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a850dacbfe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T09:26:58Z

ALE_python/ale_util.py

+                if "(" in line or ";" in line:
+                    tree_i += 1
+                    if tree_i > burnin and tree_i % every == 0:
+                        trees.append(line)


Handle name; single-leaf inputs explicitly

observe_ALE_from_file() now treats any line containing ; as a normal Newick tree and appends it unchanged. The C++ implementation in src/ALE_util.cpp:32-43 special-cases that form, strips the delimiter, and builds a one-leaf ALE directly. With the Python port, a sample file containing A; produces a leaf named A; instead of A, so single-tip families no longer round-trip correctly and later species-name matching can fail.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T09:26:59Z

ALE_python/cli.py

+                tree_i += 1
+                if tree_i > burnin and tree_i % every == 0:
+                    trees.append(line)


Preserve ALEadd's original every= subsampling order

ALEadd increments tree_i before checking every, which changes which trees are merged whenever every > 1. In src/ALEadd.cpp:56-61 the filter is evaluated before the increment, so every=2 burnin=0 keeps trees 1,3,5,...; this port keeps 2,4,6,... instead. That silently changes the clade counts added to an existing .ale, so downstream posterior estimates diverge for any workflow that relies on subsampling.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T09:26:59Z

ALE_python/cli.py

+        bounds = [(1e-10, 10.0)] * (len(x0) - len(ml_branch_multipliers))
+        bounds += [(1e-7, 10000.0)] * len(ml_branch_multipliers)


Restore the original optimizer bounds for rates and O_R

These bounds cap every optimized scalar at 10.0, but the C++ optimizer allows D/T/L rates up to 100 (src/ALEml_undated.cpp:34-37) and O_R up to 1000 (src/ALEml_undated.cpp:64-67). Any family whose likelihood keeps improving above 10 will now stick on this artificial boundary and report incorrect ML rates and log-likelihoods; MLOR runs are affected most because their search space is reduced by two orders of magnitude.

Useful? React with 👍 / 👎.

…ounds - Strip ';' from singleton tree lines so leaf name is clean (e.g. 'A' not 'A;'), matching C++ ALE_util.cpp behavior - Fix ALEadd to check burnin/every before incrementing tree_i, matching C++ ALEadd.cpp subsampling order (>=burnin, then increment) - Set optimizer bounds to match C++: rates (1e-10, 100), O_R (1e-10, 1000), branch multipliers (1e-7, 10000); use 1e-6 lower bound for rates when no_T is set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

wwood · 2026-03-20T09:47:37Z

@codex review

chatgpt-codex-connector · 2026-03-20T09:47:45Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

- Validate singleton tree lines by tokenizing on C++ delimiters (comma, colon, space) and taking only the first field, rejecting lines like 'not a tree;' that aren't valid single-leaf Newick - Add build dependency to test-fast so C++ binaries are available on fresh checkouts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ately Match C++ behavior where a singleton line (e.g. 'A;') immediately builds a one-leaf ApproxPosterior via a minimal .ale state file and returns, rather than mixing singletons into the normal tree list. This prevents silent data loss with multiple singletons or KeyErrors when mixing singletons with normal trees. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…arser Add argparse with subparsers for all 8 commands, providing --help at both top-level and per-command. Each subparser documents positional args, key=value options, defaults, and usage examples. The original bespoke key=value parsing is preserved for actual execution — argparse is only used for help/version display and subcommand dispatch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ivision Add _normalize_argv() preprocessor that converts --key value long options to key=value form before dispatching to the bespoke parsers. Update all subcommand epilogs to show both styles. Guard against ZeroDivisionError when sample=0 in event count output across all commands. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace all bespoke one-line usage strings with the full argparse help for each subcommand, so users see descriptions, options, and examples. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

python_ALE: First stab.

8a4d78f

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

wwood and others added 2 commits March 20, 2026 17:20

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

wwood and others added 2 commits March 20, 2026 17:59

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

wwood and others added 7 commits March 20, 2026 19:53

Add tests for singleton tree observation and mixed singleton/tree files

6ff1f0e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Print argparse help when invoked with no arguments

ae29931

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Show argparse help when subcommands are invoked with no arguments

b2a76e2

Replace all bespoke one-line usage strings with the full argparse help for each subcommand, so users see descriptions, options, and examples. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

		s_tree_str = model.string_parameter.get("S_with_ranks", model.string_parameter.get("S_un", Sstring))
		fout.write(f"S:\t{s_tree_str}\n")

		if (g_id >= len(self.Dip_counts)
		or parts not in self.Dip_counts.get(g_id, {})

		bounds = [(1e-10, 10.0)] * (len(x0) - len(ml_branch_multipliers))
		bounds += [(1e-7, 10000.0)] * len(ml_branch_multipliers)

Conversation

wwood commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

wwood commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

wwood commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

wwood commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

wwood commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

wwood commented Mar 20, 2026