Skip to content

Fix #116 and #129: label import bug and matplotlib backend conflict#176

Open
jbohnslav wants to merge 5 commits intomasterfrom
fix/label-import-and-matplotlib-backend
Open

Fix #116 and #129: label import bug and matplotlib backend conflict#176
jbohnslav wants to merge 5 commits intomasterfrom
fix/label-import-and-matplotlib-backend

Conversation

@jbohnslav
Copy link
Copy Markdown
Owner

Fixes

Fix #116: add_label_to_project() silently deletes first behavior column

The old code used pd.read_csv(path, index_col=0) which assumes every CSV has a DEG-style unnamed numeric index column. External CSVs without one had their first data column silently consumed as the index.

Fix: Read without index_col and explicitly detect/drop unnamed index columns.

Tests added: Three new test cases covering DEG-style CSVs, external CSVs without index, and external CSVs with background but no index.

Fix #129: matplotlib backend conflict when importing training modules

plt.switch_backend('agg') was called at module level in flow_generator/train.py, feature_extractor/train.py, and sequence/train.py. This killed interactive matplotlib for anyone importing these modules in notebooks or scripts.

Fix: Moved the call inside the training functions. Training still uses the agg backend; importing the module no longer overrides the user's backend.

Testing

All 28 tests pass (pytest -v -m 'not gpu'), including 3 new tests for #116.

James Robinson-Bohnslav added 2 commits March 31, 2026 07:57
…ernal CSVs

The old code used pd.read_csv(path, index_col=0) which assumes every
CSV has a DEG-style unnamed numeric index column. External CSVs without
one had their first data column silently consumed as the index.

Now reads without index_col and explicitly detects/drops unnamed index
columns. Added tests for DEG-style, external-no-index, and
external-with-background CSV formats.
…ing functions

The agg backend was set at import time in flow_generator/train.py,
feature_extractor/train.py, and sequence/train.py. This killed
interactive matplotlib for anyone importing these modules (e.g. in
notebooks or scripts that call plt.show()).

Moving the call inside the training functions preserves headless
behavior during training while allowing interactive use when importing.
@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Claude finished @jbohnslav's task in 1m 38s —— View job


Code Review


Both fixes are clean and address the root causes. A few observations below.

Fix #116add_label_to_project() (deepethogram/projects.py:174-178)

The logic is correct. When pandas reads a CSV with an empty header cell (DEG-generated style: ,background,behavior1,...), it names that column "Unnamed: 0", so str(first_col).startswith("Unnamed") reliably catches it.

Minor: the first_col == "" check is unreachable in practice. Pandas normalizes empty column headers to "Unnamed: X", so this branch will never be true for CSV files. It's harmless dead code, but may be confusing to future readers.

# first_col == "" will never match; pandas always produces "Unnamed: X"
if first_col == "" or str(first_col).startswith("Unnamed"):

Fix this →

Re: Codex concern about named index columnsjbohnslav's response is correct. DEG always saves with df.to_csv(label_dst) (no index_label), producing an unnamed index, so any re-read DEG file gets correctly detected and dropped. Named index columns (e.g. "frame") from other tools would not match the heuristic and would be kept as a behavior column, which is the right behavior here.

Tests are thorough. The three test cases cover all important paths:

Fix #129 — matplotlib backend (feature_extractor/train.py, flow_generator/train.py, sequence/train.py)

Correct fix. Moving plt.switch_backend("agg") from module level to the first line of the training function means importing these modules no longer clobbers the active backend (fixing notebook/interactive use), while training runs still use agg.

One thing worth being aware of: if a caller sets up a non-agg backend before calling a training function, it will be silently overridden. This is intentional (training shouldn't try to display windows), but it might be worth a comment so it's not mistakenly "fixed" in the future:

# Use non-interactive backend for training; don't move this to module level
plt.switch_backend("agg")

This is entirely a style suggestion, not a bug.

pyproject.toml

Version bump 0.3.0 → 0.4.0 is appropriate given the fixes address data-correctness and API-behavior bugs.

Overall

The PR is well-scoped, the fixes address root causes rather than symptoms, the tests clearly document the regression, and the changes are minimal. No blocking issues.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1d95860d8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread deepethogram/projects.py
James Robinson-Bohnslav and others added 3 commits March 31, 2026 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error with matplotlib backend while importing train files projects.add_label_to_project is broken

1 participant