Skip to content

Fix bugs, add CLI args, resume support, and reproduction guide#9

Open
CH-chuan wants to merge 3 commits intozhu-minjun:mainfrom
CH-chuan:pr/upstream-fixes
Open

Fix bugs, add CLI args, resume support, and reproduction guide#9
CH-chuan wants to merge 3 commits intozhu-minjun:mainfrom
CH-chuan:pr/upstream-fixes

Conversation

@CH-chuan
Copy link
Copy Markdown

@CH-chuan CH-chuan commented Mar 2, 2026

Summary

  • Fix 6 bugs that prevent the code from running correctly out of the box:
    • Import path (llama_paspas), missing PAlign/__init__.py
    • Token trimming ([:-5][:-1]) in both main.py and pas.py
    • Answer parsing ([3][-1]) for Llama-3
    • rs['alpha'] storing entire result dict instead of the alpha value
    • Default batch size OOMs on ≤24GB GPUs (10 → 3)
    • Output path ./log/ never created (→ ./reproduction/)
  • Fix setup.py: broken find_packages(where='./PAlign'), add missing deps (baukit, einops, xlrd, etc.), remove unused openai, raise python_requires to >=3.10
  • Add CLI args: --num_subjects N (0=all) and --output_dir DIR
  • Add resume support: per-subject pickle checkpoints + progress JSONL, auto-skip completed subjects on restart
  • Add raw generation logging for debugging unparseable answers
  • Add REPRODUCTION_GUIDE.md: step-by-step walkthrough for reproducing Table 1
  • Add .gitignore for Python cache and generated outputs

Test plan

  • pip install . succeeds in a fresh conda environment
  • python main.py --modes PAS --model_file meta-llama/Meta-Llama-3-8B-Instruct --num_subjects 5 completes without errors
  • Results saved to reproduction/PAS_Meta-Llama-3-8B-Instruct_OOD.json with valid scores
  • Interrupting and re-running resumes from last completed subject

CH-chuan added 3 commits March 1, 2026 23:11
…encies

Bug fixes:
- Fix import: PAlign.llama_pas -> PAlign.pas (module was renamed)
- Add missing PAlign/__init__.py so the package is importable
- Fix token trimming: [:-5] -> [:-1] in main.py and pas.py (was cutting
  too many tokens from chat template output)
- Fix answer parsing: split("<|end_header_id|>")[3] -> [-1] for Llama-3
  (hardcoded index breaks when prompt structure varies)
- Fix rs['alpha'] storing entire result dict instead of the alpha value
- Reduce default batch_size from 10 to 3 (OOMs on <=24GB GPUs)
- Fix output path: ./log/ -> ./reproduction/ (log/ was never created)
- Fix argparse help strings (were placeholder text)

setup.py fixes:
- Fix find_packages(where='./PAlign') which broke `import PAlign`
- Add missing dependencies: baukit, einops, openpyxl, xlrd, etc.
- Remove unused openai dependency
- Raise python_requires to >=3.10

Also remove unused `from datasets import load_dataset` in pas.py.
New CLI arguments:
- --num_subjects N: process only the first N subjects (0=all 300)
- --output_dir DIR: configurable output directory (default: ./reproduction)

Resume support:
- Save per-subject results as pickle files in <output_dir>/subject_results/
- On restart, automatically detect and skip completed subjects
- Append per-subject progress to <output_dir>/pas_progress.jsonl

Raw generation logging:
- Log every model output to <output_dir>/raw_generations.log for debugging
- Threaded through generateAnswer() via raw_logger parameter

Also add .gitignore for Python cache files and generated outputs.
- Add REPRODUCTION_GUIDE.md with step-by-step instructions for
  reproducing Table 1 (Big Five PAS on Llama-3-8B-Instruct)
- Update readme.md with full conda+torch+pip install workflow
- Add docstring for get_activations() in pas.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant