Fix bugs, add CLI args, resume support, and reproduction guide#9
Open
CH-chuan wants to merge 3 commits intozhu-minjun:mainfrom
Open
Fix bugs, add CLI args, resume support, and reproduction guide#9CH-chuan wants to merge 3 commits intozhu-minjun:mainfrom
CH-chuan wants to merge 3 commits intozhu-minjun:mainfrom
Conversation
…encies
Bug fixes:
- Fix import: PAlign.llama_pas -> PAlign.pas (module was renamed)
- Add missing PAlign/__init__.py so the package is importable
- Fix token trimming: [:-5] -> [:-1] in main.py and pas.py (was cutting
too many tokens from chat template output)
- Fix answer parsing: split("<|end_header_id|>")[3] -> [-1] for Llama-3
(hardcoded index breaks when prompt structure varies)
- Fix rs['alpha'] storing entire result dict instead of the alpha value
- Reduce default batch_size from 10 to 3 (OOMs on <=24GB GPUs)
- Fix output path: ./log/ -> ./reproduction/ (log/ was never created)
- Fix argparse help strings (were placeholder text)
setup.py fixes:
- Fix find_packages(where='./PAlign') which broke `import PAlign`
- Add missing dependencies: baukit, einops, openpyxl, xlrd, etc.
- Remove unused openai dependency
- Raise python_requires to >=3.10
Also remove unused `from datasets import load_dataset` in pas.py.
New CLI arguments: - --num_subjects N: process only the first N subjects (0=all 300) - --output_dir DIR: configurable output directory (default: ./reproduction) Resume support: - Save per-subject results as pickle files in <output_dir>/subject_results/ - On restart, automatically detect and skip completed subjects - Append per-subject progress to <output_dir>/pas_progress.jsonl Raw generation logging: - Log every model output to <output_dir>/raw_generations.log for debugging - Threaded through generateAnswer() via raw_logger parameter Also add .gitignore for Python cache files and generated outputs.
- Add REPRODUCTION_GUIDE.md with step-by-step instructions for reproducing Table 1 (Big Five PAS on Llama-3-8B-Instruct) - Update readme.md with full conda+torch+pip install workflow - Add docstring for get_activations() in pas.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
llama_pas→pas), missingPAlign/__init__.py[:-5]→[:-1]) in bothmain.pyandpas.py[3]→[-1]) for Llama-3rs['alpha']storing entire result dict instead of the alpha value./log/never created (→./reproduction/)setup.py: brokenfind_packages(where='./PAlign'), add missing deps (baukit,einops,xlrd, etc.), remove unusedopenai, raisepython_requiresto>=3.10--num_subjects N(0=all) and--output_dir DIRREPRODUCTION_GUIDE.md: step-by-step walkthrough for reproducing Table 1.gitignorefor Python cache and generated outputsTest plan
pip install .succeeds in a fresh conda environmentpython main.py --modes PAS --model_file meta-llama/Meta-Llama-3-8B-Instruct --num_subjects 5completes without errorsreproduction/PAS_Meta-Llama-3-8B-Instruct_OOD.jsonwith valid scores