-
Notifications
You must be signed in to change notification settings - Fork 1
minor update to the repo to ensure running in all machines #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
liamhebert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor comments on root_dir, which when resolved are good to submit :)
liamhebert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to review EmbeddingGenerator, more soon.
| """Flatten a discussion tree into tensors for model input.""" | ||
| dut.compute_relative_distance(tree) | ||
|
|
||
| flat = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are missing a few parameters here. Namely, is_root and distances. These should be already attributes of node (I think).
Lines 860 to 867 in 5f50b4a
| if is_root: | |
| parent_id = node["id"] | |
| if node["id"] not in result["id"]: | |
| node["images"] = node["images"][0] if node["images"] else None | |
| result["images"].append(node["images"]) | |
| result["distances"].append(node["distances"]) |
| "out_degree": out_degree, | ||
| "attn_bias": torch.zeros((n, n)), | ||
| "distance": torch.zeros((n, n, 2)), | ||
| "distance_index": torch.zeros((n, n), dtype=torch.int16), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't feel right... can you double check whether there should be a value here?
| "input_ids": tokenized_text["input_ids"], | ||
| "attention_mask": tokenized_text["attention_mask"], | ||
| "token_type_ids": tokenized_text.get( | ||
| "token_type_ids", torch.zeros_like(tokenized_text["input_ids"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise, we should be careful here
Replaced manual path setup with rootutils for better management.
Removed comments about batch size command validation.
Optional Changes on the side. YAML file in the configs for giga pretrain needs to have match to your directory in you local or cluster machine