random projection in influence functions#231
random projection in influence functions#231haochend413 wants to merge 13 commits intoTRAIS-Lab:mainfrom
Conversation
|
I ran the IF attributors on the influence_functions_lds example after adding random projection. Explicit on mnist_lr: ? -> 0.4851 |
| Returns: | ||
| torch.Tensor: Transformed train representations with projected dimension. | ||
| """ | ||
| from dattri.func.projection import random_project |
There was a problem hiding this comment.
|
Could we check the attribute function call to make sure that the projection has some effect on larger models such as Code snippets |
Hi! Would you mind sharing your code / repo so that I can try running your script locally? |
|
Yes, I'm using the LiSSA Attributor, with the CIFAR10 datasets and mobilev2 model. Here is the link to a running file. You can use these checkpoints and datasets attached. Let me know if you need anything else. Thank you! checkpoints and datasets: |
Thank you for the information! Though I don't think projection is available for LiSSA since hvp for LiSSA is calculated using the torch vjp directly? |
TheaperDeng
left a comment
There was a problem hiding this comment.
Good job. I think we also need unit test to make sure the change works for all three attributors.
| self, | ||
| task: AttributionTask, | ||
| layer_name: Optional[Union[str, List[str]]] = None, | ||
| projector_kwargs: Optional[Dict[str, Any]] = None, |
There was a problem hiding this comment.
projector_kwargs may not be a very good API for external users since they may need to check many document or even unit tests to learn the structure. We may flatten some key parameters (like proj_dim and proj_seed) here in the init
There was a problem hiding this comment.
same for projector_kwargs in other attributors.
There was a problem hiding this comment.
Proposing either using a pydantic class to wrap around the kwargs to keep it transparent or flattening works too.
There was a problem hiding this comment.
Something like this, so that when user wants to configure their projector arguments, it'll have autocomplete:
class BaseProjectorConfig(BaseModel):
config_1: str
config_2: int
class IFAttributorCGProjectorConfig(BaseProjectorConfig):
config_3: str
class IFAttributorLiSSA(BaseProjectorConfig):
config_4: float
# IFAttributorCG will have config_1, config_2 and config_3.
Easier to maintain as well
| sample_features, | ||
| feature_batch_size=1, | ||
| **self.projector_kwargs, | ||
| ) |
There was a problem hiding this comment.
There is no need to change this code block, but I think the random_project may need some change to avoid such kind of cumbersome projector creation. TODO: make this an issue
| blksz_out, | ||
| feature_batch_size=1, | ||
| **self.projector_kwargs, | ||
| ) |
There was a problem hiding this comment.
Again, the cumbersome happens many times. @haochend413 Do you have any suggestion to the API design of the random_project?
There was a problem hiding this comment.
I think we can definitely improve attributors' APIs, flattening the projection related inputs for straight-forwardness and adding default values. Since we're currently using null pointer to indicate projection usage, I think we probably should also add a use_projection_or_not bool indicator.
I'm not entirely sure which part of random_project API is potentially redundant? It's using a sample feature to infer sizes since it supports per-layer projection where sizes can vary between layers, but everything else looks fine to me?
def random_project(
feature: Union[Dict[str, Tensor], Tensor],
feature_batch_size: int,
proj_dim: int,
proj_max_batch_size: int,
proj_seed: int = 0,
proj_type: str = "normal",
*,
device: Union[str, torch.device] = "cpu",
) -> Callable:| input_projectors (Optional[Dict[str, Callable]]): A dict of projector functions | ||
| for projecting input activations. Keys are layer names. | ||
| output_projectors (Optional[Dict[str, Callable]]): A dict of projector functions | ||
| for projecting output gradients. Keys are layer names. |
There was a problem hiding this comment.
Is it possible that we have only one of input_projectors and output_projectors? If not, we may add a check here to report the error.
| A function that takes a tuple of Tensor `x` and a vector `v` and returns | ||
| the IHVP of the Hessian of `func` and `v`. | ||
| """ | ||
| from dattri.func.projection import random_project |
There was a problem hiding this comment.
Is it possible to put this import to the header of the file?
There was a problem hiding this comment.
It will cause circular import since projection module is using hvp_at_x function. I think by original design hessian module is at lower level ? Maybe a solution is to define a higher level module combining projection and hessian, instead of adding projection to hessian directly.
| projector = random_project( | ||
| sample_features, | ||
| 1, | ||
| **projector_kwargs, |
There was a problem hiding this comment.
Should we control the seed of the projection here? If not, what will happen if we have different projection matrix among H and g?
There was a problem hiding this comment.
This is a required setting for random projection, you may set (normally to 32) when you give the projection_kwargs to the attributor. We may set some default value for this key as well.
There was a problem hiding this comment.
I looked into trak's projection_kwargs handling. It allows partially defined projection_kwargs by defining a static default param dict and uses update() method to merge new params provided by the user. We can do it this way if we want to keep projection_kwargs. However for simplicity I think it's better to flatten the attributor inputs.
# TRAKAttributor(...)
DEFAULT_PROJECTOR_KWARGS = {
"proj_dim": 512,
"proj_max_batch_size": 32,
"proj_seed": 0,
"device": "cpu",
}
# ...
self.projector_kwargs = DEFAULT_PROJECTOR_KWARGS
if projector_kwargs is not None:
self.projector_kwargs.update(projector_kwargs)
# Usage in unit tests
projector_kwargs = {
"device": "cpu",
}
attributor = TRAKAttributor(
task=task,
correct_probability_func=m,
device=torch.device("cpu"),
projector_kwargs=projector_kwargs,
)There was a problem hiding this comment.
I rechecked the code. For one single attribution, the seed used in all projectors should be the same, which is specified by the user in projector_kwargs. So I think the seeds used are fixed.
|
BTW, I wonder what may happen if we directly use |
I can try this to see how the numbers look like for this specific model. Should probably experiment on other smaller models as well. |
| # The t here is the sequence length or time steps for sequential input | ||
| # t = 1 if the given input is not sequential | ||
| if a_prev_raw.ndim == 2: # noqa: PLR2004 | ||
| a_prev = a_prev_raw.unsqueeze(1) |
There was a problem hiding this comment.
There is a potential bug here and line 364, where a_prev is not defined on all logic paths. It could cause errors when I run some examples. Probably we should make another PR for this since it's the original code?
|
I added a simple unit test for projection. I noticed that EK-FAC will not pass the self-attribute test after projection with 0.03 max diff. I'll look into whether it's the projection's problem (maybe it's related to seed control?). |

Description
Add random projection for some IF attributors
Closes #227