Influence of the "conv_kernel_size" within the proposed Nystrom Attention

Congrats on your great work!

I am verifying your method on vision tasks and have a small concern on the influence of the "conv_kernel_size" of the 2D group-convolution in your task and I find that you choose relatively large numbers such as **35**. 

In vision tasks, applying the convolution with such a large kernel size is typically for ensuring a larger receptive field. Considering the proposed Nystrom attention already has the capability to model the long-range context following the original Multi-Head Attention.  In summary, I am a little bit confused about the motivation of such a design.

Another important concern is that: _**should we set the num_landmarks equal to the feature map width as the image feature maps are of grid structure?**_

It would be great if you could share your advice on the influence of this parameter! 

https://github.com/mlpen/Nystromformer/blob/effde255e6b38282840d0b4b620002579b32e4a2/code/attention_nystrom.py#L23-L28

https://github.com/mlpen/Nystromformer/blob/2bcc280c8cc3ab834e0c5ead2520a5872adbe348/LRA/code/lra_config.py#L46-L52

	self.conv = nn.Conv2d(
	in_channels = self.num_head, out_channels = self.num_head,
	kernel_size = (config["conv_kernel_size"], 1), padding = (config["conv_kernel_size"] // 2, 0),
	bias = False,
	groups = self.num_head)

	"extra_attn_config":{
	"softmax":{"attention_grad_checkpointing":True},
	"nystrom-32":{"attention_grad_checkpointing":False, "num_landmarks":32, "conv_kernel_size":35},
	"nystrom-64":{"attention_grad_checkpointing":False, "num_landmarks":64, "conv_kernel_size":35},
	"nystrom-128":{"attention_grad_checkpointing":False, "num_landmarks":128, "conv_kernel_size":35},
	"nystrom-256":{"attention_grad_checkpointing":False, "num_landmarks":256, "conv_kernel_size":35},
	"linformer-256":{"attention_grad_checkpointing":False, "linformer_k":256},

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Influence of the "conv_kernel_size" within the proposed Nystrom Attention #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Influence of the "conv_kernel_size" within the proposed Nystrom Attention #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions