Dear Authors,
Thank you for your amazing work, it is very inspiring.
I was wondering if you ever tried to visualize the attention your network?
Does it learn anthing meaningful like following: (borrowed from https://jacobgil.github.io/deeplearning/vision-transformer-explainability)

Thank you!