Hello,
Sorry if this is a silly question, but looking at your code in ptr_base.py line 90 the LinformerEncoder layer doesn't seem to be implementing linear attention at all; what it seems to be doing instead is just performing regular multi-head attention. Is this the case, and if not where for the LinformerEncoder layers does the linearisation take place?
Thanks,
Josh
Hello,
Sorry if this is a silly question, but looking at your code in ptr_base.py line 90 the LinformerEncoder layer doesn't seem to be implementing linear attention at all; what it seems to be doing instead is just performing regular multi-head attention. Is this the case, and if not where for the LinformerEncoder layers does the linearisation take place?
Thanks,
Josh