https://arxiv.org/pdf/2306.17806.pdf
notes:
- basically gives a "guidance prompt," the idea being that it wont decay in context the same way the normal prompt does (normal prompt is just treated like standard context so the model doesn't really give it any extra priority or weight).
- will be good for things like instructions in instruct models, style enforcement, etc
- can soft tokens go in there??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
- probably will be good for things like memory or a/n since they are pertinent throughout the entirety of the context
- requires two network passes, 2x generation time
- supports negative prompting (not sure how this will work outside of instruct...)
We suspect that CFG ... will reduce the entropy of the logit distribution. ... The effect this has is to restrict the number of tokens in the top-p=90% of the vocabulary distribution ... We do observe ... that the top tokens to not shift too much, but they do re-order to some extent, which shows that CFG is not simply having the same effect as the temperature parameter.
- probably better to disable top-a and have top-p not too high
We conclude with the surprising finding that ... there there is a statistically insignificant difference between using CFG and using vanilla prompting with a model of twice the size at p = .01
code:
https://github.com/huggingface/transformers/pull 24654
lets patch this in and see what happens
todo:
https://arxiv.org/pdf/2306.17806.pdf
notes:
We suspect that CFG ... will reduce the entropy of the logit distribution. ... The effect this has is to restrict the number of tokens in the top-p=90% of the vocabulary distribution ... We do observe ... that the top tokens to not shift too much, but they do re-order to some extent, which shows that CFG is not simply having the same effect as the temperature parameter.We conclude with the surprising finding that ... there there is a statistically insignificant difference between using CFG and using vanilla prompting with a model of twice the size at p = .01code:
https://github.com/huggingface/transformers/pull 24654lets patch this in and see what happens
todo:
Contexttabtextareas on focus/unfocus so we dont use like 104392423490823490234px of vertical space