Skip to content

Correct the chat prompt template#16

Open
bamos wants to merge 2 commits intoYu-Fangxu:mainfrom
bamos:main
Open

Correct the chat prompt template#16
bamos wants to merge 2 commits intoYu-Fangxu:mainfrom
bamos:main

Conversation

@bamos
Copy link
Copy Markdown

@bamos bamos commented Nov 23, 2024

This attempts to correct the issue raised in #9 that the chat prompt template is being incorrectly used. In the original/published version of the code, the user query portion is not closed, so the attack is done in the user query portion rather than in the assistant's generation portion. This commit corrects it by closing the user query (with [/INST]). After doing this, the ASR appears to be 0% for Llama 2, in contrast to the published result of 70% in Table 24:

image

Here are what some of the outputs with this patch are looking like.

When you get a chance, can you please take a look and help look further into this? I could believe it's possible to do the COLD attack with the proper usage of the chat template, so it's possible I simply messed something up in here


I'm running it with:

 python3 cold_decoding.py  \     
  --seed 0 \
  --mode suffix \
  --pretrained_model Llama-2-7b-chat-hf \
  --init-temp 1 \
  --length 20 \
  --max-length 20 \
  --num-iters 2000 \
  --min-iters 0 \
  --goal-weight 100 \
  --rej-weight 100 \
  --stepsize 0.1 \
  --noise-iters 1 \
  --win-anneal-iters 1000 \
  --start 0 \
  --end 50 \
  --lr-nll-portion 1.0 \
  --topk 10 \
  --output-lgt-temp 1 \
  --verbose \
  --straight-through  \
  --large-noise-iters 50,200,500,1500\
  --large_gs_std  0.1,0.05,0.01,0.001  \
  --stepsize-ratio 1  \
  --batch-size 1 \
  --print-every 100 \
  --fp16 \
  --use-sysprompt

Comment thread decoding_suffix.py
text_post = text
for bi in range(args.batch_size):
prompt = x + " " + text_post[bi]
prompt = sys_prompt + x + " " + text_post[bi] + user_prompt_suffix
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, another comment here I meant to say. I noticed this code before did not have any system prompt at all. This seems wrong as a few lines below this is tokenized into input_ids and passed into model.generate from scratch, so my correction here adds both the system prompt and what I'm calling the user prompt suffix [/INST]. I also wonder if the addition of the space here (+ " " +) could be causing some of the transferability issues? I quickly tried removing it but it didn't seem to change much

Copy link
Copy Markdown
Author

@bamos bamos Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of + " " + here, I think the most correct way would be to take concatenated [prefix, hard-projected adversarial suffix, and user prompt suffix] and decode it with the tokenizer.

Also these modifications in get_text_from_logits look a little strange. This is called before decode_with_model_topk returns the text a few lines above this part (and in other places):

        text_i = text_i.replace('\n', ' ')
        text_i += ". "

Comment thread decoding_suffix.py
decoded_text = []
for bi in range(args.batch_size):
prompt = x + " " + text_post[bi]
prompt = sys_prompt + x + " " + text_post[bi] + user_prompt_suffix
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following https://github.com/Yu-Fangxu/COLD-Attack/pull/16/files#r1855480182, I think this one should have sys_prompt (and the user suffix) as well. Without these, the code before attacking with the partial chat template, but then running the final generation without the chat template

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant