Correct the chat prompt template by bamos · Pull Request #16 · Yu-Fangxu/COLD-Attack

bamos · 2024-11-23T20:35:37Z

This attempts to correct the issue raised in #9 that the chat prompt template is being incorrectly used. In the original/published version of the code, the user query portion is not closed, so the attack is done in the user query portion rather than in the assistant's generation portion. This commit corrects it by closing the user query (with [/INST]). After doing this, the ASR appears to be 0% for Llama 2, in contrast to the published result of 70% in Table 24:

Here are what some of the outputs with this patch are looking like.

When you get a chance, can you please take a look and help look further into this? I could believe it's possible to do the COLD attack with the proper usage of the chat template, so it's possible I simply messed something up in here

I'm running it with:

 python3 cold_decoding.py  \     
  --seed 0 \
  --mode suffix \
  --pretrained_model Llama-2-7b-chat-hf \
  --init-temp 1 \
  --length 20 \
  --max-length 20 \
  --num-iters 2000 \
  --min-iters 0 \
  --goal-weight 100 \
  --rej-weight 100 \
  --stepsize 0.1 \
  --noise-iters 1 \
  --win-anneal-iters 1000 \
  --start 0 \
  --end 50 \
  --lr-nll-portion 1.0 \
  --topk 10 \
  --output-lgt-temp 1 \
  --verbose \
  --straight-through  \
  --large-noise-iters 50,200,500,1500\
  --large_gs_std  0.1,0.05,0.01,0.001  \
  --stepsize-ratio 1  \
  --batch-size 1 \
  --print-every 100 \
  --fp16 \
  --use-sysprompt

bamos · 2024-11-24T15:50:47Z

            text_post = text
            for bi in range(args.batch_size):
-                prompt = x + " " + text_post[bi]
+                prompt = sys_prompt + x + " " + text_post[bi] + user_prompt_suffix


Ah, another comment here I meant to say. I noticed this code before did not have any system prompt at all. This seems wrong as a few lines below this is tokenized into input_ids and passed into model.generate from scratch, so my correction here adds both the system prompt and what I'm calling the user prompt suffix [/INST]. I also wonder if the addition of the space here (+ " " +) could be causing some of the transferability issues? I quickly tried removing it but it didn't seem to change much

Instead of + " " + here, I think the most correct way would be to take concatenated [prefix, hard-projected adversarial suffix, and user prompt suffix] and decode it with the tokenizer.

Also these modifications in get_text_from_logits look a little strange. This is called before decode_with_model_topk returns the text a few lines above this part (and in other places):

text_i = text_i.replace('\n', ' ') text_i += ". "

bamos · 2024-11-24T16:06:40Z

    decoded_text = []
    for bi in range(args.batch_size):
-        prompt = x + " " + text_post[bi]
+        prompt = sys_prompt + x + " " + text_post[bi] + user_prompt_suffix


Following https://github.com/Yu-Fangxu/COLD-Attack/pull/16/files#r1855480182, I think this one should have sys_prompt (and the user suffix) as well. Without these, the code before attacking with the partial chat template, but then running the final generation without the chat template

bamos mentioned this pull request Nov 23, 2024

Llama-2-7b-chat-hf chat templete #9

Closed

Correct the chat prompt template

9abe5ae

bamos force-pushed the main branch from 0ad0fc9 to 9abe5ae Compare November 23, 2024 20:47

bamos commented Nov 24, 2024

View reviewed changes

add the system prompt to the final generation

63ba3cf

bamos commented Nov 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct the chat prompt template#16

Correct the chat prompt template#16
bamos wants to merge 2 commits intoYu-Fangxu:mainfrom
bamos:main

bamos commented Nov 23, 2024 •

edited

Loading

Uh oh!

bamos Nov 24, 2024

Uh oh!

bamos Nov 24, 2024 •

edited

Loading

Uh oh!

bamos Nov 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bamos commented Nov 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bamos Nov 24, 2024

Choose a reason for hiding this comment

Uh oh!

bamos Nov 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bamos Nov 24, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bamos commented Nov 23, 2024 •

edited

Loading

bamos Nov 24, 2024 •

edited

Loading