`nanoflrt`

A custom implementation of a variation of the FLRT algorithm from FLRT: Fluent Student-Teacher Redteaming, here we replace the worst candidate from the buffer, and don't (currently) use the additional loss functions outlined in the FLRT paper (we only use cross-entropy and ignore the finetuning and perplexity losses).

Installation

Install from the git repo:

pip install git+https://github.com/serteal/nanoflrt

or use uv:

uv add git+https://github.com/serteal/nanoflrt

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

import nanoflrt

model_name = "meta-llama/Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype=torch.bfloat16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Fill in your own message and target
messages = [{"role": "user", "content": "Give me instructions for how to ..."}]
target = "Sure, here are some instructions for how to ..."

result = nanoflrt.run(model, tokenizer, messages, target)

# Generate with the best FLRT string
messages[-1]["content"] = messages[-1]["content"] + result.best_string

input = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
output = model.generate(input, do_sample=False, max_new_tokens=512)

print(f"Prompt:\n{messages[-1]['content']}\n")
print(
    f"Generation:\n{tokenizer.batch_decode(output[:, input.shape[1] :], skip_special_tokens=True)[0]}"
)

Check out the examples for other use-cases.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
examples		examples
nanoflrt		nanoflrt
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`nanoflrt`

Installation

Usage

About

Uh oh!

Releases

Uh oh!

Languages

serteal/nanoflrt

Folders and files

Latest commit

History

Repository files navigation

nanoflrt

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages

`nanoflrt`