You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A reproducible adversarial ML lab that demonstrates TextFooler, BERTAttack, and DeepWordBug attacks against transformer-based sentiment models, with Docker automation and adversarial security reporting.
Prompt injection is a type of attack where malicious users craft prompts that trick or manipulate a language model into: - Ignoring system-level or developer instructions - Producing harmful, biased, or manipulated content - Bypassing safety mechanisms or revealing hidden data