Skip to content

sumit9000/Prompt-injections

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Prompt-injections

Prompt injection is a type of attack where malicious users craft prompts that trick or manipulate a language model into:

  • Ignoring system-level or developer instructions
  • Producing harmful, biased, or manipulated content
  • Bypassing safety mechanisms or revealing hidden data

🧠 Why Does This Matter?

As LLMs are increasingly embedded in chatbots, search, writing assistants, and decision-making tools, prompt injection threatens:

  • 🔓 User privacy
  • 🧨 Model safety
  • 📉 Trustworthiness of responses
  • 💸 Commercial fairness (e.g., biased recommendations)

🛡️ Mitigating Prompt Injection

While this document introduces the basics of prompt injection, defending against it requires:

  • Prompt sanitization
  • Clear separation between system and user inputs
  • Use of classifiers like Prompt Guard
  • Fine-tuned moderation models like LLaMA Guard

🤝 Contribute & Share Ideas

We welcome developers, researchers, and prompt engineers to collaborate!

💬 You can:

  • Share new examples of prompt injection
  • Suggest mitigation techniques
  • Add links to academic papers or blog posts
  • Build tools or datasets to detect/guard prompt attacks

Let’s make the future of AI more secure — together!

About

Prompt injection is a type of attack where malicious users craft prompts that trick or manipulate a language model into: - Ignoring system-level or developer instructions - Producing harmful, biased, or manipulated content - Bypassing safety mechanisms or revealing hidden data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors