Skip to content

readme-SVG/Banned-words

readme-SVG/Banned-words

Note

Banned-word lists exist across many repositories and websites — but after looking through most of them, the story is always the same: abandoned projects, outdated entries, or coverage limited to just one or two languages. I wanted something better. So I spent time collecting, merging, and cleaning word lists from every source I could find, across as many languages as possible. The result is this repository — an attempt to build the most complete and actively maintained multilingual banned-word collection available.


A curated collection of banned words in multiple languages.

This repository is designed for teams who want to automatically detect and block unwanted words in pull requests, commits, issue comments, chat pipelines, or any custom moderation workflow.

What this repository contains

  • Language-specific text files under Banned-words-list/.
  • One banned word per line (plain .txt format).
  • Files that can be used directly in scripts, CI jobs, bots, and repository governance tools.

Available languages

The repository includes separate files for many languages (for example: Arabic, Czech, French, Italian, Japanese, Russian, Ukrainian, and others).

Browse all lists in:

File Language Words
🤬 ru.txt Russian 4249
🤬 en.txt English 3225
🤬 es.txt Spanish 500
🤬 federal_government.txt federal_government.txt 381
🤬 zh.txt Chinese 335
🤬 flags_all.txt Flags of countries 267
🤬 nl.txt Dutch 190
🤬 ja.txt Japanese 180
🤬 ro.txt Romanian 175
🤬 it.txt Italian 168
🤬 kk.txt Kazakh 155
🤬 emoji.txt Emoji 151
🤬 tr.txt Turkish 142
🤬 uk.txt Ukrainian 134
🤬 fi.txt Finnish 130
🤬 hi.txt Hindi 119
🤬 fr.txt French 100
🤬 hu.txt Hungarian 96
🤬 ta.txt Tamil 86
🤬 pt.txt Portuguese 76
🤬 ko.txt Korean 72
🤬 id.txt Indonesian 68
🤬 de.txt German 66
🤬 ar.txt Arabic 58
🤬 pl.txt Polish 54
🤬 he.txt Hebrew 49
🤬 el.txt Greek 48
🤬 fa.txt Persian 45
🤬 sv.txt Swedish 43
🤬 cs.txt Czech 41
🤬 no.txt Norwegian 40
🤬 eo.txt Esperanto 37
🤬 bn.txt Bengali 33
🤬 gu.txt Gujarati 31
🤬 ms.txt Malay 31
🤬 th.txt Thai 31
🤬 vi.txt Vietnamese 31
🤬 kn.txt Kannada 29
🤬 placeholder.txt Placeholder 29
🤬 kab.txt Taqbaylit 21
🤬 da.txt Danish 20
🤬 ga.txt Irish 16
🤬 fil.txt Tagalog 13
🤬 tlh.txt Klingon 3

Format rules

To keep the dataset consistent and automation-friendly, each list should follow these rules:

  1. One word per line.
  2. No duplicate entries.
  3. Use lowercase where applicable.
  4. Avoid leading/trailing spaces.

These rules match the contribution guidelines and help keep filters predictable.

Typical use cases

  • Protecting repository discussions from toxic language.
  • Enforcing communication policies in open-source projects.
  • Pre-commit or CI checks for restricted terms.
  • Building moderation bots for multilingual communities.

Quick start

  1. Pick the language file(s) you need from Banned-words-list/.
  2. Load each file into your filter tool.
  3. Compare normalized text (trimmed and lowercased) against the list.
  4. Block, flag, or report matches based on your policy.

Example (pseudo-code)

banned = load_lines("Banned-words-list/en.txt")
input_words = tokenize(user_text)

for word in input_words:
  normalized = lowercase(trim(word))
  if normalized in banned:
    reject("Contains banned word")

Contributing

Contributions are welcome.

Before opening a pull request:

  • Read CONTRIBUTING.md.
  • Ensure the file remains one-word-per-line.
  • Remove duplicates.
  • Keep words in lowercase where it makes sense for the language.

Important note

Word lists are context-limited and can produce false positives. Use them as a baseline signal, and combine them with contextual moderation when possible.