Note
Banned-word lists exist across many repositories and websites — but after looking through most of them, the story is always the same: abandoned projects, outdated entries, or coverage limited to just one or two languages. I wanted something better. So I spent time collecting, merging, and cleaning word lists from every source I could find, across as many languages as possible. The result is this repository — an attempt to build the most complete and actively maintained multilingual banned-word collection available.
A curated collection of banned words in multiple languages.
This repository is designed for teams who want to automatically detect and block unwanted words in pull requests, commits, issue comments, chat pipelines, or any custom moderation workflow.
- Language-specific text files under
Banned-words-list/. - One banned word per line (plain
.txtformat). - Files that can be used directly in scripts, CI jobs, bots, and repository governance tools.
The repository includes separate files for many languages (for example: Arabic, Czech, French, Italian, Japanese, Russian, Ukrainian, and others).
Browse all lists in:
| File | Language | Words | |
|---|---|---|---|
| 🤬 | ru.txt |
Russian | 4249 |
| 🤬 | en.txt |
English | 3225 |
| 🤬 | es.txt |
Spanish | 500 |
| 🤬 | federal_government.txt |
federal_government.txt | 381 |
| 🤬 | zh.txt |
Chinese | 335 |
| 🤬 | flags_all.txt |
Flags of countries | 267 |
| 🤬 | nl.txt |
Dutch | 190 |
| 🤬 | ja.txt |
Japanese | 180 |
| 🤬 | ro.txt |
Romanian | 175 |
| 🤬 | it.txt |
Italian | 168 |
| 🤬 | kk.txt |
Kazakh | 155 |
| 🤬 | emoji.txt |
Emoji | 151 |
| 🤬 | tr.txt |
Turkish | 142 |
| 🤬 | uk.txt |
Ukrainian | 134 |
| 🤬 | fi.txt |
Finnish | 130 |
| 🤬 | hi.txt |
Hindi | 119 |
| 🤬 | fr.txt |
French | 100 |
| 🤬 | hu.txt |
Hungarian | 96 |
| 🤬 | ta.txt |
Tamil | 86 |
| 🤬 | pt.txt |
Portuguese | 76 |
| 🤬 | ko.txt |
Korean | 72 |
| 🤬 | id.txt |
Indonesian | 68 |
| 🤬 | de.txt |
German | 66 |
| 🤬 | ar.txt |
Arabic | 58 |
| 🤬 | pl.txt |
Polish | 54 |
| 🤬 | he.txt |
Hebrew | 49 |
| 🤬 | el.txt |
Greek | 48 |
| 🤬 | fa.txt |
Persian | 45 |
| 🤬 | sv.txt |
Swedish | 43 |
| 🤬 | cs.txt |
Czech | 41 |
| 🤬 | no.txt |
Norwegian | 40 |
| 🤬 | eo.txt |
Esperanto | 37 |
| 🤬 | bn.txt |
Bengali | 33 |
| 🤬 | gu.txt |
Gujarati | 31 |
| 🤬 | ms.txt |
Malay | 31 |
| 🤬 | th.txt |
Thai | 31 |
| 🤬 | vi.txt |
Vietnamese | 31 |
| 🤬 | kn.txt |
Kannada | 29 |
| 🤬 | placeholder.txt |
Placeholder | 29 |
| 🤬 | kab.txt |
Taqbaylit | 21 |
| 🤬 | da.txt |
Danish | 20 |
| 🤬 | ga.txt |
Irish | 16 |
| 🤬 | fil.txt |
Tagalog | 13 |
| 🤬 | tlh.txt |
Klingon | 3 |
To keep the dataset consistent and automation-friendly, each list should follow these rules:
- One word per line.
- No duplicate entries.
- Use lowercase where applicable.
- Avoid leading/trailing spaces.
These rules match the contribution guidelines and help keep filters predictable.
- Protecting repository discussions from toxic language.
- Enforcing communication policies in open-source projects.
- Pre-commit or CI checks for restricted terms.
- Building moderation bots for multilingual communities.
- Pick the language file(s) you need from
Banned-words-list/. - Load each file into your filter tool.
- Compare normalized text (trimmed and lowercased) against the list.
- Block, flag, or report matches based on your policy.
banned = load_lines("Banned-words-list/en.txt")
input_words = tokenize(user_text)
for word in input_words:
normalized = lowercase(trim(word))
if normalized in banned:
reject("Contains banned word")
Contributions are welcome.
Before opening a pull request:
- Read
CONTRIBUTING.md. - Ensure the file remains one-word-per-line.
- Remove duplicates.
- Keep words in lowercase where it makes sense for the language.
Word lists are context-limited and can produce false positives. Use them as a baseline signal, and combine them with contextual moderation when possible.
