`pyzet grep` two-way support for searching for non-ASCII characters

Alphabets that use non-ASCII characters are annoying to grep for, and there should be a way to enable a convenient search patterns that deal with this problem.

## Problem statement

* A user used non-ASCII character in the zettel content, and would like to find it with using only ASCII characters

* A user used ASCII character in the zettel content (any reason like laziness/mistake/copied text), and would like to find it also when looking for its non-ASCII counterpart

## Example

E.g. for Polish we have:

```
ą -- a
ć -- c
ę -- e
ł -- l
ń -- n
ó -- o
ś -- s
ź -- z
ż -- z
```

Of course, capital letters also should be supported.

## Behaviors

* grepping for `zolta ges` should find `żółta gęś`

* grepping for `żółta gęś` should find `zolta ges` -- (use case: we want to find a copied text from someone who haven't used diacritics)

* probably controlled with a special flag or even multiple flags (i.e. there can be different modes: a single two-way or two one-way)

## Implementation

* `git grep` pattern should be probably modified in such a way that it looks for strings with `OR` parts when one or the other character should match

* However, multiple non-ASCII chars can map to a single ASCII, e.g. both `ż` and `ź` map to `z`. In such case, all three should be detected when grepping for `z`, but only two when grepping for `ż` or `ź` (because `ż` and `ź` shouldn't be treated as the same letter)

* There are many languages, so hard-coding these rules for Polish doesn't seem like the best idea under the sun. I would prefer to create some kind of abstraction layer, so the rules can be added independently for each language. Maybe it can be even a part of a config file for custom mappings (to be checked is how YAML handles non-ASCII), but I think that built-in support for given languages can be included.

* Above, I only wondered about a situation when we have char to char mapping. But there are examples when multiple ASCII characters map to a single non-ASCII char (e.g. German `ß` maps to `ss`). I'm not sure if this is trivial to extend it like that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pyzet grep` two-way support for searching for non-ASCII characters #34

Problem statement

Example

Behaviors

Implementation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

pyzet grep two-way support for searching for non-ASCII characters #34

Description

Problem statement

Example

Behaviors

Implementation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`pyzet grep` two-way support for searching for non-ASCII characters #34