Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions docs/FrequencyListFormats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Example frequency list formats


#### Study plan frequency list created with Readibility Analyzer
```
#study_plan_frequency 1.0
мистер мистер мистер мистер UNKNOWN UNKNOWN [morph_freq 99992, master_freq 99992]
похоже похоже похоже похоже UNKNOWN UNKNOWN [morph_freq 71157, master_freq 71157]
доктор доктор доктор доктор UNKNOWN UNKNOWN [morph_freq 67918, master_freq 67918]
.
.
.
```

#### Frequency report format
This is generated by Readibility Analyzer for morph_freq_report.txt or instance_freq_report.txt
Note the new header which is automatically created with the new Morphman version. Add the header manually if needed

```
#frequency_report 1.0
1401 я я я UNKNOWN UNKNOWN 1 1 3.40098073 3.40098073 matches 1
1139 не не не UNKNOWN UNKNOWN 2 2 2.76496577 6.16594650 matches 1
992 в в в UNKNOWN UNKNOWN 3 3 2.40811769 8.57406418 matches 1
798 что что что UNKNOWN UNKNOWN 4 4 1.93717532 10.51123950 matches 1
798 и и и UNKNOWN UNKNOWN 4 5 1.93717532 12.44841482 matches 1
660 это это это UNKNOWN UNKNOWN 5 6 1.60217507 14.05058989 matches 1
.
.
```

#### Custom frequency list type, which has also frequency count
```
#HEADERTYPE_count_word
5081568 я
4334804 не
3552532 что
2953981 в
2917112 и
2723798 ты
.
.
```
This is handy if you wan't to use frequency lists from external sources.

#### List consisting of single word by line without frequency
This is the fall-back format
```
する
ない
.
.
.
```

If you need to add a custom format, it can be done quite easily by editing `loadFrequencyList` in language.py
26 changes: 26 additions & 0 deletions docs/MultiLanguage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Multi-language support

If you want to study multiple languages but keep using the same Anki profile, it's now possible with the new multi-language support. This means that each target language has its own frequency list and databases, neatly separated into distinct files.


## Setup

*Morphman Preferences -> Note Filter* window has now a new column for selecting the target language for each filter. By selecting for example '*Japanese*' the Morphman will use the following files for that specific filter:

- frequency_Japanese.txt
- all_Japanese.db
- seen_Japanese.db
- known_Japanese.db
- mature_Japanese.db
- external_Japanese.db
- priority_Japanese.db

**Default** setting means that Morphman will keep using the existing frequencylist.txt, all.db, known.db .. files for backwards compatibility when processing that filter.

The language list is currently hard-coded, but if you need to add a new one you can do that easily by editing *preferences.py*, or select **Other** . In the latter case Morphman would use files such as *known_Other.txt*

If you have existing **frequency.txt** and **external.db** files, you can rename them to reflect the target language (e.g. *frequency_Japanese.txt* and *external_Japanese.db*). You can then delete the rest of the database files and do a Recalc.

## Changes in Readability Analyzer

When using *Readability Analyzer* you must now explicitly select both Known and Mature database files (because it will not try to infer the mature morph database file name from known data base file). If generating frequency lists the output file name is currently fixed *frequency.txt* so you will need to manually rename it for the specific language.
1 change: 1 addition & 0 deletions morph/UI/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# pylint: disable=W0611

from .morphemizerComboBox import MorphemizerComboBox
from .languageComboBox import LanguageComboBox
31 changes: 31 additions & 0 deletions morph/UI/languageComboBox.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@

from PyQt6.QtWidgets import QComboBox


class LanguageComboBox(QComboBox):

def setLanguages(self, languages):
if type(languages) == list:
self.languages = languages
else:
self.languages = ['Default']

for language in self.languages:
self.addItem(language)

self.setCurrentIndex(0)

def getCurrent(self):
try:
return self.languages[self.currentIndex()]
except IndexError:
return None

def setCurrentByName(self, name):
active = False
for i, language in enumerate(self.languages):
if language == name:
active = i
if active:
self.setCurrentIndex(active)

14 changes: 7 additions & 7 deletions morph/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
# 4th (lowest) priority
default = {
'path_dbs': os.path.join(mw.pm.profileFolder(), 'dbs'),
'path_priority': os.path.join(mw.pm.profileFolder(), 'dbs', 'priority.db'),
'path_ext': os.path.join(mw.pm.profileFolder(), 'dbs', 'external.db'),
'path_frequency': os.path.join(mw.pm.profileFolder(), 'dbs', 'frequency.txt'),
'path_all': os.path.join(mw.pm.profileFolder(), 'dbs', 'all.db'),
'path_mature': os.path.join(mw.pm.profileFolder(), 'dbs', 'mature.db'),
'path_known': os.path.join(mw.pm.profileFolder(), 'dbs', 'known.db'),
'path_seen': os.path.join(mw.pm.profileFolder(), 'dbs', 'seen.db'),
'path_priority': os.path.join(mw.pm.profileFolder(), 'dbs', 'priority%s.db'),
'path_ext': os.path.join(mw.pm.profileFolder(), 'dbs', 'external%s.db'),
'path_frequency': os.path.join(mw.pm.profileFolder(), 'dbs','frequency%s.txt'),
'path_all': os.path.join(mw.pm.profileFolder(), 'dbs', 'all%s.db'),
'path_mature': os.path.join(mw.pm.profileFolder(), 'dbs', 'mature%s.db'),
'path_known': os.path.join(mw.pm.profileFolder(), 'dbs', 'known%s.db'),
'path_seen': os.path.join(mw.pm.profileFolder(), 'dbs', 'seen%s.db'),
'path_log': os.path.join(mw.pm.profileFolder(), 'morphman.log'),
'path_stats': os.path.join(mw.pm.profileFolder(), 'morphman.stats'),

Expand Down
4 changes: 3 additions & 1 deletion morph/graphs.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from .morphemes import AnkiDeck
from .preferences import get_preference as cfg
from .util import mw
from .language import getAllDb

colYoung = "#7c7"
colCard = "#282"
Expand Down Expand Up @@ -221,7 +222,8 @@ def get_stats(self, db_table, bucket_size_days, day_cutoff_seconds, num_buckets=
if not all_reviews_for_bucket:
return stats_by_name

all_db = util.allDb()
# TODO! Process all.db for each language
all_db = getAllDb("Default")
nid_to_morphs = defaultdict(set)

for m, ls in all_db.db.items():
Expand Down
131 changes: 131 additions & 0 deletions morph/language.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
from aqt.utils import tooltip
import os
import io
import csv
import itertools
from .preferences import get_preference as cfg
from .morphemes import Morpheme

# Each language has its own MorphDb
_allDb = {}

class FrequencyList:
def __init__(self):
self.map = dict()
self.len = 0
self.has_morphemes = False
self.has_frequency_count = False
self.master_total_instances = 0

def getLanguageList():
languages = set()
rowData = cfg('Filter')
try:
for row in rowData:
languages.add(row['Language'])
except:
# language per filter not yet configured
pass
if len(languages) == 0:
languages.add('Default')
return list(languages)


def getPathByLanguage(path, language):
if (language == 'Default'):
return path % ('')
else:
return path % ('_' + language )


def getAllDb(language):
global _allDb

# Force reload if all.db got deleted
all_db_path = getPathByLanguage(cfg('path_all'),language)
reload = not os.path.isfile(all_db_path)

if reload or (language not in _allDb):
from .morphemes import MorphDb
_allDb[language] = MorphDb(all_db_path, ignoreErrors=True)
return _allDb[language]


def getTotalKnownSet():
from .morphemes import MorphDb

# Load known.db and get total morphemes known
totalVariations = 0
totalKnown = 0
languages = getLanguageList()
for language in languages:
known_db = MorphDb(getPathByLanguage(cfg('path_known'),language), ignoreErrors=True)
totalVariations += len(known_db.db)
totalKnown += len(known_db.groups)

d = {'totalVariations': totalVariations, 'totalKnown': totalKnown}
return d


"""
See docs/FrequencyListFormats.md for specific info about the file format
"""
def loadFrequencyList(frequencyListPath, force_morphemes=False):

print("Loading Frequency List for file %s.." % frequencyListPath)
fl = FrequencyList()

try:
with io.open(frequencyListPath, encoding='utf-8-sig') as csvfile:
csvreader = csv.reader(csvfile, delimiter="\t")
rows = [row for row in csvreader]
print("First line: [%s]" % rows[0][0])

if rows[0][0] == "#study_plan_frequency":
print("Detected Study plan frequency format")
fl.has_morphemes = True
fl.map = dict(
zip([Morpheme(row[0], row[1], row[2], row[3], row[4], row[5]) for row in rows[1:]],
itertools.count(0)))

elif rows[0][0] == "#frequency_report":
print("Detected Frequency report format")
fl.has_morphemes = True
fl.has_frequency_count = True
for row in rows[1:]:
fl.map[ Morpheme(row[1], row[2], row[2], row[3], row[4], row[5]) ] = int(row[0])

elif rows[0][0] == "#HEADERTYPE_count_word":
print("Detected frequency + word format")
fl.has_frequency_count = True
if force_morphemes:
fl.has_morphemes = True
for row in rows[1:]:
fl.map[ Morpheme(row[1], row[1], row[1], row[1], "UNKNOWN","UNKNOWN") ] = int(row[0])
else:
for row in rows[1:]:
fl.map[ row[1] ] = int(row[0])
else:
print("Assuming one-word-per-line format")
if force_morphemes:
fl.has_morphemes = True
fl.map = dict(zip([Morpheme(row[1], row[1], row[1], row[1], "UNKNOWN","UNKNOWN") for row in rows], itertools.count(0)))
else:
fl.map = dict(zip([row[0] for row in rows], itertools.count(0)))

fl.len = len(fl.map)
if fl.has_frequency_count:
fl.master_total_instances = sum(fl.map.values())

except (FileNotFoundError, IndexError) as e:
err = "Warning! Couldn't not read frequency list %s" % (frequencyListPath)
print(err)
tooltip(err)
pass

return fl

def loadFrequencyListByLanguage(language):
frequencyListPath = getPathByLanguage(cfg('path_frequency'), language)
return loadFrequencyList(frequencyListPath)

Loading