A fork of hoshidicts, adding a Kotlin API for use with Yomihon. It can be added via jitpack: https://jitpack.io/#yomihon/hoshidicts/
For the API, see the following two files: Models.kt and HoshiDicts.kt
This library implements a dictionary backend that works similarly to Yomitan. This was made for Hoshi Reader and was only tested with Japanese. Other languages might need their own deinflector or adjustments to the lookup strategy.
ImportResult dictionary_importer::import(const std::string& zip_path, const std::string& output_dir, bool low_ram = false)Imports a Yomitan .zip dictionary file into a custom format. The resulting folder is stored in output_dir/<dict_title>. Glossaries are compressed using zstd. Term, frequency and pitch dictionaries are generally supported, but only a small part of the pitch accent spec was implemented. Setting low_ram to true can reduce memory usage significantly at the cost of slightly lower import speed.
ImportResult exposes a deterministic storage_path on successful imports. The success flag reflects the final materialized output state, and reconciles to true when a valid output marker already exists.
void DictionaryQuery::add_term_dict(const std::string& path)Adds an imported term dictionary to the query.
void DictionaryQuery::add_freq_dict(const std::string& path)Adds an imported frequency dictionary to the query.
void DictionaryQuery::add_pitch_dict(const std::string& path)Adds an imported pitch dictionary to the query.
bool DictionaryQuery::has_meta_mode_entries(const std::string& path, const std::string& mode, uint32_t min_count = 1)Returns true when the imported dictionary at path contains at least min_count meta entries for a given mode (for example "freq" or "pitch").
std::vector<TermResult> DictionaryQuery::query(const std::string& expression) constQueries all added dictionaries for the given expression. TermResult includes glossary, frequency and pitch data in the order dictionaries were added. Glossaries are decompressed.
std::vector<DictionaryStyle> DictionaryQuery::get_styles() constReturns CSS styles for all dictionaries, if present.
std::vector<char> DictionaryQuery::get_media_file(const std::string& dict_name, const std::string& media_path) constReturns raw bytes for file originally stored at media_path in term dictionary dict_name or an empty vector if the file does not exist.
std::vector<DeconjugationForm> Deconjugator::deconjugate(const std::string& text) constDeconjugates a given Japanese string using a port of Jiten's deconjugator. As this doesn't use any dictionary data, the result may include invalid deconjugations. The result may also include duplicate forms with different processing steps.
Lookup::Lookup(DictionaryQuery& query, Deconjugator& deconjugator)Creates a Lookup object using a given query with dictionaries added and a deconjugator.
std::vector<LookupResult> Lookup::lookup(const std::string& lookup_string, int max_results = 16, size_t scan_length = 16) constFollows a parsing strategy similar to Yomitan. Substrings of lookup_string are tested from length scan_length down to 1. Each substring is processed using hiragana/katakana conversion, deconjugated then queried using the query object.
Results are filtered by part-of-speech tags defined in dictionaries, or added directly if none are present. The results are sorted by matched length first, then by processing steps, then deconjugation step count and finally by frequency.
- Yomitan: Dictionary format | GPLv3
- Jiten: Deconjugator | Apache-2.0
- glaze: MIT
- kuba--/zip: MIT
- xxHash: BSD-2-Clause
- zstd: BSD
- utfcpp: BSL-1.0
- unordered_dense: MIT
hoshidicts (main-mit) is licensed under the MIT license. See LICENSE for details.