-
Notifications
You must be signed in to change notification settings - Fork 135
Korean ITN Time #317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ko_itn_staging_v1
Are you sure you want to change the base?
Korean ITN Time #317
Conversation
Sparrowhawk testing is not done yet. Signed-off-by: hmlee245 <hmlee245@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: hmlee245 <hmlee245@gmail.com>
Signed-off-by: hmlee245 <hmlee245@gmail.com>
for more information, see https://pre-commit.ci
changes made to 9f7e876. Signed-off-by: hmlee245 <hmlee245@gmail.com>
for more information, see https://pre-commit.ci
… test cases Signed-off-by: hmlee245 <hmlee245@gmail.com>
…45/NeMo-text-processing into Draft-Version2/Korean-ITN Signed-off-by: hmlee245 <hmlee245@gmail.com>
Signed-off-by: hmlee245 <hmlee245@gmail.com>
Signed-off-by: hmlee245 <hmlee245@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: hmlee245 <hmlee245@gmail.com>
…text-processing into koreanITN-ordinal
for more information, see https://pre-commit.ci
Signed-off-by: hmlee245 <hmlee245@gmail.com>
…text-processing into koreanITN-ordinal
for more information, see https://pre-commit.ci
Signed-off-by: hmlee245 <hmlee245@gmail.com>
…text-processing into koreanITN-ordinal
for more information, see https://pre-commit.ci
Signed-off-by: hmlee245 <hmlee245@gmail.com>
…text-processing into koreanITN-ordinal
for more information, see https://pre-commit.ci
Signed-off-by: hmlee245 <hmlee245@gmail.com>
…text-processing into koreanITN-ordinal
for more information, see https://pre-commit.ci
Signed-off-by: hmlee245 <hmlee245@gmail.com>
…text-processing into koreanITN-ordinal
Signed-off-by: hmlee245 <hmlee245@gmail.com>
Signed-off-by: hmlee245 <hmlee245@gmail.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments
nemo_text_processing/inverse_text_normalization/ko/data/time/time_minutes_seconds.tsv
Outdated
Show resolved
Hide resolved
nemo_text_processing/inverse_text_normalization/ko/verbalizers/time.py
Outdated
Show resolved
Hide resolved
tests/nemo_text_processing/ko/data_inverse_text_normalization/test_cases_time.txt
Outdated
Show resolved
Hide resolved
Signed-off-by: Hyunmin Lee <hyunminl@hyunminl-mlt.client.nvidia.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the minutes part is too liberal with potential cardinals. correct me if i'm wrong.
|
|
||
| hour_component = pynutil.insert("hours: \"") + (graph_hours + spacing + hour_suffix) + pynutil.insert("\"") | ||
|
|
||
| minute_component = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't this graph beyond 0-59 though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does only accept 0-59 properly. Anything beyond will be accepted awkwardly. For example, "60분" will be tokenized as Cardinal 6, minute 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a block to that to limit awkward examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I've been working on this week + money semiotic class. Will try to update those asap this week.
Signed-off-by: hmlee245 <hmlee245@gmail.com>
Signed-off-by: hmlee245 <hmlee245@gmail.com>
…t-processing into koreanITN-time
for more information, see https://pre-commit.ci
| money = MoneyFst() | ||
| money_graph = money.fst | ||
|
|
||
| graph = cardinal_graph | ordinal_graph | decimal_graph | fraction_graph | time_graph | date_graph | money_graph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use pynin union and make it multiple lines. the linter should catch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm . minor formatting change
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
Signed-off-by: hmlee245 <hmlee245@gmail.com>
…t-processing into koreanITN-time
for more information, see https://pre-commit.ci
| graph_regular = hour + minute + second | ||
|
|
||
| # 오전 = AM, 오후 = PM | ||
| prefix_words = pynini.union( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make union (....) + spacing. Optimization usually catches these but it's not a given so might as well safe the quick op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to union the pynini.accep. It gives me TypeError for str, tuple issue.
| + pynini.closure(delete_space + suffix_tag, 0, 1) | ||
| ) | ||
|
|
||
| cardinal = CardinalFst() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make cardinal fst an argument for the init function. this allow you to pass the fst from the cardinal in the tagger graph and avoid having to instantiate the graph twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not using the cardinal fst for anything else and only to detect hour/minute/second that are out of normal range. So, wouldn't this be instantiating once?
| money = MoneyFst(cardinal) | ||
| money_graph = money.fst | ||
|
|
||
| telephone = TelephoneFst() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass cardinal to your telephone like the above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not using cardinal for telephone. I am just using the same digits for cardinal class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
request regarding telephone graph
What does this PR do ?
This PR adds time semiotic class to Korean ITN
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.