-
-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Labels
Description
The most recent version errors on attempting to extract an empty project. It could be reasonable to just return a notification in this case instead of continuing. My reason for extracting this project was actually because I suspected that it was empty and shouldn't be activated for drafting. I can't think of a time we'd actually need to do further processing on a truly empty project. Example below:
2026-03-24 11:01:54,855 - silnlp.common.onboard_project - INFO - Onboarding main project 'AAI_2026_03_24'
INFO:silnlp.common.onboard_project:Onboarding main project 'AAI_2026_03_24'
2026-03-24 11:01:54,936 - silnlp.common.onboard_project - INFO - Extracted corpus '/root/M/MT/scripture/anp-AAI_2026_03_24.txt' already exists. Skipping corpus extraction.
2026-03-24 11:01:54,998 - silnlp.common.onboard_project - INFO - Collecting verse counts for project 'AAI_2026_03_24'
0%| | 0/1 [00:00<?, ?it/s]2026-03-24 11:01:55,129 - silnlp.common.collect_verse_counts - INFO - Found verse counts for /root/M/MT/scripture/anp-AAI_2026_03_24.txt
100%|████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1390.68it/s]
2026-03-24 11:01:54,736 - silnlp.common.collect_verse_counts - INFO - No files smaller than 41KB were found.
2026-03-24 11:01:54,736 - silnlp.common.collect_verse_counts - INFO - All files were found.
2026-03-24 11:01:54,738 - silnlp.common.onboard_project - INFO - Running Wildebeest analysis on /root/M/MT/scripture/anp-AAI_2026_03_24.txt.
2026-03-24 11:01:55,601 - silnlp.common.onboard_project - INFO - Calculating tokenization stats for project 'AAI_2026_03_24'
2026-03-24 11:01:59,445 - silnlp.nmt.config - INFO - Preprocessing anp-AAI_2026_03_24 -> anp-AAI_2026_03_24
2026-03-24 11:01:59,685 - silnlp.nmt.config - INFO - train size: 0, val size: 0, test size: 0,
2026-03-24 11:01:59,686 - silnlp.nmt.config - WARNING - Glosses could not be included. No source or target language matches any of the supported gloss language codes: fr, en, id, es, pt.
2026-03-24 11:01:59,686 - silnlp.nmt.config - INFO - terms train size: 0
2026-03-24 11:01:59,686 - silnlp.nmt.config - INFO - Calculating tokenization statistics
Traceback (most recent call last):
File "/root/miniconda3/envs/silnlp/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/silnlp/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/silnlp/silnlp/common/onboard_project.py", line 775, in <module>
main()
File "/root/silnlp/silnlp/common/onboard_project.py", line 771, in main
onboarding_request.process_onboarding_request()
File "/root/silnlp/silnlp/common/onboard_project.py", line 439, in process_onboarding_request
self.main_project.calculate_tokenization_stats(
File "/root/silnlp/silnlp/common/onboard_project.py", line 209, in calculate_tokenization_stats
config.preprocess(stats=True, force_align=True)
File "/root/silnlp/silnlp/nmt/config.py", line 261, in preprocess
self._build_corpora(tokenizer, stats, force_align)
File "/root/silnlp/silnlp/nmt/config.py", line 326, in _build_corpora
self._calculate_tokenization_stats()
File "/root/silnlp/silnlp/nmt/config.py", line 406, in _calculate_tokenization_stats
tokens_verse_df = distribution_df(top_header, src_tokens_per_verse, trg_tokens_per_verse)
File "/root/silnlp/silnlp/nmt/config.py", line 389, in distribution_df
min(src_data),
ValueError: min() arg is an empty sequence
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
🏗 In progress