Skip to content

COG Datalyer#2

Open
aasseman wants to merge 70 commits intotkornuta-nvidia:masterfrom
aasseman:feat/cog
Open

COG Datalyer#2
aasseman wants to merge 70 commits intotkornuta-nvidia:masterfrom
aasseman:feat/cog

Conversation

@aasseman
Copy link

@aasseman aasseman commented Jun 2, 2020

Ported COG dataset from https://github.com/IBM/mi-prometheus as a NeMo datalayer.
Also made lots of cleanup of the original code. Tried to organize the commits in a clean and logical way, so consulting the commits one by one should help with tracking the modifications.

To run test, in the root of the NeMo python module dir:

python -m nemo.collections.visual_reasoning.modules.data_layers.cog.datalayer

If X is available, the test will show some samples using matplotlib.

gkucsko and others added 30 commits June 2, 2020 19:46
…VIDIA-NeMo#693)

* update sgd numbers after fix in seen services and slot request loss

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix table

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add more information to documentation

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
* Added user sys tag to TRADE.

Signed-off-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
…-NeMo#695)

* megatron glue numbers added, default amp level reverted to O0

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* table reformatted

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
editdistance package for fast WER calculation
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
…IA-NeMo#673)

* add VAD

Signed-off-by: fayejf <fayejf07@gmail.com>

* update with PR comments

Signed-off-by: fayejf <fayejf07@gmail.com>

* revert change on asr notebook 4&5

Signed-off-by: fayejf <fayejf07@gmail.com>

* update with PR comments

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix typos

Signed-off-by: fayejf <fayejf07@gmail.com>

* upload docs and resolve parts of PR comments

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix doc bib issue

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix jenksin doc issue

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix some warning/typo

Signed-off-by: fayejf <fayejf07@gmail.com>

* update notebook#6, improve data process scripts, and some fix

Signed-off-by: fayejf <fayejf07@gmail.com>

* some minor changes

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix bib issue

Signed-off-by: fayejf <fayejf07@gmail.com>

* little fix to avoid misunderstanding

Signed-off-by: fayejf <fayejf07@gmail.com>
* update an4 notebook

Signed-off-by: Jason <jasoli@nvidia.com>

* colab bugfix

Signed-off-by: Jason <jasoli@nvidia.com>

* update script

Signed-off-by: Jason <jasoli@nvidia.com>

* fix notebooks

Signed-off-by: Jason <jasoli@nvidia.com>

* fix notebooks

Signed-off-by: Jason <jasoli@nvidia.com>
* Durations extraction with script draft.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Durations extraction notebooks.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Finished bulk part of durations predictor.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add tensorboard logging.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add general-style train logger.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Change LibriSpeech parts order and move train logger callback to core.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add one big file durs saving.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Big batch params change.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add full pad option to data loader as default.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Complete durs pipeline with evaluation.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Rename durs ngc script.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Adjust duration main script default for ngc run.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Fix problem with torch.bool dist eval.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add LibriTTS processing.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add FasterSpeech full pipeline reaching about 0.4 MSE for LibriTTS.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add QN retrain NGC pipeline, new dur XE steps loss and mel Griffin-Lim sampling.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add train logging for mel with audio sampling and super sampler.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add length sampler.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Set SSS as default and introduce local shuffling.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* New defaults.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* W&B Support, new speaker system and some refactoring

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add simple durs aug.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* New baseline (1)

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Fix trim bug and make default O2.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add pad16.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Fix dist eval error and add variable steps.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add WaveGlow inference and fix pad16 bug.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Generalize mel loss.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Generalize pad op.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Move pad16 logic to loss.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add fmin/fmax to griffin-lim vocoding.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Move model params to config.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add denoiser argument to WaveGlow inference.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add new durs with all 1s by default.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* New baseline (3)

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* New baseline (4)

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* New baseline (5)

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Refactor durs predictor script.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Durs predictor baseline

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Adjusted durs scirpt for NGC.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* New durs lj baseline

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Update durs baseline params.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Add durs/blanks acc metrics.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Fix durs baseline.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Current state

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Update NGC scripts and implement shake_all aug.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Update augmentations implementations.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Bunch of things

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Change name to TalkNet.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Latest notebooks changes

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Working scripts with latest master changes

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Finished trimming durs predictor code.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Trimmed mels part.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Delete dev folder.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Fix style errors.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Fix LGTM errors.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Revert simple logging changes.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Fix problems.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Fix problems.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Remove WG inference and add type hints for data layer.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
…s) (NVIDIA-NeMo#675)

* git history clean up

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* nlp references to the tutotials

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* sphinx fix

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* review feedback

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>
Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
* initial commit of callback documentation

Signed-off-by: Jason <jasoli@nvidia.com>

* some syntax fixes

Signed-off-by: Jason <jasoli@nvidia.com>

* add old callbacks file

Signed-off-by: Jason <jasoli@nvidia.com>

* finalize docs; change train to action

Signed-off-by: Jason <jasoli@nvidia.com>

* style

Signed-off-by: Jason <jasoli@nvidia.com>

* update sphinx style

Signed-off-by: Jason <jasoli@nvidia.com>

* update sphinx warnings

Signed-off-by: Jason <jasoli@nvidia.com>

* train->action rename bug

Signed-off-by: Jason <jasoli@nvidia.com>

* address comments

Signed-off-by: Jason <jasoli@nvidia.com>

* comments

Signed-off-by: Jason <jasoli@nvidia.com>
Update README (pretrained ASR model information)
Bugfix to output ports of Kaldi data layer
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>
* pm+nlg for multiwoz init

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* pipeline is working, init clean up

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* headers added

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* fixed invalid .json file, added db files to multiwoz preprocessing

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* code clean up

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* lgtm fixes

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* docs for TRADE update, jenkins for ruled_based example

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* jenkins fix

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* ports refactor wip

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* ports refactor wip

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* wip works

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* neural types refactored

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* remove unused

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* lgtm fixes

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* typo

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* state dict splited

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* lgtm fixes

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* fixing the process script, moved multiwoz_mapping.pair to multiwoz, enabled utilization of relative paths

Signed-off-by: Tomasz Kornuta <tkornuta@nvidia.com>

* formatting fix

Signed-off-by: Tomasz Kornuta <tkornuta@nvidia.com>

* reformatted the code, ready for definition of NG by connecting the modules - and fixing the definitions

Signed-off-by: Tomasz Kornuta <tkornuta@nvidia.com>

* work in progress-ess, not working, internet issues

Signed-off-by: Tomasz Kornuta <tkornuta@nvidia.com>

* UtteranceEncoder neural types wip

Signed-off-by: nvidia <tkornuta@nvidia.com>

* utterance encoder neural types

Signed-off-by: nvidia <tkornuta@nvidia.com>

* updating trade outputs

Signed-off-by: nvidia <tkornuta@nvidia.com>

* updating trade outputs

Signed-off-by: nvidia <tkornuta@nvidia.com>

* fightihg with belief state

Signed-off-by: nvidia <tkornuta@nvidia.com>

* Cannot make second named tuple work

Signed-off-by: nvidia <tkornuta@nvidia.com>

* reorganized files, whole pipeline handshaking works

Signed-off-by: nvidia <tkornuta@nvidia.com>

* reorganized files, whole pipeline handshaking works

Signed-off-by: nvidia <tkornuta@nvidia.com>

* polish

Signed-off-by: nvidia <tkornuta@nvidia.com>

* Fix of my dummy error

Signed-off-by: nvidia <tkornuta@nvidia.com>

* new examples

Signed-off-by: nvidia <tkornuta@nvidia.com>

* style fix

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* fixed TRADE training

Signed-off-by: Evelina Bakhturina <ebakhturina@nvidia.com>

* Added module responsible for sys uttr dialog history update

Signed-off-by: nvidia <tkornuta@nvidia.com>

* LGTM fix

Signed-off-by: nvidia <tkornuta@nvidia.com>

* moved dialog specific axesc andctypes to nlp/neural_types.py, refactored the modules

Signed-off-by: nvidia <tkornuta@nvidia.com>

* style fix

Signed-off-by: nvidia <tkornuta@nvidia.com>

Co-authored-by: Tomasz Kornuta <tkornuta@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>
* make test better

Signed-off-by: Jason <jasoli@nvidia.com>

* fix rename error during topological sort

Signed-off-by: Jason <jasoli@nvidia.com>

* test fix

Signed-off-by: Jason <jasoli@nvidia.com>
@aasseman
Copy link
Author

FYI, I just rebased to a more recent upstream master.

okuchaiev and others added 23 commits June 11, 2020 13:03
Fixed 2_Online_ASR_Microphone_Demo notebook to support new config
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
…VIDIA-NeMo#724)

* Added ability to write audio to tensorboard during Tacotron training

Signed-off-by: Polezhaev Sergej <sapolezh@mts.ru>

* Removed unused import

Signed-off-by: Polezhaev Sergej <sapolezh@mts.ru>

Co-authored-by: Sergey Polezhaev <sapolezh@mts.ru>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
…rent subdirs, as they are not datalayers.

Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Alexis Asseman added 3 commits June 16, 2020 11:26
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
Signed-off-by: Alexis Asseman <alexis.asseman@ibm.com>
@aasseman aasseman marked this pull request as ready for review June 19, 2020 17:08
tkornuta-nvidia pushed a commit that referenced this pull request Aug 25, 2020
* Integrated Megatron-LM

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressing PR comments, trying NER example

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* manual style fix

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* manual style fix #2

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Resolving circular import

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Static analysys warnings addressed

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressed code review; Jenkins test added

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Removing parallel feom Megatron

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Added more info to tokenizer printout, made megatron bert derivative explicit

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Bumping Megatron-LM version to get APEX fix

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.