Skip to content
This repository was archived by the owner on Jan 3, 2023. It is now read-only.
This repository was archived by the owner on Jan 3, 2023. It is now read-only.

training process is killed because OOM #48

@fancyerii

Description

@fancyerii

I have trained neon for librispeech data. But it's always killed because OOM. My machine has 24GB memory and GeForce GTX 1070 card of 8G memory.

I found this msg by dmsg
[3017506.733819] Out of memory: Kill process 25635 (python) score 974 or sacrifice child
[3017506.736861] Killed process 25635 (python) total-vm:55518724kB, anon-rss:23902876kB, file-rss:154436kB

is neon leaking memory or it require more memory to train?

The command I run is:
python train.py --manifest train:/bigdata/lili/deepspeech/librispeech/train-clean-100/train-manifest.csv --manifest val:/bigdata/lili/deepspeech/librispeech/train-clean-100/val-manifest.csv -e 20 -z 16 -s models -b gpu

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions